All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
@ 2021-04-20 11:08 Gao Xiang
  2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Gao Xiang @ 2021-04-20 11:08 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Dave Chinner, Gao Xiang, Zorro Lang, Carlos Maiolino

There are many paths which could trigger xfs_log_sb(), e.g.
  xfs_bmap_add_attrfork()
    -> xfs_log_sb()
, which overrides on-disk fdblocks by in-core per-CPU fdblocks.

However, for !lazysbcount cases, on-disk fdblocks is actually updated
by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever,
see the comment in xfs_unmountfs().

It could be observed by the following steps reported by Zorro:

1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
2. mount $dev $mnt
3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
4. umount $mnt
5. xfs_repair -n $dev

yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging
into log covering"), xfs_sync_sb() will also be triggered if log
covering is needed and !lazysbcount when xfs_unmountfs(), so hard
to reproduce on kernel 5.12+ for clean unmount.

on-disk sb_icount and sb_ifree are also updated in
xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which
are always equal to per-CPU counters, so only fdblocks matters.

After this patch, I've seen no strange so far on older kernels
for the testcase above without lazysbcount.

Reported-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
changes since v1:
 - update commit message.

 fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 60e6d255e5e2..423dada3f64c 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -928,7 +928,13 @@ xfs_log_sb(
 
 	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
 	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
-	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
+	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
+		struct xfs_dsb	*dsb = bp->b_addr;
+
+		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
+	} else {
+		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
+	}
 
 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
 	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally
  2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang
@ 2021-04-20 11:08 ` Gao Xiang
  2021-04-20 16:22   ` Darrick J. Wong
  2021-04-20 17:42 ` [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Darrick J. Wong
  2021-04-20 21:25 ` Dave Chinner
  2 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2021-04-20 11:08 UTC (permalink / raw)
  To: linux-xfs; +Cc: Darrick J. Wong, Dave Chinner, Gao Xiang

As Dave mentioned [1], "/me is now wondering why we even bother
with !lazy-count anymore.

We've updated the agr btree block accounting unconditionally since
lazy-count was added, and scrub will always report a mismatch in
counts if they exist regardless of lazy-count. So why don't we just
start ignoring the on-disk value and always use lazy-count based
updates? "

Therefore, turn on lazy sb counters if it's still disabled at the
mount time, or at remount_rw if fs was mounted as read-only.
xfs_initialize_perag_data() is reused here since no need to scan
agf/agi once more again.

After this patch, we could get rid of this whole set of subtle
conditional behaviours in the codebase.

[1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
Enabling lazysbcount is only addressed in this patch, I'll send
out a seperated following patch to cleanup all unused conditions
later.

Also tr_sb is reused here since only agf is modified for each ag,
and before lazysbcount sb feature is enabled (m_update_sb = true),
agf_btreeblks field shouldn't matter for such AGs.

 fs/xfs/libxfs/xfs_format.h |  6 +++
 fs/xfs/libxfs/xfs_sb.c     | 93 +++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_mount.c         |  2 +-
 fs/xfs/xfs_super.c         |  5 ++
 4 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 76e2461b9e66..9081d7876d66 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp)
 		(sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT));
 }
 
+static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp)
+{
+	sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT;
+	sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT;
+}
+
 static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp)
 {
 	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) ||
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 423dada3f64c..6353e0d4cab1 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -18,6 +18,7 @@
 #include "xfs_trace.h"
 #include "xfs_trans.h"
 #include "xfs_buf_item.h"
+#include "xfs_btree.h"
 #include "xfs_bmap_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_log.h"
@@ -841,6 +842,55 @@ xfs_sb_mount_common(
 	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
 }
 
+static int
+xfs_fixup_agf_btreeblks(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agfbp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_btree_cur	*cur;
+	struct xfs_perag	*pag = agfbp->b_pag;
+	struct xfs_agf		*agf = agfbp->b_addr;
+	xfs_agblock_t		btreeblks, blocks;
+	int			error;
+
+	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO);
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+	btreeblks = blocks - 1;
+
+	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT);
+	error = xfs_btree_count_blocks(cur, &blocks);
+	if (error)
+		goto err;
+	xfs_btree_del_cursor(cur, error);
+	btreeblks += blocks - 1;
+
+	/*
+	 * although rmapbt doesn't exist in v4 fses, but it'd be better
+	 * to turn it as a generic helper.
+	 */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno);
+		error = xfs_btree_count_blocks(cur, &blocks);
+		if (error)
+			goto err;
+		xfs_btree_del_cursor(cur, error);
+		btreeblks += blocks - 1;
+	}
+
+	agf->agf_btreeblks = cpu_to_be32(btreeblks);
+	pag->pagf_btreeblks = btreeblks;
+	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS);
+	return 0;
+err:
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
 /*
  * xfs_initialize_perag_data
  *
@@ -864,27 +914,51 @@ xfs_initialize_perag_data(
 	uint64_t	btree = 0;
 	uint64_t	fdblocks;
 	int		error = 0;
+	bool		conv = !(mp->m_flags & XFS_MOUNT_RDONLY) &&
+				!xfs_sb_version_haslazysbcount(sbp);
+
+	if (conv)
+		xfs_warn(mp, "enabling lazy-counters...");
 
 	for (index = 0; index < agcount; index++) {
+		struct xfs_trans	*tp = NULL;
+		struct xfs_buf		*agfbp;
+
+		if (conv) {
+			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb,
+					0, 0, 0, &tp);
+			if (error)
+				return error;
+		}
+
 		/*
-		 * read the agf, then the agi. This gets us
+		 * read the agi, then the agf. This gets us
 		 * all the information we need and populates the
 		 * per-ag structures for us.
 		 */
-		error = xfs_alloc_pagf_init(mp, NULL, index, 0);
-		if (error)
+		error = xfs_ialloc_pagi_init(mp, tp, index);
+		if (error) {
+err_out:
+			if (tp)
+				xfs_trans_cancel(tp);
 			return error;
+		}
 
-		error = xfs_ialloc_pagi_init(mp, NULL, index);
+		error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp);
 		if (error)
-			return error;
-		pag = xfs_perag_get(mp, index);
+			goto err_out;
+		pag = agfbp->b_pag;
 		ifree += pag->pagi_freecount;
 		ialloc += pag->pagi_count;
 		bfree += pag->pagf_freeblks;
 		bfreelst += pag->pagf_flcount;
+		if (tp) {
+			error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index);
+			xfs_trans_commit(tp);
+		} else {
+			xfs_buf_relse(agfbp);
+		}
 		btree += pag->pagf_btreeblks;
-		xfs_perag_put(pag);
 	}
 	fdblocks = bfree + bfreelst + btree;
 
@@ -900,6 +974,11 @@ xfs_initialize_perag_data(
 		goto out;
 	}
 
+	if (conv) {
+		xfs_sb_version_addlazysbcount(sbp);
+		mp->m_update_sb = true;
+		xfs_warn(mp, "lazy-counters has been enabled.");
+	}
 	/* Overwrite incore superblock counters with just-read data */
 	spin_lock(&mp->m_sb_lock);
 	sbp->sb_ifree = ifree;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index cb1e2c4702c3..b3b13acd45d6 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -626,7 +626,7 @@ xfs_check_summary_counts(
 	 * superblock to be correct and we don't need to do anything here.
 	 * Otherwise, recalculate the summary counters.
 	 */
-	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
+	if ((xfs_sb_version_haslazysbcount(&mp->m_sb) &&
 	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
 	    !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
 		return 0;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index a2dab05332ac..16197a890c15 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1678,6 +1678,11 @@ xfs_remount_rw(
 	}
 
 	mp->m_flags &= ~XFS_MOUNT_RDONLY;
+	if (!xfs_sb_version_haslazysbcount(sbp)) {
+		error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
+		if (error)
+			return error;
+	}
 
 	/*
 	 * If this is the first remount to writeable state we might have some
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally
  2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang
@ 2021-04-20 16:22   ` Darrick J. Wong
  2021-04-20 20:00     ` Gao Xiang
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2021-04-20 16:22 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-xfs, Dave Chinner

On Tue, Apr 20, 2021 at 07:08:55PM +0800, Gao Xiang wrote:
> As Dave mentioned [1], "/me is now wondering why we even bother
> with !lazy-count anymore.
> 
> We've updated the agr btree block accounting unconditionally since
> lazy-count was added, and scrub will always report a mismatch in
> counts if they exist regardless of lazy-count. So why don't we just
> start ignoring the on-disk value and always use lazy-count based
> updates? "
> 
> Therefore, turn on lazy sb counters if it's still disabled at the
> mount time, or at remount_rw if fs was mounted as read-only.
> xfs_initialize_perag_data() is reused here since no need to scan
> agf/agi once more again.
> 
> After this patch, we could get rid of this whole set of subtle
> conditional behaviours in the codebase.
> 
> [1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
> Enabling lazysbcount is only addressed in this patch, I'll send
> out a seperated following patch to cleanup all unused conditions
> later.
> 
> Also tr_sb is reused here since only agf is modified for each ag,
> and before lazysbcount sb feature is enabled (m_update_sb = true),
> agf_btreeblks field shouldn't matter for such AGs.
> 
>  fs/xfs/libxfs/xfs_format.h |  6 +++
>  fs/xfs/libxfs/xfs_sb.c     | 93 +++++++++++++++++++++++++++++++++++---
>  fs/xfs/xfs_mount.c         |  2 +-
>  fs/xfs/xfs_super.c         |  5 ++
>  4 files changed, 98 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 76e2461b9e66..9081d7876d66 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp)
>  		(sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT));
>  }
>  
> +static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp)
> +{
> +	sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT;
> +	sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT;
> +}
> +
>  static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp)
>  {
>  	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) ||
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 423dada3f64c..6353e0d4cab1 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -18,6 +18,7 @@
>  #include "xfs_trace.h"
>  #include "xfs_trans.h"
>  #include "xfs_buf_item.h"
> +#include "xfs_btree.h"
>  #include "xfs_bmap_btree.h"
>  #include "xfs_alloc_btree.h"
>  #include "xfs_log.h"
> @@ -841,6 +842,55 @@ xfs_sb_mount_common(
>  	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
>  }
>  
> +static int
> +xfs_fixup_agf_btreeblks(
> +	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agfbp,
> +	xfs_agnumber_t		agno)
> +{
> +	struct xfs_btree_cur	*cur;
> +	struct xfs_perag	*pag = agfbp->b_pag;
> +	struct xfs_agf		*agf = agfbp->b_addr;
> +	xfs_agblock_t		btreeblks, blocks;
> +	int			error;
> +
> +	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO);
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, error);
> +	btreeblks = blocks - 1;
> +
> +	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT);
> +	error = xfs_btree_count_blocks(cur, &blocks);
> +	if (error)
> +		goto err;
> +	xfs_btree_del_cursor(cur, error);
> +	btreeblks += blocks - 1;
> +
> +	/*
> +	 * although rmapbt doesn't exist in v4 fses, but it'd be better
> +	 * to turn it as a generic helper.
> +	 */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno);
> +		error = xfs_btree_count_blocks(cur, &blocks);
> +		if (error)
> +			goto err;
> +		xfs_btree_del_cursor(cur, error);
> +		btreeblks += blocks - 1;
> +	}
> +
> +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> +	pag->pagf_btreeblks = btreeblks;
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS);
> +	return 0;
> +err:
> +	xfs_btree_del_cursor(cur, error);
> +	return error;
> +}
> +
>  /*
>   * xfs_initialize_perag_data
>   *
> @@ -864,27 +914,51 @@ xfs_initialize_perag_data(
>  	uint64_t	btree = 0;
>  	uint64_t	fdblocks;
>  	int		error = 0;
> +	bool		conv = !(mp->m_flags & XFS_MOUNT_RDONLY) &&
> +				!xfs_sb_version_haslazysbcount(sbp);
> +
> +	if (conv)
> +		xfs_warn(mp, "enabling lazy-counters...");
>  
>  	for (index = 0; index < agcount; index++) {
> +		struct xfs_trans	*tp = NULL;
> +		struct xfs_buf		*agfbp;
> +
> +		if (conv) {
> +			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb,
> +					0, 0, 0, &tp);
> +			if (error)
> +				return error;
> +		}
> +
>  		/*
> -		 * read the agf, then the agi. This gets us
> +		 * read the agi, then the agf. This gets us
>  		 * all the information we need and populates the
>  		 * per-ag structures for us.
>  		 */
> -		error = xfs_alloc_pagf_init(mp, NULL, index, 0);
> -		if (error)
> +		error = xfs_ialloc_pagi_init(mp, tp, index);
> +		if (error) {
> +err_out:
> +			if (tp)
> +				xfs_trans_cancel(tp);
>  			return error;
> +		}
>  
> -		error = xfs_ialloc_pagi_init(mp, NULL, index);
> +		error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp);
>  		if (error)
> -			return error;
> -		pag = xfs_perag_get(mp, index);
> +			goto err_out;
> +		pag = agfbp->b_pag;
>  		ifree += pag->pagi_freecount;
>  		ialloc += pag->pagi_count;
>  		bfree += pag->pagf_freeblks;
>  		bfreelst += pag->pagf_flcount;
> +		if (tp) {
> +			error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index);

Lazysbcount upgrades should be done from a separate function, not mixed
in with perag initialization.  Also, why is it necessary to walk all the
space btrees to set agf_btreeblks?

> +			xfs_trans_commit(tp);
> +		} else {
> +			xfs_buf_relse(agfbp);
> +		}
>  		btree += pag->pagf_btreeblks;
> -		xfs_perag_put(pag);
>  	}
>  	fdblocks = bfree + bfreelst + btree;
>  
> @@ -900,6 +974,11 @@ xfs_initialize_perag_data(
>  		goto out;
>  	}
>  
> +	if (conv) {
> +		xfs_sb_version_addlazysbcount(sbp);
> +		mp->m_update_sb = true;
> +		xfs_warn(mp, "lazy-counters has been enabled.");

But we don't log the sb update?

As far as the feature upgrade goes, is it necessary to bwrite the
primary super to disk (and then log the change)[1] to prevent a truly
ancient kernel that doesn't support lazysbcount from trying to recover
the log and ending up with an unsupported feature set?

[1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/

> +	}
>  	/* Overwrite incore superblock counters with just-read data */
>  	spin_lock(&mp->m_sb_lock);
>  	sbp->sb_ifree = ifree;
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index cb1e2c4702c3..b3b13acd45d6 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -626,7 +626,7 @@ xfs_check_summary_counts(
>  	 * superblock to be correct and we don't need to do anything here.
>  	 * Otherwise, recalculate the summary counters.
>  	 */
> -	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
> +	if ((xfs_sb_version_haslazysbcount(&mp->m_sb) &&

Not clear why the logic here inverts?

--D

>  	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
>  	    !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
>  		return 0;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index a2dab05332ac..16197a890c15 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1678,6 +1678,11 @@ xfs_remount_rw(
>  	}
>  
>  	mp->m_flags &= ~XFS_MOUNT_RDONLY;
> +	if (!xfs_sb_version_haslazysbcount(sbp)) {
> +		error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
> +		if (error)
> +			return error;
> +	}
>  
>  	/*
>  	 * If this is the first remount to writeable state we might have some
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang
  2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang
@ 2021-04-20 17:42 ` Darrick J. Wong
  2021-04-20 21:25 ` Dave Chinner
  2 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2021-04-20 17:42 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-xfs, Dave Chinner, Zorro Lang, Carlos Maiolino

On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote:
> There are many paths which could trigger xfs_log_sb(), e.g.
>   xfs_bmap_add_attrfork()
>     -> xfs_log_sb()
> , which overrides on-disk fdblocks by in-core per-CPU fdblocks.
> 
> However, for !lazysbcount cases, on-disk fdblocks is actually updated
> by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever,
> see the comment in xfs_unmountfs().
> 
> It could be observed by the following steps reported by Zorro:
> 
> 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> 2. mount $dev $mnt
> 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> 4. umount $mnt
> 5. xfs_repair -n $dev
> 
> yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging
> into log covering"), xfs_sync_sb() will also be triggered if log
> covering is needed and !lazysbcount when xfs_unmountfs(), so hard
> to reproduce on kernel 5.12+ for clean unmount.
> 
> on-disk sb_icount and sb_ifree are also updated in
> xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which
> are always equal to per-CPU counters, so only fdblocks matters.
> 
> After this patch, I've seen no strange so far on older kernels
> for the testcase above without lazysbcount.
> 
> Reported-by: Zorro Lang <zlang@redhat.com>
> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Still seems a bit roundabout to me, but fmeh, this is all deprecated
anyways, so:

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
> changes since v1:
>  - update commit message.
> 
>  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 60e6d255e5e2..423dada3f64c 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -928,7 +928,13 @@ xfs_log_sb(
>  
>  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
>  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> +		struct xfs_dsb	*dsb = bp->b_addr;
> +
> +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> +	} else {
> +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	}
>  
>  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
>  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally
  2021-04-20 16:22   ` Darrick J. Wong
@ 2021-04-20 20:00     ` Gao Xiang
  2021-04-22  0:01       ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2021-04-20 20:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Dave Chinner

Hi Darrick,

On Tue, Apr 20, 2021 at 09:22:50AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 20, 2021 at 07:08:55PM +0800, Gao Xiang wrote:
> > As Dave mentioned [1], "/me is now wondering why we even bother
> > with !lazy-count anymore.
> > 
> > We've updated the agr btree block accounting unconditionally since
> > lazy-count was added, and scrub will always report a mismatch in
> > counts if they exist regardless of lazy-count. So why don't we just
> > start ignoring the on-disk value and always use lazy-count based
> > updates? "
> > 
> > Therefore, turn on lazy sb counters if it's still disabled at the
> > mount time, or at remount_rw if fs was mounted as read-only.
> > xfs_initialize_perag_data() is reused here since no need to scan
> > agf/agi once more again.
> > 
> > After this patch, we could get rid of this whole set of subtle
> > conditional behaviours in the codebase.
> > 
> > [1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> > Enabling lazysbcount is only addressed in this patch, I'll send
> > out a seperated following patch to cleanup all unused conditions
> > later.
> > 
> > Also tr_sb is reused here since only agf is modified for each ag,
> > and before lazysbcount sb feature is enabled (m_update_sb = true),
> > agf_btreeblks field shouldn't matter for such AGs.
> > 
> >  fs/xfs/libxfs/xfs_format.h |  6 +++
> >  fs/xfs/libxfs/xfs_sb.c     | 93 +++++++++++++++++++++++++++++++++++---
> >  fs/xfs/xfs_mount.c         |  2 +-
> >  fs/xfs/xfs_super.c         |  5 ++
> >  4 files changed, 98 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index 76e2461b9e66..9081d7876d66 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp)
> >  		(sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT));
> >  }
> >  
> > +static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp)
> > +{
> > +	sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT;
> > +	sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT;
> > +}
> > +
> >  static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp)
> >  {
> >  	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) ||
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index 423dada3f64c..6353e0d4cab1 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -18,6 +18,7 @@
> >  #include "xfs_trace.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_buf_item.h"
> > +#include "xfs_btree.h"
> >  #include "xfs_bmap_btree.h"
> >  #include "xfs_alloc_btree.h"
> >  #include "xfs_log.h"
> > @@ -841,6 +842,55 @@ xfs_sb_mount_common(
> >  	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
> >  }
> >  
> > +static int
> > +xfs_fixup_agf_btreeblks(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agfbp,
> > +	xfs_agnumber_t		agno)
> > +{
> > +	struct xfs_btree_cur	*cur;
> > +	struct xfs_perag	*pag = agfbp->b_pag;
> > +	struct xfs_agf		*agf = agfbp->b_addr;
> > +	xfs_agblock_t		btreeblks, blocks;
> > +	int			error;
> > +
> > +	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO);
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, error);
> > +	btreeblks = blocks - 1;
> > +
> > +	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT);
> > +	error = xfs_btree_count_blocks(cur, &blocks);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, error);
> > +	btreeblks += blocks - 1;
> > +
> > +	/*
> > +	 * although rmapbt doesn't exist in v4 fses, but it'd be better
> > +	 * to turn it as a generic helper.
> > +	 */
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > +		cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno);
> > +		error = xfs_btree_count_blocks(cur, &blocks);
> > +		if (error)
> > +			goto err;
> > +		xfs_btree_del_cursor(cur, error);
> > +		btreeblks += blocks - 1;
> > +	}
> > +
> > +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> > +	pag->pagf_btreeblks = btreeblks;
> > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS);
> > +	return 0;
> > +err:
> > +	xfs_btree_del_cursor(cur, error);
> > +	return error;
> > +}
> > +
> >  /*
> >   * xfs_initialize_perag_data
> >   *
> > @@ -864,27 +914,51 @@ xfs_initialize_perag_data(
> >  	uint64_t	btree = 0;
> >  	uint64_t	fdblocks;
> >  	int		error = 0;
> > +	bool		conv = !(mp->m_flags & XFS_MOUNT_RDONLY) &&
> > +				!xfs_sb_version_haslazysbcount(sbp);
> > +
> > +	if (conv)
> > +		xfs_warn(mp, "enabling lazy-counters...");
> >  
> >  	for (index = 0; index < agcount; index++) {
> > +		struct xfs_trans	*tp = NULL;
> > +		struct xfs_buf		*agfbp;
> > +
> > +		if (conv) {
> > +			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb,
> > +					0, 0, 0, &tp);
> > +			if (error)
> > +				return error;
> > +		}
> > +
> >  		/*
> > -		 * read the agf, then the agi. This gets us
> > +		 * read the agi, then the agf. This gets us
> >  		 * all the information we need and populates the
> >  		 * per-ag structures for us.
> >  		 */
> > -		error = xfs_alloc_pagf_init(mp, NULL, index, 0);
> > -		if (error)
> > +		error = xfs_ialloc_pagi_init(mp, tp, index);
> > +		if (error) {
> > +err_out:
> > +			if (tp)
> > +				xfs_trans_cancel(tp);
> >  			return error;
> > +		}
> >  
> > -		error = xfs_ialloc_pagi_init(mp, NULL, index);
> > +		error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp);
> >  		if (error)
> > -			return error;
> > -		pag = xfs_perag_get(mp, index);
> > +			goto err_out;
> > +		pag = agfbp->b_pag;
> >  		ifree += pag->pagi_freecount;
> >  		ialloc += pag->pagi_count;
> >  		bfree += pag->pagf_freeblks;
> >  		bfreelst += pag->pagf_flcount;
> > +		if (tp) {
> > +			error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index);
> 
> Lazysbcount upgrades should be done from a separate function, not mixed
> in with perag initialization. 

I've seen some previous discussion about multiple AG total scan time cost.
Yeah, if another extra scan really accepts here, I could update instead.

> Also, why is it necessary to walk all the space btrees to set agf_btreeblks?

If my understanding is correct, I think because without lazysbcount,
even pagf_btreeblks is updated unconditionally now, but that counter
is unreliable for quite ancient kernels which don't have such update
logic.

Kindly correct me if I'm wrong here.

> 
> > +			xfs_trans_commit(tp);
> > +		} else {
> > +			xfs_buf_relse(agfbp);
> > +		}
> >  		btree += pag->pagf_btreeblks;
> > -		xfs_perag_put(pag);
> >  	}
> >  	fdblocks = bfree + bfreelst + btree;
> >  
> > @@ -900,6 +974,11 @@ xfs_initialize_perag_data(
> >  		goto out;
> >  	}
> >  
> > +	if (conv) {
> > +		xfs_sb_version_addlazysbcount(sbp);
> > +		mp->m_update_sb = true;
> > +		xfs_warn(mp, "lazy-counters has been enabled.");
> 
> But we don't log the sb update?
> 
> As far as the feature upgrade goes, is it necessary to bwrite the
> primary super to disk (and then log the change)[1] to prevent a truly
> ancient kernel that doesn't support lazysbcount from trying to recover
> the log and ending up with an unsupported feature set?

Not quite sure if it does harm to ancient kernels with such
unsupported feature. may I ask for more details? :)

but yeah, if any issues here, I should follow
 1) bwrite sb block first;
 2) log sb

> 
> [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/
> 
> > +	}
> >  	/* Overwrite incore superblock counters with just-read data */
> >  	spin_lock(&mp->m_sb_lock);
> >  	sbp->sb_ifree = ifree;
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index cb1e2c4702c3..b3b13acd45d6 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -626,7 +626,7 @@ xfs_check_summary_counts(
> >  	 * superblock to be correct and we don't need to do anything here.
> >  	 * Otherwise, recalculate the summary counters.
> >  	 */
> > -	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
> > +	if ((xfs_sb_version_haslazysbcount(&mp->m_sb) &&
> 
> Not clear why the logic here inverts?

.. thus xfs_initialize_perag_data() below can be called then.

Thanks,
Gao Xiang

> 
> --D
> 
> >  	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
> >  	    !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
> >  		return 0;
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index a2dab05332ac..16197a890c15 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1678,6 +1678,11 @@ xfs_remount_rw(
> >  	}
> >  
> >  	mp->m_flags &= ~XFS_MOUNT_RDONLY;
> > +	if (!xfs_sb_version_haslazysbcount(sbp)) {
> > +		error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
> > +		if (error)
> > +			return error;
> > +	}
> >  
> >  	/*
> >  	 * If this is the first remount to writeable state we might have some
> > -- 
> > 2.27.0
> > 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang
  2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang
  2021-04-20 17:42 ` [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Darrick J. Wong
@ 2021-04-20 21:25 ` Dave Chinner
  2021-04-20 21:54   ` Gao Xiang
  2 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2021-04-20 21:25 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote:
> There are many paths which could trigger xfs_log_sb(), e.g.
>   xfs_bmap_add_attrfork()
>     -> xfs_log_sb()
> , which overrides on-disk fdblocks by in-core per-CPU fdblocks.
> 
> However, for !lazysbcount cases, on-disk fdblocks is actually updated
> by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever,
> see the comment in xfs_unmountfs().
> 
> It could be observed by the following steps reported by Zorro:
> 
> 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> 2. mount $dev $mnt
> 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> 4. umount $mnt
> 5. xfs_repair -n $dev
> 
> yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging
> into log covering"), xfs_sync_sb() will also be triggered if log
> covering is needed and !lazysbcount when xfs_unmountfs(), so hard
> to reproduce on kernel 5.12+ for clean unmount.
> 
> on-disk sb_icount and sb_ifree are also updated in
> xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which
> are always equal to per-CPU counters, so only fdblocks matters.
> 
> After this patch, I've seen no strange so far on older kernels
> for the testcase above without lazysbcount.
> 
> Reported-by: Zorro Lang <zlang@redhat.com>
> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
> changes since v1:
>  - update commit message.
> 
>  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 60e6d255e5e2..423dada3f64c 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -928,7 +928,13 @@ xfs_log_sb(
>  
>  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
>  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> +		struct xfs_dsb	*dsb = bp->b_addr;
> +
> +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> +	} else {
> +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	}

THis really needs a comment explaining why this is done this way.
It's not obvious from reading the code why we pull the the fdblock
count off disk and then, in  xfs_sb_to_disk(), we write it straight
back to disk.

It's also not clear to me that summing the inode counters is correct
in the case of the !lazysbcount for the similar reasons - the percpu
counter is not guaranteed to be absolutely accurate here, yet the
values in the disk buffer are. Perhaps we should be updating the
m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case,
and only summing them here for the lazycount case...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-20 21:25 ` Dave Chinner
@ 2021-04-20 21:54   ` Gao Xiang
  2021-04-21  1:45     ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2021-04-20 21:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

Hi Dave,

On Wed, Apr 21, 2021 at 07:25:06AM +1000, Dave Chinner wrote:
> On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote:
> > There are many paths which could trigger xfs_log_sb(), e.g.
> >   xfs_bmap_add_attrfork()
> >     -> xfs_log_sb()
> > , which overrides on-disk fdblocks by in-core per-CPU fdblocks.
> > 
> > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever,
> > see the comment in xfs_unmountfs().
> > 
> > It could be observed by the following steps reported by Zorro:
> > 
> > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > 2. mount $dev $mnt
> > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > 4. umount $mnt
> > 5. xfs_repair -n $dev
> > 
> > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging
> > into log covering"), xfs_sync_sb() will also be triggered if log
> > covering is needed and !lazysbcount when xfs_unmountfs(), so hard
> > to reproduce on kernel 5.12+ for clean unmount.
> > 
> > on-disk sb_icount and sb_ifree are also updated in
> > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which
> > are always equal to per-CPU counters, so only fdblocks matters.
> > 
> > After this patch, I've seen no strange so far on older kernels
> > for the testcase above without lazysbcount.
> > 
> > Reported-by: Zorro Lang <zlang@redhat.com>
> > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> > changes since v1:
> >  - update commit message.
> > 
> >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index 60e6d255e5e2..423dada3f64c 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -928,7 +928,13 @@ xfs_log_sb(
> >  
> >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > +		struct xfs_dsb	*dsb = bp->b_addr;
> > +
> > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > +	} else {
> > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	}
> 
> THis really needs a comment explaining why this is done this way.
> It's not obvious from reading the code why we pull the the fdblock
> count off disk and then, in  xfs_sb_to_disk(), we write it straight
> back to disk.
> 
> It's also not clear to me that summing the inode counters is correct
> in the case of the !lazysbcount for the similar reasons - the percpu
> counter is not guaranteed to be absolutely accurate here, yet the
> values in the disk buffer are. Perhaps we should be updating the
> m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case,
> and only summing them here for the lazycount case...

But if updating m_sb values in xfs_trans_apply_sb_deltas(), we
should also update on-disk sb counters in xfs_trans_apply_sb_deltas()
and log sb for !lazysbcount (since for such cases, sb counter update
should be considered immediately.)

That will indeed cause more modification, I'm not quite sure if it's
quite ok honestly. But if you assume that's more clear, I could submit
an alternative instead later.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-20 21:54   ` Gao Xiang
@ 2021-04-21  1:45     ` Dave Chinner
  2021-04-21  3:01       ` Gao Xiang
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2021-04-21  1:45 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote:
> Hi Dave,
> 
> On Wed, Apr 21, 2021 at 07:25:06AM +1000, Dave Chinner wrote:
> > On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote:
> > > There are many paths which could trigger xfs_log_sb(), e.g.
> > >   xfs_bmap_add_attrfork()
> > >     -> xfs_log_sb()
> > > , which overrides on-disk fdblocks by in-core per-CPU fdblocks.
> > > 
> > > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever,
> > > see the comment in xfs_unmountfs().
> > > 
> > > It could be observed by the following steps reported by Zorro:
> > > 
> > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > > 2. mount $dev $mnt
> > > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > > 4. umount $mnt
> > > 5. xfs_repair -n $dev
> > > 
> > > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging
> > > into log covering"), xfs_sync_sb() will also be triggered if log
> > > covering is needed and !lazysbcount when xfs_unmountfs(), so hard
> > > to reproduce on kernel 5.12+ for clean unmount.
> > > 
> > > on-disk sb_icount and sb_ifree are also updated in
> > > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which
> > > are always equal to per-CPU counters, so only fdblocks matters.
> > > 
> > > After this patch, I've seen no strange so far on older kernels
> > > for the testcase above without lazysbcount.
> > > 
> > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > ---
> > > changes since v1:
> > >  - update commit message.
> > > 
> > >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > index 60e6d255e5e2..423dada3f64c 100644
> > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > >  
> > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > +
> > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > > +	} else {
> > > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > +	}
> > 
> > THis really needs a comment explaining why this is done this way.
> > It's not obvious from reading the code why we pull the the fdblock
> > count off disk and then, in  xfs_sb_to_disk(), we write it straight
> > back to disk.
> > 
> > It's also not clear to me that summing the inode counters is correct
> > in the case of the !lazysbcount for the similar reasons - the percpu
> > counter is not guaranteed to be absolutely accurate here, yet the
> > values in the disk buffer are. Perhaps we should be updating the
> > m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case,
> > and only summing them here for the lazycount case...
> 
> But if updating m_sb values in xfs_trans_apply_sb_deltas(), we
> should also update on-disk sb counters in xfs_trans_apply_sb_deltas()
> and log sb for !lazysbcount (since for such cases, sb counter update
> should be considered immediately.)

I don't follow - xfs_trans_apply_sb_deltas() already logs the
changes to the superblock made in the transaction.

xfs_trans_unreserve_and_mod_sb() does the in-memory counter updates
after xfs_trans_apply_sb_deltas() applies them to the on-disk
superblock in the buffer and logs them.

But nowhere on a !lazysbcount setup are mp->m_sb.sb_fdcount/ifree/
icount values being updated, and hence they are not valid at any
time except for during log quiesce where all the in memory
reservations have been removed and the per-cpu counters are synced
to mp->m_sb.

I'm suggesting that xfs_trans_unreserve_and_mod_sb() also updates
the mp->m_sb.sb_fdcount/ifree/icount values for !lazysbcount, as we
currently do not do this and this will keep them uptodate for any
caller of xfs_sb_to_disk().

i.e. we have three choices:

1. avoid writing the counters in xfs_sb_to_disk() for !lazycount.
2. read them from the buffer before writing them back to the buffer.
3. keep them up to date correctly via xfs_trans_unreserve_and_mod_sb.

#1 is bad because there are cases where we want to write the
counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc).

#2 is essentially a hack around the fact that mp->m_sb is not kept
up to date in the in-memory superblock for !lazysbcount filesystems.

#3 keeps the in-memory superblock up to date for !lazysbcount case
so they are coherent with the on-disk values and hence we only need
to update the in-memory superblock counts for lazysbcount
filesystems before calling xfs_sb_to_disk().

#3 is my preferred solution.

> That will indeed cause more modification, I'm not quite sure if it's
> quite ok honestly. But if you assume that's more clear, I could submit
> an alternative instead later.

I think the version you posted doesn't fix the entire problem. It
merely slaps a band-aid over the symptom that is being seen, and
doesn't address all the non-coherent data that can be written to the
superblock here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-21  1:45     ` Dave Chinner
@ 2021-04-21  3:01       ` Gao Xiang
  2021-04-22  1:44         ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2021-04-21  3:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote:
> On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote:
> > Hi Dave,
> > 
> > On Wed, Apr 21, 2021 at 07:25:06AM +1000, Dave Chinner wrote:
> > > On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote:
> > > > There are many paths which could trigger xfs_log_sb(), e.g.
> > > >   xfs_bmap_add_attrfork()
> > > >     -> xfs_log_sb()
> > > > , which overrides on-disk fdblocks by in-core per-CPU fdblocks.
> > > > 
> > > > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > > > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > > > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever,
> > > > see the comment in xfs_unmountfs().
> > > > 
> > > > It could be observed by the following steps reported by Zorro:
> > > > 
> > > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > > > 2. mount $dev $mnt
> > > > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > > > 4. umount $mnt
> > > > 5. xfs_repair -n $dev
> > > > 
> > > > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging
> > > > into log covering"), xfs_sync_sb() will also be triggered if log
> > > > covering is needed and !lazysbcount when xfs_unmountfs(), so hard
> > > > to reproduce on kernel 5.12+ for clean unmount.
> > > > 
> > > > on-disk sb_icount and sb_ifree are also updated in
> > > > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which
> > > > are always equal to per-CPU counters, so only fdblocks matters.
> > > > 
> > > > After this patch, I've seen no strange so far on older kernels
> > > > for the testcase above without lazysbcount.
> > > > 
> > > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> > > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > > ---
> > > > changes since v1:
> > > >  - update commit message.
> > > > 
> > > >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > > index 60e6d255e5e2..423dada3f64c 100644
> > > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > > >  
> > > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > > +
> > > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > > > +	} else {
> > > > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > +	}
> > > 
> > > THis really needs a comment explaining why this is done this way.
> > > It's not obvious from reading the code why we pull the the fdblock
> > > count off disk and then, in  xfs_sb_to_disk(), we write it straight
> > > back to disk.
> > > 
> > > It's also not clear to me that summing the inode counters is correct
> > > in the case of the !lazysbcount for the similar reasons - the percpu
> > > counter is not guaranteed to be absolutely accurate here, yet the
> > > values in the disk buffer are. Perhaps we should be updating the
> > > m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case,
> > > and only summing them here for the lazycount case...
> > 
> > But if updating m_sb values in xfs_trans_apply_sb_deltas(), we
> > should also update on-disk sb counters in xfs_trans_apply_sb_deltas()
> > and log sb for !lazysbcount (since for such cases, sb counter update
> > should be considered immediately.)
> 
> I don't follow - xfs_trans_apply_sb_deltas() already logs the
> changes to the superblock made in the transaction.
> 
> xfs_trans_unreserve_and_mod_sb() does the in-memory counter updates
> after xfs_trans_apply_sb_deltas() applies them to the on-disk
> superblock in the buffer and logs them.
> 
> But nowhere on a !lazysbcount setup are mp->m_sb.sb_fdcount/ifree/
> icount values being updated, and hence they are not valid at any
> time except for during log quiesce where all the in memory
> reservations have been removed and the per-cpu counters are synced
> to mp->m_sb.
> 
> I'm suggesting that xfs_trans_unreserve_and_mod_sb() also updates
> the mp->m_sb.sb_fdcount/ifree/icount values for !lazysbcount, as we
> currently do not do this and this will keep them uptodate for any
> caller of xfs_sb_to_disk().
> 
> i.e. we have three choices:
> 
> 1. avoid writing the counters in xfs_sb_to_disk() for !lazycount.
> 2. read them from the buffer before writing them back to the buffer.
> 3. keep them up to date correctly via xfs_trans_unreserve_and_mod_sb.
> 
> #1 is bad because there are cases where we want to write the
> counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc).
> 
> #2 is essentially a hack around the fact that mp->m_sb is not kept
> up to date in the in-memory superblock for !lazysbcount filesystems.
> 
> #3 keeps the in-memory superblock up to date for !lazysbcount case
> so they are coherent with the on-disk values and hence we only need
> to update the in-memory superblock counts for lazysbcount
> filesystems before calling xfs_sb_to_disk().
> 
> #3 is my preferred solution.
> 
> > That will indeed cause more modification, I'm not quite sure if it's
> > quite ok honestly. But if you assume that's more clear, I could submit
> > an alternative instead later.
> 
> I think the version you posted doesn't fix the entire problem. It
> merely slaps a band-aid over the symptom that is being seen, and
> doesn't address all the non-coherent data that can be written to the
> superblock here.

As I explained on IRC as well, I think for !lazysbcount cases, fdblocks,
icount and ifree are protected by sb buffer lock. and the only users of
these three are:
 1) xfs_trans_apply_sb_deltas()
 2) xfs_log_sb()

So I've seen no need to update sb_icount, sb_ifree in that way (I mean
my v2, although I agree it's a bit hacky.) only sb_fdblocks matters.

But the reason why this patch exist is only to backport to old stable
kernels, since after [PATCH v2 2/2], we can get rid of all of
!lazysbcount cases upstream.

But if we'd like to do more e.g. by taking m_sb_lock, I've seen the
xfs codebase quite varies these years. and I modified some version
like http://paste.debian.net/1194481/

and I'm not sure if we be64_add_cpu() on-disk sb counters in
xfs_trans_apply_sb_deltas() and then mp->m_sb.sb_fdblocks +=
tp->t_fdblocks_delta again in xfs_trans_unreserve_and_mod_sb()
seems much better.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally
  2021-04-20 20:00     ` Gao Xiang
@ 2021-04-22  0:01       ` Darrick J. Wong
  2021-04-22  1:51         ` Gao Xiang
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2021-04-22  0:01 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-xfs, Dave Chinner

On Wed, Apr 21, 2021 at 04:00:29AM +0800, Gao Xiang wrote:
> Hi Darrick,
> 
> On Tue, Apr 20, 2021 at 09:22:50AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 20, 2021 at 07:08:55PM +0800, Gao Xiang wrote:
> > > As Dave mentioned [1], "/me is now wondering why we even bother
> > > with !lazy-count anymore.
> > > 
> > > We've updated the agr btree block accounting unconditionally since
> > > lazy-count was added, and scrub will always report a mismatch in
> > > counts if they exist regardless of lazy-count. So why don't we just
> > > start ignoring the on-disk value and always use lazy-count based
> > > updates? "
> > > 
> > > Therefore, turn on lazy sb counters if it's still disabled at the
> > > mount time, or at remount_rw if fs was mounted as read-only.
> > > xfs_initialize_perag_data() is reused here since no need to scan
> > > agf/agi once more again.
> > > 
> > > After this patch, we could get rid of this whole set of subtle
> > > conditional behaviours in the codebase.
> > > 
> > > [1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area
> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > ---
> > > Enabling lazysbcount is only addressed in this patch, I'll send
> > > out a seperated following patch to cleanup all unused conditions
> > > later.
> > > 
> > > Also tr_sb is reused here since only agf is modified for each ag,
> > > and before lazysbcount sb feature is enabled (m_update_sb = true),
> > > agf_btreeblks field shouldn't matter for such AGs.
> > > 
> > >  fs/xfs/libxfs/xfs_format.h |  6 +++
> > >  fs/xfs/libxfs/xfs_sb.c     | 93 +++++++++++++++++++++++++++++++++++---
> > >  fs/xfs/xfs_mount.c         |  2 +-
> > >  fs/xfs/xfs_super.c         |  5 ++
> > >  4 files changed, 98 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > > index 76e2461b9e66..9081d7876d66 100644
> > > --- a/fs/xfs/libxfs/xfs_format.h
> > > +++ b/fs/xfs/libxfs/xfs_format.h
> > > @@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp)
> > >  		(sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT));
> > >  }
> > >  
> > > +static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp)
> > > +{
> > > +	sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT;
> > > +	sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT;
> > > +}
> > > +
> > >  static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp)
> > >  {
> > >  	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) ||
> > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > index 423dada3f64c..6353e0d4cab1 100644
> > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > @@ -18,6 +18,7 @@
> > >  #include "xfs_trace.h"
> > >  #include "xfs_trans.h"
> > >  #include "xfs_buf_item.h"
> > > +#include "xfs_btree.h"
> > >  #include "xfs_bmap_btree.h"
> > >  #include "xfs_alloc_btree.h"
> > >  #include "xfs_log.h"
> > > @@ -841,6 +842,55 @@ xfs_sb_mount_common(
> > >  	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
> > >  }
> > >  
> > > +static int
> > > +xfs_fixup_agf_btreeblks(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_buf		*agfbp,
> > > +	xfs_agnumber_t		agno)
> > > +{
> > > +	struct xfs_btree_cur	*cur;
> > > +	struct xfs_perag	*pag = agfbp->b_pag;
> > > +	struct xfs_agf		*agf = agfbp->b_addr;
> > > +	xfs_agblock_t		btreeblks, blocks;
> > > +	int			error;
> > > +
> > > +	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO);
> > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > +	if (error)
> > > +		goto err;
> > > +	xfs_btree_del_cursor(cur, error);
> > > +	btreeblks = blocks - 1;
> > > +
> > > +	cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT);
> > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > +	if (error)
> > > +		goto err;
> > > +	xfs_btree_del_cursor(cur, error);
> > > +	btreeblks += blocks - 1;
> > > +
> > > +	/*
> > > +	 * although rmapbt doesn't exist in v4 fses, but it'd be better
> > > +	 * to turn it as a generic helper.
> > > +	 */
> > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > +		cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno);
> > > +		error = xfs_btree_count_blocks(cur, &blocks);
> > > +		if (error)
> > > +			goto err;
> > > +		xfs_btree_del_cursor(cur, error);
> > > +		btreeblks += blocks - 1;
> > > +	}
> > > +
> > > +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> > > +	pag->pagf_btreeblks = btreeblks;
> > > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS);
> > > +	return 0;
> > > +err:
> > > +	xfs_btree_del_cursor(cur, error);
> > > +	return error;
> > > +}
> > > +
> > >  /*
> > >   * xfs_initialize_perag_data
> > >   *
> > > @@ -864,27 +914,51 @@ xfs_initialize_perag_data(
> > >  	uint64_t	btree = 0;
> > >  	uint64_t	fdblocks;
> > >  	int		error = 0;
> > > +	bool		conv = !(mp->m_flags & XFS_MOUNT_RDONLY) &&
> > > +				!xfs_sb_version_haslazysbcount(sbp);
> > > +
> > > +	if (conv)
> > > +		xfs_warn(mp, "enabling lazy-counters...");
> > >  
> > >  	for (index = 0; index < agcount; index++) {
> > > +		struct xfs_trans	*tp = NULL;
> > > +		struct xfs_buf		*agfbp;
> > > +
> > > +		if (conv) {
> > > +			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb,
> > > +					0, 0, 0, &tp);
> > > +			if (error)
> > > +				return error;
> > > +		}
> > > +
> > >  		/*
> > > -		 * read the agf, then the agi. This gets us
> > > +		 * read the agi, then the agf. This gets us
> > >  		 * all the information we need and populates the
> > >  		 * per-ag structures for us.
> > >  		 */
> > > -		error = xfs_alloc_pagf_init(mp, NULL, index, 0);
> > > -		if (error)
> > > +		error = xfs_ialloc_pagi_init(mp, tp, index);
> > > +		if (error) {
> > > +err_out:
> > > +			if (tp)
> > > +				xfs_trans_cancel(tp);
> > >  			return error;
> > > +		}
> > >  
> > > -		error = xfs_ialloc_pagi_init(mp, NULL, index);
> > > +		error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp);
> > >  		if (error)
> > > -			return error;
> > > -		pag = xfs_perag_get(mp, index);
> > > +			goto err_out;
> > > +		pag = agfbp->b_pag;
> > >  		ifree += pag->pagi_freecount;
> > >  		ialloc += pag->pagi_count;
> > >  		bfree += pag->pagf_freeblks;
> > >  		bfreelst += pag->pagf_flcount;
> > > +		if (tp) {
> > > +			error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index);
> > 
> > Lazysbcount upgrades should be done from a separate function, not mixed
> > in with perag initialization. 
> 
> I've seen some previous discussion about multiple AG total scan time cost.
> Yeah, if another extra scan really accepts here, I could update instead.
> 
> > Also, why is it necessary to walk all the space btrees to set agf_btreeblks?
> 
> If my understanding is correct, I think because without lazysbcount,
> even pagf_btreeblks is updated unconditionally now, but that counter
> is unreliable for quite ancient kernels which don't have such update
> logic.
> 
> Kindly correct me if I'm wrong here.

Ah, you're right.  The agf_btreeblks field in the AGF only exists if
lazysbcount is enabled, which means that adding the feature means that
we have to scan every AG to compute the correct value.

Still, we only need to do this walk once per filesystem, so I'd prefer
not to clutter up the xfs_initialize_perag_data code for the sake of a
onetime upgrade for a deprecated ondisk format.

In my mind it's a feature to be able to do:

#if IS_ENABLED(CONFIG_XFS_V4)
int
xfs_fs_set_lazycount(...)
{
	/* walk AGs, fix AGF... */
	/* lock super */
	/* set lazysbcount */
	/* bwrite super */
	/* log super changes */
	/* commit the whole mess */
}
#else
# define xfs_fs_set_lazycount(..)	(-ENOSYS)
#endif

Because then we know that this is all XFSv4 code and can easily make it
go away.

The other question I have is: Do we /really/ want to QA and support
this in the kernel?  Considering that we already have xfs_admin -c1?

> > 
> > > +			xfs_trans_commit(tp);
> > > +		} else {
> > > +			xfs_buf_relse(agfbp);
> > > +		}
> > >  		btree += pag->pagf_btreeblks;
> > > -		xfs_perag_put(pag);
> > >  	}
> > >  	fdblocks = bfree + bfreelst + btree;
> > >  
> > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data(
> > >  		goto out;
> > >  	}
> > >  
> > > +	if (conv) {
> > > +		xfs_sb_version_addlazysbcount(sbp);
> > > +		mp->m_update_sb = true;
> > > +		xfs_warn(mp, "lazy-counters has been enabled.");
> > 
> > But we don't log the sb update?
> > 
> > As far as the feature upgrade goes, is it necessary to bwrite the
> > primary super to disk (and then log the change)[1] to prevent a truly
> > ancient kernel that doesn't support lazysbcount from trying to recover
> > the log and ending up with an unsupported feature set?
> 
> Not quite sure if it does harm to ancient kernels with such
> unsupported feature. may I ask for more details? :)

1. Walk AG to update btreeblks.
2. Commit feature flag update in superblock.
3. Log flushes to disk before the superblock update gets written to
   sector 0.
<crash>
4. Boot ancient kernel that doesn't understand lazysbcount
   (from USB recovery stick).
5. Mount begins, because sector 0 doesn't have the lazysbcount bit set.
6. Log recovery replays the primary super update over sector 0, and the
   new contents of sector 0 say lazysbcount is enabled.
7. Superblock now says it has lazysbcount, what does the kernel do?

> 
> but yeah, if any issues here, I should follow
>  1) bwrite sb block first;
>  2) log sb
> 
> > 
> > [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/
> > 
> > > +	}
> > >  	/* Overwrite incore superblock counters with just-read data */
> > >  	spin_lock(&mp->m_sb_lock);
> > >  	sbp->sb_ifree = ifree;
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index cb1e2c4702c3..b3b13acd45d6 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -626,7 +626,7 @@ xfs_check_summary_counts(
> > >  	 * superblock to be correct and we don't need to do anything here.
> > >  	 * Otherwise, recalculate the summary counters.
> > >  	 */
> > > -	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
> > > +	if ((xfs_sb_version_haslazysbcount(&mp->m_sb) &&
> > 
> > Not clear why the logic here inverts?
> 
> .. thus xfs_initialize_perag_data() below can be called then.

That seems like all the more reason to make it a separate function, TBH.

--D

> 
> Thanks,
> Gao Xiang
> 
> > 
> > --D
> > 
> > >  	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
> > >  	    !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
> > >  		return 0;
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index a2dab05332ac..16197a890c15 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -1678,6 +1678,11 @@ xfs_remount_rw(
> > >  	}
> > >  
> > >  	mp->m_flags &= ~XFS_MOUNT_RDONLY;
> > > +	if (!xfs_sb_version_haslazysbcount(sbp)) {
> > > +		error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > >  
> > >  	/*
> > >  	 * If this is the first remount to writeable state we might have some
> > > -- 
> > > 2.27.0
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-21  3:01       ` Gao Xiang
@ 2021-04-22  1:44         ` Dave Chinner
  2021-04-22  2:06           ` Gao Xiang
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2021-04-22  1:44 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote:
> On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote:
> > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote:
> > #1 is bad because there are cases where we want to write the
> > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc).
> > 
> > #2 is essentially a hack around the fact that mp->m_sb is not kept
> > up to date in the in-memory superblock for !lazysbcount filesystems.
> > 
> > #3 keeps the in-memory superblock up to date for !lazysbcount case
> > so they are coherent with the on-disk values and hence we only need
> > to update the in-memory superblock counts for lazysbcount
> > filesystems before calling xfs_sb_to_disk().
> > 
> > #3 is my preferred solution.
> > 
> > > That will indeed cause more modification, I'm not quite sure if it's
> > > quite ok honestly. But if you assume that's more clear, I could submit
> > > an alternative instead later.
> > 
> > I think the version you posted doesn't fix the entire problem. It
> > merely slaps a band-aid over the symptom that is being seen, and
> > doesn't address all the non-coherent data that can be written to the
> > superblock here.
> 
> As I explained on IRC as well, I think for !lazysbcount cases, fdblocks,
> icount and ifree are protected by sb buffer lock. and the only users of
> these three are:
>  1) xfs_trans_apply_sb_deltas()
>  2) xfs_log_sb()

That's just a happy accident and not intentional in any way. Just
fixing the case that occurs while holding the sb buffer lock doesn't
actually fix the underlying problem, it just uses this as a bandaid.

> 
> So I've seen no need to update sb_icount, sb_ifree in that way (I mean
> my v2, although I agree it's a bit hacky.) only sb_fdblocks matters.
> 
> But the reason why this patch exist is only to backport to old stable
> kernels, since after [PATCH v2 2/2], we can get rid of all of
> !lazysbcount cases upstream.
> 
> But if we'd like to do more e.g. by taking m_sb_lock, I've seen the
> xfs codebase quite varies these years. and I modified some version
> like http://paste.debian.net/1194481/

I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is
for. For !lazysbcount filesystems the transaction will be marked
dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the
slow path that takes the m_sb_lock and updates mp->m_sb. 

It's faster for me to explain this by patch than any other way. See
below.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: update superblock counters correctly for !lazysbcount

From: Dave Chinner <dchinner@redhat.com>

Keep the mount superblock counters up to date for !lazysbcount
filesystems so that when we log the superblock they do not need
updating in any way because they are already correct.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_sb.c | 16 +++++++++++++---
 fs/xfs/xfs_trans.c     |  3 +++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 9630f9e2f540..7d4c238540d4 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -794,9 +794,19 @@ xfs_log_sb(
 	struct xfs_mount	*mp = tp->t_mountp;
 	struct xfs_buf		*bp = xfs_trans_getsb(tp);
 
-	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
-	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
-	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
+	/*
+	 * Lazy sb counters don't update the in-core superblock so do that now.
+	 * If this is at unmount, the counters will be exactly correct, but at
+	 * any other time they will only be ballpark correct because of
+	 * reservations that have been taken out percpu counters. If we have an
+	 * unclean shutdown, this will be corrected by log recovery rebuilding
+	 * the counters from the AGF block counts.
+	 */
+	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
+		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
+		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
+		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
+	}
 
 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
 	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index bcc978011869..438e41931b55 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb(
 
 	/* apply remaining deltas */
 	spin_lock(&mp->m_sb_lock);
+	mp->m_sb.sb_fdblocks += blkdelta;
+	mp->m_sb.sb_icount += idelta;
+	mp->m_sb.sb_ifree += ifreedelta;
 	mp->m_sb.sb_frextents += rtxdelta;
 	mp->m_sb.sb_dblocks += tp->t_dblocks_delta;
 	mp->m_sb.sb_agcount += tp->t_agcount_delta;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally
  2021-04-22  0:01       ` Darrick J. Wong
@ 2021-04-22  1:51         ` Gao Xiang
  2021-04-22  5:11           ` Zorro Lang
  0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2021-04-22  1:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Dave Chinner, Zorro Lang

On Wed, Apr 21, 2021 at 05:01:40PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 21, 2021 at 04:00:29AM +0800, Gao Xiang wrote:

...

> > > >  /*
> > > >   * xfs_initialize_perag_data
> > > >   *
> > > > @@ -864,27 +914,51 @@ xfs_initialize_perag_data(
> > > >  	uint64_t	btree = 0;
> > > >  	uint64_t	fdblocks;
> > > >  	int		error = 0;
> > > > +	bool		conv = !(mp->m_flags & XFS_MOUNT_RDONLY) &&
> > > > +				!xfs_sb_version_haslazysbcount(sbp);
> > > > +
> > > > +	if (conv)
> > > > +		xfs_warn(mp, "enabling lazy-counters...");
> > > >  
> > > >  	for (index = 0; index < agcount; index++) {
> > > > +		struct xfs_trans	*tp = NULL;
> > > > +		struct xfs_buf		*agfbp;
> > > > +
> > > > +		if (conv) {
> > > > +			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb,
> > > > +					0, 0, 0, &tp);
> > > > +			if (error)
> > > > +				return error;
> > > > +		}
> > > > +
> > > >  		/*
> > > > -		 * read the agf, then the agi. This gets us
> > > > +		 * read the agi, then the agf. This gets us
> > > >  		 * all the information we need and populates the
> > > >  		 * per-ag structures for us.
> > > >  		 */
> > > > -		error = xfs_alloc_pagf_init(mp, NULL, index, 0);
> > > > -		if (error)
> > > > +		error = xfs_ialloc_pagi_init(mp, tp, index);
> > > > +		if (error) {
> > > > +err_out:
> > > > +			if (tp)
> > > > +				xfs_trans_cancel(tp);
> > > >  			return error;
> > > > +		}
> > > >  
> > > > -		error = xfs_ialloc_pagi_init(mp, NULL, index);
> > > > +		error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp);
> > > >  		if (error)
> > > > -			return error;
> > > > -		pag = xfs_perag_get(mp, index);
> > > > +			goto err_out;
> > > > +		pag = agfbp->b_pag;
> > > >  		ifree += pag->pagi_freecount;
> > > >  		ialloc += pag->pagi_count;
> > > >  		bfree += pag->pagf_freeblks;
> > > >  		bfreelst += pag->pagf_flcount;
> > > > +		if (tp) {
> > > > +			error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index);
> > > 
> > > Lazysbcount upgrades should be done from a separate function, not mixed
> > > in with perag initialization. 
> > 
> > I've seen some previous discussion about multiple AG total scan time cost.
> > Yeah, if another extra scan really accepts here, I could update instead.
> > 
> > > Also, why is it necessary to walk all the space btrees to set agf_btreeblks?
> > 
> > If my understanding is correct, I think because without lazysbcount,
> > even pagf_btreeblks is updated unconditionally now, but that counter
> > is unreliable for quite ancient kernels which don't have such update
> > logic.
> > 
> > Kindly correct me if I'm wrong here.
> 
> Ah, you're right.  The agf_btreeblks field in the AGF only exists if
> lazysbcount is enabled, which means that adding the feature means that
> we have to scan every AG to compute the correct value.
> 
> Still, we only need to do this walk once per filesystem, so I'd prefer
> not to clutter up the xfs_initialize_perag_data code for the sake of a
> onetime upgrade for a deprecated ondisk format.
> 
> In my mind it's a feature to be able to do:
> 
> #if IS_ENABLED(CONFIG_XFS_V4)
> int
> xfs_fs_set_lazycount(...)
> {
> 	/* walk AGs, fix AGF... */
> 	/* lock super */
> 	/* set lazysbcount */
> 	/* bwrite super */
> 	/* log super changes */
> 	/* commit the whole mess */
> }
> #else
> # define xfs_fs_set_lazycount(..)	(-ENOSYS)
> #endif
> 
> Because then we know that this is all XFSv4 code and can easily make it
> go away.

Yeah, sounds better. I could refine this in the next version. :)

> 
> The other question I have is: Do we /really/ want to QA and support
> this in the kernel?  Considering that we already have xfs_admin -c1?

I think we might ask Zorro for this whole thing, since no end users
actually report this. :) (Cc Zorro here.) Although the reality is
we still support !lazysbcount fses even it isn't looked after at all.

My another suggestion completely forbids !lazysbcount from mounting
in months (or right now.)

Just warn users to use xfs_admin -c1 to convert this. And after months,
warn users to convert this and also forbid it from mounting.

> 
> > > 
> > > > +			xfs_trans_commit(tp);
> > > > +		} else {
> > > > +			xfs_buf_relse(agfbp);
> > > > +		}
> > > >  		btree += pag->pagf_btreeblks;
> > > > -		xfs_perag_put(pag);
> > > >  	}
> > > >  	fdblocks = bfree + bfreelst + btree;
> > > >  
> > > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data(
> > > >  		goto out;
> > > >  	}
> > > >  
> > > > +	if (conv) {
> > > > +		xfs_sb_version_addlazysbcount(sbp);
> > > > +		mp->m_update_sb = true;
> > > > +		xfs_warn(mp, "lazy-counters has been enabled.");
> > > 
> > > But we don't log the sb update?
> > > 
> > > As far as the feature upgrade goes, is it necessary to bwrite the
> > > primary super to disk (and then log the change)[1] to prevent a truly
> > > ancient kernel that doesn't support lazysbcount from trying to recover
> > > the log and ending up with an unsupported feature set?
> > 
> > Not quite sure if it does harm to ancient kernels with such
> > unsupported feature. may I ask for more details? :)
> 
> 1. Walk AG to update btreeblks.
> 2. Commit feature flag update in superblock.
> 3. Log flushes to disk before the superblock update gets written to
>    sector 0.
> <crash>
> 4. Boot ancient kernel that doesn't understand lazysbcount
>    (from USB recovery stick).
> 5. Mount begins, because sector 0 doesn't have the lazysbcount bit set.
> 6. Log recovery replays the primary super update over sector 0, and the
>    new contents of sector 0 say lazysbcount is enabled.
> 7. Superblock now says it has lazysbcount, what does the kernel do?

Yeah, but I'm not sure if it has some bad effect if ancient kernels
do it in this way. I mean (I think) it's somewhat different from
log_incompat thing.

If ancient kernels just replay the log, and then sb read verified
and refuse to proceed (but fs is not corrupted...) I think that
would be fine?

I'm not sure about the whole thing on ancient kernels. So very
curious about this. I will look into the whole thing as well.

> 
> > 
> > but yeah, if any issues here, I should follow
> >  1) bwrite sb block first;
> >  2) log sb
> > 
> > > 
> > > [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/
> > > 
> > > > +	}
> > > >  	/* Overwrite incore superblock counters with just-read data */
> > > >  	spin_lock(&mp->m_sb_lock);
> > > >  	sbp->sb_ifree = ifree;
> > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > > index cb1e2c4702c3..b3b13acd45d6 100644
> > > > --- a/fs/xfs/xfs_mount.c
> > > > +++ b/fs/xfs/xfs_mount.c
> > > > @@ -626,7 +626,7 @@ xfs_check_summary_counts(
> > > >  	 * superblock to be correct and we don't need to do anything here.
> > > >  	 * Otherwise, recalculate the summary counters.
> > > >  	 */
> > > > -	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
> > > > +	if ((xfs_sb_version_haslazysbcount(&mp->m_sb) &&
> > > 
> > > Not clear why the logic here inverts?
> > 
> > .. thus xfs_initialize_perag_data() below can be called then.
> 
> That seems like all the more reason to make it a separate function, TBH.

Yeah, will refine this later.

Thanks,
Gao Xiang

> 
> --D
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> > > 
> > > --D
> > > 
> > > >  	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
> > > >  	    !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
> > > >  		return 0;
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index a2dab05332ac..16197a890c15 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -1678,6 +1678,11 @@ xfs_remount_rw(
> > > >  	}
> > > >  
> > > >  	mp->m_flags &= ~XFS_MOUNT_RDONLY;
> > > > +	if (!xfs_sb_version_haslazysbcount(sbp)) {
> > > > +		error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
> > > > +		if (error)
> > > > +			return error;
> > > > +	}
> > > >  
> > > >  	/*
> > > >  	 * If this is the first remount to writeable state we might have some
> > > > -- 
> > > > 2.27.0
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-22  1:44         ` Dave Chinner
@ 2021-04-22  2:06           ` Gao Xiang
  2021-04-22  3:01             ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2021-04-22  2:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

Hi Dave,

On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote:
> On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote:
> > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote:
> > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote:
> > > #1 is bad because there are cases where we want to write the
> > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc).
> > > 
> > > #2 is essentially a hack around the fact that mp->m_sb is not kept
> > > up to date in the in-memory superblock for !lazysbcount filesystems.
> > > 
> > > #3 keeps the in-memory superblock up to date for !lazysbcount case
> > > so they are coherent with the on-disk values and hence we only need
> > > to update the in-memory superblock counts for lazysbcount
> > > filesystems before calling xfs_sb_to_disk().
> > > 
> > > #3 is my preferred solution.
> > > 
> > > > That will indeed cause more modification, I'm not quite sure if it's
> > > > quite ok honestly. But if you assume that's more clear, I could submit
> > > > an alternative instead later.
> > > 
> > > I think the version you posted doesn't fix the entire problem. It
> > > merely slaps a band-aid over the symptom that is being seen, and
> > > doesn't address all the non-coherent data that can be written to the
> > > superblock here.
> > 
> > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks,
> > icount and ifree are protected by sb buffer lock. and the only users of
> > these three are:
> >  1) xfs_trans_apply_sb_deltas()
> >  2) xfs_log_sb()
> 
> That's just a happy accident and not intentional in any way. Just
> fixing the case that occurs while holding the sb buffer lock doesn't
> actually fix the underlying problem, it just uses this as a bandaid.

I think for !lazysbcases, sb buffer lock is only a reliable lock that
can be relied on for serialzing (since we need to make sure each sb
write matches the corresponding fdblocks, ifree, icount. So sb buffer
needs be locked every time. So so need to recalc on dirty log.)
> 
> > 
> > So I've seen no need to update sb_icount, sb_ifree in that way (I mean
> > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters.
> > 
> > But the reason why this patch exist is only to backport to old stable
> > kernels, since after [PATCH v2 2/2], we can get rid of all of
> > !lazysbcount cases upstream.
> > 
> > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the
> > xfs codebase quite varies these years. and I modified some version
> > like http://paste.debian.net/1194481/
> 
> I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is
> for. For !lazysbcount filesystems the transaction will be marked
> dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the
> slow path that takes the m_sb_lock and updates mp->m_sb. 
> 
> It's faster for me to explain this by patch than any other way. See
> below.

I know what you mean, but there exists 3 things:
 1) we be64_add_cpu() on-disk fdblocks, ifree, icount at
    xfs_trans_apply_sb_deltas(), and then do the same bahavior in
    xfs_trans_unreserve_and_mod_sb() for in-memory counters again.
    that is (somewhat) fragile.

 2) m_sb_lock behaves no effect at this. This lock between
    xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still
    sb buffer lock for !lazysbcount cases.

 3) in-memory sb counters are serialized by some spinlock now,
    so I'm not sure sb per-CPU counters behave for lazysbcount
    cases, are these used for better performance?

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> xfs: update superblock counters correctly for !lazysbcount
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> Keep the mount superblock counters up to date for !lazysbcount
> filesystems so that when we log the superblock they do not need
> updating in any way because they are already correct.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_sb.c | 16 +++++++++++++---
>  fs/xfs/xfs_trans.c     |  3 +++
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 9630f9e2f540..7d4c238540d4 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -794,9 +794,19 @@ xfs_log_sb(
>  	struct xfs_mount	*mp = tp->t_mountp;
>  	struct xfs_buf		*bp = xfs_trans_getsb(tp);
>  
> -	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> -	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	/*
> +	 * Lazy sb counters don't update the in-core superblock so do that now.
> +	 * If this is at unmount, the counters will be exactly correct, but at
> +	 * any other time they will only be ballpark correct because of
> +	 * reservations that have been taken out percpu counters. If we have an
> +	 * unclean shutdown, this will be corrected by log recovery rebuilding
> +	 * the counters from the AGF block counts.
> +	 */
> +	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> +		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> +		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> +	}
>  
>  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
>  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index bcc978011869..438e41931b55 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb(
>  
>  	/* apply remaining deltas */
>  	spin_lock(&mp->m_sb_lock);
> +	mp->m_sb.sb_fdblocks += blkdelta;

not sure that is quite equal to blkdelta, since (I think) we might need
to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount
cases, but I'm not quite sure, just saw the comment above
xfs_trans_unreserve_and_mod_sb() and the implementation of
xfs_trans_apply_sb_deltas().

Thanks,
Gao Xiang

> +	mp->m_sb.sb_icount += idelta;
> +	mp->m_sb.sb_ifree += ifreedelta;
>  	mp->m_sb.sb_frextents += rtxdelta;
>  	mp->m_sb.sb_dblocks += tp->t_dblocks_delta;
>  	mp->m_sb.sb_agcount += tp->t_agcount_delta;
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-22  2:06           ` Gao Xiang
@ 2021-04-22  3:01             ` Dave Chinner
  2021-04-22  3:12               ` Gao Xiang
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2021-04-22  3:01 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

On Thu, Apr 22, 2021 at 10:06:13AM +0800, Gao Xiang wrote:
> Hi Dave,
> 
> On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote:
> > On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote:
> > > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote:
> > > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote:
> > > > #1 is bad because there are cases where we want to write the
> > > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc).
> > > > 
> > > > #2 is essentially a hack around the fact that mp->m_sb is not kept
> > > > up to date in the in-memory superblock for !lazysbcount filesystems.
> > > > 
> > > > #3 keeps the in-memory superblock up to date for !lazysbcount case
> > > > so they are coherent with the on-disk values and hence we only need
> > > > to update the in-memory superblock counts for lazysbcount
> > > > filesystems before calling xfs_sb_to_disk().
> > > > 
> > > > #3 is my preferred solution.
> > > > 
> > > > > That will indeed cause more modification, I'm not quite sure if it's
> > > > > quite ok honestly. But if you assume that's more clear, I could submit
> > > > > an alternative instead later.
> > > > 
> > > > I think the version you posted doesn't fix the entire problem. It
> > > > merely slaps a band-aid over the symptom that is being seen, and
> > > > doesn't address all the non-coherent data that can be written to the
> > > > superblock here.
> > > 
> > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks,
> > > icount and ifree are protected by sb buffer lock. and the only users of
> > > these three are:
> > >  1) xfs_trans_apply_sb_deltas()
> > >  2) xfs_log_sb()
> > 
> > That's just a happy accident and not intentional in any way. Just
> > fixing the case that occurs while holding the sb buffer lock doesn't
> > actually fix the underlying problem, it just uses this as a bandaid.
> 
> I think for !lazysbcases, sb buffer lock is only a reliable lock that
> can be relied on for serialzing (since we need to make sure each sb
> write matches the corresponding fdblocks, ifree, icount. So sb buffer
> needs be locked every time. So so need to recalc on dirty log.)
> > 
> > > 
> > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean
> > > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters.
> > > 
> > > But the reason why this patch exist is only to backport to old stable
> > > kernels, since after [PATCH v2 2/2], we can get rid of all of
> > > !lazysbcount cases upstream.
> > > 
> > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the
> > > xfs codebase quite varies these years. and I modified some version
> > > like http://paste.debian.net/1194481/
> > 
> > I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is
> > for. For !lazysbcount filesystems the transaction will be marked
> > dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the
> > slow path that takes the m_sb_lock and updates mp->m_sb. 
> > 
> > It's faster for me to explain this by patch than any other way. See
> > below.
> 
> I know what you mean, but there exists 3 things:
>  1) we be64_add_cpu() on-disk fdblocks, ifree, icount at
>     xfs_trans_apply_sb_deltas(), and then do the same bahavior in
>     xfs_trans_unreserve_and_mod_sb() for in-memory counters again.
>     that is (somewhat) fragile.

That's exactly how the superblock updates have been done since the
mid 1990s. It's the way it was intended to work:

- xfs_trans_apply_sb_deltas() applies the changes to the on
  disk superblock
- xfs_trans_unreserve_and_mod_sb() applies the changes to the
  in-memory superblock.

All my patch does is follow the long established separation of
update responsibilities. It is actually returning the code to the
behaviour we had before lazy superblock counters were introduced.

>  2) m_sb_lock behaves no effect at this. This lock between
>     xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still
>     sb buffer lock for !lazysbcount cases.

The m_sb_lock doesn't need to have any effect on this. It's to
prevent concurrent updates of the in-core superblock, not to prevent
access to the superblock buffer.

i.e. the superblock buffer lock protects against concurrent updates
of the superblock buffer, and hence while progating and logging
changes to the superblock buffer we have to have the superblock
buffer locked.

>  3) in-memory sb counters are serialized by some spinlock now,

No, they are not. Lazysbcount does not set XFS_TRANS_SB_DIRTY
for pure ifree/icount/fdblock updates, so it never runs the code
I modified in xfs_trans_unreserve_and_mod_sb() unless some other
part of the superblock is changed.

For !lazysbcount, we always run this path because XFS_TRANS_SB_DIRTY
is always set.

>     so I'm not sure sb per-CPU counters behave for lazysbcount
>     cases, are these used for better performance?

It does not change behaviour of anything at all, execpt the counter
values for !lazysbcount filesystems are now always kept correctly up
to date.

> >  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> >  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index bcc978011869..438e41931b55 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb(
> >  
> >  	/* apply remaining deltas */
> >  	spin_lock(&mp->m_sb_lock);
> > +	mp->m_sb.sb_fdblocks += blkdelta;
> 
> not sure that is quite equal to blkdelta, since (I think) we might need
> to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount
> cases, but I'm not quite sure, just saw the comment above
> xfs_trans_unreserve_and_mod_sb() and the implementation of
> xfs_trans_apply_sb_deltas().

Yes, I forgot about the special delayed allocation space accounting.
We'll have to add that, too, so it becomes:

+	mp->m_sb.sb_fdblocks += blkdelta + tp->t_res_fdblocks_delta;
+	mp->m_sb.sb_icount += idelta;
+	mp->m_sb.sb_ifree += ifreedelta;

But this doesn't change the structure of the patch in any way.

CHeers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-22  3:01             ` Dave Chinner
@ 2021-04-22  3:12               ` Gao Xiang
  2021-04-22 15:58                 ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Gao Xiang @ 2021-04-22  3:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino

On Thu, Apr 22, 2021 at 01:01:02PM +1000, Dave Chinner wrote:
> On Thu, Apr 22, 2021 at 10:06:13AM +0800, Gao Xiang wrote:
> > Hi Dave,
> > 
> > On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote:
> > > On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote:
> > > > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote:
> > > > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote:
> > > > > #1 is bad because there are cases where we want to write the
> > > > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc).
> > > > > 
> > > > > #2 is essentially a hack around the fact that mp->m_sb is not kept
> > > > > up to date in the in-memory superblock for !lazysbcount filesystems.
> > > > > 
> > > > > #3 keeps the in-memory superblock up to date for !lazysbcount case
> > > > > so they are coherent with the on-disk values and hence we only need
> > > > > to update the in-memory superblock counts for lazysbcount
> > > > > filesystems before calling xfs_sb_to_disk().
> > > > > 
> > > > > #3 is my preferred solution.
> > > > > 
> > > > > > That will indeed cause more modification, I'm not quite sure if it's
> > > > > > quite ok honestly. But if you assume that's more clear, I could submit
> > > > > > an alternative instead later.
> > > > > 
> > > > > I think the version you posted doesn't fix the entire problem. It
> > > > > merely slaps a band-aid over the symptom that is being seen, and
> > > > > doesn't address all the non-coherent data that can be written to the
> > > > > superblock here.
> > > > 
> > > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks,
> > > > icount and ifree are protected by sb buffer lock. and the only users of
> > > > these three are:
> > > >  1) xfs_trans_apply_sb_deltas()
> > > >  2) xfs_log_sb()
> > > 
> > > That's just a happy accident and not intentional in any way. Just
> > > fixing the case that occurs while holding the sb buffer lock doesn't
> > > actually fix the underlying problem, it just uses this as a bandaid.
> > 
> > I think for !lazysbcases, sb buffer lock is only a reliable lock that
> > can be relied on for serialzing (since we need to make sure each sb
> > write matches the corresponding fdblocks, ifree, icount. So sb buffer
> > needs be locked every time. So so need to recalc on dirty log.)
> > > 
> > > > 
> > > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean
> > > > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters.
> > > > 
> > > > But the reason why this patch exist is only to backport to old stable
> > > > kernels, since after [PATCH v2 2/2], we can get rid of all of
> > > > !lazysbcount cases upstream.
> > > > 
> > > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the
> > > > xfs codebase quite varies these years. and I modified some version
> > > > like http://paste.debian.net/1194481/
> > > 
> > > I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is
> > > for. For !lazysbcount filesystems the transaction will be marked
> > > dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the
> > > slow path that takes the m_sb_lock and updates mp->m_sb. 
> > > 
> > > It's faster for me to explain this by patch than any other way. See
> > > below.
> > 
> > I know what you mean, but there exists 3 things:
> >  1) we be64_add_cpu() on-disk fdblocks, ifree, icount at
> >     xfs_trans_apply_sb_deltas(), and then do the same bahavior in
> >     xfs_trans_unreserve_and_mod_sb() for in-memory counters again.
> >     that is (somewhat) fragile.
> 
> That's exactly how the superblock updates have been done since the
> mid 1990s. It's the way it was intended to work:
> 
> - xfs_trans_apply_sb_deltas() applies the changes to the on
>   disk superblock
> - xfs_trans_unreserve_and_mod_sb() applies the changes to the
>   in-memory superblock.
> 
> All my patch does is follow the long established separation of
> update responsibilities. It is actually returning the code to the
> behaviour we had before lazy superblock counters were introduced.
> 
> >  2) m_sb_lock behaves no effect at this. This lock between
> >     xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still
> >     sb buffer lock for !lazysbcount cases.
> 
> The m_sb_lock doesn't need to have any effect on this. It's to
> prevent concurrent updates of the in-core superblock, not to prevent
> access to the superblock buffer.
> 
> i.e. the superblock buffer lock protects against concurrent updates
> of the superblock buffer, and hence while progating and logging
> changes to the superblock buffer we have to have the superblock
> buffer locked.
> 
> >  3) in-memory sb counters are serialized by some spinlock now,
> 
> No, they are not. Lazysbcount does not set XFS_TRANS_SB_DIRTY
> for pure ifree/icount/fdblock updates, so it never runs the code
> I modified in xfs_trans_unreserve_and_mod_sb() unless some other
> part of the superblock is changed.
> 
> For !lazysbcount, we always run this path because XFS_TRANS_SB_DIRTY
> is always set.
> 
> >     so I'm not sure sb per-CPU counters behave for lazysbcount
> >     cases, are these used for better performance?
> 
> It does not change behaviour of anything at all, execpt the counter
> values for !lazysbcount filesystems are now always kept correctly up
> to date.
> 
> > >  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > >  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > index bcc978011869..438e41931b55 100644
> > > --- a/fs/xfs/xfs_trans.c
> > > +++ b/fs/xfs/xfs_trans.c
> > > @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb(
> > >  
> > >  	/* apply remaining deltas */
> > >  	spin_lock(&mp->m_sb_lock);
> > > +	mp->m_sb.sb_fdblocks += blkdelta;
> > 
> > not sure that is quite equal to blkdelta, since (I think) we might need
> > to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount
> > cases, but I'm not quite sure, just saw the comment above
> > xfs_trans_unreserve_and_mod_sb() and the implementation of
> > xfs_trans_apply_sb_deltas().
> 
> Yes, I forgot about the special delayed allocation space accounting.
> We'll have to add that, too, so it becomes:
> 
> +	mp->m_sb.sb_fdblocks += blkdelta + tp->t_res_fdblocks_delta;
> +	mp->m_sb.sb_icount += idelta;
> +	mp->m_sb.sb_ifree += ifreedelta;
> 
> But this doesn't change the structure of the patch in any way.

Anyway, I think this'd be absolutely fine to fix this issue as well,
so:

Reviewed-by: Gao Xiang <hsiangkao@redhat.com>

(Although I still insist on my v2 [just my own thought] since in-memory
 sb counters are totally unused/reserved compared with on-disk sb counters
 for sb_fdblocks and per-CPU sb counters for sb_ifree / sb_icount for
 the whole !lazysbcount cases, maybe adding some comments is better.
 But I'm also fine if the patch goes like this ;) )

Thanks,
Gao Xiang

> 
> CHeers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally
  2021-04-22  1:51         ` Gao Xiang
@ 2021-04-22  5:11           ` Zorro Lang
  0 siblings, 0 replies; 17+ messages in thread
From: Zorro Lang @ 2021-04-22  5:11 UTC (permalink / raw)
  To: Gao Xiang; +Cc: Darrick J. Wong, linux-xfs, Dave Chinner

On Thu, Apr 22, 2021 at 09:51:26AM +0800, Gao Xiang wrote:
> On Wed, Apr 21, 2021 at 05:01:40PM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 21, 2021 at 04:00:29AM +0800, Gao Xiang wrote:
> 
> ...
> 
> > > > >  /*
> > > > >   * xfs_initialize_perag_data
> > > > >   *
> > > > > @@ -864,27 +914,51 @@ xfs_initialize_perag_data(
> > > > >  	uint64_t	btree = 0;
> > > > >  	uint64_t	fdblocks;
> > > > >  	int		error = 0;
> > > > > +	bool		conv = !(mp->m_flags & XFS_MOUNT_RDONLY) &&
> > > > > +				!xfs_sb_version_haslazysbcount(sbp);
> > > > > +
> > > > > +	if (conv)
> > > > > +		xfs_warn(mp, "enabling lazy-counters...");
> > > > >  
> > > > >  	for (index = 0; index < agcount; index++) {
> > > > > +		struct xfs_trans	*tp = NULL;
> > > > > +		struct xfs_buf		*agfbp;
> > > > > +
> > > > > +		if (conv) {
> > > > > +			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb,
> > > > > +					0, 0, 0, &tp);
> > > > > +			if (error)
> > > > > +				return error;
> > > > > +		}
> > > > > +
> > > > >  		/*
> > > > > -		 * read the agf, then the agi. This gets us
> > > > > +		 * read the agi, then the agf. This gets us
> > > > >  		 * all the information we need and populates the
> > > > >  		 * per-ag structures for us.
> > > > >  		 */
> > > > > -		error = xfs_alloc_pagf_init(mp, NULL, index, 0);
> > > > > -		if (error)
> > > > > +		error = xfs_ialloc_pagi_init(mp, tp, index);
> > > > > +		if (error) {
> > > > > +err_out:
> > > > > +			if (tp)
> > > > > +				xfs_trans_cancel(tp);
> > > > >  			return error;
> > > > > +		}
> > > > >  
> > > > > -		error = xfs_ialloc_pagi_init(mp, NULL, index);
> > > > > +		error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp);
> > > > >  		if (error)
> > > > > -			return error;
> > > > > -		pag = xfs_perag_get(mp, index);
> > > > > +			goto err_out;
> > > > > +		pag = agfbp->b_pag;
> > > > >  		ifree += pag->pagi_freecount;
> > > > >  		ialloc += pag->pagi_count;
> > > > >  		bfree += pag->pagf_freeblks;
> > > > >  		bfreelst += pag->pagf_flcount;
> > > > > +		if (tp) {
> > > > > +			error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index);
> > > > 
> > > > Lazysbcount upgrades should be done from a separate function, not mixed
> > > > in with perag initialization. 
> > > 
> > > I've seen some previous discussion about multiple AG total scan time cost.
> > > Yeah, if another extra scan really accepts here, I could update instead.
> > > 
> > > > Also, why is it necessary to walk all the space btrees to set agf_btreeblks?
> > > 
> > > If my understanding is correct, I think because without lazysbcount,
> > > even pagf_btreeblks is updated unconditionally now, but that counter
> > > is unreliable for quite ancient kernels which don't have such update
> > > logic.
> > > 
> > > Kindly correct me if I'm wrong here.
> > 
> > Ah, you're right.  The agf_btreeblks field in the AGF only exists if
> > lazysbcount is enabled, which means that adding the feature means that
> > we have to scan every AG to compute the correct value.
> > 
> > Still, we only need to do this walk once per filesystem, so I'd prefer
> > not to clutter up the xfs_initialize_perag_data code for the sake of a
> > onetime upgrade for a deprecated ondisk format.
> > 
> > In my mind it's a feature to be able to do:
> > 
> > #if IS_ENABLED(CONFIG_XFS_V4)
> > int
> > xfs_fs_set_lazycount(...)
> > {
> > 	/* walk AGs, fix AGF... */
> > 	/* lock super */
> > 	/* set lazysbcount */
> > 	/* bwrite super */
> > 	/* log super changes */
> > 	/* commit the whole mess */
> > }
> > #else
> > # define xfs_fs_set_lazycount(..)	(-ENOSYS)
> > #endif
> > 
> > Because then we know that this is all XFSv4 code and can easily make it
> > go away.
> 
> Yeah, sounds better. I could refine this in the next version. :)
> 
> > 
> > The other question I have is: Do we /really/ want to QA and support
> > this in the kernel?  Considering that we already have xfs_admin -c1?
> 
> I think we might ask Zorro for this whole thing, since no end users
> actually report this. :) (Cc Zorro here.) Although the reality is
> we still support !lazysbcount fses even it isn't looked after at all.

There's not complaint from cumtomers yet. Due to lazy-count only can be disabled
on V4 xfs, and not disabled by default. So nearly no one use it now (Except I'm
the one who's picky:)

I never thougth it brings in so much argument. I think if the logic is clear, and
easy to fix, better to fix it simply, then remove the whole related code later
when it's deprecated.
If there's risk, we can keep it unitl it's deprecated, especially there's not more
complaints on it.

Thanks,
Zorro

> 
> My another suggestion completely forbids !lazysbcount from mounting
> in months (or right now.)
> 
> Just warn users to use xfs_admin -c1 to convert this. And after months,
> warn users to convert this and also forbid it from mounting.
> 
> > 
> > > > 
> > > > > +			xfs_trans_commit(tp);
> > > > > +		} else {
> > > > > +			xfs_buf_relse(agfbp);
> > > > > +		}
> > > > >  		btree += pag->pagf_btreeblks;
> > > > > -		xfs_perag_put(pag);
> > > > >  	}
> > > > >  	fdblocks = bfree + bfreelst + btree;
> > > > >  
> > > > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data(
> > > > >  		goto out;
> > > > >  	}
> > > > >  
> > > > > +	if (conv) {
> > > > > +		xfs_sb_version_addlazysbcount(sbp);
> > > > > +		mp->m_update_sb = true;
> > > > > +		xfs_warn(mp, "lazy-counters has been enabled.");
> > > > 
> > > > But we don't log the sb update?
> > > > 
> > > > As far as the feature upgrade goes, is it necessary to bwrite the
> > > > primary super to disk (and then log the change)[1] to prevent a truly
> > > > ancient kernel that doesn't support lazysbcount from trying to recover
> > > > the log and ending up with an unsupported feature set?
> > > 
> > > Not quite sure if it does harm to ancient kernels with such
> > > unsupported feature. may I ask for more details? :)
> > 
> > 1. Walk AG to update btreeblks.
> > 2. Commit feature flag update in superblock.
> > 3. Log flushes to disk before the superblock update gets written to
> >    sector 0.
> > <crash>
> > 4. Boot ancient kernel that doesn't understand lazysbcount
> >    (from USB recovery stick).
> > 5. Mount begins, because sector 0 doesn't have the lazysbcount bit set.
> > 6. Log recovery replays the primary super update over sector 0, and the
> >    new contents of sector 0 say lazysbcount is enabled.
> > 7. Superblock now says it has lazysbcount, what does the kernel do?
> 
> Yeah, but I'm not sure if it has some bad effect if ancient kernels
> do it in this way. I mean (I think) it's somewhat different from
> log_incompat thing.
> 
> If ancient kernels just replay the log, and then sb read verified
> and refuse to proceed (but fs is not corrupted...) I think that
> would be fine?
> 
> I'm not sure about the whole thing on ancient kernels. So very
> curious about this. I will look into the whole thing as well.
> 
> > 
> > > 
> > > but yeah, if any issues here, I should follow
> > >  1) bwrite sb block first;
> > >  2) log sb
> > > 
> > > > 
> > > > [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/
> > > > 
> > > > > +	}
> > > > >  	/* Overwrite incore superblock counters with just-read data */
> > > > >  	spin_lock(&mp->m_sb_lock);
> > > > >  	sbp->sb_ifree = ifree;
> > > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > > > index cb1e2c4702c3..b3b13acd45d6 100644
> > > > > --- a/fs/xfs/xfs_mount.c
> > > > > +++ b/fs/xfs/xfs_mount.c
> > > > > @@ -626,7 +626,7 @@ xfs_check_summary_counts(
> > > > >  	 * superblock to be correct and we don't need to do anything here.
> > > > >  	 * Otherwise, recalculate the summary counters.
> > > > >  	 */
> > > > > -	if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
> > > > > +	if ((xfs_sb_version_haslazysbcount(&mp->m_sb) &&
> > > > 
> > > > Not clear why the logic here inverts?
> > > 
> > > .. thus xfs_initialize_perag_data() below can be called then.
> > 
> > That seems like all the more reason to make it a separate function, TBH.
> 
> Yeah, will refine this later.
> 
> Thanks,
> Gao Xiang
> 
> > 
> > --D
> > 
> > > 
> > > Thanks,
> > > Gao Xiang
> > > 
> > > > 
> > > > --D
> > > > 
> > > > >  	     XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
> > > > >  	    !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
> > > > >  		return 0;
> > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > > index a2dab05332ac..16197a890c15 100644
> > > > > --- a/fs/xfs/xfs_super.c
> > > > > +++ b/fs/xfs/xfs_super.c
> > > > > @@ -1678,6 +1678,11 @@ xfs_remount_rw(
> > > > >  	}
> > > > >  
> > > > >  	mp->m_flags &= ~XFS_MOUNT_RDONLY;
> > > > > +	if (!xfs_sb_version_haslazysbcount(sbp)) {
> > > > > +		error = xfs_initialize_perag_data(mp, sbp->sb_agcount);
> > > > > +		if (error)
> > > > > +			return error;
> > > > > +	}
> > > > >  
> > > > >  	/*
> > > > >  	 * If this is the first remount to writeable state we might have some
> > > > > -- 
> > > > > 2.27.0
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
  2021-04-22  3:12               ` Gao Xiang
@ 2021-04-22 15:58                 ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2021-04-22 15:58 UTC (permalink / raw)
  To: Gao Xiang; +Cc: Dave Chinner, linux-xfs, Zorro Lang, Carlos Maiolino

On Thu, Apr 22, 2021 at 11:12:15AM +0800, Gao Xiang wrote:
> On Thu, Apr 22, 2021 at 01:01:02PM +1000, Dave Chinner wrote:
> > On Thu, Apr 22, 2021 at 10:06:13AM +0800, Gao Xiang wrote:
> > > Hi Dave,
> > > 
> > > On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote:
> > > > On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote:
> > > > > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote:
> > > > > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote:
> > > > > > #1 is bad because there are cases where we want to write the
> > > > > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc).
> > > > > > 
> > > > > > #2 is essentially a hack around the fact that mp->m_sb is not kept
> > > > > > up to date in the in-memory superblock for !lazysbcount filesystems.
> > > > > > 
> > > > > > #3 keeps the in-memory superblock up to date for !lazysbcount case
> > > > > > so they are coherent with the on-disk values and hence we only need
> > > > > > to update the in-memory superblock counts for lazysbcount
> > > > > > filesystems before calling xfs_sb_to_disk().
> > > > > > 
> > > > > > #3 is my preferred solution.
> > > > > > 
> > > > > > > That will indeed cause more modification, I'm not quite sure if it's
> > > > > > > quite ok honestly. But if you assume that's more clear, I could submit
> > > > > > > an alternative instead later.
> > > > > > 
> > > > > > I think the version you posted doesn't fix the entire problem. It
> > > > > > merely slaps a band-aid over the symptom that is being seen, and
> > > > > > doesn't address all the non-coherent data that can be written to the
> > > > > > superblock here.
> > > > > 
> > > > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks,
> > > > > icount and ifree are protected by sb buffer lock. and the only users of
> > > > > these three are:
> > > > >  1) xfs_trans_apply_sb_deltas()
> > > > >  2) xfs_log_sb()
> > > > 
> > > > That's just a happy accident and not intentional in any way. Just
> > > > fixing the case that occurs while holding the sb buffer lock doesn't
> > > > actually fix the underlying problem, it just uses this as a bandaid.
> > > 
> > > I think for !lazysbcases, sb buffer lock is only a reliable lock that
> > > can be relied on for serialzing (since we need to make sure each sb
> > > write matches the corresponding fdblocks, ifree, icount. So sb buffer
> > > needs be locked every time. So so need to recalc on dirty log.)
> > > > 
> > > > > 
> > > > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean
> > > > > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters.
> > > > > 
> > > > > But the reason why this patch exist is only to backport to old stable
> > > > > kernels, since after [PATCH v2 2/2], we can get rid of all of
> > > > > !lazysbcount cases upstream.
> > > > > 
> > > > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the
> > > > > xfs codebase quite varies these years. and I modified some version
> > > > > like http://paste.debian.net/1194481/
> > > > 
> > > > I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is
> > > > for. For !lazysbcount filesystems the transaction will be marked
> > > > dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the
> > > > slow path that takes the m_sb_lock and updates mp->m_sb. 
> > > > 
> > > > It's faster for me to explain this by patch than any other way. See
> > > > below.
> > > 
> > > I know what you mean, but there exists 3 things:
> > >  1) we be64_add_cpu() on-disk fdblocks, ifree, icount at
> > >     xfs_trans_apply_sb_deltas(), and then do the same bahavior in
> > >     xfs_trans_unreserve_and_mod_sb() for in-memory counters again.
> > >     that is (somewhat) fragile.
> > 
> > That's exactly how the superblock updates have been done since the
> > mid 1990s. It's the way it was intended to work:
> > 
> > - xfs_trans_apply_sb_deltas() applies the changes to the on
> >   disk superblock
> > - xfs_trans_unreserve_and_mod_sb() applies the changes to the
> >   in-memory superblock.
> > 
> > All my patch does is follow the long established separation of
> > update responsibilities. It is actually returning the code to the
> > behaviour we had before lazy superblock counters were introduced.
> > 
> > >  2) m_sb_lock behaves no effect at this. This lock between
> > >     xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still
> > >     sb buffer lock for !lazysbcount cases.
> > 
> > The m_sb_lock doesn't need to have any effect on this. It's to
> > prevent concurrent updates of the in-core superblock, not to prevent
> > access to the superblock buffer.
> > 
> > i.e. the superblock buffer lock protects against concurrent updates
> > of the superblock buffer, and hence while progating and logging
> > changes to the superblock buffer we have to have the superblock
> > buffer locked.
> > 
> > >  3) in-memory sb counters are serialized by some spinlock now,
> > 
> > No, they are not. Lazysbcount does not set XFS_TRANS_SB_DIRTY
> > for pure ifree/icount/fdblock updates, so it never runs the code
> > I modified in xfs_trans_unreserve_and_mod_sb() unless some other
> > part of the superblock is changed.
> > 
> > For !lazysbcount, we always run this path because XFS_TRANS_SB_DIRTY
> > is always set.
> > 
> > >     so I'm not sure sb per-CPU counters behave for lazysbcount
> > >     cases, are these used for better performance?
> > 
> > It does not change behaviour of anything at all, execpt the counter
> > values for !lazysbcount filesystems are now always kept correctly up
> > to date.
> > 
> > > >  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > > >  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > > index bcc978011869..438e41931b55 100644
> > > > --- a/fs/xfs/xfs_trans.c
> > > > +++ b/fs/xfs/xfs_trans.c
> > > > @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb(
> > > >  
> > > >  	/* apply remaining deltas */
> > > >  	spin_lock(&mp->m_sb_lock);
> > > > +	mp->m_sb.sb_fdblocks += blkdelta;
> > > 
> > > not sure that is quite equal to blkdelta, since (I think) we might need
> > > to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount
> > > cases, but I'm not quite sure, just saw the comment above
> > > xfs_trans_unreserve_and_mod_sb() and the implementation of
> > > xfs_trans_apply_sb_deltas().
> > 
> > Yes, I forgot about the special delayed allocation space accounting.
> > We'll have to add that, too, so it becomes:
> > 
> > +	mp->m_sb.sb_fdblocks += blkdelta + tp->t_res_fdblocks_delta;
> > +	mp->m_sb.sb_icount += idelta;
> > +	mp->m_sb.sb_ifree += ifreedelta;
> > 
> > But this doesn't change the structure of the patch in any way.
> 
> Anyway, I think this'd be absolutely fine to fix this issue as well,
> so:
> 
> Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
> 
> (Although I still insist on my v2 [just my own thought] since in-memory
>  sb counters are totally unused/reserved compared with on-disk sb counters
>  for sb_fdblocks and per-CPU sb counters for sb_ifree / sb_icount for
>  the whole !lazysbcount cases, maybe adding some comments is better.
>  But I'm also fine if the patch goes like this ;) )

Does this patch (+ other fixes) fix the problem?  If Zorro says it's ok,
please send this as a formal patch submission so it isn't buried in a
thread.

--D

> Thanks,
> Gao Xiang
> 
> > 
> > CHeers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-22 15:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang
2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang
2021-04-20 16:22   ` Darrick J. Wong
2021-04-20 20:00     ` Gao Xiang
2021-04-22  0:01       ` Darrick J. Wong
2021-04-22  1:51         ` Gao Xiang
2021-04-22  5:11           ` Zorro Lang
2021-04-20 17:42 ` [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Darrick J. Wong
2021-04-20 21:25 ` Dave Chinner
2021-04-20 21:54   ` Gao Xiang
2021-04-21  1:45     ` Dave Chinner
2021-04-21  3:01       ` Gao Xiang
2021-04-22  1:44         ` Dave Chinner
2021-04-22  2:06           ` Gao Xiang
2021-04-22  3:01             ` Dave Chinner
2021-04-22  3:12               ` Gao Xiang
2021-04-22 15:58                 ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.