Re: [PATCH] xfs: don't use in-core per-cpu fdblocks for !lazysbcount

From: Gao Xiang <hsiangkao@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	linux-xfs@vger.kernel.org, Zorro Lang <zlang@redhat.com>
Subject: Re: [PATCH] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
Date: Sun, 18 Apr 2021 07:59:48 +0800	[thread overview]
Message-ID: <20210417235948.GB2266103@xiangao.remote.csb> (raw)
In-Reply-To: <20210417223201.GU63242@dread.disaster.area>

Hi Dave,

On Sun, Apr 18, 2021 at 08:32:01AM +1000, Dave Chinner wrote:
> On Sat, Apr 17, 2021 at 10:20:13AM +0800, Gao Xiang wrote:

...

> > > > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > > > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > > > So aren't all three of these updates unnecessary?
> > > 
> > > Yup, now I understand the issue, the fix is simply to avoid these
> > > updates for !lazysb. i.e. it should just be:
> > > 
> > > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > 	}
> > > 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> > 
> > I did as this because xfs_sb_to_disk() will override them, see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629
> > 
> > ...
> > 	to->sb_icount = cpu_to_be64(from->sb_icount);
> > 	to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > 	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> 
> > As an alternative, I was once to wrap it as:
> > 
> > xfs_sb_to_disk() {
> > ...
> > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > 		to->sb_icount = cpu_to_be64(from->sb_icount);
> > 		to->sb_ifree = cpu_to_be64(from->sb_ifree);
> > 		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> > 	}
> > ...
> > }
> 

...

> 
> That is, xfs_trans_apply_sb_deltas() only applies deltas to the
> directly to the in-memory superblock in the case of !lazy-count, so
> these counters are actually a correct representation of the on-disk
> value of the accounting when lazy-count=0.
> 
> Hence we should always be able to write the counters in mp->m_sb
> directly to the on-disk superblock buffer in the case of
> lazy-count=0 and the values should be correct. lazy-count=1 only
> updates the mp->m_sb counters from the per-cpu counters so that the
> on-disk counters aren't wildly inaccruate, and so that when we
> unmount/freeze/etc the counters are actually correct.
> 
> Long story short, I think xfs_sb_to_disk() always updating the
> on-disk superblock from mp->m_sb is safe to do as the counters in
> mp->m_sb are updated in the same manner during transaction commit as
> the superblock buffer counters for lazy-count=0....

Thanks for your long words, I have to say I don't quite get what's
your thought here, if my understanding is correct,
xfs_trans_apply_sb_deltas() for !lazy-count case just directly
update on-disk superblock (rather than in-memory superblock), see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_trans.c?h=v5.12-rc2#n501

	if (!xfs_sb_version_haslazysbcount(&(tp->t_mountp->m_sb))) {
		if (tp->t_icount_delta)
			be64_add_cpu(&sbp->sb_icount, tp->t_icount_delta);
		if (tp->t_ifree_delta)
			be64_add_cpu(&sbp->sb_ifree, tp->t_ifree_delta);
		if (tp->t_fdblocks_delta)
			be64_add_cpu(&sbp->sb_fdblocks, tp->t_fdblocks_delta);
		if (tp->t_res_fdblocks_delta)
			be64_add_cpu(&sbp->sb_fdblocks, tp->t_res_fdblocks_delta);
	}

That is why I think in-memory mp->m_sb.sb_icount, mp->m_sb.sb_ifree,
mp->m_sb.sb_fdblocks are all outdated at all (kindly correct me if I'm wrong
here)... so xfs_sb_to_disk() will replace on-disk fields with outdated
in-memory counters.

> 
> > Yet after I observed the other callers of xfs_sb_to_disk() (e.g. growfs
> > and online repair), I think a better modification is the way I proposed
> > here, so no need to update xfs_sb_to_disk() and the other callers (since
> > !lazysbcount is not recommended at all.)
> 
> Yup that's the original reason for having a fields flag to do
> condition update of the on-disk buffer from the in-memory state.
> Different code has diferrent requirements, but it looked like this
> didn't matter for lazy-count filesystems because other checks
> avoided the update of m_sb fields. What was missed in that
> optimisation was the fact lazy-count=0 never updated the counters
> directly.
> 
> /me is now wondering why we even bother with !lazy-count anymore.
> 
> WE've updated the agr btree block accounting unconditionally since
> lazy-count was added, and scrub will always report a mismatch in
> counts if they exist regardless of lazy-count. So why don't we just
> start ignoring the on-disk value and always use lazy-count based
> updates?
> 
> We only added it as mkfs option/feature bit because of the recovery
> issue with not being able to account for btree blocks properly at
> mount time, but now we have mechanisms for counting blocks in btrees
> so even that has gone away. So we could actually just turn
> on lazy-count at mount time, and we could get rid of this whole
> set of subtle conditional behaviours we clearly aren't able to
> exercise effectively...

If my understanding of the words above is correct, maybe that could
be unfriendly when users turned back to some old kernels. But
considering lazysbcount has been landed for quite quite long time,
I think that is practical as 2 patches:
 1) fix sb counters for !lazysbcount;
 2) turn on lazysbcount at the mount time from now (and warn users).

> 
> > It's easier to backport and less conflict, and btw !lazysbcount also need
> > to be warned out and deprecated from now.
> 
> You have to use -m crc=0 to turn off lazycount, and the deprecation
> warning should come from -m crc=0...

Yes, but I think 2030 is too far for this !lazysbcount feature, since
it seems easy to cause potential bugs. I think maybe we could get rid
of it as soon as possible.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>