All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Feng Tang <feng.tang@intel.com>
Cc: Oliver Sang <oliver.sang@intel.com>,
	Dave Chinner <dchinner@redhat.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	"Darrick J. Wong" <djwong@kernel.org>,
	linux-xfs@vger.kernel.org, ying.huang@intel.com,
	fengwei.yin@intel.com
Subject: Re: [linus:master] [xfs]  2edf06a50f:  fsmark.files_per_sec -5.7% regression
Date: Tue, 16 May 2023 08:20:34 +1000	[thread overview]
Message-ID: <20230515222034.GG3223426@dread.disaster.area> (raw)
In-Reply-To: <ZGDyAOewWqjY5xvJ@feng-clx>

On Sun, May 14, 2023 at 10:36:48PM +0800, Feng Tang wrote:
> Hi Dave,
> 
> On Sat, May 13, 2023 at 09:05:04AM +1000, Dave Chinner wrote:
> > On Fri, May 12, 2023 at 03:44:29PM +0800, Oliver Sang wrote:
> [...]
> > > Thanks a lot for guidance!
> > > 
> > > we plan to disable XFS_DEBUG (as well as XFS_WARN) in our performance tests.
> > > want to consult with you if this is the correct thing to do?
> > 
> > You can use XFS_WARN=y with performance tests - that elides all the
> > debug specific code that changes behaviour but leaves all the
> > ASSERT-based correctness checks in the code.
> > 
> > > and I guess we should still keep them in functional tests, am I right?
> > 
> > Yes.
> > 
> > > BTW, regarding this case, we tested again with disabling XFS_DEBUG (as well as
> > > XFS_WARN), kconfig is attached, only diff with last time is:
> > > -CONFIG_XFS_DEBUG=y
> > > -CONFIG_XFS_ASSERT_FATAL=y
> > > +# CONFIG_XFS_WARN is not set
> > > +# CONFIG_XFS_DEBUG is not set
> > > 
> > > but we still observed similar regression:
> > > 
> > > ecd788a92460eef4 2edf06a50f5bbe664283f3c55c4
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >    8176057 ± 15%      +4.7%    8558110        fsmark.app_overhead
> > >      14484            -6.3%      13568        fsmark.files_per_sec
> > 
> > So the application spent 5% more CPU time in userspace, and the rate
> > the kernel processed IO went down by 6%. Seems to me like
> > everything is running slower, not just the kernel code....
> > 
> > >     100.50 ±  5%      +0.3%     100.83        turbostat.Avg_MHz
> > >       5.54 ± 11%      +0.3        5.82        turbostat.Busy%
> > >       1863 ± 19%      -6.9%       1733        turbostat.Bzy_MHz
> > 
> > Evidence that the CPU is running at a 7% lower clock rate when the
> > results are 6% slower is a bit suspicious to me. Shouldn't the CPU
> > clock rate be fixed to the same value for A-B performance regression
> > testing?
> 
> For commit 2edf06a50f5, it seems to change the semantics a little
> about handling of 'flags' for xfs_alloc_fix_freelist(). With the debug
> below, the performance is restored.
> 
> 
> ecd788a92460eef4 2edf06a50f5bbe664283f3c55c4 68721405630744da1c07c9c1c3c 
> ---------------- --------------------------- --------------------------- 
> 
>      14349            -5.7%      13527            +0.6%      14437        fsmark.files_per_sec
>     486.29            +5.8%     514.28            -0.5%     483.70        fsmark.time.elapsed_time
> 
> Please help to review if the debug patch miss anything as I don't
> know the internals of xfs, thanks.

Well spotted. :)

The relevant commit dropped the trylock flag, so the perf regression
and change of allocator behaviour is due to it blocking on a busy AG
instead of skipping to the next uncontended on and so all
allocations came from extents in the last block of the free space
btree in that AG.

> 
> ---
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 98defd19e09e..8c85cc68c5f4 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3246,12 +3246,12 @@ xfs_alloc_vextent_set_fsbno(
>   */
>  static int
>  __xfs_alloc_vextent_this_ag(
> -	struct xfs_alloc_arg	*args)
> +	struct xfs_alloc_arg	*args, int flag)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	int			error;
>  
> -	error = xfs_alloc_fix_freelist(args, 0);
> +	error = xfs_alloc_fix_freelist(args, flag);
>  	if (error) {
>  		trace_xfs_alloc_vextent_nofix(args);
>  		return error;
> @@ -3289,7 +3289,7 @@ xfs_alloc_vextent_this_ag(
>  	}
>  
>  	args->pag = xfs_perag_get(mp, args->agno);
> -	error = __xfs_alloc_vextent_this_ag(args);
> +	error = __xfs_alloc_vextent_this_ag(args, 0);
>  
>  	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
>  	xfs_perag_put(args->pag);
> @@ -3329,7 +3329,7 @@ xfs_alloc_vextent_iterate_ags(
>  	args->agno = start_agno;
>  	for (;;) {
>  		args->pag = xfs_perag_get(mp, args->agno);
> -		error = __xfs_alloc_vextent_this_ag(args);
> +		error = __xfs_alloc_vextent_this_ag(args, flags);
>  		if (error) {
>  			args->agbno = NULLAGBLOCK;
>  			break;

I don't think this is the right way to fix this. The code is -very-
different at the end of the series that this is in the middle of,
and I need to check what callers now use the trylock behaviour to
determine how the trylock flag shold be passed around...

> Also for the turbostat.Bzy_MHz diff, IIUC, 0Day always uses
> 'performance' cpufreq governor. And as the test case is running
> 32 thread in a platform with 96 CPUs, there are many CPUs in idle
> state in average, and I suspect the Bzy_MHz may be calculated 
> considering those cpufreq and cpuidle factors.

If "busy MHz" includes the speed of idle CPUs, then it's not really
a measure of the speed of "busy" CPUs. If what you say is true, then
it is, at best, badly names - it would just be the "average Mhz",
right?

-Dave.

-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2023-05-15 22:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-09  2:13 [linus:master] [xfs] 2edf06a50f: fsmark.files_per_sec -5.7% regression kernel test robot
2023-05-09  6:54 ` Dave Chinner
2023-05-09  7:10   ` Dave Chinner
2023-05-12  7:44     ` Oliver Sang
2023-05-12 23:05       ` Dave Chinner
2023-05-14 14:36         ` Feng Tang
2023-05-15 16:57           ` Darrick J. Wong
2023-05-15 22:20           ` Dave Chinner [this message]
2023-05-16  2:46             ` Feng Tang
2023-05-16  3:07               ` Zhang, Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230515222034.GG3223426@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.