linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Carel Si <beibei.si@intel.com>
Cc: lkp@intel.com, linux-kernel@vger.kernel.org, lkp@lists.01.org,
	fengwei.yin@intel.com, linux-xfs@vger.kernel.org,
	oliver.sang@intel.com
Subject: Re: [LKP] Re: [xfs]  32678f1513:  aim7.jobs-per-min -5.6% regression
Date: Mon, 9 May 2022 10:03:26 +1000	[thread overview]
Message-ID: <20220509000326.GN1098723@dread.disaster.area> (raw)
In-Reply-To: <20220507110941.GA10880@linux.intel.com>

On Sat, May 07, 2022 at 07:09:46PM +0800, Carel Si wrote:
> Hi Dave,
> 
> On Sat, May 07, 2022 at 07:29:24AM +1000, Dave Chinner wrote:
> > On Fri, May 06, 2022 at 05:22:50PM +0800, kernel test robot wrote:
> > > 
> > > 
> > > Greeting,
> > > 
> > > FYI, we noticed a -5.6% regression of aim7.jobs-per-min due to commit:
> > > 
> > > 
> > > commit: 32678f151338b9a321e9e27139a63c81f353acb7 ("[PATCH 1/4] xfs: detect self referencing btree sibling pointers")
> > > url: https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/xfs-fix-random-format-verification-issues/20220502-162206
> > > base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next
> > > patch link: https://lore.kernel.org/linux-xfs/20220502082018.1076561-2-david@fromorbit.com
> > 
> > Well, that answers the concern I had about the impact of 
> > changing the way endian conversions were done in that patch.
> > 
> > > a44a027a8b2a20fe 32678f151338b9a321e9e27139a 
> > > ---------------- --------------------------- 
> > >          %stddev     %change         %stddev
> > >              \          |                \  
> > >     464232            -5.6%     438315        aim7.jobs-per-min
> > ....
> > >       0.13 ±  5%      +0.2        0.33 ±  6%  perf-profile.children.cycles-pp.__xfs_btree_check_sblock
> > ....
> > >       0.11 ±  4%      +0.2        0.30 ±  5%  perf-profile.self.cycles-pp.__xfs_btree_check_sblock
> > 
> > Because there is it, right at the bottom of the profile.
> > 
> > Can you try the patch below and see if that fixes the issue?
> 
> We tested below patch, it didn't fix the issue, still has -6.4% regression [1] 
> comparing to a44a027a8b ("Merge tag 'large-extent-counters-v9' of 
> https://github.com/chandanr/linux into xfs-5.19-for-next"). 

Really? It made the regression *worse*?

Oh, wait, *that* wasn't in the last set of profiles:

....
>      35.30 ±  4%      +1.3       36.59 ±  3%      +1.9       37.24 ±  3%  perf-profile.children.cycles-pp.osq_lock
>      36.88 ±  4%      +1.3       38.19 ±  2%      +1.9       38.83 ±  3%  perf-profile.children.cycles-pp.rwsem_optimistic_spin
>      37.40 ±  4%      +1.4       38.77 ±  2%      +2.0       39.35 ±  3%  perf-profile.children.cycles-pp.rwsem_down_write_slowpath
....
>       1.89 ±  3%      +0.1        1.96 ±  4%      +0.0        1.90 ±  2%  perf-profile.self.cycles-pp.rwsem_spin_on_owner
>       0.11 ±  4%      +0.2        0.30 ±  5%      +0.2        0.29        perf-profile.self.cycles-pp.__xfs_btree_check_sblock
>      35.08 ±  4%      +1.3       36.33 ±  2%      +1.9       37.01 ±  3%  perf-profile.self.cycles-pp.osq_lock

This test is hammering an rwsem which is them spinning on exclusive
write locks somewhere. That's where all the increase in system time
has come from, and that's what's causing the difference in
perofrmance. I bet this test is hammering single file write IO from
all 96 CPUs at once, and that's where all the problems start.

IOWs, this likley has nothing to with the btree validation change,
and everything to do with the system being driven into a lock and
cacheline contention corner. When there is lock contention like this
on a large CPU count system, we can't infer anything at all from any
other measurement because the cacheline contention skews
everything...

I'm not going to look at this any further, and I think I'm going to
ignore AIM7 write test regressions in future until the test is
reconfigured to avoid this sort of "100 cpus all trying to do
buffered writes to the same file" contention issue.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

      reply	other threads:[~2022-05-09  1:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220502082018.1076561-2-david@fromorbit.com>
2022-05-06  9:22 ` [xfs] 32678f1513: aim7.jobs-per-min -5.6% regression kernel test robot
2022-05-06 21:29   ` Dave Chinner
2022-05-07 11:09     ` [LKP] " Carel Si
2022-05-09  0:03       ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220509000326.GN1098723@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=beibei.si@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=oliver.sang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).