All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Sang <oliver.sang@intel.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Nadav Amit <namit@vmware.com>, Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"lkp@lists.01.org" <lkp@lists.01.org>,
	"lkp@intel.com" <lkp@intel.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"zhengjun.xing@linux.intel.com" <zhengjun.xing@linux.intel.com>,
	"fengwei.yin@intel.com" <fengwei.yin@intel.com>
Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression
Date: Fri, 18 Mar 2022 10:56:31 +0800	[thread overview]
Message-ID: <20220318025631.GA12658@xsang-OptiPlex-9020> (raw)
In-Reply-To: <285cf565-ebf7-8cf5-4e57-2245608e79fb@intel.com>

Hi Dave, Hi Nadav,

On Thu, Mar 17, 2022 at 01:49:48PM -0700, Dave Hansen wrote:
> On 3/17/22 13:32, Nadav Amit wrote:
> > Can you please clarify how the bot works - did it notice a performance
> > regression and then started bisecting, or did it just check one patch
> > at a time?
> 
> Oliver can tell us for sure, but it usually finds things by bisecting.
> It will pick an upstream commit and compare it to the latest baseline.
> If it sees a delta it starts bisecting for the triggering commit.
> 
> It isn't a literal 'git bisect', but it's logically similar.

yes, this is exactly how 0-day bot works.

regarding below from Nadav,
> > I ask because I got a different report from the report that a
> > subsequent patch ("x86/mm/tlb: Privatize cpu_tlbstate”) made a
> > 23.3% improvement [1] for a very similar (yet different) test.

yes, we also noticed this:
* 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate
* 4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently
* 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
* 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
* a32a4d8a815c4 smp: Run functions concurrently in smp_call_function_many_cond()
* a38fd87484648 (tag: v5.12-rc2,

but we confirmed there is no obvious performance change on this test upon
2f4305b19fe6a ("x86/mm/tlb: Privatize cpu_tlbstate")

below are what we tested along mainline recently, from latest to old for this
test:

7e57714cd0ad2 Linux 5.17-rc6                                                                                             5533 5551 5572 5536 5544 5523
9137eda53752e Merge tag 'configfs-5.17-2022-02-25' of git://git.infradead.org/users/hch/configfs                         5591
53ab78cd6d5ab Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux                 5571 5569 5525 5542
d8152cfe2f21d Merge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci                 5569 5541
23d04328444a8 Merge tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux         5535 5565 5526
5c1ee569660d4 Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup                   5480 5527 5486
038101e6b2cd5 Merge tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 5508
cfb92440ee71a Linux 5.17-rc5                                                                                             5506
754e0b0e35608 Linux 5.17-rc4                                                                                             5498
e783362eb54cd Linux 5.17-rc1                                                                                             5557
df0cc57e057f1 Linux 5.16                                                                                                 5571
2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate                                                                         5601 5642 5674 5634 5678 5702
6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()                                        5598 5571 5571 5639 5579 5587 5571 5582
4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()                                       6292 6508 6478 6505 6411 6475 6269 6494 6474

as above show, the performance drop caused by 6035152d8eebe seems not recover
on 2f4305b19fe6a and following.


as a contrast, for the report
"[x86/mm/tlb] 2f4305b19f: will-it-scale.per_thread_ops 23.3% improvement"
which is a different subtest under will-it-scale, also on another platform:
(we have much more history tests on it before we upgrade the ucode for
this platform, so I only show partial of them):

df0cc57e057f1 Linux 5.16                                                                                                 3247
fc74e0a40e4f9 Linux 5.16-rc7                                                                                             3242
8bb7eca972ad5 Linux 5.15                                                                                                 2856 2879 2900 2871 2890
519d81956ee27 Linux 5.15-rc6                                                                                             2822
64570fbc14f8d Linux 5.15-rc5                                                                                             2820 2839 2852 2833
62fb9874f5da5 Linux 5.13                                                                                                 3311 3299 3288
13311e74253fe Linux 5.13-rc7                                                                                             3302 3316 3303
9f4ad9e425a1d Linux 5.12                                                                                                 2765 2774 2779 2784 2768
1608e4cf31b88 x86/mm/tlb: Remove unnecessary uses of the inline keyword                                                  3448 3447 3483 3506 3494
291c4011dd7ac cpumask: Mark functions as pure                                                                            3469 3520 3419 3437 3418 3499
09c5272e48614 x86/mm/tlb: Do not make is_lazy dirty for no reason                                                        3421 3473 3392 3463 3474 3434
2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate                                                                         3509 3475 3368 3450 3445 3442
4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently                                                       2796 2792 2796 2812 2784 2796 2779
6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()                                        2755 2797 2792
4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()                                       2836 2827 2825
a38fd87484648 Linux 5.12-rc2                                                                                             2997 2981 3003

as above, there is a performance improvement on 2f4305b19fe6a.
but the data from this subtest seems more fluctuated along mainline.

> 
> I did ask the 0day folks privately if they had any more performance data
> on that commit: good, bad or neutral.

we don't have other performance data on this commit so far.
but this may mean there is no other bisection bisected to this commit.

> 
> That commit didn't actually look to me like it was fundamental to
> anything built after it.  It might not revert cleanly, but it doesn't
> look like it would be hard to logically remove.  What other side-effects
> are you worried about?
> 
> BTW, there's also a dirt simple hack to do the on_each_cpu_cond_mask()
> without a retpoline:
> 
> 	if ((cond_func == tlb_is_not_lazy) &&
>             !tlb_is_not_lazy(...))
> 		continue;
> 
> You can't do that literally in arch-independent code, but you get the point.
> 
> I know folks have discussed ways of doing this kind of stuff for other
> high-value indirect calls.  I need to see if there's anything around
> that we could use.

WARNING: multiple messages have this Message-ID (diff)
From: Oliver Sang <oliver.sang@intel.com>
To: lkp@lists.01.org
Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression
Date: Fri, 18 Mar 2022 10:56:31 +0800	[thread overview]
Message-ID: <20220318025631.GA12658@xsang-OptiPlex-9020> (raw)
In-Reply-To: <285cf565-ebf7-8cf5-4e57-2245608e79fb@intel.com>

[-- Attachment #1: Type: text/plain, Size: 7258 bytes --]

Hi Dave, Hi Nadav,

On Thu, Mar 17, 2022 at 01:49:48PM -0700, Dave Hansen wrote:
> On 3/17/22 13:32, Nadav Amit wrote:
> > Can you please clarify how the bot works - did it notice a performance
> > regression and then started bisecting, or did it just check one patch
> > at a time?
> 
> Oliver can tell us for sure, but it usually finds things by bisecting.
> It will pick an upstream commit and compare it to the latest baseline.
> If it sees a delta it starts bisecting for the triggering commit.
> 
> It isn't a literal 'git bisect', but it's logically similar.

yes, this is exactly how 0-day bot works.

regarding below from Nadav,
> > I ask because I got a different report from the report that a
> > subsequent patch ("x86/mm/tlb: Privatize cpu_tlbstate”) made a
> > 23.3% improvement [1] for a very similar (yet different) test.

yes, we also noticed this:
* 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate
* 4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently
* 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
* 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
* a32a4d8a815c4 smp: Run functions concurrently in smp_call_function_many_cond()
* a38fd87484648 (tag: v5.12-rc2,

but we confirmed there is no obvious performance change on this test upon
2f4305b19fe6a ("x86/mm/tlb: Privatize cpu_tlbstate")

below are what we tested along mainline recently, from latest to old for this
test:

7e57714cd0ad2 Linux 5.17-rc6                                                                                             5533 5551 5572 5536 5544 5523
9137eda53752e Merge tag 'configfs-5.17-2022-02-25' of git://git.infradead.org/users/hch/configfs                         5591
53ab78cd6d5ab Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux                 5571 5569 5525 5542
d8152cfe2f21d Merge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci                 5569 5541
23d04328444a8 Merge tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux         5535 5565 5526
5c1ee569660d4 Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup                   5480 5527 5486
038101e6b2cd5 Merge tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 5508
cfb92440ee71a Linux 5.17-rc5                                                                                             5506
754e0b0e35608 Linux 5.17-rc4                                                                                             5498
e783362eb54cd Linux 5.17-rc1                                                                                             5557
df0cc57e057f1 Linux 5.16                                                                                                 5571
2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate                                                                         5601 5642 5674 5634 5678 5702
6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()                                        5598 5571 5571 5639 5579 5587 5571 5582
4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()                                       6292 6508 6478 6505 6411 6475 6269 6494 6474

as above show, the performance drop caused by 6035152d8eebe seems not recover
on 2f4305b19fe6a and following.


as a contrast, for the report
"[x86/mm/tlb] 2f4305b19f: will-it-scale.per_thread_ops 23.3% improvement"
which is a different subtest under will-it-scale, also on another platform:
(we have much more history tests on it before we upgrade the ucode for
this platform, so I only show partial of them):

df0cc57e057f1 Linux 5.16                                                                                                 3247
fc74e0a40e4f9 Linux 5.16-rc7                                                                                             3242
8bb7eca972ad5 Linux 5.15                                                                                                 2856 2879 2900 2871 2890
519d81956ee27 Linux 5.15-rc6                                                                                             2822
64570fbc14f8d Linux 5.15-rc5                                                                                             2820 2839 2852 2833
62fb9874f5da5 Linux 5.13                                                                                                 3311 3299 3288
13311e74253fe Linux 5.13-rc7                                                                                             3302 3316 3303
9f4ad9e425a1d Linux 5.12                                                                                                 2765 2774 2779 2784 2768
1608e4cf31b88 x86/mm/tlb: Remove unnecessary uses of the inline keyword                                                  3448 3447 3483 3506 3494
291c4011dd7ac cpumask: Mark functions as pure                                                                            3469 3520 3419 3437 3418 3499
09c5272e48614 x86/mm/tlb: Do not make is_lazy dirty for no reason                                                        3421 3473 3392 3463 3474 3434
2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate                                                                         3509 3475 3368 3450 3445 3442
4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently                                                       2796 2792 2796 2812 2784 2796 2779
6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()                                        2755 2797 2792
4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()                                       2836 2827 2825
a38fd87484648 Linux 5.12-rc2                                                                                             2997 2981 3003

as above, there is a performance improvement on 2f4305b19fe6a.
but the data from this subtest seems more fluctuated along mainline.

> 
> I did ask the 0day folks privately if they had any more performance data
> on that commit: good, bad or neutral.

we don't have other performance data on this commit so far.
but this may mean there is no other bisection bisected to this commit.

> 
> That commit didn't actually look to me like it was fundamental to
> anything built after it.  It might not revert cleanly, but it doesn't
> look like it would be hard to logically remove.  What other side-effects
> are you worried about?
> 
> BTW, there's also a dirt simple hack to do the on_each_cpu_cond_mask()
> without a retpoline:
> 
> 	if ((cond_func == tlb_is_not_lazy) &&
>             !tlb_is_not_lazy(...))
> 		continue;
> 
> You can't do that literally in arch-independent code, but you get the point.
> 
> I know folks have discussed ways of doing this kind of stuff for other
> high-value indirect calls.  I need to see if there's anything around
> that we could use.

  reply	other threads:[~2022-03-18  2:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17  9:04 [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression kernel test robot
2022-03-17  9:04 ` kernel test robot
2022-03-17 18:38 ` Dave Hansen
2022-03-17 18:38   ` Dave Hansen
2022-03-17 19:02   ` Nadav Amit
2022-03-17 19:02     ` Nadav Amit
2022-03-17 19:11     ` Dave Hansen
2022-03-17 19:11       ` Dave Hansen
2022-03-17 20:32       ` Nadav Amit
2022-03-17 20:32         ` Nadav Amit
2022-03-17 20:49         ` Dave Hansen
2022-03-17 20:49           ` Dave Hansen
2022-03-18  2:56           ` Oliver Sang [this message]
2022-03-18  2:56             ` Oliver Sang
2022-03-18  0:16         ` Dave Hansen
2022-03-18  0:16           ` Dave Hansen
2022-03-18  0:20           ` Nadav Amit
2022-03-18  0:20             ` Nadav Amit
2022-03-18  0:45             ` Dave Hansen
2022-03-18  0:45               ` Dave Hansen
2022-03-18  3:02               ` Nadav Amit
2022-03-18  3:02                 ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220318025631.GA12658@xsang-OptiPlex-9020 \
    --to=oliver.sang@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=mingo@kernel.org \
    --cc=namit@vmware.com \
    --cc=ying.huang@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.