From: Oliver Sang <oliver.sang@intel.com> To: Dave Hansen <dave.hansen@intel.com> Cc: Nadav Amit <namit@vmware.com>, Ingo Molnar <mingo@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, LKML <linux-kernel@vger.kernel.org>, "lkp@lists.01.org" <lkp@lists.01.org>, "lkp@intel.com" <lkp@intel.com>, "ying.huang@intel.com" <ying.huang@intel.com>, "feng.tang@intel.com" <feng.tang@intel.com>, "zhengjun.xing@linux.intel.com" <zhengjun.xing@linux.intel.com>, "fengwei.yin@intel.com" <fengwei.yin@intel.com> Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression Date: Fri, 18 Mar 2022 10:56:31 +0800 [thread overview] Message-ID: <20220318025631.GA12658@xsang-OptiPlex-9020> (raw) In-Reply-To: <285cf565-ebf7-8cf5-4e57-2245608e79fb@intel.com> Hi Dave, Hi Nadav, On Thu, Mar 17, 2022 at 01:49:48PM -0700, Dave Hansen wrote: > On 3/17/22 13:32, Nadav Amit wrote: > > Can you please clarify how the bot works - did it notice a performance > > regression and then started bisecting, or did it just check one patch > > at a time? > > Oliver can tell us for sure, but it usually finds things by bisecting. > It will pick an upstream commit and compare it to the latest baseline. > If it sees a delta it starts bisecting for the triggering commit. > > It isn't a literal 'git bisect', but it's logically similar. yes, this is exactly how 0-day bot works. regarding below from Nadav, > > I ask because I got a different report from the report that a > > subsequent patch ("x86/mm/tlb: Privatize cpu_tlbstate”) made a > > 23.3% improvement [1] for a very similar (yet different) test. yes, we also noticed this: * 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate * 4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently * 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() * 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() * a32a4d8a815c4 smp: Run functions concurrently in smp_call_function_many_cond() * a38fd87484648 (tag: v5.12-rc2, but we confirmed there is no obvious performance change on this test upon 2f4305b19fe6a ("x86/mm/tlb: Privatize cpu_tlbstate") below are what we tested along mainline recently, from latest to old for this test: 7e57714cd0ad2 Linux 5.17-rc6 5533 5551 5572 5536 5544 5523 9137eda53752e Merge tag 'configfs-5.17-2022-02-25' of git://git.infradead.org/users/hch/configfs 5591 53ab78cd6d5ab Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux 5571 5569 5525 5542 d8152cfe2f21d Merge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci 5569 5541 23d04328444a8 Merge tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 5535 5565 5526 5c1ee569660d4 Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 5480 5527 5486 038101e6b2cd5 Merge tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 5508 cfb92440ee71a Linux 5.17-rc5 5506 754e0b0e35608 Linux 5.17-rc4 5498 e783362eb54cd Linux 5.17-rc1 5557 df0cc57e057f1 Linux 5.16 5571 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate 5601 5642 5674 5634 5678 5702 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() 5598 5571 5571 5639 5579 5587 5571 5582 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() 6292 6508 6478 6505 6411 6475 6269 6494 6474 as above show, the performance drop caused by 6035152d8eebe seems not recover on 2f4305b19fe6a and following. as a contrast, for the report "[x86/mm/tlb] 2f4305b19f: will-it-scale.per_thread_ops 23.3% improvement" which is a different subtest under will-it-scale, also on another platform: (we have much more history tests on it before we upgrade the ucode for this platform, so I only show partial of them): df0cc57e057f1 Linux 5.16 3247 fc74e0a40e4f9 Linux 5.16-rc7 3242 8bb7eca972ad5 Linux 5.15 2856 2879 2900 2871 2890 519d81956ee27 Linux 5.15-rc6 2822 64570fbc14f8d Linux 5.15-rc5 2820 2839 2852 2833 62fb9874f5da5 Linux 5.13 3311 3299 3288 13311e74253fe Linux 5.13-rc7 3302 3316 3303 9f4ad9e425a1d Linux 5.12 2765 2774 2779 2784 2768 1608e4cf31b88 x86/mm/tlb: Remove unnecessary uses of the inline keyword 3448 3447 3483 3506 3494 291c4011dd7ac cpumask: Mark functions as pure 3469 3520 3419 3437 3418 3499 09c5272e48614 x86/mm/tlb: Do not make is_lazy dirty for no reason 3421 3473 3392 3463 3474 3434 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate 3509 3475 3368 3450 3445 3442 4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently 2796 2792 2796 2812 2784 2796 2779 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() 2755 2797 2792 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() 2836 2827 2825 a38fd87484648 Linux 5.12-rc2 2997 2981 3003 as above, there is a performance improvement on 2f4305b19fe6a. but the data from this subtest seems more fluctuated along mainline. > > I did ask the 0day folks privately if they had any more performance data > on that commit: good, bad or neutral. we don't have other performance data on this commit so far. but this may mean there is no other bisection bisected to this commit. > > That commit didn't actually look to me like it was fundamental to > anything built after it. It might not revert cleanly, but it doesn't > look like it would be hard to logically remove. What other side-effects > are you worried about? > > BTW, there's also a dirt simple hack to do the on_each_cpu_cond_mask() > without a retpoline: > > if ((cond_func == tlb_is_not_lazy) && > !tlb_is_not_lazy(...)) > continue; > > You can't do that literally in arch-independent code, but you get the point. > > I know folks have discussed ways of doing this kind of stuff for other > high-value indirect calls. I need to see if there's anything around > that we could use.
WARNING: multiple messages have this Message-ID (diff)
From: Oliver Sang <oliver.sang@intel.com> To: lkp@lists.01.org Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression Date: Fri, 18 Mar 2022 10:56:31 +0800 [thread overview] Message-ID: <20220318025631.GA12658@xsang-OptiPlex-9020> (raw) In-Reply-To: <285cf565-ebf7-8cf5-4e57-2245608e79fb@intel.com> [-- Attachment #1: Type: text/plain, Size: 7258 bytes --] Hi Dave, Hi Nadav, On Thu, Mar 17, 2022 at 01:49:48PM -0700, Dave Hansen wrote: > On 3/17/22 13:32, Nadav Amit wrote: > > Can you please clarify how the bot works - did it notice a performance > > regression and then started bisecting, or did it just check one patch > > at a time? > > Oliver can tell us for sure, but it usually finds things by bisecting. > It will pick an upstream commit and compare it to the latest baseline. > If it sees a delta it starts bisecting for the triggering commit. > > It isn't a literal 'git bisect', but it's logically similar. yes, this is exactly how 0-day bot works. regarding below from Nadav, > > I ask because I got a different report from the report that a > > subsequent patch ("x86/mm/tlb: Privatize cpu_tlbstate”) made a > > 23.3% improvement [1] for a very similar (yet different) test. yes, we also noticed this: * 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate * 4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently * 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() * 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() * a32a4d8a815c4 smp: Run functions concurrently in smp_call_function_many_cond() * a38fd87484648 (tag: v5.12-rc2, but we confirmed there is no obvious performance change on this test upon 2f4305b19fe6a ("x86/mm/tlb: Privatize cpu_tlbstate") below are what we tested along mainline recently, from latest to old for this test: 7e57714cd0ad2 Linux 5.17-rc6 5533 5551 5572 5536 5544 5523 9137eda53752e Merge tag 'configfs-5.17-2022-02-25' of git://git.infradead.org/users/hch/configfs 5591 53ab78cd6d5ab Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux 5571 5569 5525 5542 d8152cfe2f21d Merge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci 5569 5541 23d04328444a8 Merge tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 5535 5565 5526 5c1ee569660d4 Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 5480 5527 5486 038101e6b2cd5 Merge tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 5508 cfb92440ee71a Linux 5.17-rc5 5506 754e0b0e35608 Linux 5.17-rc4 5498 e783362eb54cd Linux 5.17-rc1 5557 df0cc57e057f1 Linux 5.16 5571 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate 5601 5642 5674 5634 5678 5702 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() 5598 5571 5571 5639 5579 5587 5571 5582 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() 6292 6508 6478 6505 6411 6475 6269 6494 6474 as above show, the performance drop caused by 6035152d8eebe seems not recover on 2f4305b19fe6a and following. as a contrast, for the report "[x86/mm/tlb] 2f4305b19f: will-it-scale.per_thread_ops 23.3% improvement" which is a different subtest under will-it-scale, also on another platform: (we have much more history tests on it before we upgrade the ucode for this platform, so I only show partial of them): df0cc57e057f1 Linux 5.16 3247 fc74e0a40e4f9 Linux 5.16-rc7 3242 8bb7eca972ad5 Linux 5.15 2856 2879 2900 2871 2890 519d81956ee27 Linux 5.15-rc6 2822 64570fbc14f8d Linux 5.15-rc5 2820 2839 2852 2833 62fb9874f5da5 Linux 5.13 3311 3299 3288 13311e74253fe Linux 5.13-rc7 3302 3316 3303 9f4ad9e425a1d Linux 5.12 2765 2774 2779 2784 2768 1608e4cf31b88 x86/mm/tlb: Remove unnecessary uses of the inline keyword 3448 3447 3483 3506 3494 291c4011dd7ac cpumask: Mark functions as pure 3469 3520 3419 3437 3418 3499 09c5272e48614 x86/mm/tlb: Do not make is_lazy dirty for no reason 3421 3473 3392 3463 3474 3434 2f4305b19fe6a x86/mm/tlb: Privatize cpu_tlbstate 3509 3475 3368 3450 3445 3442 4ce94eabac16b x86/mm/tlb: Flush remote and local TLBs concurrently 2796 2792 2796 2812 2784 2796 2779 6035152d8eebe x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() 2755 2797 2792 4c1ba3923e6c8 x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() 2836 2827 2825 a38fd87484648 Linux 5.12-rc2 2997 2981 3003 as above, there is a performance improvement on 2f4305b19fe6a. but the data from this subtest seems more fluctuated along mainline. > > I did ask the 0day folks privately if they had any more performance data > on that commit: good, bad or neutral. we don't have other performance data on this commit so far. but this may mean there is no other bisection bisected to this commit. > > That commit didn't actually look to me like it was fundamental to > anything built after it. It might not revert cleanly, but it doesn't > look like it would be hard to logically remove. What other side-effects > are you worried about? > > BTW, there's also a dirt simple hack to do the on_each_cpu_cond_mask() > without a retpoline: > > if ((cond_func == tlb_is_not_lazy) && > !tlb_is_not_lazy(...)) > continue; > > You can't do that literally in arch-independent code, but you get the point. > > I know folks have discussed ways of doing this kind of stuff for other > high-value indirect calls. I need to see if there's anything around > that we could use.
next prev parent reply other threads:[~2022-03-18 2:56 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-17 9:04 [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression kernel test robot 2022-03-17 9:04 ` kernel test robot 2022-03-17 18:38 ` Dave Hansen 2022-03-17 18:38 ` Dave Hansen 2022-03-17 19:02 ` Nadav Amit 2022-03-17 19:02 ` Nadav Amit 2022-03-17 19:11 ` Dave Hansen 2022-03-17 19:11 ` Dave Hansen 2022-03-17 20:32 ` Nadav Amit 2022-03-17 20:32 ` Nadav Amit 2022-03-17 20:49 ` Dave Hansen 2022-03-17 20:49 ` Dave Hansen 2022-03-18 2:56 ` Oliver Sang [this message] 2022-03-18 2:56 ` Oliver Sang 2022-03-18 0:16 ` Dave Hansen 2022-03-18 0:16 ` Dave Hansen 2022-03-18 0:20 ` Nadav Amit 2022-03-18 0:20 ` Nadav Amit 2022-03-18 0:45 ` Dave Hansen 2022-03-18 0:45 ` Dave Hansen 2022-03-18 3:02 ` Nadav Amit 2022-03-18 3:02 ` Nadav Amit
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220318025631.GA12658@xsang-OptiPlex-9020 \ --to=oliver.sang@intel.com \ --cc=dave.hansen@intel.com \ --cc=dave.hansen@linux.intel.com \ --cc=feng.tang@intel.com \ --cc=fengwei.yin@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=lkp@intel.com \ --cc=lkp@lists.01.org \ --cc=mingo@kernel.org \ --cc=namit@vmware.com \ --cc=ying.huang@intel.com \ --cc=zhengjun.xing@linux.intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.