All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"lkp@lists.01.org" <lkp@lists.01.org>,
	"lkp@intel.com" <lkp@intel.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"zhengjun.xing@linux.intel.com" <zhengjun.xing@linux.intel.com>,
	"fengwei.yin@intel.com" <fengwei.yin@intel.com>
Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression
Date: Thu, 17 Mar 2022 20:32:35 +0000	[thread overview]
Message-ID: <DC37F01B-A80F-4839-B4FB-C21F64943E64@vmware.com> (raw)
In-Reply-To: <96f9b880-876f-bf4d-8eb0-9ae8bbc8df6d@intel.com>



> On Mar 17, 2022, at 12:11 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 3/17/22 12:02, Nadav Amit wrote:
>>> This new "early lazy check" behavior could theoretically work both ways.
>>> If threads tended to be waking up from idle when TLB flushes were being
>>> sent, this would tend to reduce the number of IPIs.  But, since they
>>> tend to be going to sleep it increases the number of IPIs.
>>> 
>>> Anybody have a better theory?  I think we should probably revert the commit.
>> 
>> Let’s get back to the motivation behind this patch.
>> 
>> Originally we had an indirect branch that on system which are
>> vulnerable to Spectre v2 translates into a retpoline.
>> 
>> So I would not paraphrase this patch purpose as “early lazy check”
>> but instead “more efficient lazy check”. There is very little code
>> that was executed between the call to on_each_cpu_cond_mask() and
>> the actual check of tlb_is_not_lazy(). So what it seems to happen
>> in this test-case - according to what you say - is that *slower*
>> checks of is-lazy allows to send fewer IPIs since some cores go
>> into idle-state.
>> 
>> Was this test run with retpolines? If there is a difference in
>> performance without retpoline - I am probably wrong.
> 
> Nope, no retpolines:

Err..

> 
>> /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Enhanced IBRS, IBPB: conditional, RSB filling
> 
> which is the same situation as the "Xeon Platinum 8358" which found this
> in 0day.
> 
> Maybe the increased IPIs with this approach end up being a wash with the
> reduced retpoline overhead.
> 
> Did you have any specific performance numbers that show the benefit on
> retpoline systems?

I had profiled this thing to death at the time. I don’t have the numbers
with me now though. I did not run will-it-scale but a similar benchmark
that I wrote.

Another possible reason is that perhaps with this patch alone, without
subsequent patches we get some negative impact. I do not have a good
explanation, but can we rule this one out?

Can you please clarify how the bot works - did it notice a performance
regression and then started bisecting, or did it just check one patch
at a time?

I ask because I got a different report from the report that a
subsequent patch ("x86/mm/tlb: Privatize cpu_tlbstate”) made a
23.3% improvement [1] for a very similar (yet different) test.

Without a good explanation, my knee-jerk reaction is that this seems
as a pathological case. I do not expect performance improvement without
retpolines, and perhaps the few cycles in which the test of is-lazy
is performed earlier matter.

I’m not married to this patch, but before a revert it would be good
to know why it even matters. I wonder whether you can confirm that
reverting the patch (without the rest of the series) even helps. If
it does, I’ll try to run some tests to understand what the heck is
going on.

[1] https://lists.ofono.org/hyperkitty/list/lkp@lists.01.org/thread/UTC7DVZX4O5DKT2WUTWBTCVQ6W5QLGFA/



WARNING: multiple messages have this Message-ID (diff)
From: Nadav Amit <namit@vmware.com>
To: lkp@lists.01.org
Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression
Date: Thu, 17 Mar 2022 20:32:35 +0000	[thread overview]
Message-ID: <DC37F01B-A80F-4839-B4FB-C21F64943E64@vmware.com> (raw)
In-Reply-To: <96f9b880-876f-bf4d-8eb0-9ae8bbc8df6d@intel.com>

[-- Attachment #1: Type: text/plain, Size: 3089 bytes --]



> On Mar 17, 2022, at 12:11 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 3/17/22 12:02, Nadav Amit wrote:
>>> This new "early lazy check" behavior could theoretically work both ways.
>>> If threads tended to be waking up from idle when TLB flushes were being
>>> sent, this would tend to reduce the number of IPIs.  But, since they
>>> tend to be going to sleep it increases the number of IPIs.
>>> 
>>> Anybody have a better theory?  I think we should probably revert the commit.
>> 
>> Let’s get back to the motivation behind this patch.
>> 
>> Originally we had an indirect branch that on system which are
>> vulnerable to Spectre v2 translates into a retpoline.
>> 
>> So I would not paraphrase this patch purpose as “early lazy check”
>> but instead “more efficient lazy check”. There is very little code
>> that was executed between the call to on_each_cpu_cond_mask() and
>> the actual check of tlb_is_not_lazy(). So what it seems to happen
>> in this test-case - according to what you say - is that *slower*
>> checks of is-lazy allows to send fewer IPIs since some cores go
>> into idle-state.
>> 
>> Was this test run with retpolines? If there is a difference in
>> performance without retpoline - I am probably wrong.
> 
> Nope, no retpolines:

Err..

> 
>> /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Enhanced IBRS, IBPB: conditional, RSB filling
> 
> which is the same situation as the "Xeon Platinum 8358" which found this
> in 0day.
> 
> Maybe the increased IPIs with this approach end up being a wash with the
> reduced retpoline overhead.
> 
> Did you have any specific performance numbers that show the benefit on
> retpoline systems?

I had profiled this thing to death at the time. I don’t have the numbers
with me now though. I did not run will-it-scale but a similar benchmark
that I wrote.

Another possible reason is that perhaps with this patch alone, without
subsequent patches we get some negative impact. I do not have a good
explanation, but can we rule this one out?

Can you please clarify how the bot works - did it notice a performance
regression and then started bisecting, or did it just check one patch
at a time?

I ask because I got a different report from the report that a
subsequent patch ("x86/mm/tlb: Privatize cpu_tlbstate”) made a
23.3% improvement [1] for a very similar (yet different) test.

Without a good explanation, my knee-jerk reaction is that this seems
as a pathological case. I do not expect performance improvement without
retpolines, and perhaps the few cycles in which the test of is-lazy
is performed earlier matter.

I’m not married to this patch, but before a revert it would be good
to know why it even matters. I wonder whether you can confirm that
reverting the patch (without the rest of the series) even helps. If
it does, I’ll try to run some tests to understand what the heck is
going on.

[1] https://lists.ofono.org/hyperkitty/list/lkp(a)lists.01.org/thread/UTC7DVZX4O5DKT2WUTWBTCVQ6W5QLGFA/



  reply	other threads:[~2022-03-17 20:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17  9:04 [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression kernel test robot
2022-03-17  9:04 ` kernel test robot
2022-03-17 18:38 ` Dave Hansen
2022-03-17 18:38   ` Dave Hansen
2022-03-17 19:02   ` Nadav Amit
2022-03-17 19:02     ` Nadav Amit
2022-03-17 19:11     ` Dave Hansen
2022-03-17 19:11       ` Dave Hansen
2022-03-17 20:32       ` Nadav Amit [this message]
2022-03-17 20:32         ` Nadav Amit
2022-03-17 20:49         ` Dave Hansen
2022-03-17 20:49           ` Dave Hansen
2022-03-18  2:56           ` Oliver Sang
2022-03-18  2:56             ` Oliver Sang
2022-03-18  0:16         ` Dave Hansen
2022-03-18  0:16           ` Dave Hansen
2022-03-18  0:20           ` Nadav Amit
2022-03-18  0:20             ` Nadav Amit
2022-03-18  0:45             ` Dave Hansen
2022-03-18  0:45               ` Dave Hansen
2022-03-18  3:02               ` Nadav Amit
2022-03-18  3:02                 ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DC37F01B-A80F-4839-B4FB-C21F64943E64@vmware.com \
    --to=namit@vmware.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=mingo@kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=ying.huang@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.