Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression

From: Nadav Amit <namit@vmware.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"lkp@lists.01.org" <lkp@lists.01.org>,
	"lkp@intel.com" <lkp@intel.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"zhengjun.xing@linux.intel.com" <zhengjun.xing@linux.intel.com>,
	"fengwei.yin@intel.com" <fengwei.yin@intel.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2% regression
Date: Fri, 18 Mar 2022 00:20:04 +0000	[thread overview]
Message-ID: <A185DAD5-3AA7-445B-B57D-AFAF6B55D144@vmware.com> (raw)
In-Reply-To: <dd8be93c-ded6-b962-50d4-96b1c3afb2b7@intel.com>

> On Mar 17, 2022, at 5:16 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 3/17/22 13:32, Nadav Amit wrote:
>> I’m not married to this patch, but before a revert it would be good
>> to know why it even matters. I wonder whether you can confirm that
>> reverting the patch (without the rest of the series) even helps. If
>> it does, I’ll try to run some tests to understand what the heck is
>> going on.
> 
> I went back and tested on a "Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz"
> which is evidently a 6-core "Coffee Lake".  It needs retpolines:
> 
>> /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full
> generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB
> filling
> 
> I ran the will-it-scale test:
> 
> 	./malloc1_threads -s 30 -t 12
> 
> and took the 30-second average "ops/sec" at the two commits:
> 
> 	4c1ba3923e:197876
> 	6035152d8e:199367 +0.75%
> 
> Where bigger is better.  So, a small win, but probably mostly in the
> noise.  The number of IPIs definitely went up, probably 3-4% to get that
> win.
> 
> IPI costs go up the more threads you throw at it.  The retpolines do
> too, though because you do *more* of them.  Systems with no retpolines
> get hit harder by the IPI costs and have no upsides from removing the
> retpoline.
> 
> So, we've got a small (<1%, possibly zero) win on the bulk of systems
> (which have retpolines).  Newer, retpoline-free systems see a
> double-digit regression.  The bigger the system, the bigger the
> regression (probably).
> 
> I tend to think the bigger regression wins and we should probably revert
> the patch, or at least back out its behavior.
> 
> Nadav, do you have some different data or a different take?

Thanks for testing.

I don’t have other data right now. Let me run some measurements later
tonight. I understand your explanation, but I still do not see how
much “later” can the lazy check be that it really matters. Just
strange.