From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964891AbcBILYJ (ORCPT ); Tue, 9 Feb 2016 06:24:09 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:35049 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964868AbcBILYD (ORCPT ); Tue, 9 Feb 2016 06:24:03 -0500 Date: Tue, 9 Feb 2016 12:23:58 +0100 From: Ingo Molnar To: Will Deacon Cc: Linus Torvalds , Boqun Feng , Paul McKenney , Peter Zijlstra , "Maciej W. Rozycki" , David Daney , =?iso-8859-1?Q?M=E5ns_Rullg=E5rd?= , Ralf Baechle , Linux Kernel Mailing List Subject: Re: [RFC][PATCH] mips: Fix arch_spin_unlock() Message-ID: <20160209112358.GA500@gmail.com> References: <20160202093440.GD1239@fixme-laptop.cn.ibm.com> <20160202175127.GO10166@arm.com> <20160202193037.GQ10166@arm.com> <20160203083338.GA1772@gmail.com> <20160203133210.GC20217@arm.com> <20160203190307.GB15852@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160203190307.GB15852@arm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Will Deacon wrote: > On Wed, Feb 03, 2016 at 01:32:10PM +0000, Will Deacon wrote: > > On Wed, Feb 03, 2016 at 09:33:39AM +0100, Ingo Molnar wrote: > > > In fact I'd suggest to test this via a quick runtime hack like this in rcupdate.h: > > > > > > extern int panic_timeout; > > > > > > ... > > > > > > if (panic_timeout) > > > smp_load_acquire(p); > > > else > > > typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); > > > > > > (or so) > > > > So the problem with this is that a LOAD LOAD sequence isn't an > > ordering hazard on ARM, so you're potentially at the mercy of the branch > > predictor as to whether you get an acquire. That's not to say it won't > > be discarded as soon as the conditional is resolved, but it could > > screw up the benchmarking. > > > > I'd be better off doing some runtime patching, but that's not something > > I can knock up in a couple of minutes (so I'll add it to my list). > > ... so I actually got that up and running, believe it or not. Filthy stuff. Wow! I tried to implement the simpler solution by hacking rcupdate.h, but got drowned in nasty circular header file dependencies and gave up... If you are not overly embarrassed by posting hacky patches, mind posting your solution? > The good news is that you're right, and I'm now seeing ~1% difference between > the runs with ~0.3% noise for either of them. I still think that's significant, > but it's a lot more reassuring than 4%. hm, so for such marginal effects I think we could improve the testing method a bit: we could improve 'perf bench sched messaging' to allow 'steady state testing': to not exit+restart all the processes between test iterations, but to continuously measure and print out current performance figures. I.e. every 10 seconds it could print a decaying running average of current throughput. That way you could patch/unpatch the instructions without having to restart the tasks. If you still see an effect (in the numbers reported every 10 seconds), then that's a guaranteed result. [ We have such functionality in 'perf bench numa' (the --show-convergence option), for similar reasons, to allow runtime monitoring and tweaking of kernel parameters. ] Thanks, Ingo