Thanks Boris! So the patch is helpful and no impact for other/older machines, I will re-send new version according to comments. Any further comments are appreciated! Regards Ling > -----Original Message----- > From: Borislav Petkov [mailto:bp@alien8.de] > Sent: Sunday, October 14, 2012 6:58 PM > To: Ma, Ling > Cc: Konrad Rzeszutek Wilk; mingo@elte.hu; hpa@zytor.com; > tglx@linutronix.de; linux-kernel@vger.kernel.org; iant@google.com; > George Spelvin > Subject: Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging > instruction sequence and saving register > > On Fri, Oct 12, 2012 at 08:04:11PM +0200, Borislav Petkov wrote: > > Right, so benchmark shows around 20% speedup on Bulldozer but this is > > a microbenchmark and before pursue this further, we need to verify > > whether this brings any palpable speedup with a real benchmark, I > > don't know, kernbench, netbench, whatever. Even something as boring > as > > kernel build. And probably check for perf regressions on the rest of > > the uarches. > > Ok, so to summarize, on AMD we're using REP MOVSQ which is even faster > than the unrolled version. I've added the REP MOVSQ version to the > µbenchmark. It nicely validates that we're correctly setting > X86_FEATURE_REP_GOOD on everything >= F10h and some K8s. > > So, to answer Konrad's question: those patches don't concern AMD > machines. > > Thanks. > > -- > Regards/Gruss, > Boris. {.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I