From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751834AbbBUVhY (ORCPT ); Sat, 21 Feb 2015 16:37:24 -0500 Received: from mail.skyhub.de ([78.46.96.112]:54179 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750746AbbBUVhX (ORCPT ); Sat, 21 Feb 2015 16:37:23 -0500 Date: Sat, 21 Feb 2015 22:36:25 +0100 From: Borislav Petkov To: Ingo Molnar Cc: Andy Lutomirski , Oleg Nesterov , Rik van Riel , x86@kernel.org, linux-kernel@vger.kernel.org, Linus Torvalds Subject: Re: [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs Message-ID: <20150221213625.GD32073@pd.tnic> References: <20150221093150.GA27841@gmail.com> <20150221163840.GA32073@pd.tnic> <20150221172914.GB32073@pd.tnic> <20150221183952.GD8406@gmail.com> <20150221191527.GC32073@pd.tnic> <20150221192352.GA10027@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150221192352.GA10027@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Feb 21, 2015 at 08:23:52PM +0100, Ingo Molnar wrote: > to switch between the modes? I went all out and did a debugfs file, see patch at the end, which counts FPU saves. Then I ran this script: --- #!/bin/bash D="/sys/kernel/debug/fpu/eager" echo "Lazy FPU: " echo 0 > $D echo -n " FPU saves before: "; cat $D perf stat -a -e task-clock,cycles,instructions,branch-misses,cache-misses,faults,context-switches,migrations --sync --pre ~/bin/pre-build-kernel.sh make -s -j12 echo -n " FPU saves after: "; cat $D echo "" echo "Eager FPU: " echo 1 > $D echo -n " FPU saves before: "; cat $D perf stat -a -e task-clock,cycles,instructions,branch-misses,cache-misses,faults,context-switches,migrations --sync --pre ~/bin/pre-build-kernel.sh make -s -j12 echo -n " FPU saves after: "; cat $D --- which spit this: Lazy FPU: FPU saves before: 3 Setup is 16252 bytes (padded to 16384 bytes). System is 4222 kB CRC c79a13ab Kernel: arch/x86/boot/bzImage is ready (#41) Performance counter stats for 'system wide': 1315527.989020 task-clock (msec) # 6.003 CPUs utilized [100.00%] 3,042,312,057,208 cycles # 2.313 GHz [100.00%] 2,790,807,863,402 instructions # 0.92 insns per cycle [100.00%] 31,658,299,111 branch-misses # 24.065 M/sec [100.00%] 27,504,255,277 cache-misses # 20.907 M/sec 26,802,015 faults # 0.020 M/sec [100.00%] 1,248,899 context-switches # 0.949 K/sec [100.00%] 69,553 migrations # 0.053 K/sec 219.127929718 seconds time elapsed FPU saves after: 704186 Eager FPU: FPU saves before: 4 Setup is 16252 bytes (padded to 16384 bytes). System is 4222 kB CRC 6767bb2e Kernel: arch/x86/boot/bzImage is ready (#42) Performance counter stats for 'system wide': 1321651.543922 task-clock (msec) # 6.003 CPUs utilized [100.00%] 3,044,403,437,364 cycles # 2.303 GHz [100.00%] 2,790,835,886,565 instructions # 0.92 insns per cycle [100.00%] 31,638,090,259 branch-misses # 23.938 M/sec [100.00%] 27,491,643,095 cache-misses # 20.801 M/sec 26,869,732 faults # 0.020 M/sec [100.00%] 1,252,034 context-switches # 0.947 K/sec [100.00%] 69,247 migrations # 0.052 K/sec 220.148034331 seconds time elapsed FPU saves after: 901638 --- so we have a second slowdown and 200K FPU saves more in eager mode. Provided I've not done a mistake, looks like the increase in cycles gets mirrored in 1 second time longer. I've not done the --repeat 10 thing again, maybe I should do it too, just to be fair as this is a single run. --- arch/x86/include/asm/fpu-internal.h | 4 ++++ arch/x86/kernel/xsave.c | 47 ++++++++++++++++++++++++++++++++++++- 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index e97622f57722..7141f353e960 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -38,6 +38,8 @@ int ia32_setup_frame(int sig, struct ksignal *ksig, # define ia32_setup_rt_frame __setup_rt_frame #endif + +extern unsigned long fpu_saved; extern unsigned int mxcsr_feature_mask; extern void fpu_init(void); extern void eager_fpu_init(void); @@ -242,6 +244,8 @@ static inline void fpu_fxsave(struct fpu *fpu) */ static inline int fpu_save_init(struct fpu *fpu) { + fpu_saved++; + if (use_xsave()) { fpu_xsave(fpu); diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index 0de1fae2bdf0..029de8b629d0 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -14,6 +14,8 @@ #include #include +#include + /* * Supported feature mask by the CPU and the kernel. */ @@ -638,7 +640,7 @@ static void __init xstate_enable_boot_cpu(void) setup_init_fpu_buf(); /* Auto enable eagerfpu for xsaveopt */ - if (cpu_has_xsaveopt && eagerfpu != DISABLE) + if (eagerfpu != DISABLE) eagerfpu = ENABLE; if (pcntxt_mask & XSTATE_EAGER) { @@ -739,3 +741,46 @@ void *get_xsave_addr(struct xsave_struct *xsave, int xstate) return (void *)xsave + xstate_comp_offsets[feature]; } EXPORT_SYMBOL_GPL(get_xsave_addr); + +unsigned long fpu_saved; + +static int eager_get(void *data, u64 *val) +{ + *val = fpu_saved; + + return 0; +} + +static int eager_set(void *data, u64 val) +{ + if (val) + setup_force_cpu_cap(X86_FEATURE_EAGER_FPU); + else + setup_clear_cpu_cap(X86_FEATURE_EAGER_FPU); + + fpu_saved = 0; + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(eager_fops, eager_get, eager_set, "%llu\n"); + +static int __init setup_eagerfpu_knob(void) +{ + static struct dentry *d_eager, *f_eager; + + d_eager = debugfs_create_dir("fpu", NULL); + if (!d_eager) { + pr_err("Error creating fpu debugfs dir\n"); + return -ENOMEM; + } + + f_eager = debugfs_create_file("eager", 0644, d_eager, NULL, &eager_fops); + if (!f_eager) { + pr_err("Error creating fpu debugfs node\n"); + return -ENOMEM; + } + + return 0; +} +late_initcall(setup_eagerfpu_knob); -- 2.2.0.33.gc18b867 -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --