All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 207/208] x86/fpu: Add FPU performance measurement subsystem
Date: Tue,  5 May 2015 19:58:31 +0200	[thread overview]
Message-ID: <1430848712-28064-47-git-send-email-mingo@kernel.org> (raw)
In-Reply-To: <1430848712-28064-1-git-send-email-mingo@kernel.org>

Add a short FPU performance suite that runs once during bootup.

It can be enabled via CONFIG_X86_DEBUG_FPU_PERFORMANCE=y.

  x86/fpu:##################################################################
  x86/fpu: Running FPU performance measurement suite (cache hot):
  x86/fpu: Cost of: null                                      :   108 cycles
  x86/fpu:########  CPU instructions:           ############################
  x86/fpu: Cost of: NOP                         insn          :     0 cycles
  x86/fpu: Cost of: RDTSC                       insn          :    12 cycles
  x86/fpu: Cost of: RDMSR                       insn          :   100 cycles
  x86/fpu: Cost of: WRMSR                       insn          :   396 cycles
  x86/fpu: Cost of: CLI                         insn  same-IF :     0 cycles
  x86/fpu: Cost of: CLI                         insn  flip-IF :     0 cycles
  x86/fpu: Cost of: STI                         insn  same-IF :     0 cycles
  x86/fpu: Cost of: STI                         insn  flip-IF :     0 cycles
  x86/fpu: Cost of: PUSHF                       insn          :     0 cycles
  x86/fpu: Cost of: POPF                        insn  same-IF :    20 cycles
  x86/fpu: Cost of: POPF                        insn  flip-IF :    28 cycles
  x86/fpu:########  IRQ save/restore APIs:      ############################
  x86/fpu: Cost of: local_irq_save()            fn            :    20 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    same-IF :    24 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles
  x86/fpu:########  locking APIs:               ############################
  x86/fpu: Cost of: smp_mb()                    fn            :    40 cycles
  x86/fpu: Cost of: cpu_relax()                 fn            :     8 cycles
  x86/fpu: Cost of: spin_lock()+unlock()        fn            :    64 cycles
  x86/fpu: Cost of: read_lock()+unlock()        fn            :    76 cycles
  x86/fpu: Cost of: write_lock()+unlock()       fn            :    52 cycles
  x86/fpu: Cost of: rcu_read_lock()+unlock()    fn            :    16 cycles
  x86/fpu: Cost of: preempt_disable()+enable()  fn            :    20 cycles
  x86/fpu: Cost of: mutex_lock()+unlock()       fn            :    56 cycles
  x86/fpu:########  MM instructions:            ############################
  x86/fpu: Cost of: __flush_tlb()               fn            :   132 cycles
  x86/fpu: Cost of: __flush_tlb_global()        fn            :   920 cycles
  x86/fpu: Cost of: __flush_tlb_one()           fn            :   288 cycles
  x86/fpu: Cost of: __flush_tlb_range()         fn            :   412 cycles
  x86/fpu:########  FPU instructions:           ############################
  x86/fpu: Cost of: CR0                         read          :     4 cycles
  x86/fpu: Cost of: CR0                         write         :   208 cycles
  x86/fpu: Cost of: CR0::TS                     fault         :  1156 cycles
  x86/fpu: Cost of: FNINIT                      insn          :    76 cycles
  x86/fpu: Cost of: FWAIT                       insn          :     0 cycles
  x86/fpu: Cost of: FSAVE                       insn          :   168 cycles
  x86/fpu: Cost of: FRSTOR                      insn          :   160 cycles
  x86/fpu: Cost of: FXSAVE                      insn          :    84 cycles
  x86/fpu: Cost of: FXRSTOR                     insn          :    44 cycles
  x86/fpu: Cost of: FXRSTOR                     fault         :   688 cycles
  x86/fpu: Cost of: XSAVE                       insn          :   104 cycles
  x86/fpu: Cost of: XRSTOR                      insn          :    80 cycles
  x86/fpu: Cost of: XRSTOR                      fault         :   884 cycles
  x86/fpu:##################################################################

on an AMD system:

  x86/fpu:##################################################################
  x86/fpu: Running FPU performance measurement suite (cache hot):
  x86/fpu: Cost of: null                                      :   144 cycles
  x86/fpu:########  CPU instructions:           ############################
  x86/fpu: Cost of: NOP                         insn          :     4 cycles
  x86/fpu: Cost of: RDTSC                       insn          :    71 cycles
  x86/fpu: Cost of: RDMSR                       insn          :    43 cycles
  x86/fpu: Cost of: WRMSR                       insn          :   148 cycles
  x86/fpu: Cost of: CLI                         insn  same-IF :     8 cycles
  x86/fpu: Cost of: CLI                         insn  flip-IF :     5 cycles
  x86/fpu: Cost of: STI                         insn  same-IF :    28 cycles
  x86/fpu: Cost of: STI                         insn  flip-IF :     0 cycles
  x86/fpu: Cost of: PUSHF                       insn          :    15 cycles
  x86/fpu: Cost of: POPF                        insn  same-IF :     8 cycles
  x86/fpu: Cost of: POPF                        insn  flip-IF :    12 cycles
  x86/fpu:########  IRQ save/restore APIs:      ############################
  x86/fpu: Cost of: local_irq_save()            fn            :     0 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    same-IF :     7 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    flip-IF :    20 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    same-IF :    20 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    flip-IF :    20 cycles
  x86/fpu:########  locking APIs:               ############################
  x86/fpu: Cost of: smp_mb()                    fn            :    38 cycles
  x86/fpu: Cost of: cpu_relax()                 fn            :     7 cycles
  x86/fpu: Cost of: spin_lock()+unlock()        fn            :    89 cycles
  x86/fpu: Cost of: read_lock()+unlock()        fn            :    91 cycles
  x86/fpu: Cost of: write_lock()+unlock()       fn            :    85 cycles
  x86/fpu: Cost of: rcu_read_lock()+unlock()    fn            :    30 cycles
  x86/fpu: Cost of: preempt_disable()+enable()  fn            :    38 cycles
  x86/fpu: Cost of: mutex_lock()+unlock()       fn            :    64 cycles
  x86/fpu:########  MM instructions:            ############################
  x86/fpu: Cost of: __flush_tlb()               fn            :   134 cycles
  x86/fpu: Cost of: __flush_tlb_global()        fn            :   547 cycles
  x86/fpu: Cost of: __flush_tlb_one()           fn            :   128 cycles
  x86/fpu: Cost of: __flush_tlb_range()         fn            :   539 cycles
  x86/fpu:########  FPU instructions:           ############################
  x86/fpu: Cost of: CR0                         read          :    16 cycles
  x86/fpu: Cost of: CR0                         write         :    83 cycles
  x86/fpu: Cost of: CR0::TS                     fault         :   691 cycles
  x86/fpu: Cost of: FNINIT                      insn          :   118 cycles
  x86/fpu: Cost of: FWAIT                       insn          :     4 cycles
  x86/fpu: Cost of: FSAVE                       insn          :   156 cycles
  x86/fpu: Cost of: FRSTOR                      insn          :   151 cycles
  x86/fpu: Cost of: FXSAVE                      insn          :    73 cycles
  x86/fpu: Cost of: FXRSTOR                     insn          :    86 cycles
  x86/fpu: Cost of: FXRSTOR                     fault         :   441 cycles
  x86/fpu:##################################################################

Note that there can be some jitter in the results between bootups.
The measurement takes the shortest of all runs, which is relatively
but not completely stable. So for example in the above test,
NOPs obviously don't take 3 cycles. Results are expected to be
relatively accurate for more complex instructions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Kconfig.debug             |  15 ++
 arch/x86/include/asm/fpu/measure.h |  13 ++
 arch/x86/kernel/cpu/bugs.c         |   2 +
 arch/x86/kernel/cpu/bugs_64.c      |   2 +
 arch/x86/kernel/fpu/Makefile       |   8 +-
 arch/x86/kernel/fpu/measure.c      | 509 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 548 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 2fd3ebbb4e33..8329635101f8 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -344,4 +344,19 @@ config X86_DEBUG_FPU
 
 	  If unsure, say N.
 
+config X86_DEBUG_FPU_PERFORMANCE
+	bool "Measure x86 FPU performance"
+	depends on DEBUG_KERNEL
+	---help---
+	  If this option is enabled then the kernel will run a short
+	  FPU (Floating Point Unit) benchmarking suite during bootup,
+	  to measure the cost of various FPU hardware operations and
+	  other kernel APIs.
+
+	  The results are printed to the kernel log.
+
+	  This extra benchmarking code will be freed after bootup.
+
+	  If unsure, say N.
+
 endmenu
diff --git a/arch/x86/include/asm/fpu/measure.h b/arch/x86/include/asm/fpu/measure.h
new file mode 100644
index 000000000000..d003809491c2
--- /dev/null
+++ b/arch/x86/include/asm/fpu/measure.h
@@ -0,0 +1,13 @@
+/*
+ * x86 FPU performance measurement methods:
+ */
+#ifndef _ASM_X86_FPU_MEASURE_H
+#define _ASM_X86_FPU_MEASURE_H
+
+#ifdef CONFIG_X86_DEBUG_FPU_PERFORMANCE
+extern void fpu__measure(void);
+#else
+static inline void fpu__measure(void) { }
+#endif
+
+#endif /* _ASM_X86_FPU_MEASURE_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bd17db15a2c1..1b947415d903 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -13,6 +13,7 @@
 #include <asm/processor.h>
 #include <asm/processor-flags.h>
 #include <asm/fpu/internal.h>
+#include <asm/fpu/measure.h>
 #include <asm/msr.h>
 #include <asm/paravirt.h>
 #include <asm/alternative.h>
@@ -37,6 +38,7 @@ void __init check_bugs(void)
 
 	init_utsname()->machine[1] =
 		'0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
+	fpu__measure();
 	alternative_instructions();
 
 	fpu__init_check_bugs();
diff --git a/arch/x86/kernel/cpu/bugs_64.c b/arch/x86/kernel/cpu/bugs_64.c
index 04f0fe5af83e..846c24aa14cf 100644
--- a/arch/x86/kernel/cpu/bugs_64.c
+++ b/arch/x86/kernel/cpu/bugs_64.c
@@ -8,6 +8,7 @@
 #include <asm/alternative.h>
 #include <asm/bugs.h>
 #include <asm/processor.h>
+#include <asm/fpu/measure.h>
 #include <asm/mtrr.h>
 #include <asm/cacheflush.h>
 
@@ -18,6 +19,7 @@ void __init check_bugs(void)
 	printk(KERN_INFO "CPU: ");
 	print_cpu_info(&boot_cpu_data);
 #endif
+	fpu__measure();
 	alternative_instructions();
 
 	/*
diff --git a/arch/x86/kernel/fpu/Makefile b/arch/x86/kernel/fpu/Makefile
index 68279efb811a..e7676c20bdde 100644
--- a/arch/x86/kernel/fpu/Makefile
+++ b/arch/x86/kernel/fpu/Makefile
@@ -2,4 +2,10 @@
 # Build rules for the FPU support code:
 #
 
-obj-y				+= init.o bugs.o core.o regset.o signal.o xstate.o
+obj-y					+= init.o bugs.o core.o regset.o signal.o xstate.o
+
+# Make the measured functions as simple as possible:
+CFLAGS_measure.o += -fomit-frame-pointer
+CFLAGS_REMOVE_measure.o = -pg
+
+obj-$(CONFIG_X86_DEBUG_FPU_PERFORMANCE) += measure.o
diff --git a/arch/x86/kernel/fpu/measure.c b/arch/x86/kernel/fpu/measure.c
new file mode 100644
index 000000000000..6232cdf240d8
--- /dev/null
+++ b/arch/x86/kernel/fpu/measure.c
@@ -0,0 +1,509 @@
+/*
+ * FPU performance measurement routines
+ */
+#include <asm/fpu/internal.h>
+#include <asm/tlbflush.h>
+
+#include <linux/kernel.h>
+
+/*
+ * Number of repeated measurements we do. We pick the fastest one:
+ */
+static int loops = 1000;
+
+/*
+ * Various small functions, whose overhead we measure:
+ */
+
+typedef void (*bench_fn_t)(void) __aligned(32);
+
+static void fn_empty(void)
+{
+}
+
+/* Basic instructions: */
+
+static void fn_nop(void)
+{
+	asm volatile ("nop");
+}
+
+static void fn_rdtsc(void)
+{
+	u32 low, high;
+
+	asm volatile ("rdtsc": "=a"(low), "=d"(high));
+}
+
+static void fn_rdmsr(void)
+{
+	u64 efer;
+
+	rdmsrl_safe(MSR_EFER, &efer);
+}
+
+static void fn_wrmsr(void)
+{
+	u64 efer;
+
+	if (!rdmsrl_safe(MSR_EFER, &efer))
+		wrmsrl_safe(MSR_EFER, efer);
+}
+
+static void fn_cli_same(void)
+{
+	asm volatile ("cli");
+}
+
+static void fn_cli_flip(void)
+{
+	asm volatile ("sti");
+	asm volatile ("cli");
+}
+
+static void fn_sti_same(void)
+{
+	asm volatile ("sti");
+}
+
+static void fn_sti_flip(void)
+{
+	asm volatile ("cli");
+	asm volatile ("sti");
+}
+
+static void fn_pushf(void)
+{
+	arch_local_save_flags();
+}
+
+static void fn_popf_baseline(void)
+{
+	arch_local_save_flags();
+	asm volatile ("cli");
+}
+
+static void fn_popf_flip(void)
+{
+	unsigned long flags = arch_local_save_flags();
+	asm volatile ("cli");
+
+	arch_local_irq_restore(flags);
+}
+
+static void fn_popf_same(void)
+{
+	unsigned long flags = arch_local_save_flags();
+
+	arch_local_irq_restore(flags);
+}
+
+/* Basic IRQ save/restore APIs: */
+
+static void fn_irq_save_baseline(void)
+{
+	local_irq_enable();
+}
+
+static void fn_irq_save(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+	local_irq_save(flags);
+}
+
+static void fn_irq_restore_flip(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+static void fn_irq_restore_same(void)
+{
+	unsigned long flags;
+
+	local_irq_disable();
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+static void fn_irq_save_restore_flip(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+static void fn_irq_save_restore_same(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+/* Basic locking primitives: */
+
+static void fn_smp_mb(void)
+{
+	smp_mb();
+}
+
+static void fn_cpu_relax(void)
+{
+	cpu_relax();
+}
+
+static DEFINE_SPINLOCK(test_spinlock);
+
+static void fn_spin_lock_unlock(void)
+{
+	spin_lock(&test_spinlock);
+	spin_unlock(&test_spinlock);
+}
+
+static DEFINE_RWLOCK(test_rwlock);
+
+static void fn_read_lock_unlock(void)
+{
+	read_lock(&test_rwlock);
+	read_unlock(&test_rwlock);
+}
+
+static void fn_write_lock_unlock(void)
+{
+	write_lock(&test_rwlock);
+	write_unlock(&test_rwlock);
+}
+
+static void fn_rcu_read_lock_unlock(void)
+{
+	rcu_read_lock();
+	rcu_read_unlock();
+}
+
+static void fn_preempt_disable_enable(void)
+{
+	preempt_disable();
+	preempt_enable();
+}
+
+static DEFINE_MUTEX(test_mutex);
+
+static void fn_mutex_lock_unlock(void)
+{
+	local_irq_enable();
+
+	mutex_lock(&test_mutex);
+	mutex_unlock(&test_mutex);
+}
+
+/* MM instructions: */
+
+static void fn_flush_tlb(void)
+{
+	__flush_tlb();
+}
+
+static void fn_flush_tlb_global(void)
+{
+	__flush_tlb_global();
+}
+
+static char tlb_flush_target[PAGE_SIZE] __aligned(4096);
+
+static void fn_flush_tlb_one(void)
+{
+	unsigned long addr = (unsigned long)&tlb_flush_target;
+
+	tlb_flush_target[0]++;
+	__flush_tlb_one(addr);
+}
+
+static void fn_flush_tlb_range(void)
+{
+	unsigned long start = (unsigned long)&tlb_flush_target;
+	unsigned long end = start+PAGE_SIZE;
+	struct mm_struct *mm_saved;
+
+	tlb_flush_target[0]++;
+
+	mm_saved = current->mm;
+	current->mm = current->active_mm;
+
+	flush_tlb_mm_range(current->active_mm, start, end, 0);
+
+	current->mm = mm_saved;
+}
+
+/* FPU instructions: */
+/* FPU instructions: */
+
+static void fn_read_cr0(void)
+{
+	read_cr0();
+}
+
+static void fn_rw_cr0(void)
+{
+	write_cr0(read_cr0());
+}
+
+static void fn_cr0_fault(void)
+{
+	struct fpu *fpu = &current->thread.fpu;
+	u32 cr0 = read_cr0();
+
+	write_cr0(cr0 | X86_CR0_TS);
+
+	asm volatile("fwait");
+
+	/* Zap the FP state we created via the fault: */
+	fpu->fpregs_active = 0;
+	fpu->fpstate_active = 0;
+
+	write_cr0(cr0);
+}
+
+static void fn_fninit(void)
+{
+	asm volatile ("fninit");
+}
+
+static void fn_fwait(void)
+{
+	asm volatile("fwait");
+}
+
+static void fn_fsave(void)
+{
+	static struct fregs_state fstate __aligned(32);
+
+	copy_fregs_to_user(&fstate);
+}
+
+static void fn_frstor(void)
+{
+	static struct fregs_state fstate __aligned(32);
+
+	copy_fregs_to_user(&fstate);
+	copy_user_to_fregs(&fstate);
+}
+
+static void fn_fxsave(void)
+{
+	struct fxregs_state fxstate __aligned(32);
+
+	copy_fxregs_to_user(&fxstate);
+}
+
+static void fn_fxrstor(void)
+{
+	static struct fxregs_state fxstate __aligned(32);
+
+	copy_fxregs_to_user(&fxstate);
+	copy_user_to_fxregs(&fxstate);
+}
+
+/*
+ * Provoke #GP on invalid FXRSTOR:
+ */
+static void fn_fxrstor_fault(void)
+{
+	static struct fxregs_state fxstate __aligned(32);
+	struct fpu *fpu = &current->thread.fpu;
+
+	copy_fxregs_to_user(&fxstate);
+
+	/* Set invalid MXCSR value, this will generate a #GP: */
+	fxstate.mxcsr = -1;
+
+	copy_user_to_fxregs(&fxstate);
+
+	/* Zap any FP state we created via the fault: */
+	fpu->fpregs_active = 0;
+	fpu->fpstate_active = 0;
+}
+
+static void fn_xsave(void)
+{
+	static struct xregs_state x __aligned(32);
+
+	copy_xregs_to_kernel_booting(&x);
+}
+
+static void fn_xrstor(void)
+{
+	static struct xregs_state x __aligned(32);
+
+	copy_xregs_to_kernel_booting(&x);
+	copy_kernel_to_xregs_booting(&x, -1);
+}
+
+/*
+ * Provoke #GP on invalid XRSTOR:
+ */
+static void fn_xrstor_fault(void)
+{
+	static struct xregs_state x __aligned(32);
+
+	copy_xregs_to_kernel_booting(&x);
+
+	/* Set invalid MXCSR value, this will generate a #GP: */
+	x.i387.mxcsr = -1;
+
+	copy_kernel_to_xregs_booting(&x, -1);
+}
+
+static s64
+measure(s64 null_overhead, bench_fn_t bench_fn,
+	const char *txt_1, const char *txt_2, const char *txt_3)
+{
+	unsigned long flags;
+	u32 cr0_saved;
+	int eager_saved;
+	u64 t0, t1;
+	s64 delta, delta_min;
+	int i;
+
+	delta_min = LONG_MAX;
+
+	/* Disable eagerfpu, so that we can provoke CR0::TS faults: */
+	eager_saved = boot_cpu_has(X86_FEATURE_EAGER_FPU);
+	setup_clear_cpu_cap(X86_FEATURE_EAGER_FPU);
+
+	/* Save CR0 so that we can freely set it to any value during measurement: */
+	cr0_saved = read_cr0();
+	/* Clear TS, so that we can measure FPU ops by default: */
+	write_cr0(cr0_saved & ~X86_CR0_TS);
+
+	local_irq_save(flags);
+
+	asm volatile (".align 32\n");
+
+	for (i = 0; i < loops; i++) {
+		rdtscll(t0);
+		mb();
+
+		bench_fn();
+
+		mb();
+		rdtscll(t1);
+		delta = t1-t0;
+		if (delta <= 0)
+			continue;
+
+		delta_min = min(delta_min, delta);
+	}
+
+	local_irq_restore(flags);
+	write_cr0(cr0_saved);
+
+	if (eager_saved)
+		setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
+
+	delta_min = max(0LL, delta_min-null_overhead);
+
+	if (txt_1) {
+		if (!txt_2)
+			txt_2 = "";
+		if (!txt_3)
+			txt_3 = "";
+		pr_info("x86/fpu: Cost of: %-27s %-5s %-8s: %5Ld cycles\n", txt_1, txt_2, txt_3, delta_min);
+	}
+
+	return delta_min;
+}
+
+/*
+ * Measure all the above primitives:
+ */
+void __init fpu__measure(void)
+{
+	s64 cost;
+	s64 rdmsr_cost;
+	s64 cli_cost, sti_cost, popf_cost, irq_save_cost;
+	s64 cr0_read_cost, cr0_write_cost;
+	s64 save_cost;
+
+	pr_info("x86/fpu:##################################################################\n");
+	pr_info("x86/fpu: Running FPU performance measurement suite (cache hot):\n");
+
+	cost = measure(0, fn_empty, "null", NULL, NULL);
+
+	pr_info("x86/fpu:########  CPU instructions:           ############################\n");
+	measure(cost, fn_nop, "NOP", "insn", NULL);
+	measure(cost, fn_rdtsc, "RDTSC", "insn", NULL);
+
+	rdmsr_cost = measure(cost, fn_rdmsr, "RDMSR", "insn", NULL);
+	measure(cost+rdmsr_cost, fn_wrmsr,"WRMSR", "insn", NULL);
+
+	cli_cost = measure(cost, fn_cli_same, "CLI", "insn", "same-IF");
+	measure(cost+cli_cost, fn_cli_flip, "CLI", "insn", "flip-IF");
+
+	sti_cost = measure(cost, fn_sti_same, "STI", "insn", "same-IF");
+	measure(cost+sti_cost, fn_sti_flip, "STI", "insn", "flip-IF");
+
+	measure(cost, fn_pushf,	"PUSHF", "insn", NULL);
+
+	popf_cost = measure(cost, fn_popf_baseline, NULL, NULL, NULL);
+	measure(cost+popf_cost, fn_popf_same, "POPF", "insn", "same-IF");
+	measure(cost+popf_cost, fn_popf_flip, "POPF", "insn", "flip-IF");
+
+	pr_info("x86/fpu:########  IRQ save/restore APIs:      ############################\n");
+	irq_save_cost = measure(cost, fn_irq_save_baseline, NULL, NULL, NULL);
+	irq_save_cost += measure(cost+irq_save_cost, fn_irq_save, "local_irq_save()", "fn", NULL);
+	measure(cost+irq_save_cost, fn_irq_restore_same, "local_irq_restore()", "fn", "same-IF");
+	measure(cost+irq_save_cost, fn_irq_restore_flip, "local_irq_restore()", "fn", "flip-IF");
+	measure(cost+sti_cost, fn_irq_save_restore_same, "irq_save()+restore()", "fn", "same-IF");
+	measure(cost+sti_cost, fn_irq_save_restore_flip, "irq_save()+restore()", "fn", "flip-IF");
+
+	pr_info("x86/fpu:########  locking APIs:               ############################\n");
+	measure(cost, fn_smp_mb, "smp_mb()", "fn", NULL);
+	measure(cost, fn_cpu_relax, "cpu_relax()", "fn", NULL);
+	measure(cost, fn_spin_lock_unlock, "spin_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_read_lock_unlock, "read_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_write_lock_unlock, "write_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_rcu_read_lock_unlock, "rcu_read_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_preempt_disable_enable, "preempt_disable()+enable()", "fn", NULL);
+	measure(cost+sti_cost, fn_mutex_lock_unlock, "mutex_lock()+unlock()", "fn", NULL);
+
+	pr_info("x86/fpu:########  MM instructions:            ############################\n");
+	measure(cost, fn_flush_tlb, "__flush_tlb()", "fn", NULL);
+	measure(cost, fn_flush_tlb_global, "__flush_tlb_global()", "fn", NULL);
+	measure(cost, fn_flush_tlb_one, "__flush_tlb_one()", "fn", NULL);
+	measure(cost, fn_flush_tlb_range, "__flush_tlb_range()", "fn", NULL);
+
+	pr_info("x86/fpu:########  FPU instructions:           ############################\n");
+	cr0_read_cost = measure(cost, fn_read_cr0, "CR0", "read", NULL);
+	cr0_write_cost = measure(cost+cr0_read_cost, fn_rw_cr0,	"CR0", "write", NULL);
+
+	measure(cost+cr0_read_cost+cr0_write_cost, fn_cr0_fault, "CR0::TS", "fault", NULL);
+
+	measure(cost, fn_fninit, "FNINIT", "insn", NULL);
+	measure(cost, fn_fwait,	"FWAIT", "insn", NULL);
+
+	save_cost = measure(cost, fn_fsave, "FSAVE", "insn", NULL);
+	measure(cost+save_cost, fn_frstor, "FRSTOR", "insn", NULL);
+
+	if (cpu_has_fxsr) {
+		save_cost = measure(cost, fn_fxsave, "FXSAVE", "insn", NULL);
+		measure(cost+save_cost, fn_fxrstor, "FXRSTOR", "insn", NULL);
+		measure(cost+save_cost, fn_fxrstor_fault,"FXRSTOR", "fault", NULL);
+	}
+	if (cpu_has_xsaveopt) {
+		save_cost = measure(cost, fn_xsave, "XSAVE", "insn", NULL);
+		measure(cost+save_cost, fn_xrstor, "XRSTOR", "insn", NULL);
+		measure(cost+save_cost, fn_xrstor_fault, "XRSTOR", "fault", NULL);
+	}
+	pr_info("x86/fpu:##################################################################\n");
+}
-- 
2.1.0


  parent reply	other threads:[~2015-05-05 18:01 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-05 17:57 [PATCH 000/208] big x86 FPU code rewrite Ingo Molnar
2015-05-05 17:57 ` [PATCH 162/208] x86/fpu, crypto x86/cast6_avx: Simplify the cast6_init() xfeature checks Ingo Molnar
2015-05-05 17:57 ` [PATCH 163/208] x86/fpu, crypto x86/sha1_ssse3: Simplify the sha1_ssse3_mod_init() " Ingo Molnar
2015-05-05 17:57 ` [PATCH 164/208] x86/fpu, crypto x86/serpent_avx2: Simplify the init() " Ingo Molnar
2015-05-05 17:57 ` [PATCH 165/208] x86/fpu, crypto x86/sha1_mb: Remove FPU internal headers from sha1_mb.c Ingo Molnar
2015-05-05 17:57 ` [PATCH 166/208] x86/fpu: Move asm/xcr.h to asm/fpu/internal.h Ingo Molnar
2015-05-05 17:57 ` [PATCH 167/208] x86/fpu: Rename sanitize_i387_state() to fpstate_sanitize_xstate() Ingo Molnar
2015-05-05 17:57 ` [PATCH 168/208] x86/fpu: Simplify fpstate_sanitize_xstate() calls Ingo Molnar
2015-05-05 17:57 ` [PATCH 169/208] x86/fpu: Pass 'struct fpu' to fpstate_sanitize_xstate() Ingo Molnar
2015-05-05 17:57 ` [PATCH 170/208] x86/fpu: Rename save_xstate_sig() to copy_fpstate_to_sigframe() Ingo Molnar
2015-05-05 17:57 ` [PATCH 171/208] x86/fpu: Rename save_user_xstate() to copy_fpregs_to_sigframe() Ingo Molnar
2015-05-05 17:57 ` [PATCH 172/208] x86/fpu: Clarify ancient comments in fpu__restore() Ingo Molnar
2015-05-05 17:57 ` [PATCH 173/208] x86/fpu: Rename user_has_fpu() to fpregs_active() Ingo Molnar
2015-05-05 17:57 ` [PATCH 174/208] x86/fpu: Initialize fpregs in fpu__init_cpu_generic() Ingo Molnar
2015-05-05 17:57 ` [PATCH 175/208] x86/fpu: Clean up fpu__clear() state handling Ingo Molnar
2015-05-05 17:58 ` [PATCH 176/208] x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state() Ingo Molnar
2015-05-05 22:47   ` Borislav Petkov
2015-05-06  2:57     ` Ingo Molnar
2015-05-05 17:58 ` [PATCH 177/208] x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state() Ingo Molnar
2015-05-05 17:58 ` [PATCH 178/208] x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs() Ingo Molnar
2015-05-05 17:58 ` [PATCH 179/208] x86/fpu: Move all the fpu__*() high level methods closer to each other Ingo Molnar
2015-05-05 17:58 ` [PATCH 180/208] x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing Ingo Molnar
2015-05-05 17:58 ` [PATCH 181/208] x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig() Ingo Molnar
2015-05-05 17:58 ` [PATCH 182/208] x86/fpu: Move the signal frame handling code closer to each other Ingo Molnar
2015-05-05 17:58 ` [PATCH 183/208] x86/fpu: Merge fpu__reset() and fpu__clear() Ingo Molnar
2015-05-05 17:58 ` [PATCH 184/208] x86/fpu: Move is_ia32*frame() helpers out of fpu/internal.h Ingo Molnar
2015-05-05 17:58 ` [PATCH 185/208] x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions Ingo Molnar
2015-05-05 17:58 ` [PATCH 186/208] x86/fpu: Factor out fpu/regset.h from fpu/internal.h Ingo Molnar
2015-05-05 17:58 ` [PATCH 187/208] x86/fpu: Remove run-once init quirks Ingo Molnar
2015-05-05 17:58 ` [PATCH 188/208] x86/fpu: Factor out the exception error code handling code Ingo Molnar
2015-05-05 17:58 ` [PATCH 189/208] x86/fpu: Harmonize the names of the fpstate_init() helper functions Ingo Molnar
2015-05-05 17:58 ` [PATCH 190/208] x86/fpu: Create 'union thread_xstate' helper for fpstate_init() Ingo Molnar
2015-05-05 17:58 ` [PATCH 191/208] x86/fpu: Generalize 'init_xstate_ctx' Ingo Molnar
2015-05-05 17:58 ` [PATCH 192/208] x86/fpu: Move restore_init_xstate() out of fpu/internal.h Ingo Molnar
2015-05-05 17:58 ` [PATCH 193/208] x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions Ingo Molnar
2015-05-12 21:54   ` Dave Hansen
2015-05-05 17:58 ` [PATCH 194/208] x86/fpu: Factor out fpu/signal.c Ingo Molnar
2015-05-05 17:58 ` [PATCH 195/208] x86/fpu: Factor out the FPU regset code into fpu/regset.c Ingo Molnar
2015-05-05 17:58 ` [PATCH 196/208] x86/fpu: Harmonize FPU register state types Ingo Molnar
2015-05-05 17:58 ` [PATCH 197/208] x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments Ingo Molnar
2015-05-05 17:58 ` [PATCH 198/208] x86/fpu: Document the various fpregs state formats Ingo Molnar
2015-05-05 19:52   ` Dave Hansen
2015-05-05 22:55     ` Yu, Fenghua
2015-05-06  4:20     ` Ingo Molnar
2015-05-05 17:58 ` [PATCH 199/208] x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin() Ingo Molnar
2015-05-05 17:58 ` [PATCH 200/208] x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end Ingo Molnar
2015-05-05 20:10   ` Dave Hansen
2015-05-05 23:04   ` Yu, Fenghua
2015-05-06  4:13     ` Ingo Molnar
2015-05-05 17:58 ` [PATCH 201/208] x86/fpu: Clean up xstate feature reservation Ingo Molnar
2015-05-05 20:12   ` Dave Hansen
2015-05-06  4:54     ` Ingo Molnar
2015-05-05 17:58 ` [PATCH 202/208] x86/fpu/xstate: Clean up setup_xstate_comp() call Ingo Molnar
2015-05-05 17:58 ` [PATCH 203/208] x86/fpu/init: Propagate __init annotations Ingo Molnar
2015-05-05 17:58 ` [PATCH 204/208] x86/fpu: Pass 'struct fpu' to fpu__restore() Ingo Molnar
2015-05-05 17:58 ` [PATCH 205/208] x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT Ingo Molnar
2015-05-05 17:58 ` [PATCH 206/208] x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code Ingo Molnar
2015-05-05 19:41   ` Borislav Petkov
2015-05-06  3:35     ` Ingo Molnar
2015-05-05 17:58 ` Ingo Molnar [this message]
2015-05-05 19:15   ` [PATCH 207/208] x86/fpu: Add FPU performance measurement subsystem Dave Hansen
2015-05-05 19:22     ` Borislav Petkov
2015-05-06  4:11     ` Ingo Molnar
2015-05-06  0:52   ` Andy Lutomirski
2015-05-06  4:52     ` Ingo Molnar
2015-05-06 15:53       ` Borislav Petkov
2015-05-07  2:52       ` Andy Lutomirski
2015-05-05 17:58 ` [PATCH 208/208] x86/fpu: Reorganize fpu/internal.h Ingo Molnar
2015-05-12 17:46 ` [PATCH 000/208] big x86 FPU code rewrite Dave Hansen
2015-05-29 18:53   ` Ingo Molnar
2015-05-19 21:41 ` Ingo Molnar
2015-05-27  1:22   ` Bobby Powers
2015-05-27 10:42     ` [PATCH] x86/fpu: Fix FPU register read access to the current task Ingo Molnar
2015-05-29 13:12       ` Bobby Powers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1430848712-28064-47-git-send-email-mingo@kernel.org \
    --to=mingo@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.