All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support
@ 2015-08-05  3:18 Huang Rui
  2015-08-05  3:18 ` [PATCH v6 1/2] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Huang Rui @ 2015-08-05  3:18 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Len Brown,
	John Stultz, Frédéric Weisbecker
  Cc: linux-kernel, x86, Andreas Herrmann, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li, Huang Rui

Hi,

This patch set introduces a new instruction support on AMD Carrizo (Family
15h, Model 60h-6fh). It adds mwaitx delay function with a configurable
timer.

Andy and Boris provide a suggestion which uses mwaitx on delay method.

As Peter's suggestion of last version (v5), the serial of patch set
provides a test result.
http://marc.info/?l=linux-kernel&m=143436586513713&w=2

Some discussions of the background, please see:
http://marc.info/?l=linux-kernel&m=143202042530498&w=2
http://marc.info/?l=linux-kernel&m=143161327003541&w=2
http://marc.info/?l=linux-kernel&m=143222815331016&w=2

Patch set is rebased on tip/master.

Changes from v1 -> v2
- Remove mwaitx idle implementation since some disputes without power
  improvement.
- Add a patch which implement another use case on delay.
- Introduce a kernel parameter (delay) to make delay method configurable.

Changes from v2 -> v3
- Add compared data on commit message
- Remove kernel parameter
- Add hint to avoid to access deep state in future
- Update mwaitx delay method as Petter's suggestion

Changes from v3 -> v4
- Put the MONITORX/MWAITX description into comments

Changes from v4 -> v5
- Remove mwaitx function
- Use mwaitx_delay at init_amd
- Use cpu_tts as montioring address scope

Changes from v5 -> v6
- Move definitions into patch 1
- Completed the power consumption testing both with MWAITX and without
  MWAITX
- Use mwaitx_delay at bsp_init_amd

In MWAITX delay, the CPU core will be quiesced in a waiting phase,
diminishing its power consumption.

Run a simple test to measure power consumption:

cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc;
sleep 10000s;
cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc;

* TSC-based default delay:      485115 uWatts average power
* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The test
method relies on the support of AMD CPU accumulated power algorithm in
fam15_power for which patches are forthcoming.

Thanks,
Rui

Huang Rui (2):
  x86, mwaitt: add monitorx and mwaitx instruction
  x86, mwaitt: introduce mwaitx delay with a configurable timer

 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/include/asm/delay.h      |  1 +
 arch/x86/include/asm/mwait.h      | 43 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/amd.c         |  4 ++++
 arch/x86/lib/delay.c              | 48 ++++++++++++++++++++++++++++++++++++++-
 5 files changed, 96 insertions(+), 1 deletion(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/2] x86, mwaitt: add monitorx and mwaitx instruction
  2015-08-05  3:18 [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support Huang Rui
@ 2015-08-05  3:18 ` Huang Rui
  2015-08-05  3:18 ` [PATCH v6 2/2] x86, mwaitt: introduce mwaitx delay with a configurable timer Huang Rui
  2015-08-05  4:01 ` [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support Borislav Petkov
  2 siblings, 0 replies; 13+ messages in thread
From: Huang Rui @ 2015-08-05  3:18 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Len Brown,
	John Stultz, Frédéric Weisbecker
  Cc: linux-kernel, x86, Andreas Herrmann, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li, Huang Rui, Andreas Herrmann

On AMD Carrizo processors (Family 15h, Model 60h-6fh), there is a new
feature called MWAITT (MWAIT with a timer) as an extension of
MONITOR/MWAIT.

MWAITT, another name is MWAITX (MWAIT with extensions), has a configurable
timer that causes MWAITX to exit on expiration.

Compared with MONITOR/MWAIT, there are minor differences in opcode and
input parameters.

MWAITX ECX[1]: enable timer if set
MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks

                MWAIT                           MWAITX
opcode          0f 01 c9           |            0f 01 fb
ECX[0]                  value of RFLAGS.IF seen by instruction
ECX[1]          unused/#GP if set  |            enable timer if set
ECX[31:2]                     unused/#GP if set
EAX                           unused (reserve for hint)
EBX[31:0]       unused             |            max wait time (loops)

                MONITOR                         MONITORX
opcode          0f 01 c8           |            0f 01 fa
EAX                     (logical) address to monitor
ECX                     #GP if not zero

The software P0 frequency is the same as the TSC frequency.

Max timeout = EBX/(TSC frequency)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
---
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/include/asm/mwait.h      | 43 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 4b11974..9978a98 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -177,6 +177,7 @@
 #define X86_FEATURE_PERFCTR_NB  ( 6*32+24) /* NB performance counter extensions */
 #define X86_FEATURE_BPEXT	(6*32+26) /* data breakpoint extension */
 #define X86_FEATURE_PERFCTR_L2	( 6*32+28) /* L2 performance counter extensions */
+#define X86_FEATURE_MWAITT	( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 653dfa7..47f3540 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -14,6 +14,9 @@
 #define CPUID5_ECX_INTERRUPT_BREAK	0x2
 
 #define MWAIT_ECX_INTERRUPT_BREAK	0x1
+#define MWAITX_ECX_TIMER_ENABLE		BIT(1)
+#define MWAITX_MAX_LOOPS		((u32)-1)
+#define MWAITX_DISABLE_CSTATES		0xf
 
 static inline void __monitor(const void *eax, unsigned long ecx,
 			     unsigned long edx)
@@ -23,6 +26,14 @@ static inline void __monitor(const void *eax, unsigned long ecx,
 		     :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
+static inline void __monitorx(const void *eax, unsigned long ecx,
+			      unsigned long edx)
+{
+	/* "monitorx %eax, %ecx, %edx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfa;"
+		     :: "a" (eax), "c" (ecx), "d"(edx));
+}
+
 static inline void __mwait(unsigned long eax, unsigned long ecx)
 {
 	/* "mwait %eax, %ecx;" */
@@ -30,6 +41,38 @@ static inline void __mwait(unsigned long eax, unsigned long ecx)
 		     :: "a" (eax), "c" (ecx));
 }
 
+/*
+ * MWAITT allows for both a timer value to get you out of the MWAIT as
+ * well as the normal exit conditions.
+ *
+ * MWAITX ECX[1]: enable timer if set
+ * MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks
+ *
+ * Below is the compared data between MWAIT and MWAITX on AMD
+ * processors:
+ *                 MWAIT                           MWAITX
+ * opcode          0f 01 c9           |            0f 01 fb
+ * ECX[0]                  value of RFLAGS.IF seen by instruction
+ * ECX[1]          unused/#GP if set  |            enable timer if set
+ * ECX[31:2]                     unused/#GP if set
+ * EAX                           unused (reserve for hint)
+ * EBX[31:0]       unused             |            max wait time (loops)
+ *
+ *                 MONITOR                         MONITORX
+ * opcode          0f 01 c8           |            0f 01 fa
+ * EAX                     (logical) address to monitor
+ * ECX                     #GP if not zero
+ *
+ * The software P0 frequency is the same as the TSC frequency.
+ */
+static inline void __mwaitx(unsigned long eax, unsigned long ebx,
+			    unsigned long ecx)
+{
+	/* "mwaitx %eax, %ebx, %ecx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfb;"
+		     :: "a" (eax), "b" (ebx), "c" (ecx));
+}
+
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
 	trace_hardirqs_on();
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/2] x86, mwaitt: introduce mwaitx delay with a configurable timer
  2015-08-05  3:18 [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support Huang Rui
  2015-08-05  3:18 ` [PATCH v6 1/2] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
@ 2015-08-05  3:18 ` Huang Rui
  2015-08-06 15:14   ` Borislav Petkov
  2015-08-05  4:01 ` [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support Borislav Petkov
  2 siblings, 1 reply; 13+ messages in thread
From: Huang Rui @ 2015-08-05  3:18 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Len Brown,
	John Stultz, Frédéric Weisbecker
  Cc: linux-kernel, x86, Andreas Herrmann, Borislav Petkov,
	Fengguang Wu, Aaron Lu, Tony Li, Huang Rui, Andreas Herrmann

MWAITX can enable a timer and a corresponding timer value specified in
SW P0 clocks. The SW P0 frequency is the same as TSC. The timer
provides an upper bound on how long the instruction waits before
exiting.

The implementation of delay function in kernel can leverage the timer
of MWAITX. This patch provides a new method (delay_mwaitx) to measure
delay time.

In MWAITX delay, the CPU core will be quiesced in a waiting phase,
diminishing its power consumption.

Run a simple test to measure power consumption:

cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc;
sleep 10000s;
cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc;

* TSC-based default delay:      485115 uWatts average power
* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The test
method relies on the support of AMD CPU accumulated power algorithm in
fam15_power for which patches are forthcoming.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
---
 arch/x86/include/asm/delay.h |  1 +
 arch/x86/kernel/cpu/amd.c    |  4 ++++
 arch/x86/lib/delay.c         | 48 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
index 9b3b4f2..36a760b 100644
--- a/arch/x86/include/asm/delay.h
+++ b/arch/x86/include/asm/delay.h
@@ -4,5 +4,6 @@
 #include <asm-generic/delay.h>
 
 void use_tsc_delay(void);
+void use_mwaitx_delay(void);
 
 #endif /* _ASM_X86_DELAY_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 51ad2af..730e620 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -11,6 +11,7 @@
 #include <asm/cpu.h>
 #include <asm/smp.h>
 #include <asm/pci-direct.h>
+#include <asm/delay.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
@@ -506,6 +507,9 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 		/* A random value per boot for bit slice [12:upper_bit) */
 		va_align.bits = get_random_int() & va_align.mask;
 	}
+
+	if (cpu_has(c, X86_FEATURE_MWAITT))
+		use_mwaitx_delay();
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 4453d52..f8236cb 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -20,6 +20,7 @@
 #include <asm/processor.h>
 #include <asm/delay.h>
 #include <asm/timer.h>
+#include <asm/mwait.h>
 
 #ifdef CONFIG_SMP
 # include <asm/smp.h>
@@ -84,6 +85,45 @@ static void delay_tsc(unsigned long __loops)
 }
 
 /*
+ * On AMD platforms MWAITX has a configurable 32-bit timer, that
+ * counts with TSC frequency. And the input value is the loop of the
+ * counter, it will exit when the timer expires.
+ */
+static void delay_mwaitx(unsigned long __loops)
+{
+	u32 delay, loops = __loops;
+	u64 end, start;
+
+	start = rdtsc_ordered();
+
+	for (;;) {
+		delay = min(MWAITX_MAX_LOOPS, loops);
+
+		/*
+		 * Use cpu_tss as a cacheline-aligned, seldomly
+		 * accessed per-cpu variable as the monitor target.
+		 */
+		__monitorx(this_cpu_ptr(&cpu_tss), 0, 0);
+		/*
+		 * AMD, like Intel, supports the EAX hint and EAX=0xf
+		 * means, do not enter any deep C-state and we use it
+		 * here in delay() to minimize wakeup latency.
+		 */
+		__mwaitx(MWAITX_DISABLE_CSTATES, delay,
+			 MWAITX_ECX_TIMER_ENABLE);
+
+		end = rdtsc_ordered();
+
+		if (loops <= end - start)
+			break;
+
+		loops -= end - start;
+
+		start = end;
+	}
+}
+
+/*
  * Since we calibrate only once at boot, this
  * function should be set once at boot and not changed
  */
@@ -91,7 +131,13 @@ static void (*delay_fn)(unsigned long) = delay_loop;
 
 void use_tsc_delay(void)
 {
-	delay_fn = delay_tsc;
+	if (delay_fn == delay_loop)
+		delay_fn = delay_tsc;
+}
+
+void use_mwaitx_delay(void)
+{
+	delay_fn = delay_mwaitx;
 }
 
 int read_current_timer(unsigned long *timer_val)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support
  2015-08-05  3:18 [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support Huang Rui
  2015-08-05  3:18 ` [PATCH v6 1/2] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
  2015-08-05  3:18 ` [PATCH v6 2/2] x86, mwaitt: introduce mwaitx delay with a configurable timer Huang Rui
@ 2015-08-05  4:01 ` Borislav Petkov
  2 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2015-08-05  4:01 UTC (permalink / raw)
  To: Huang Rui
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel, x86,
	Andreas Herrmann, Fengguang Wu, Aaron Lu, Tony Li

On Wed, Aug 05, 2015 at 11:18:50AM +0800, Huang Rui wrote:
> cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc;
> sleep 10000s;
> cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc;
> 
> * TSC-based default delay:      485115 uWatts average power
> * MWAITX-based delay:           252738 uWatts average power
> 
> Thus, that's about 240 milliWatts less power consumption. The test
> method relies on the support of AMD CPU accumulated power algorithm in
> fam15_power for which patches are forthcoming.

Cool power consumption drop is actually even measureable.

Also, I think implementing it as a loop, as Peter suggested, was the
right thing to do due to this statement in MWAITX's definition in the
APM:

"There is no indication after exiting MWAITX of why the processor exited
or if the timer expired. It is up to software to check whether the
awaiting store has occurred, and if not, determining how much time
has elapsed if it wants to re-establish the MONITORX with a new timer
value."

So all in all, those patches are starting to shape up nicely. One small
nit I have is using "MWAITT" (with a T) together with MWAITX while the
APM calls it only MWAITX. But I can fix that when applying and drop all
MWAITT occurrences.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/2] x86, mwaitt: introduce mwaitx delay with a configurable timer
  2015-08-05  3:18 ` [PATCH v6 2/2] x86, mwaitt: introduce mwaitx delay with a configurable timer Huang Rui
@ 2015-08-06 15:14   ` Borislav Petkov
  2015-08-07  4:46     ` Huang Rui
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2015-08-06 15:14 UTC (permalink / raw)
  To: Huang Rui
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel, x86, Fengguang Wu,
	Aaron Lu, Tony Li, Andreas Herrmann

On Wed, Aug 05, 2015 at 11:18:52AM +0800, Huang Rui wrote:
> MWAITX can enable a timer and a corresponding timer value specified in
> SW P0 clocks. The SW P0 frequency is the same as TSC. The timer
> provides an upper bound on how long the instruction waits before
> exiting.
> 
> The implementation of delay function in kernel can leverage the timer
> of MWAITX. This patch provides a new method (delay_mwaitx) to measure
> delay time.

...

> +static void delay_mwaitx(unsigned long __loops)
> +{
> +	u32 delay, loops = __loops;
> +	u64 end, start;

Hmm, this truncates __loops in case someone wants to delay for more
than (u32)-1 TSC clocks. I guess the right thing to do is to do the
calculation with u64s and MWAITX_MAX_LOOPS will keep us within bounds.

Here's what I did:

---
From: Huang Rui <ray.huang@amd.com>
Date: Wed, 5 Aug 2015 11:18:52 +0800
Subject: [PATCH] x86/asm: Introduce an MWAITX-based delay with a configurable
 timer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

MWAITX can enable a timer and a corresponding timer value specified in
SW P0 clocks. The SW P0 frequency is the same as TSC. The timer provides
an upper bound on how long the instruction waits before exiting.

This way, a delay function in the kernel can leverage that MWAITX timer
of MWAITX.

When a CPU core executes MWAITX, it will be quiesced in a waiting phase,
diminishing its power consumption. This way, we can save power in
comparison to our default TSC-based delays.

A simple test shows that:

$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
$ sleep 10000s
$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc

Results:

* TSC-based default delay:      485115 uWatts average power
* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The test
method relies on the support of AMD CPU accumulated power algorithm in
fam15h_power for which patches are forthcoming.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
[ Fix delay truncation. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/delay.h |  1 +
 arch/x86/kernel/cpu/amd.c    |  4 ++++
 arch/x86/lib/delay.c         | 47 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
index 9b3b4f2754c7..36a760bda462 100644
--- a/arch/x86/include/asm/delay.h
+++ b/arch/x86/include/asm/delay.h
@@ -4,5 +4,6 @@
 #include <asm-generic/delay.h>
 
 void use_tsc_delay(void);
+void use_mwaitx_delay(void);
 
 #endif /* _ASM_X86_DELAY_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 51ad2af84a72..4a70fc6d400a 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -11,6 +11,7 @@
 #include <asm/cpu.h>
 #include <asm/smp.h>
 #include <asm/pci-direct.h>
+#include <asm/delay.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
@@ -506,6 +507,9 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 		/* A random value per boot for bit slice [12:upper_bit) */
 		va_align.bits = get_random_int() & va_align.mask;
 	}
+
+	if (cpu_has(c, X86_FEATURE_MWAITX))
+		use_mwaitx_delay();
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 4453d52a143d..e912b2f6d36e 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -20,6 +20,7 @@
 #include <asm/processor.h>
 #include <asm/delay.h>
 #include <asm/timer.h>
+#include <asm/mwait.h>
 
 #ifdef CONFIG_SMP
 # include <asm/smp.h>
@@ -84,6 +85,44 @@ static void delay_tsc(unsigned long __loops)
 }
 
 /*
+ * On some AMD platforms, MWAITX has a configurable 32-bit timer, that
+ * counts with TSC frequency. The input value is the loop of the
+ * counter, it will exit when the timer expires.
+ */
+static void delay_mwaitx(unsigned long __loops)
+{
+	u64 start, end, delay, loops = __loops;
+
+	start = rdtsc_ordered();
+
+	for (;;) {
+		delay = min_t(u64, MWAITX_MAX_LOOPS, loops);
+
+		/*
+		 * Use cpu_tss as a cacheline-aligned, seldomly
+		 * accessed per-cpu variable as the monitor target.
+		 */
+		__monitorx(this_cpu_ptr(&cpu_tss), 0, 0);
+
+		/*
+		 * AMD, like Intel, supports the EAX hint and EAX=0xf
+		 * means, do not enter any deep C-state and we use it
+		 * here in delay() to minimize wakeup latency.
+		 */
+		__mwaitx(MWAITX_DISABLE_CSTATES, delay, MWAITX_ECX_TIMER_ENABLE);
+
+		end = rdtsc_ordered();
+
+		if (loops <= end - start)
+			break;
+
+		loops -= end - start;
+
+		start = end;
+	}
+}
+
+/*
  * Since we calibrate only once at boot, this
  * function should be set once at boot and not changed
  */
@@ -91,7 +130,13 @@ static void (*delay_fn)(unsigned long) = delay_loop;
 
 void use_tsc_delay(void)
 {
-	delay_fn = delay_tsc;
+	if (delay_fn == delay_loop)
+		delay_fn = delay_tsc;
+}
+
+void use_mwaitx_delay(void)
+{
+	delay_fn = delay_mwaitx;
 }
 
 int read_current_timer(unsigned long *timer_val)
-- 
2.5.0.rc2.28.g6003e7f

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/2] x86, mwaitt: introduce mwaitx delay with a configurable timer
  2015-08-06 15:14   ` Borislav Petkov
@ 2015-08-07  4:46     ` Huang Rui
  0 siblings, 0 replies; 13+ messages in thread
From: Huang Rui @ 2015-08-07  4:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Rafael J. Wysocki, Len Brown, John Stultz,
	Frédéric Weisbecker, linux-kernel, x86, Fengguang Wu,
	Aaron Lu, Tony Li, Andreas Herrmann

On Thu, Aug 06, 2015 at 05:14:40PM +0200, Borislav Petkov wrote:
> On Wed, Aug 05, 2015 at 11:18:52AM +0800, Huang Rui wrote:
> > +static void delay_mwaitx(unsigned long __loops)
> > +{
> > +	u32 delay, loops = __loops;
> > +	u64 end, start;
> 
> Hmm, this truncates __loops in case someone wants to delay for more
> than (u32)-1 TSC clocks. I guess the right thing to do is to do the
> calculation with u64s and MWAITX_MAX_LOOPS will keep us within bounds.
> 

Yes, you're right. Thanks to update. :)

Thanks,
Rui

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/3] tip-queue 2015-08-10
@ 2015-08-10 10:19 Borislav Petkov
  2015-08-10 10:19 ` [PATCH 1/3] x86/microcode: Use kmemdup() rather than duplicating its implementation Borislav Petkov
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Borislav Petkov @ 2015-08-10 10:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Hi Ingo,

here's a little stuff which got done baking. More specifically, the use
of this new MWAITX insn on AMD as a more power-optimal delay function.
Patch 3's commit message has details as to how exactly it was measured.
The power measurement interface will be part of fam15h_power soon too.

Please queue for 4.3.

Btw, patch 3 uses rdtsc_ordered() which is in tip/x86/asm... Patch 1
goes to tip/x86/microcode, of course.

Thanks.

Andrzej Hajda (1):
  x86/microcode: Use kmemdup() rather than duplicating its
    implementation

Huang Rui (2):
  x86/asm: Add MONITORX/MWAITX insns support
  x86/asm: Introduce an MWAITX-based delay with a configurable timer


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] x86/microcode: Use kmemdup() rather than duplicating its implementation
  2015-08-10 10:19 [PATCH 0/3] tip-queue 2015-08-10 Borislav Petkov
@ 2015-08-10 10:19 ` Borislav Petkov
  2015-08-22 13:58   ` [tip:x86/microcode] " tip-bot for Andrzej Hajda
  2015-08-10 10:19 ` [PATCH 2/3] x86/asm: Add MONITORX/MWAITX insns support Borislav Petkov
  2015-08-10 10:19 ` [PATCH 3/3] x86/asm: Introduce an MWAITX-based delay with a configurable timer Borislav Petkov
  2 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2015-08-10 10:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Andrzej Hajda <a.hajda@samsung.com>

The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1438934377-4922-8-git-send-email-a.hajda@samsung.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/microcode/amd.c         | 4 +---
 arch/x86/kernel/cpu/microcode/intel_early.c | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index c7d2415b8a24..be37f101ce5a 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -366,15 +366,13 @@ static int verify_and_add_patch(u8 family, u8 *fw, unsigned int leftover)
 		return -EINVAL;
 	}
 
-	patch->data = kzalloc(patch_size, GFP_KERNEL);
+	patch->data = kmemdup(fw + SECTION_HDR_SIZE, patch_size, GFP_KERNEL);
 	if (!patch->data) {
 		pr_err("Patch data allocation failure.\n");
 		kfree(patch);
 		return -EINVAL;
 	}
 
-	/* All looks ok, copy patch... */
-	memcpy(patch->data, fw + SECTION_HDR_SIZE, patch_size);
 	INIT_LIST_HEAD(&patch->plist);
 	patch->patch_id  = mc_hdr->patch_id;
 	patch->equiv_cpu = proc_id;
diff --git a/arch/x86/kernel/cpu/microcode/intel_early.c b/arch/x86/kernel/cpu/microcode/intel_early.c
index 8187b7247d1c..101f0ac5f6e1 100644
--- a/arch/x86/kernel/cpu/microcode/intel_early.c
+++ b/arch/x86/kernel/cpu/microcode/intel_early.c
@@ -207,13 +207,11 @@ save_microcode(struct mc_saved_data *mc_saved_data,
 		mc_hdr = &mc->hdr;
 		size   = get_totalsize(mc_hdr);
 
-		saved_ptr[i] = kmalloc(size, GFP_KERNEL);
+		saved_ptr[i] = kmemdup(mc, size, GFP_KERNEL);
 		if (!saved_ptr[i]) {
 			ret = -ENOMEM;
 			goto err;
 		}
-
-		memcpy(saved_ptr[i], mc, size);
 	}
 
 	/*
-- 
2.5.0.rc2.28.g6003e7f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] x86/asm: Add MONITORX/MWAITX insns support
  2015-08-10 10:19 [PATCH 0/3] tip-queue 2015-08-10 Borislav Petkov
  2015-08-10 10:19 ` [PATCH 1/3] x86/microcode: Use kmemdup() rather than duplicating its implementation Borislav Petkov
@ 2015-08-10 10:19 ` Borislav Petkov
  2015-08-22 13:58   ` [tip:x86/asm] x86/asm: Add MONITORX/MWAITX instruction support tip-bot for Huang Rui
  2015-08-10 10:19 ` [PATCH 3/3] x86/asm: Introduce an MWAITX-based delay with a configurable timer Borislav Petkov
  2 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2015-08-10 10:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Huang Rui <ray.huang@amd.com>

AMD Carrizo processors (Family 15h, Models 60h-6fh) add a new feature
called MWAITX (MWAIT with extensions) as an extension to MONITOR/MWAIT.

This new instruction controls a configurable timer which causes the core
to exit wait state on timer expiration, in addition to "normal" MWAIT
condition of reading from a monitored VA.

Compared to MONITOR/MWAIT, there are minor differences in opcode and
input parameters:

MWAITX ECX[1]: enable timer if set
MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks == TSC.
The software P0 frequency is the same as the TSC frequency.

                MWAIT                           MWAITX
opcode          0f 01 c9           |            0f 01 fb
ECX[0]                  value of RFLAGS.IF seen by instruction
ECX[1]          unused/#GP if set  |            enable timer if set
ECX[31:2]                     unused/#GP if set
EAX                           unused (reserve for hint)
EBX[31:0]       unused             |            max wait time (SW P0 == TSC)

                MONITOR                         MONITORX
opcode          0f 01 c8           |            0f 01 fa
EAX                     (logical) address to monitor
ECX                     #GP if not zero

Max timeout = EBX/(TSC frequency)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1438744732-1459-2-git-send-email-ray.huang@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/include/asm/mwait.h      | 45 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 3d6606fb97d0..a39e5708209b 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -176,6 +176,7 @@
 #define X86_FEATURE_PERFCTR_NB  ( 6*32+24) /* NB performance counter extensions */
 #define X86_FEATURE_BPEXT	(6*32+26) /* data breakpoint extension */
 #define X86_FEATURE_PERFCTR_L2	( 6*32+28) /* L2 performance counter extensions */
+#define X86_FEATURE_MWAITX	( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 653dfa7662e1..c70689b5e5aa 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -14,6 +14,9 @@
 #define CPUID5_ECX_INTERRUPT_BREAK	0x2
 
 #define MWAIT_ECX_INTERRUPT_BREAK	0x1
+#define MWAITX_ECX_TIMER_ENABLE		BIT(1)
+#define MWAITX_MAX_LOOPS		((u32)-1)
+#define MWAITX_DISABLE_CSTATES		0xf
 
 static inline void __monitor(const void *eax, unsigned long ecx,
 			     unsigned long edx)
@@ -23,6 +26,14 @@ static inline void __monitor(const void *eax, unsigned long ecx,
 		     :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
+static inline void __monitorx(const void *eax, unsigned long ecx,
+			      unsigned long edx)
+{
+	/* "monitorx %eax, %ecx, %edx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfa;"
+		     :: "a" (eax), "c" (ecx), "d"(edx));
+}
+
 static inline void __mwait(unsigned long eax, unsigned long ecx)
 {
 	/* "mwait %eax, %ecx;" */
@@ -30,6 +41,40 @@ static inline void __mwait(unsigned long eax, unsigned long ecx)
 		     :: "a" (eax), "c" (ecx));
 }
 
+/*
+ * MWAITX allows for a timer expiration to get the core out a wait state in
+ * addition to the default MWAIT exit condition of a store appearing at a
+ * monitored virtual address.
+ *
+ * Registers:
+ *
+ * MWAITX ECX[1]: enable timer if set
+ * MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks. The software P0
+ * frequency is the same as the TSC frequency.
+ *
+ * Below is a comparison between MWAIT and MWAITX on AMD processors:
+ *
+ *                 MWAIT                           MWAITX
+ * opcode          0f 01 c9           |            0f 01 fb
+ * ECX[0]                  value of RFLAGS.IF seen by instruction
+ * ECX[1]          unused/#GP if set  |            enable timer if set
+ * ECX[31:2]                     unused/#GP if set
+ * EAX                           unused (reserve for hint)
+ * EBX[31:0]       unused             |            max wait time (P0 clocks)
+ *
+ *                 MONITOR                         MONITORX
+ * opcode          0f 01 c8           |            0f 01 fa
+ * EAX                     (logical) address to monitor
+ * ECX                     #GP if not zero
+ */
+static inline void __mwaitx(unsigned long eax, unsigned long ebx,
+			    unsigned long ecx)
+{
+	/* "mwaitx %eax, %ebx, %ecx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfb;"
+		     :: "a" (eax), "b" (ebx), "c" (ecx));
+}
+
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
 	trace_hardirqs_on();
-- 
2.5.0.rc2.28.g6003e7f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] x86/asm: Introduce an MWAITX-based delay with a configurable timer
  2015-08-10 10:19 [PATCH 0/3] tip-queue 2015-08-10 Borislav Petkov
  2015-08-10 10:19 ` [PATCH 1/3] x86/microcode: Use kmemdup() rather than duplicating its implementation Borislav Petkov
  2015-08-10 10:19 ` [PATCH 2/3] x86/asm: Add MONITORX/MWAITX insns support Borislav Petkov
@ 2015-08-10 10:19 ` Borislav Petkov
  2015-08-22 13:58   ` [tip:x86/asm] x86/asm/delay: " tip-bot for Huang Rui
  2 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2015-08-10 10:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Huang Rui <ray.huang@amd.com>

MWAITX can enable a timer and a corresponding timer value specified in
SW P0 clocks. The SW P0 frequency is the same as TSC. The timer provides
an upper bound on how long the instruction waits before exiting.

This way, a delay function in the kernel can leverage that MWAITX timer
of MWAITX.

When a CPU core executes MWAITX, it will be quiesced in a waiting phase,
diminishing its power consumption. This way, we can save power in
comparison to our default TSC-based delays.

A simple test shows that:

$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
$ sleep 10000s
$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc

Results:

* TSC-based default delay:      485115 uWatts average power
* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The test
method relies on the support of AMD CPU accumulated power algorithm in
fam15h_power for which patches are forthcoming.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
[ Fix delay truncation. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/delay.h |  1 +
 arch/x86/kernel/cpu/amd.c    |  4 ++++
 arch/x86/lib/delay.c         | 47 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
index 9b3b4f2754c7..36a760bda462 100644
--- a/arch/x86/include/asm/delay.h
+++ b/arch/x86/include/asm/delay.h
@@ -4,5 +4,6 @@
 #include <asm-generic/delay.h>
 
 void use_tsc_delay(void);
+void use_mwaitx_delay(void);
 
 #endif /* _ASM_X86_DELAY_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 51ad2af84a72..4a70fc6d400a 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -11,6 +11,7 @@
 #include <asm/cpu.h>
 #include <asm/smp.h>
 #include <asm/pci-direct.h>
+#include <asm/delay.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
@@ -506,6 +507,9 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 		/* A random value per boot for bit slice [12:upper_bit) */
 		va_align.bits = get_random_int() & va_align.mask;
 	}
+
+	if (cpu_has(c, X86_FEATURE_MWAITX))
+		use_mwaitx_delay();
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 4453d52a143d..e912b2f6d36e 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -20,6 +20,7 @@
 #include <asm/processor.h>
 #include <asm/delay.h>
 #include <asm/timer.h>
+#include <asm/mwait.h>
 
 #ifdef CONFIG_SMP
 # include <asm/smp.h>
@@ -84,6 +85,44 @@ static void delay_tsc(unsigned long __loops)
 }
 
 /*
+ * On some AMD platforms, MWAITX has a configurable 32-bit timer, that
+ * counts with TSC frequency. The input value is the loop of the
+ * counter, it will exit when the timer expires.
+ */
+static void delay_mwaitx(unsigned long __loops)
+{
+	u64 start, end, delay, loops = __loops;
+
+	start = rdtsc_ordered();
+
+	for (;;) {
+		delay = min_t(u64, MWAITX_MAX_LOOPS, loops);
+
+		/*
+		 * Use cpu_tss as a cacheline-aligned, seldomly
+		 * accessed per-cpu variable as the monitor target.
+		 */
+		__monitorx(this_cpu_ptr(&cpu_tss), 0, 0);
+
+		/*
+		 * AMD, like Intel, supports the EAX hint and EAX=0xf
+		 * means, do not enter any deep C-state and we use it
+		 * here in delay() to minimize wakeup latency.
+		 */
+		__mwaitx(MWAITX_DISABLE_CSTATES, delay, MWAITX_ECX_TIMER_ENABLE);
+
+		end = rdtsc_ordered();
+
+		if (loops <= end - start)
+			break;
+
+		loops -= end - start;
+
+		start = end;
+	}
+}
+
+/*
  * Since we calibrate only once at boot, this
  * function should be set once at boot and not changed
  */
@@ -91,7 +130,13 @@ static void (*delay_fn)(unsigned long) = delay_loop;
 
 void use_tsc_delay(void)
 {
-	delay_fn = delay_tsc;
+	if (delay_fn == delay_loop)
+		delay_fn = delay_tsc;
+}
+
+void use_mwaitx_delay(void)
+{
+	delay_fn = delay_mwaitx;
 }
 
 int read_current_timer(unsigned long *timer_val)
-- 
2.5.0.rc2.28.g6003e7f


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:x86/microcode] x86/microcode: Use kmemdup() rather than duplicating its implementation
  2015-08-10 10:19 ` [PATCH 1/3] x86/microcode: Use kmemdup() rather than duplicating its implementation Borislav Petkov
@ 2015-08-22 13:58   ` tip-bot for Andrzej Hajda
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Andrzej Hajda @ 2015-08-22 13:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: m.szyprowski, mingo, b.zolnierkie, tglx, bp, hpa, torvalds,
	linux-kernel, a.hajda, peterz

Commit-ID:  d4e963644768b33aa3db7f470c35d74ed78d8354
Gitweb:     http://git.kernel.org/tip/d4e963644768b33aa3db7f470c35d74ed78d8354
Author:     Andrzej Hajda <a.hajda@samsung.com>
AuthorDate: Mon, 10 Aug 2015 12:19:52 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Aug 2015 14:49:35 +0200

x86/microcode: Use kmemdup() rather than duplicating its implementation

The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439201994-28067-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/microcode/amd.c         | 4 +---
 arch/x86/kernel/cpu/microcode/intel_early.c | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index c7d2415..be37f10 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -366,15 +366,13 @@ static int verify_and_add_patch(u8 family, u8 *fw, unsigned int leftover)
 		return -EINVAL;
 	}
 
-	patch->data = kzalloc(patch_size, GFP_KERNEL);
+	patch->data = kmemdup(fw + SECTION_HDR_SIZE, patch_size, GFP_KERNEL);
 	if (!patch->data) {
 		pr_err("Patch data allocation failure.\n");
 		kfree(patch);
 		return -EINVAL;
 	}
 
-	/* All looks ok, copy patch... */
-	memcpy(patch->data, fw + SECTION_HDR_SIZE, patch_size);
 	INIT_LIST_HEAD(&patch->plist);
 	patch->patch_id  = mc_hdr->patch_id;
 	patch->equiv_cpu = proc_id;
diff --git a/arch/x86/kernel/cpu/microcode/intel_early.c b/arch/x86/kernel/cpu/microcode/intel_early.c
index 8187b72..101f0ac 100644
--- a/arch/x86/kernel/cpu/microcode/intel_early.c
+++ b/arch/x86/kernel/cpu/microcode/intel_early.c
@@ -207,13 +207,11 @@ save_microcode(struct mc_saved_data *mc_saved_data,
 		mc_hdr = &mc->hdr;
 		size   = get_totalsize(mc_hdr);
 
-		saved_ptr[i] = kmalloc(size, GFP_KERNEL);
+		saved_ptr[i] = kmemdup(mc, size, GFP_KERNEL);
 		if (!saved_ptr[i]) {
 			ret = -ENOMEM;
 			goto err;
 		}
-
-		memcpy(saved_ptr[i], mc, size);
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:x86/asm] x86/asm: Add MONITORX/MWAITX instruction support
  2015-08-10 10:19 ` [PATCH 2/3] x86/asm: Add MONITORX/MWAITX insns support Borislav Petkov
@ 2015-08-22 13:58   ` tip-bot for Huang Rui
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Huang Rui @ 2015-08-22 13:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bitbucket, tglx, torvalds, hpa, dirk.j.brandewie, tony.li,
	peterz, bp, rjw, lenb, alexander.shishkin, luto, ray.huang,
	mingo, linux-kernel, josh, fengguang.wu, herrmann.der.user,
	aaron.lu, ross.zwisler, dave.hansen, john.stultz, fweisbec

Commit-ID:  f96756746c7909de37db3d03ac5fd5cfb2757f38
Gitweb:     http://git.kernel.org/tip/f96756746c7909de37db3d03ac5fd5cfb2757f38
Author:     Huang Rui <ray.huang@amd.com>
AuthorDate: Mon, 10 Aug 2015 12:19:53 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Aug 2015 14:52:16 +0200

x86/asm: Add MONITORX/MWAITX instruction support

AMD Carrizo processors (Family 15h, Models 60h-6fh) added a new
feature called MWAITX (MWAIT with extensions) as an extension to
MONITOR/MWAIT.

This new instruction controls a configurable timer which causes
the core to exit wait state on timer expiration, in addition to
"normal" MWAIT condition of reading from a monitored VA.

Compared to MONITOR/MWAIT, there are minor differences in opcode
and input parameters:

MWAITX ECX[1]: enable timer if set
MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks ==
TSC. The software P0 frequency is the same as the TSC frequency.

                MWAIT                           MWAITX
opcode          0f 01 c9           |            0f 01 fb
ECX[0]                  value of RFLAGS.IF seen by instruction
ECX[1]          unused/#GP if set  |            enable timer if set
ECX[31:2]                     unused/#GP if set
EAX                           unused (reserve for hint)
EBX[31:0]       unused             |            max wait time (SW P0 == TSC)

                MONITOR                         MONITORX
opcode          0f 01 c8           |            0f 01 fa
EAX                     (logical) address to monitor
ECX                     #GP if not zero

Max timeout = EBX/(TSC frequency)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1439201994-28067-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/include/asm/mwait.h      | 45 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 3d6606f..a39e570 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -176,6 +176,7 @@
 #define X86_FEATURE_PERFCTR_NB  ( 6*32+24) /* NB performance counter extensions */
 #define X86_FEATURE_BPEXT	(6*32+26) /* data breakpoint extension */
 #define X86_FEATURE_PERFCTR_L2	( 6*32+28) /* L2 performance counter extensions */
+#define X86_FEATURE_MWAITX	( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 653dfa7..c70689b 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -14,6 +14,9 @@
 #define CPUID5_ECX_INTERRUPT_BREAK	0x2
 
 #define MWAIT_ECX_INTERRUPT_BREAK	0x1
+#define MWAITX_ECX_TIMER_ENABLE		BIT(1)
+#define MWAITX_MAX_LOOPS		((u32)-1)
+#define MWAITX_DISABLE_CSTATES		0xf
 
 static inline void __monitor(const void *eax, unsigned long ecx,
 			     unsigned long edx)
@@ -23,6 +26,14 @@ static inline void __monitor(const void *eax, unsigned long ecx,
 		     :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
+static inline void __monitorx(const void *eax, unsigned long ecx,
+			      unsigned long edx)
+{
+	/* "monitorx %eax, %ecx, %edx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfa;"
+		     :: "a" (eax), "c" (ecx), "d"(edx));
+}
+
 static inline void __mwait(unsigned long eax, unsigned long ecx)
 {
 	/* "mwait %eax, %ecx;" */
@@ -30,6 +41,40 @@ static inline void __mwait(unsigned long eax, unsigned long ecx)
 		     :: "a" (eax), "c" (ecx));
 }
 
+/*
+ * MWAITX allows for a timer expiration to get the core out a wait state in
+ * addition to the default MWAIT exit condition of a store appearing at a
+ * monitored virtual address.
+ *
+ * Registers:
+ *
+ * MWAITX ECX[1]: enable timer if set
+ * MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks. The software P0
+ * frequency is the same as the TSC frequency.
+ *
+ * Below is a comparison between MWAIT and MWAITX on AMD processors:
+ *
+ *                 MWAIT                           MWAITX
+ * opcode          0f 01 c9           |            0f 01 fb
+ * ECX[0]                  value of RFLAGS.IF seen by instruction
+ * ECX[1]          unused/#GP if set  |            enable timer if set
+ * ECX[31:2]                     unused/#GP if set
+ * EAX                           unused (reserve for hint)
+ * EBX[31:0]       unused             |            max wait time (P0 clocks)
+ *
+ *                 MONITOR                         MONITORX
+ * opcode          0f 01 c8           |            0f 01 fa
+ * EAX                     (logical) address to monitor
+ * ECX                     #GP if not zero
+ */
+static inline void __mwaitx(unsigned long eax, unsigned long ebx,
+			    unsigned long ecx)
+{
+	/* "mwaitx %eax, %ebx, %ecx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfb;"
+		     :: "a" (eax), "b" (ebx), "c" (ecx));
+}
+
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
 	trace_hardirqs_on();

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:x86/asm] x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
  2015-08-10 10:19 ` [PATCH 3/3] x86/asm: Introduce an MWAITX-based delay with a configurable timer Borislav Petkov
@ 2015-08-22 13:58   ` tip-bot for Huang Rui
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Huang Rui @ 2015-08-22 13:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, aaron.lu, bp, rjw, tony.li, john.stultz, luto, hpa,
	peterz, linux-kernel, fweisbec, hecmargi, fengguang.wu, tglx,
	torvalds, mingo, Aravind.Gopalakrishnan, pbonzini,
	herrmann.der.user, jacob.w.shin, ray.huang, lenb

Commit-ID:  b466bdb614823aaaa7188e85516177d2850f4782
Gitweb:     http://git.kernel.org/tip/b466bdb614823aaaa7188e85516177d2850f4782
Author:     Huang Rui <ray.huang@amd.com>
AuthorDate: Mon, 10 Aug 2015 12:19:54 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 22 Aug 2015 14:52:16 +0200

x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer

MWAITX can enable a timer and a corresponding timer value
specified in SW P0 clocks. The SW P0 frequency is the same as
TSC. The timer provides an upper bound on how long the
instruction waits before exiting.

This way, a delay function in the kernel can leverage that
MWAITX timer of MWAITX.

When a CPU core executes MWAITX, it will be quiesced in a
waiting phase, diminishing its power consumption. This way, we
can save power in comparison to our default TSC-based delays.

A simple test shows that:

	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
	$ sleep 10000s
	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc

Results:

	* TSC-based default delay:      485115 uWatts average power
	* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The
test method relies on the support of AMD CPU accumulated power
algorithm in fam15h_power for which patches are forthcoming.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix delay truncation. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/delay.h |  1 +
 arch/x86/kernel/cpu/amd.c    |  4 ++++
 arch/x86/lib/delay.c         | 47 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
index 9b3b4f2..36a760b 100644
--- a/arch/x86/include/asm/delay.h
+++ b/arch/x86/include/asm/delay.h
@@ -4,5 +4,6 @@
 #include <asm-generic/delay.h>
 
 void use_tsc_delay(void);
+void use_mwaitx_delay(void);
 
 #endif /* _ASM_X86_DELAY_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 51ad2af..4a70fc6 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -11,6 +11,7 @@
 #include <asm/cpu.h>
 #include <asm/smp.h>
 #include <asm/pci-direct.h>
+#include <asm/delay.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
@@ -506,6 +507,9 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 		/* A random value per boot for bit slice [12:upper_bit) */
 		va_align.bits = get_random_int() & va_align.mask;
 	}
+
+	if (cpu_has(c, X86_FEATURE_MWAITX))
+		use_mwaitx_delay();
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 4453d52..e912b2f 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -20,6 +20,7 @@
 #include <asm/processor.h>
 #include <asm/delay.h>
 #include <asm/timer.h>
+#include <asm/mwait.h>
 
 #ifdef CONFIG_SMP
 # include <asm/smp.h>
@@ -84,6 +85,44 @@ static void delay_tsc(unsigned long __loops)
 }
 
 /*
+ * On some AMD platforms, MWAITX has a configurable 32-bit timer, that
+ * counts with TSC frequency. The input value is the loop of the
+ * counter, it will exit when the timer expires.
+ */
+static void delay_mwaitx(unsigned long __loops)
+{
+	u64 start, end, delay, loops = __loops;
+
+	start = rdtsc_ordered();
+
+	for (;;) {
+		delay = min_t(u64, MWAITX_MAX_LOOPS, loops);
+
+		/*
+		 * Use cpu_tss as a cacheline-aligned, seldomly
+		 * accessed per-cpu variable as the monitor target.
+		 */
+		__monitorx(this_cpu_ptr(&cpu_tss), 0, 0);
+
+		/*
+		 * AMD, like Intel, supports the EAX hint and EAX=0xf
+		 * means, do not enter any deep C-state and we use it
+		 * here in delay() to minimize wakeup latency.
+		 */
+		__mwaitx(MWAITX_DISABLE_CSTATES, delay, MWAITX_ECX_TIMER_ENABLE);
+
+		end = rdtsc_ordered();
+
+		if (loops <= end - start)
+			break;
+
+		loops -= end - start;
+
+		start = end;
+	}
+}
+
+/*
  * Since we calibrate only once at boot, this
  * function should be set once at boot and not changed
  */
@@ -91,7 +130,13 @@ static void (*delay_fn)(unsigned long) = delay_loop;
 
 void use_tsc_delay(void)
 {
-	delay_fn = delay_tsc;
+	if (delay_fn == delay_loop)
+		delay_fn = delay_tsc;
+}
+
+void use_mwaitx_delay(void)
+{
+	delay_fn = delay_mwaitx;
 }
 
 int read_current_timer(unsigned long *timer_val)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-08-22 14:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-10 10:19 [PATCH 0/3] tip-queue 2015-08-10 Borislav Petkov
2015-08-10 10:19 ` [PATCH 1/3] x86/microcode: Use kmemdup() rather than duplicating its implementation Borislav Petkov
2015-08-22 13:58   ` [tip:x86/microcode] " tip-bot for Andrzej Hajda
2015-08-10 10:19 ` [PATCH 2/3] x86/asm: Add MONITORX/MWAITX insns support Borislav Petkov
2015-08-22 13:58   ` [tip:x86/asm] x86/asm: Add MONITORX/MWAITX instruction support tip-bot for Huang Rui
2015-08-10 10:19 ` [PATCH 3/3] x86/asm: Introduce an MWAITX-based delay with a configurable timer Borislav Petkov
2015-08-22 13:58   ` [tip:x86/asm] x86/asm/delay: " tip-bot for Huang Rui
  -- strict thread matches above, loose matches on Subject: below --
2015-08-05  3:18 [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support Huang Rui
2015-08-05  3:18 ` [PATCH v6 1/2] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
2015-08-05  3:18 ` [PATCH v6 2/2] x86, mwaitt: introduce mwaitx delay with a configurable timer Huang Rui
2015-08-06 15:14   ` Borislav Petkov
2015-08-07  4:46     ` Huang Rui
2015-08-05  4:01 ` [PATCH v6 0/2] x86, mwaitt: introduce AMD mwaitt support Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.