[2/2] x86/asm/delay: Introduce TPAUSE delay
diff mbox series

Message ID 1582744258-42744-3-git-send-email-kyung.min.park@intel.com
State New
Headers show
Series
  • x86/delay: Introduce TPAUSE instruction
Related show

Commit Message

Park, Kyung Min Feb. 26, 2020, 7:10 p.m. UTC
TPAUSE instructs the processor to enter an implementation-dependent
optimized state. The instruction execution wakes up when the time-stamp
counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
The instruction execution also wakes up due to the expiration of
the operating system time-limit or by an external interrupt
or exceptions such as a debug exception or a machine check exception.

TPAUSE offers a choice of two lower power states:
 1. Light-weight power/performance optimized state C0.1
 2. Improved power/performance optimized state C0.2
This way, it can save power with low wake-up latency in comparison to
spinloop based delay. The selection between the two is governed by the
input register.

TPAUSE is available on processors with X86_FEATURE_WAITPKG.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
---
 arch/x86/include/asm/mwait.h | 17 +++++++++++++++++
 arch/x86/lib/delay.c         | 26 +++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

Comments

Andi Kleen Feb. 26, 2020, 9:10 p.m. UTC | #1
On Wed, Feb 26, 2020 at 11:10:58AM -0800, Kyung Min Park wrote:
> TPAUSE instructs the processor to enter an implementation-dependent
> optimized state. The instruction execution wakes up when the time-stamp
> counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
> The instruction execution also wakes up due to the expiration of
> the operating system time-limit or by an external interrupt

This is actually a behavior change. Today's udelay() will continue
after processing the interrupt. Your patches don't

I don't think it's a problem though. The interrupt will cause
a long enough delay that exceed any reasonable udelay() requirements.

There would be a difference if someone did really long udelay()s, much
longer than typical interrupts, in this case you might end up
with a truncated udelay, but such long udelays are not something that we
would encourage.

I don't think you need to change anything in the code, but should
probably document this behavior.

-Andi
Luck, Tony Feb. 26, 2020, 9:20 p.m. UTC | #2
On Wed, Feb 26, 2020 at 01:10:40PM -0800, Andi Kleen wrote:
> On Wed, Feb 26, 2020 at 11:10:58AM -0800, Kyung Min Park wrote:
> > TPAUSE instructs the processor to enter an implementation-dependent
> > optimized state. The instruction execution wakes up when the time-stamp
> > counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
> > The instruction execution also wakes up due to the expiration of
> > the operating system time-limit or by an external interrupt
> 
> This is actually a behavior change. Today's udelay() will continue
> after processing the interrupt. Your patches don't

The instruction level TPAUSE is called inside delay_wait()
that checks to see of we were interrupted early and loops to issue
another TPAUSE if needed.

-Tony
Fenghua Yu Feb. 26, 2020, 9:31 p.m. UTC | #3
On Wed, Feb 26, 2020 at 01:10:40PM -0800, Andi Kleen wrote:
> On Wed, Feb 26, 2020 at 11:10:58AM -0800, Kyung Min Park wrote:
> > TPAUSE instructs the processor to enter an implementation-dependent
> > optimized state. The instruction execution wakes up when the time-stamp
> > counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
> > The instruction execution also wakes up due to the expiration of
> > the operating system time-limit or by an external interrupt
> 
> This is actually a behavior change. Today's udelay() will continue
> after processing the interrupt. Your patches don't
> 
> I don't think it's a problem though. The interrupt will cause
> a long enough delay that exceed any reasonable udelay() requirements.
> 
> There would be a difference if someone did really long udelay()s, much
> longer than typical interrupts, in this case you might end up
> with a truncated udelay, but such long udelays are not something that we
> would encourage.

TPAUSE is in a loop which checks if this udelay exceeds deadline.
Coming back from interrupt, the loop checks deadline and finds
there is still left time to delay. Then udelay() goes back to TPAUSE.

Thanks.

-Fenghua
Andi Kleen Feb. 26, 2020, 9:59 p.m. UTC | #4
On Wed, Feb 26, 2020 at 01:20:34PM -0800, Luck, Tony wrote:
> On Wed, Feb 26, 2020 at 01:10:40PM -0800, Andi Kleen wrote:
> > On Wed, Feb 26, 2020 at 11:10:58AM -0800, Kyung Min Park wrote:
> > > TPAUSE instructs the processor to enter an implementation-dependent
> > > optimized state. The instruction execution wakes up when the time-stamp
> > > counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
> > > The instruction execution also wakes up due to the expiration of
> > > the operating system time-limit or by an external interrupt
> > 
> > This is actually a behavior change. Today's udelay() will continue
> > after processing the interrupt. Your patches don't
> 
> The instruction level TPAUSE is called inside delay_wait()
> that checks to see of we were interrupted early and loops to issue
> another TPAUSE if needed.

Ah right. It was already solved for mwaitx. Great.

-Andi

Patch
diff mbox series

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 9d5252c..2067501 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -22,6 +22,8 @@ 
 #define MWAITX_ECX_TIMER_ENABLE		BIT(1)
 #define MWAITX_MAX_LOOPS		((u32)-1)
 #define MWAITX_DISABLE_CSTATES		0xf0
+#define TPAUSE_C01_STATE		1
+#define TPAUSE_C02_STATE		0
 
 static inline void __monitor(const void *eax, unsigned long ecx,
 			     unsigned long edx)
@@ -120,4 +122,19 @@  static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 	current_clr_polling();
 }
 
+/*
+ * Caller can specify whether to enter C0.1 (low latency, less
+ * power saving) or C0.2 state (saves more power, but longer wakeup
+ * latency). This may be overridden by the IA32_UMWAIT_CONTROL MSR
+ * which can force requests for C0.2 to be downgraded to C0.1.
+ */
+static inline void __tpause(unsigned int ecx, unsigned int edx,
+			    unsigned int eax)
+{
+	/* "tpause %ecx, %edx, %eax;" */
+	asm volatile(".byte 0x66, 0x0f, 0xae, 0xf1\t\n"
+		     :
+		     : "c"(ecx), "d"(edx), "a"(eax));
+}
+
 #endif /* _ASM_X86_MWAIT_H */
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 6be29cf..3553150 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -86,6 +86,26 @@  static void delay_tsc(unsigned long __loops)
 }
 
 /*
+ * On Intel the TPAUSE instruction waits until any of:
+ * 1) the TSC counter exceeds the value provided in EAX:EDX
+ * 2) global timeout in IA32_UMWAIT_CONTROL is exceeded
+ * 3) an external interrupt occurs
+ */
+static void tpause(u64 start, u64 cycles)
+{
+	u64 until = start + cycles;
+	unsigned int eax, edx;
+
+	eax = (unsigned int)(until & 0xffffffff);
+	edx = (unsigned int)(until >> 32);
+
+	/* Hard code the deeper (C0.2) sleep state because exit latency is
+	 * small compared to the "microseconds" that usleep() will delay.
+	 */
+	__tpause(TPAUSE_C02_STATE, edx, eax);
+}
+
+/*
  * On some AMD platforms, MWAITX has a configurable 32-bit timer, that
  * counts with TSC frequency. The input value is the loop of the
  * counter, it will exit when the timer expires.
@@ -153,8 +173,12 @@  static void (*delay_platform)(unsigned long) = delay_loop;
 
 void use_tsc_delay(void)
 {
-	if (delay_platform == delay_loop)
+	if (static_cpu_has(X86_FEATURE_WAITPKG)) {
+		wait_func = tpause;
+		delay_platform = delay_iterate;
+	} else if (delay_platform == delay_loop) {
 		delay_platform = delay_tsc;
+	}
 }
 
 void use_mwaitx_delay(void)