linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/TSC: Use RDTSCP
@ 2018-11-19 18:45 Borislav Petkov
  2018-11-19 19:02 ` Lendacky, Thomas
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Borislav Petkov @ 2018-11-19 18:45 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Andy Lutomirski, H. Peter Anvin, John Stultz, Thomas Lendacky

From: Borislav Petkov <bp@suse.de>

Currently, the kernel uses

  [LM]FENCE; RDTSC

in the timekeeping code, to guarantee monotonicity of time where the
*FENCE is selected based on vendor.

Replace that sequence with RDTSCP which is faster or on-par and gives
the same guarantees.

A microbenchmark on Intel shows that the change is on-par.

On AMD, the change is either on-par with the current LFENCE-prefixed
RDTSC and some are slightly better with RDTSCP.

The comparison is done with the LFENCE-prefixed RDTSC (and not with the
MFENCE-prefixed one, as one would normally expect) because all modern
AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
effectively enabling X86_FEATURE_LFENCE_RDTSC.

Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86@kernel.org
---
 arch/x86/include/asm/msr.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..f00f2b61d326 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
  */
 static __always_inline unsigned long long rdtsc_ordered(void)
 {
+	DECLARE_ARGS(val, low, high);
+
 	/*
 	 * The RDTSC instruction is not ordered relative to memory
 	 * access.  The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * ordering guarantees as reading from a global memory location
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
+	 *
+	 * Thus, use the preferred barrier on the respective CPU, aiming for
+	 * RDTSCP as the default.
 	 */
-	barrier_nospec();
-	return rdtsc();
+	asm volatile(ALTERNATIVE_2("mfence; rdtsc",
+				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+				   "rdtscp", X86_FEATURE_RDTSCP)
+			: EAX_EDX_RET(val, low, high)
+			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+			:: "ecx");
+
+	return EAX_EDX_VAL(val, low, high);
 }
 
 static inline unsigned long long native_read_pmc(int counter)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-19 18:45 [PATCH] x86/TSC: Use RDTSCP Borislav Petkov
@ 2018-11-19 19:02 ` Lendacky, Thomas
  2018-11-19 19:52 ` Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Lendacky, Thomas @ 2018-11-19 19:02 UTC (permalink / raw)
  To: Borislav Petkov, X86 ML
  Cc: LKML, Andy Lutomirski, H. Peter Anvin, John Stultz

On 11/19/2018 12:45 PM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> Currently, the kernel uses
> 
>   [LM]FENCE; RDTSC
> 
> in the timekeeping code, to guarantee monotonicity of time where the
> *FENCE is selected based on vendor.
> 
> Replace that sequence with RDTSCP which is faster or on-par and gives
> the same guarantees.
> 
> A microbenchmark on Intel shows that the change is on-par.
> 
> On AMD, the change is either on-par with the current LFENCE-prefixed
> RDTSC and some are slightly better with RDTSCP.
> 
> The comparison is done with the LFENCE-prefixed RDTSC (and not with the
> MFENCE-prefixed one, as one would normally expect) because all modern
> AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
> effectively enabling X86_FEATURE_LFENCE_RDTSC.

Thanks for this. I had been wanting to do something similar since, on AMD,
a guest may not see the LFENCE-serializing bit if the hypervisor doesn't
expose it, and have to fall back to MFENCE.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> 
> Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
> Cc: x86@kernel.org
> ---
>  arch/x86/include/asm/msr.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 91e4cf189914..f00f2b61d326 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
>   */
>  static __always_inline unsigned long long rdtsc_ordered(void)
>  {
> +	DECLARE_ARGS(val, low, high);
> +
>  	/*
>  	 * The RDTSC instruction is not ordered relative to memory
>  	 * access.  The Intel SDM and the AMD APM are both vague on this
> @@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
>  	 * ordering guarantees as reading from a global memory location
>  	 * that some other imaginary CPU is updating continuously with a
>  	 * time stamp.
> +	 *
> +	 * Thus, use the preferred barrier on the respective CPU, aiming for
> +	 * RDTSCP as the default.
>  	 */
> -	barrier_nospec();
> -	return rdtsc();
> +	asm volatile(ALTERNATIVE_2("mfence; rdtsc",
> +				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
> +				   "rdtscp", X86_FEATURE_RDTSCP)
> +			: EAX_EDX_RET(val, low, high)
> +			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
> +			:: "ecx");
> +
> +	return EAX_EDX_VAL(val, low, high);
>  }
>  
>  static inline unsigned long long native_read_pmc(int counter)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-19 18:45 [PATCH] x86/TSC: Use RDTSCP Borislav Petkov
  2018-11-19 19:02 ` Lendacky, Thomas
@ 2018-11-19 19:52 ` Andy Lutomirski
  2018-11-19 20:17   ` H. Peter Anvin
  2018-11-20  9:11 ` [tip:x86/timers] " tip-bot for Borislav Petkov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Andy Lutomirski @ 2018-11-19 19:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Andrew Lutomirski, H. Peter Anvin, John Stultz,
	Tom Lendacky

On Mon, Nov 19, 2018 at 10:46 AM Borislav Petkov <bp@alien8.de> wrote:
>
> From: Borislav Petkov <bp@suse.de>
>
> Currently, the kernel uses
>
>   [LM]FENCE; RDTSC
>
> in the timekeeping code, to guarantee monotonicity of time where the
> *FENCE is selected based on vendor.
>
> Replace that sequence with RDTSCP which is faster or on-par and gives
> the same guarantees.
>
> A microbenchmark on Intel shows that the change is on-par.
>
> On AMD, the change is either on-par with the current LFENCE-prefixed
> RDTSC and some are slightly better with RDTSCP.
>
> The comparison is done with the LFENCE-prefixed RDTSC (and not with the
> MFENCE-prefixed one, as one would normally expect) because all modern
> AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
> effectively enabling X86_FEATURE_LFENCE_RDTSC.
>

I thought I benchmarked this on Intel at some point and found the
LFENCE;RDTSC variant to be slightly faster.  But I believe you, so:

Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-19 19:52 ` Andy Lutomirski
@ 2018-11-19 20:17   ` H. Peter Anvin
  2018-11-19 20:40     ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2018-11-19 20:17 UTC (permalink / raw)
  To: Andy Lutomirski, Borislav Petkov; +Cc: X86 ML, LKML, John Stultz, Tom Lendacky

On 11/19/18 11:52 AM, Andy Lutomirski wrote:
> 
> I thought I benchmarked this on Intel at some point and found the
> LFENCE;RDTSC variant to be slightly faster.  But I believe you, so:
> 
> Acked-by: Andy Lutomirski <luto@kernel.org>
> 

As long as the difference isn't significant, the simplicity would seem to be a
win.

	-hpa

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-19 20:17   ` H. Peter Anvin
@ 2018-11-19 20:40     ` Borislav Petkov
  2018-11-19 20:48       ` hpa
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2018-11-19 20:40 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andy Lutomirski, X86 ML, LKML, John Stultz, Tom Lendacky

On Mon, Nov 19, 2018 at 12:17:35PM -0800, H. Peter Anvin wrote:
> On 11/19/18 11:52 AM, Andy Lutomirski wrote:
> > 
> > I thought I benchmarked this on Intel at some point and found the
> > LFENCE;RDTSC variant to be slightly faster.  But I believe you, so:
> > 
> > Acked-by: Andy Lutomirski <luto@kernel.org>
> > 
> 
> As long as the difference isn't significant, the simplicity would seem to be a
> win.

Right, I think by simplicity you mean RDTSCP. :)

Also in the sense that you have a single instruction which gives you
that barrier of waiting for all older insns to retire before reading the
TSC vs two where you don't necessarily know what happens on the uarch
level. I mean, nothing probably happens but RDTSCP is still simpler :)

Also, hpa, isn't LFENCE; RDTSC and RDTSCP equivalent on Intel? In the
sense that RDTSCP microcode practically adds an LFENCE before reading
the TSC?

Because SDM says:

"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible."

which sounds pretty much like an LFENCE to me:

"LFENCE does not execute until all prior instructions have completed
locally, and no later instruction begins execution until LFENCE
completes."

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-19 20:40     ` Borislav Petkov
@ 2018-11-19 20:48       ` hpa
  0 siblings, 0 replies; 15+ messages in thread
From: hpa @ 2018-11-19 20:48 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andy Lutomirski, X86 ML, LKML, John Stultz, Tom Lendacky

On November 19, 2018 12:40:25 PM PST, Borislav Petkov <bp@alien8.de> wrote:
>On Mon, Nov 19, 2018 at 12:17:35PM -0800, H. Peter Anvin wrote:
>> On 11/19/18 11:52 AM, Andy Lutomirski wrote:
>> > 
>> > I thought I benchmarked this on Intel at some point and found the
>> > LFENCE;RDTSC variant to be slightly faster.  But I believe you, so:
>> > 
>> > Acked-by: Andy Lutomirski <luto@kernel.org>
>> > 
>> 
>> As long as the difference isn't significant, the simplicity would
>seem to be a
>> win.
>
>Right, I think by simplicity you mean RDTSCP. :)
>
>Also in the sense that you have a single instruction which gives you
>that barrier of waiting for all older insns to retire before reading
>the
>TSC vs two where you don't necessarily know what happens on the uarch
>level. I mean, nothing probably happens but RDTSCP is still simpler :)
>
>Also, hpa, isn't LFENCE; RDTSC and RDTSCP equivalent on Intel? In the
>sense that RDTSCP microcode practically adds an LFENCE before reading
>the TSC?
>
>Because SDM says:
>
>"The RDTSCP instruction is not a serializing instruction, but it does
>wait until all previous instructions have executed and all previous
>loads are globally visible."
>
>which sounds pretty much like an LFENCE to me:
>
>"LFENCE does not execute until all prior instructions have completed
>locally, and no later instruction begins execution until LFENCE
>completes."

I don't know the exact sequence of fencing operations it is equivalent to, but it is probably something quite similar.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:x86/timers] x86/TSC: Use RDTSCP
  2018-11-19 18:45 [PATCH] x86/TSC: Use RDTSCP Borislav Petkov
  2018-11-19 19:02 ` Lendacky, Thomas
  2018-11-19 19:52 ` Andy Lutomirski
@ 2018-11-20  9:11 ` tip-bot for Borislav Petkov
  2018-11-23 20:03 ` [PATCH] " Guenter Roeck
  2019-01-16 11:59 ` [tip:x86/alternatives] " tip-bot for Borislav Petkov
  4 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-11-20  9:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, tglx, luto, bp, john.stultz, thomas.lendacky, linux-kernel

Commit-ID:  2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
Gitweb:     https://git.kernel.org/tip/2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Thu, 12 Apr 2018 13:11:36 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Mon, 19 Nov 2018 21:55:32 +0100

x86/TSC: Use RDTSCP

Currently, the kernel uses

  [LM]FENCE; RDTSC

in the timekeeping code, to guarantee monotonicity of time where the
*FENCE is selected based on vendor.

Replace that sequence with RDTSCP which is faster or on-par and gives
the same guarantees.

A microbenchmark on Intel shows that the change is on-par.

On AMD, the change is either on-par with the current LFENCE-prefixed
RDTSC and some are slightly better with RDTSCP.

The comparison is done with the LFENCE-prefixed RDTSC (and not with the
MFENCE-prefixed one, as one would normally expect) because all modern
AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
effectively enabling X86_FEATURE_LFENCE_RDTSC.

Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181119184556.11479-1-bp@alien8.de
---
 arch/x86/include/asm/msr.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..f00f2b61d326 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
  */
 static __always_inline unsigned long long rdtsc_ordered(void)
 {
+	DECLARE_ARGS(val, low, high);
+
 	/*
 	 * The RDTSC instruction is not ordered relative to memory
 	 * access.  The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * ordering guarantees as reading from a global memory location
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
+	 *
+	 * Thus, use the preferred barrier on the respective CPU, aiming for
+	 * RDTSCP as the default.
 	 */
-	barrier_nospec();
-	return rdtsc();
+	asm volatile(ALTERNATIVE_2("mfence; rdtsc",
+				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+				   "rdtscp", X86_FEATURE_RDTSCP)
+			: EAX_EDX_RET(val, low, high)
+			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+			:: "ecx");
+
+	return EAX_EDX_VAL(val, low, high);
 }
 
 static inline unsigned long long native_read_pmc(int counter)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-19 18:45 [PATCH] x86/TSC: Use RDTSCP Borislav Petkov
                   ` (2 preceding siblings ...)
  2018-11-20  9:11 ` [tip:x86/timers] " tip-bot for Borislav Petkov
@ 2018-11-23 20:03 ` Guenter Roeck
  2018-11-23 20:22   ` hpa
  2018-11-23 20:44   ` Borislav Petkov
  2019-01-16 11:59 ` [tip:x86/alternatives] " tip-bot for Borislav Petkov
  4 siblings, 2 replies; 15+ messages in thread
From: Guenter Roeck @ 2018-11-23 20:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Andy Lutomirski, H. Peter Anvin, John Stultz,
	Thomas Lendacky

Hi,

On Mon, Nov 19, 2018 at 07:45:56PM +0100, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> Currently, the kernel uses
> 
>   [LM]FENCE; RDTSC
> 
> in the timekeeping code, to guarantee monotonicity of time where the
> *FENCE is selected based on vendor.
> 
> Replace that sequence with RDTSCP which is faster or on-par and gives
> the same guarantees.
> 
> A microbenchmark on Intel shows that the change is on-par.
> 
> On AMD, the change is either on-par with the current LFENCE-prefixed
> RDTSC and some are slightly better with RDTSCP.
> 
> The comparison is done with the LFENCE-prefixed RDTSC (and not with the
> MFENCE-prefixed one, as one would normally expect) because all modern
> AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
> effectively enabling X86_FEATURE_LFENCE_RDTSC.
> 
> Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
> Cc: x86@kernel.org
> ---
>  arch/x86/include/asm/msr.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 91e4cf189914..f00f2b61d326 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
>   */
>  static __always_inline unsigned long long rdtsc_ordered(void)
>  {
> +	DECLARE_ARGS(val, low, high);
> +
>  	/*
>  	 * The RDTSC instruction is not ordered relative to memory
>  	 * access.  The Intel SDM and the AMD APM are both vague on this
> @@ -227,9 +229,18 @@ static __always_inline unsigned long long rdtsc_ordered(void)
>  	 * ordering guarantees as reading from a global memory location
>  	 * that some other imaginary CPU is updating continuously with a
>  	 * time stamp.
> +	 *
> +	 * Thus, use the preferred barrier on the respective CPU, aiming for
> +	 * RDTSCP as the default.
>  	 */
> -	barrier_nospec();
> -	return rdtsc();
> +	asm volatile(ALTERNATIVE_2("mfence; rdtsc",
> +				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
> +				   "rdtscp", X86_FEATURE_RDTSCP)
> +			: EAX_EDX_RET(val, low, high)
> +			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
> +			:: "ecx");
> +
> +	return EAX_EDX_VAL(val, low, high);
>  }
>  

This patch results in a crash with certain qemu emulations.

[    0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.762233] invalid opcode: 0000 [#1] PTI
[    0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc3-next-20181123 #2
[    0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    0.762832] EIP: read_tsc+0x4/0x10

To reproduce:

make ARCH=i386 defconfig
echo "CONFIG_HIGHMEM64G=y" >> .config
make ARCH=i386 olddefconfig
make -j30 ARCH=i386

qemu-system-i386 \
	-kernel arch/x86/boot/bzImage \
	-M q35 -cpu pentium3 -no-reboot -m 256 \
	-initrd rootfs.cpio \
	--append 'earlycon=uart8250,io,0x3f8,9600n8 rdinit=/sbin/init panic=-1 console=ttyS0 console=tty' \
	-nographic -monitor none

initrd or root file system doesn't really matter since the code never
gets there, but it is available from here:
	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86/rootfs.cpio.gz
The qemu version does not matter either (I tried with 2.5 and 3.0).

Reverting the patch fixes the problem.

Guenter

---
crash log:

...
[    0.756366] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.756718] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.762233] invalid opcode: 0000 [#1] PTI
[    0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc3-next-20181123
#2
[    0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    0.762832] EIP: read_tsc+0x4/0x10
[    0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b
[    0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
[    0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: cb495ed0
[    0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00200002
[    0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
[    0.762832] Call Trace:
[    0.762832]  tk_setup_internals.constprop.10+0x3d/0x260
[    0.762832]  timekeeping_notify+0x56/0xc0
[    0.762832]  __clocksource_select+0xf3/0x150
[    0.762832]  ? boot_override_clock+0x42/0x42
[    0.762832]  clocksource_done_booting+0x2d/0x3b
[    0.762832]  do_one_initcall+0x68/0x15e
[    0.762832]  ? set_debug_rodata+0xf/0xf
[    0.762832]  kernel_init_freeable+0xec/0x18b
[    0.762832]  ? rest_init+0x80/0x80
[    0.762832]  kernel_init+0x8/0xf0
[    0.762832]  ret_from_fork+0x2e/0x38
[    0.762832] Modules linked in:
[    0.762832] ---[ end trace ba4ba4587aec46c9 ]---
[    0.762832] EIP: read_tsc+0x4/0x10
[    0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b
[    0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
[    0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: c597f47c
[    0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00200002
[    0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0

---
bisect log:

# bad: [8c9733fd9806c71e7f2313a280f98cb3051f93df] Add linux-next specific files for 20181123
# good: [9ff01193a20d391e8dbce4403dd5ef87c7eaaca6] Linux 4.20-rc3
git bisect start 'HEAD' 'v4.20-rc3'
# good: [34c2101b4f765edf1b91c2837da9c60fbf9f6912] Merge remote-tracking branch 'spi-nor/spi-nor/next'
git bisect good 34c2101b4f765edf1b91c2837da9c60fbf9f6912
# good: [15367a0657fc8027ff3466bf0202bde9f270259b] Merge remote-tracking branch 'kgdb/kgdb-next'
git bisect good 15367a0657fc8027ff3466bf0202bde9f270259b
# bad: [d29686ab179c34c5dbaac067a9effbeeb6a8073e] Merge remote-tracking branch 'soundwire/next'
git bisect bad d29686ab179c34c5dbaac067a9effbeeb6a8073e
# bad: [7cd63670817c236ccaf21ffe9d7b4921afeab130] Merge remote-tracking branch 'tip/auto-latest'
git bisect bad 7cd63670817c236ccaf21ffe9d7b4921afeab130
# good: [f9c8ebdcb052214954136426990e13a88e45e906] Merge remote-tracking branch 'audit/next'
git bisect good f9c8ebdcb052214954136426990e13a88e45e906
# good: [4142a8a10e7673fa71ff4cb5b4693c4dda299f7c] Merge remote-tracking branch 'spi/for-next'
git bisect good 4142a8a10e7673fa71ff4cb5b4693c4dda299f7c
# good: [fc922a4c7c72131f6c65e4af4e36ff25f3af1485] Merge branch 'x86/cpu'
git bisect good fc922a4c7c72131f6c65e4af4e36ff25f3af1485
# good: [906d11c1ff007ad18e2f3477cca04c437ce19c64] Merge branch 'x86/microcode'
git bisect good 906d11c1ff007ad18e2f3477cca04c437ce19c64
# good: [d836e1d42739a86738dd92a1b7ab6729b2485cba] Merge branch 'x86/platform'
git bisect good d836e1d42739a86738dd92a1b7ab6729b2485cba
# bad: [191bba684b7459842e72b52718ec61f8bdbd8499] Merge branch 'x86/timers'
git bisect bad 191bba684b7459842e72b52718ec61f8bdbd8499
# good: [f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee] Merge branch 'x86/pti'
git bisect good f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee
# bad: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC: Use RDTSCP
git bisect bad 2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
# good: [a786ef152cdcfebc923a67f63c7815806eefcf81] x86/tsc: Make calibration refinement more robust
git bisect good a786ef152cdcfebc923a67f63c7815806eefcf81
# first bad commit: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC: Use RDTSCP

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-23 20:03 ` [PATCH] " Guenter Roeck
@ 2018-11-23 20:22   ` hpa
  2018-11-23 20:44     ` Thomas Gleixner
  2018-11-23 20:44   ` Borislav Petkov
  1 sibling, 1 reply; 15+ messages in thread
From: hpa @ 2018-11-23 20:22 UTC (permalink / raw)
  To: Guenter Roeck, Borislav Petkov
  Cc: X86 ML, LKML, Andy Lutomirski, John Stultz, Thomas Lendacky

On November 23, 2018 12:03:07 PM PST, Guenter Roeck <linux@roeck-us.net> wrote:
>Hi,
>
>On Mon, Nov 19, 2018 at 07:45:56PM +0100, Borislav Petkov wrote:
>> From: Borislav Petkov <bp@suse.de>
>> 
>> Currently, the kernel uses
>> 
>>   [LM]FENCE; RDTSC
>> 
>> in the timekeeping code, to guarantee monotonicity of time where the
>> *FENCE is selected based on vendor.
>> 
>> Replace that sequence with RDTSCP which is faster or on-par and gives
>> the same guarantees.
>> 
>> A microbenchmark on Intel shows that the change is on-par.
>> 
>> On AMD, the change is either on-par with the current LFENCE-prefixed
>> RDTSC and some are slightly better with RDTSCP.
>> 
>> The comparison is done with the LFENCE-prefixed RDTSC (and not with
>the
>> MFENCE-prefixed one, as one would normally expect) because all modern
>> AMD families make LFENCE serializing and thus avoid the heavy MFENCE
>by
>> effectively enabling X86_FEATURE_LFENCE_RDTSC.
>> 
>> Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Borislav Petkov <bp@suse.de>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
>> Cc: x86@kernel.org
>> ---
>>  arch/x86/include/asm/msr.h | 15 +++++++++++++--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
>> index 91e4cf189914..f00f2b61d326 100644
>> --- a/arch/x86/include/asm/msr.h
>> +++ b/arch/x86/include/asm/msr.h
>> @@ -217,6 +217,8 @@ static __always_inline unsigned long long
>rdtsc(void)
>>   */
>>  static __always_inline unsigned long long rdtsc_ordered(void)
>>  {
>> +	DECLARE_ARGS(val, low, high);
>> +
>>  	/*
>>  	 * The RDTSC instruction is not ordered relative to memory
>>  	 * access.  The Intel SDM and the AMD APM are both vague on this
>> @@ -227,9 +229,18 @@ static __always_inline unsigned long long
>rdtsc_ordered(void)
>>  	 * ordering guarantees as reading from a global memory location
>>  	 * that some other imaginary CPU is updating continuously with a
>>  	 * time stamp.
>> +	 *
>> +	 * Thus, use the preferred barrier on the respective CPU, aiming
>for
>> +	 * RDTSCP as the default.
>>  	 */
>> -	barrier_nospec();
>> -	return rdtsc();
>> +	asm volatile(ALTERNATIVE_2("mfence; rdtsc",
>> +				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
>> +				   "rdtscp", X86_FEATURE_RDTSCP)
>> +			: EAX_EDX_RET(val, low, high)
>> +			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
>> +			:: "ecx");
>> +
>> +	return EAX_EDX_VAL(val, low, high);
>>  }
>>  
>
>This patch results in a crash with certain qemu emulations.
>
>[    0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
>[    0.762233] invalid opcode: 0000 [#1] PTI
>[    0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted
>4.20.0-rc3-next-20181123 #2
>[    0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>Ubuntu-1.8.2-1ubuntu1 04/01/2014
>[    0.762832] EIP: read_tsc+0x4/0x10
>
>To reproduce:
>
>make ARCH=i386 defconfig
>echo "CONFIG_HIGHMEM64G=y" >> .config
>make ARCH=i386 olddefconfig
>make -j30 ARCH=i386
>
>qemu-system-i386 \
>	-kernel arch/x86/boot/bzImage \
>	-M q35 -cpu pentium3 -no-reboot -m 256 \
>	-initrd rootfs.cpio \
>	--append 'earlycon=uart8250,io,0x3f8,9600n8 rdinit=/sbin/init panic=-1
>console=ttyS0 console=tty' \
>	-nographic -monitor none
>
>initrd or root file system doesn't really matter since the code never
>gets there, but it is available from here:
>	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86/rootfs.cpio.gz
>The qemu version does not matter either (I tried with 2.5 and 3.0).
>
>Reverting the patch fixes the problem.
>
>Guenter
>
>---
>crash log:
>
>...
>[    0.756366] HPET: 3 timers in total, 0 timers will be used for
>per-cpu timer
>[    0.756718] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
>[    0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
>[    0.762233] invalid opcode: 0000 [#1] PTI
>[    0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted
>4.20.0-rc3-next-20181123
>#2
>[    0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>Ubuntu-1.8.2-1ubuntu1 04/01/2014
>[    0.762832] EIP: read_tsc+0x4/0x10
>[    0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90
>90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5
>57 <0f> ae f0b
>[    0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
>[    0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: cb495ed0
>[    0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS:
>00200002
>[    0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
>[    0.762832] Call Trace:
>[    0.762832]  tk_setup_internals.constprop.10+0x3d/0x260
>[    0.762832]  timekeeping_notify+0x56/0xc0
>[    0.762832]  __clocksource_select+0xf3/0x150
>[    0.762832]  ? boot_override_clock+0x42/0x42
>[    0.762832]  clocksource_done_booting+0x2d/0x3b
>[    0.762832]  do_one_initcall+0x68/0x15e
>[    0.762832]  ? set_debug_rodata+0xf/0xf
>[    0.762832]  kernel_init_freeable+0xec/0x18b
>[    0.762832]  ? rest_init+0x80/0x80
>[    0.762832]  kernel_init+0x8/0xf0
>[    0.762832]  ret_from_fork+0x2e/0x38
>[    0.762832] Modules linked in:
>[    0.762832] ---[ end trace ba4ba4587aec46c9 ]---
>[    0.762832] EIP: read_tsc+0x4/0x10
>[    0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90
>90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5
>57 <0f> ae f0b
>[    0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
>[    0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: c597f47c
>[    0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS:
>00200002
>[    0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
>
>---
>bisect log:
>
># bad: [8c9733fd9806c71e7f2313a280f98cb3051f93df] Add linux-next
>specific files for 20181123
># good: [9ff01193a20d391e8dbce4403dd5ef87c7eaaca6] Linux 4.20-rc3
>git bisect start 'HEAD' 'v4.20-rc3'
># good: [34c2101b4f765edf1b91c2837da9c60fbf9f6912] Merge
>remote-tracking branch 'spi-nor/spi-nor/next'
>git bisect good 34c2101b4f765edf1b91c2837da9c60fbf9f6912
># good: [15367a0657fc8027ff3466bf0202bde9f270259b] Merge
>remote-tracking branch 'kgdb/kgdb-next'
>git bisect good 15367a0657fc8027ff3466bf0202bde9f270259b
># bad: [d29686ab179c34c5dbaac067a9effbeeb6a8073e] Merge remote-tracking
>branch 'soundwire/next'
>git bisect bad d29686ab179c34c5dbaac067a9effbeeb6a8073e
># bad: [7cd63670817c236ccaf21ffe9d7b4921afeab130] Merge remote-tracking
>branch 'tip/auto-latest'
>git bisect bad 7cd63670817c236ccaf21ffe9d7b4921afeab130
># good: [f9c8ebdcb052214954136426990e13a88e45e906] Merge
>remote-tracking branch 'audit/next'
>git bisect good f9c8ebdcb052214954136426990e13a88e45e906
># good: [4142a8a10e7673fa71ff4cb5b4693c4dda299f7c] Merge
>remote-tracking branch 'spi/for-next'
>git bisect good 4142a8a10e7673fa71ff4cb5b4693c4dda299f7c
># good: [fc922a4c7c72131f6c65e4af4e36ff25f3af1485] Merge branch
>'x86/cpu'
>git bisect good fc922a4c7c72131f6c65e4af4e36ff25f3af1485
># good: [906d11c1ff007ad18e2f3477cca04c437ce19c64] Merge branch
>'x86/microcode'
>git bisect good 906d11c1ff007ad18e2f3477cca04c437ce19c64
># good: [d836e1d42739a86738dd92a1b7ab6729b2485cba] Merge branch
>'x86/platform'
>git bisect good d836e1d42739a86738dd92a1b7ab6729b2485cba
># bad: [191bba684b7459842e72b52718ec61f8bdbd8499] Merge branch
>'x86/timers'
>git bisect bad 191bba684b7459842e72b52718ec61f8bdbd8499
># good: [f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee] Merge branch
>'x86/pti'
>git bisect good f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee
># bad: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC: Use RDTSCP
>git bisect bad 2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
># good: [a786ef152cdcfebc923a67f63c7815806eefcf81] x86/tsc: Make
>calibration refinement more robust
>git bisect good a786ef152cdcfebc923a67f63c7815806eefcf81
># first bad commit: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC:
>Use RDTSCP

Right, because that cpu predates RDTSCP, so it needs to use a fallback.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-23 20:22   ` hpa
@ 2018-11-23 20:44     ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2018-11-23 20:44 UTC (permalink / raw)
  To: hpa
  Cc: Guenter Roeck, Borislav Petkov, X86 ML, LKML, Andy Lutomirski,
	John Stultz, Thomas Lendacky

On Fri, 23 Nov 2018, hpa@zytor.com wrote:
> On November 23, 2018 12:03:07 PM PST, Guenter Roeck <linux@roeck-us.net> wrote:
> ># first bad commit: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC:
> >Use RDTSCP
> 
> Right, because that cpu predates RDTSCP, so it needs to use a fallback.

Well, that's not the problem.

The default here is MFENCE; RDTSC; But P3 does not have MFENCE either :)

We need alternative_3 with default RDTSC

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-23 20:03 ` [PATCH] " Guenter Roeck
  2018-11-23 20:22   ` hpa
@ 2018-11-23 20:44   ` Borislav Petkov
  2018-11-23 21:03     ` Guenter Roeck
  1 sibling, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2018-11-23 20:44 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: X86 ML, LKML, Andy Lutomirski, H. Peter Anvin, John Stultz,
	Thomas Lendacky

On Fri, Nov 23, 2018 at 12:03:07PM -0800, Guenter Roeck wrote:
> [    0.762832] EIP: read_tsc+0x4/0x10
> [    0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b

Where does that 'b' in f0b come from?

But ok, I was able to reproduce and decode myself. So if the Code:
section is correct, qemu chokes on MFENCE.

[ 0.854209] Code: 90 90 90 90 90 90 90 a1 84 37 11 cd c3 8d b4 26 00 00 00 00 8d 76 00 c3 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 55 89 e5 <0f> ae f0 0f 31 5d c3 8d b6 00 00 00 00 55 89 e5 57 31 ff 56 53 89
All code
========
   0:   90                      nop
   1:   90                      nop
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop
   5:   90                      nop
   6:   90                      nop
   7:   a1 84 37 11 cd          mov    0xcd113784,%eax
   c:   c3                      ret    
   d:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
  14:   8d 76 00                lea    0x0(%esi),%esi
  17:   c3                      ret    
  18:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
  1f:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
  26:   90                      nop
  27:   55                      push   %ebp
  28:   89 e5                   mov    %esp,%ebp
  2a:*  0f ae f0                mfence          <-- trapping instruction
  2d:   0f 31                   rdtsc  
  2f:   5d                      pop    %ebp
  30:   c3                      ret    
  31:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  37:   55                      push   %ebp
  38:   89 e5                   mov    %esp,%ebp
  3a:   57                      push   %edi
  3b:   31 ff                   xor    %edi,%edi
  3d:   56                      push   %esi
  3e:   53                      push   %ebx
  3f:   89                      .byte 0x89

Doing this:

        asm volatile(ALTERNATIVE_2("mfence", ...

fails too which confirms that P3 can't do MFENCE.

I need to think about how to handle that old cruft.

Thx for the report.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-23 20:44   ` Borislav Petkov
@ 2018-11-23 21:03     ` Guenter Roeck
  2018-11-23 21:07       ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2018-11-23 21:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Andy Lutomirski, H. Peter Anvin, John Stultz,
	Thomas Lendacky

On 11/23/18 12:44 PM, Borislav Petkov wrote:
> On Fri, Nov 23, 2018 at 12:03:07PM -0800, Guenter Roeck wrote:
>> [    0.762832] EIP: read_tsc+0x4/0x10
>> [    0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90 90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5 57 <0f> ae f0b
> 
> Where does that 'b' in f0b come from?
> 

Good catch.

It is a cut off screen log. x86 boots change xterm configuration from
wrap to non-wrap, and I did a cut-and-paste instead of copying the log
to a file. Sorry for that.

Guenter

> But ok, I was able to reproduce and decode myself. So if the Code:
> section is correct, qemu chokes on MFENCE.
> 
> [ 0.854209] Code: 90 90 90 90 90 90 90 a1 84 37 11 cd c3 8d b4 26 00 00 00 00 8d 76 00 c3 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 55 89 e5 <0f> ae f0 0f 31 5d c3 8d b6 00 00 00 00 55 89 e5 57 31 ff 56 53 89
> All code
> ========
>     0:   90                      nop
>     1:   90                      nop
>     2:   90                      nop
>     3:   90                      nop
>     4:   90                      nop
>     5:   90                      nop
>     6:   90                      nop
>     7:   a1 84 37 11 cd          mov    0xcd113784,%eax
>     c:   c3                      ret
>     d:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
>    14:   8d 76 00                lea    0x0(%esi),%esi
>    17:   c3                      ret
>    18:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
>    1f:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
>    26:   90                      nop
>    27:   55                      push   %ebp
>    28:   89 e5                   mov    %esp,%ebp
>    2a:*  0f ae f0                mfence          <-- trapping instruction
>    2d:   0f 31                   rdtsc
>    2f:   5d                      pop    %ebp
>    30:   c3                      ret
>    31:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
>    37:   55                      push   %ebp
>    38:   89 e5                   mov    %esp,%ebp
>    3a:   57                      push   %edi
>    3b:   31 ff                   xor    %edi,%edi
>    3d:   56                      push   %esi
>    3e:   53                      push   %ebx
>    3f:   89                      .byte 0x89
> 
> Doing this:
> 
>          asm volatile(ALTERNATIVE_2("mfence", ...
> 
> fails too which confirms that P3 can't do MFENCE.
> 
> I need to think about how to handle that old cruft.
> 
> Thx for the report.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-23 21:03     ` Guenter Roeck
@ 2018-11-23 21:07       ` Borislav Petkov
  2018-12-07 18:39         ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2018-11-23 21:07 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: X86 ML, LKML, Andy Lutomirski, H. Peter Anvin, John Stultz,
	Thomas Lendacky

On Fri, Nov 23, 2018 at 01:03:25PM -0800, Guenter Roeck wrote:
> It is a cut off screen log. x86 boots change xterm configuration from
> wrap to non-wrap, and I did a cut-and-paste instead of copying the log
> to a file. Sorry for that.

No worries.

It was a head-scratcher though because look what it decodes to with the
'b':

[ 0.854209] Code: 90 90 90 90 90 90 90 a1 84 37 11 cd c3 8d b4 26 00 00 00 00 8d 76 00 c3 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 55 89 e5 <0f> ae f0b 0f 31 5d c3 8d b6 00 00 00 00 55 89 e5 57 31 ff 56 53 89
All code
========
   0:   90                      nop
   1:   90                      nop
   2:   90                      nop
   3:   90                      nop
   4:   90                      nop
   5:   90                      nop
   6:   90                      nop
   7:   a1 84 37 11 cd          mov    0xcd113784,%eax
   c:   c3                      ret
   d:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
  14:   8d 76 00                lea    0x0(%esi),%esi
  17:   c3                      ret
  18:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
  1f:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
  26:   90                      nop
  27:   55                      push   %ebp
  28:   89 e5                   mov    %esp,%ebp
  2a:*  0f ae 0b                fxrstor (%ebx)          <-- trapping instruction
  2d:   0f 31                   rdtsc
  2f:   5d                      pop    %ebp
  30:   c3                      ret

FXRSTOR?!?!?

:-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86/TSC: Use RDTSCP
  2018-11-23 21:07       ` Borislav Petkov
@ 2018-12-07 18:39         ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2018-12-07 18:39 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: X86 ML, LKML, Andy Lutomirski, H. Peter Anvin, John Stultz,
	Thomas Lendacky

Ok,

I have something which looks like it works here, even with the pentium3
qemu CPU. I'll be hammering on it in the coming days but if you wanna
give it a try, here's a conglomerate patch:

---
From 0c6adce02d2e7f3b5bbdc4cbe3eb3dae99448def Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Fri, 7 Dec 2018 18:54:23 +0100
Subject: [PATCH] WIP

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/alternative.h | 26 +++++++++++++++++++++++++-
 arch/x86/include/asm/msr.h         | 16 ++++++++++++++--
 arch/x86/kernel/alternative.c      |  4 ++--
 3 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index ea9886651c39..db8ebe5dd5be 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -114,6 +114,16 @@ static inline int alternatives_text_reserved(void *start, void *end)
 		"(" alt_max_short(alt_rlen(num1), alt_rlen(num2)) " - (" alt_slen ")), 0x90\n"	\
 	alt_end_marker ":\n"
 
+#define OLDINSTR_3(oldinsn, n1, n2, n3)								\
+	"# ALT: oldinstr\n"										\
+	"661:\n\t" oldinsn "\n662:\n"								\
+	"# ALT: padding\n"										\
+	".skip -((" alt_max_short(alt_max_short(alt_rlen(n1), alt_rlen(n2)), alt_rlen(n3))	\
+		" - (" alt_slen ")) > 0) * "							\
+		"(" alt_max_short(alt_max_short(alt_rlen(n1), alt_rlen(n2)), alt_rlen(n3))	\
+		" - (" alt_slen ")), 0x90\n"							\
+	alt_end_marker ":\n"
+
 #define ALTINSTR_ENTRY(feature, num)					      \
 	" .long 661b - .\n"				/* label           */ \
 	" .long " b_replacement(num)"f - .\n"		/* new instruction */ \
@@ -122,7 +132,8 @@ static inline int alternatives_text_reserved(void *start, void *end)
 	" .byte " alt_rlen(num) "\n"			/* replacement len */ \
 	" .byte " alt_pad_len "\n"			/* pad len */
 
-#define ALTINSTR_REPLACEMENT(newinstr, feature, num)	/* replacement */     \
+#define ALTINSTR_REPLACEMENT(newinstr, feature, num)	/* replacement */	\
+	"# ALT: replacement " #num "\n"						\
 	b_replacement(num)":\n\t" newinstr "\n" e_replacement(num) ":\n\t"
 
 /* alternative assembly primitive: */
@@ -146,6 +157,19 @@ static inline int alternatives_text_reserved(void *start, void *end)
 	ALTINSTR_REPLACEMENT(newinstr2, feature2, 2)			\
 	".popsection\n"
 
+#define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, feat3) \
+	OLDINSTR_3(oldinsn, 1, 2, 3)						\
+	".pushsection .altinstructions,\"a\"\n"					\
+	ALTINSTR_ENTRY(feat1, 1)						\
+	ALTINSTR_ENTRY(feat2, 2)						\
+	ALTINSTR_ENTRY(feat3, 3)						\
+	".popsection\n"								\
+	".pushsection .altinstr_replacement, \"ax\"\n"				\
+	ALTINSTR_REPLACEMENT(newinsn1, feat1, 1)				\
+	ALTINSTR_REPLACEMENT(newinsn2, feat2, 2)				\
+	ALTINSTR_REPLACEMENT(newinsn3, feat3, 3)				\
+	".popsection\n"
+
 /*
  * Alternative instructions for different CPU types or capabilities.
  *
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..5cc3930cb465 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
  */
 static __always_inline unsigned long long rdtsc_ordered(void)
 {
+	DECLARE_ARGS(val, low, high);
+
 	/*
 	 * The RDTSC instruction is not ordered relative to memory
 	 * access.  The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,19 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * ordering guarantees as reading from a global memory location
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
+	 *
+	 * Thus, use the preferred barrier on the respective CPU, aiming for
+	 * RDTSCP as the default.
 	 */
-	barrier_nospec();
-	return rdtsc();
+	asm volatile(ALTERNATIVE_3("rdtsc",
+				   "mfence; rdtsc", X86_FEATURE_MFENCE_RDTSC,
+				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+				   "rdtscp", X86_FEATURE_RDTSCP)
+			: EAX_EDX_RET(val, low, high)
+			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+			:: "ecx");
+
+	return EAX_EDX_VAL(val, low, high);
 }
 
 static inline unsigned long long native_read_pmc(int counter)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ebeac487a20c..d458c7973c56 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -393,10 +393,10 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
 			continue;
 		}
 
-		DPRINTK("feat: %d*32+%d, old: (%px len: %d), repl: (%px, len: %d), pad: %d",
+		DPRINTK("feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d), pad: %d",
 			a->cpuid >> 5,
 			a->cpuid & 0x1f,
-			instr, a->instrlen,
+			instr, instr, a->instrlen,
 			replacement, a->replacementlen, a->padlen);
 
 		DUMP_BYTES(instr, a->instrlen, "%px: old_insn: ", instr);
-- 
2.19.1


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:x86/alternatives] x86/TSC: Use RDTSCP
  2018-11-19 18:45 [PATCH] x86/TSC: Use RDTSCP Borislav Petkov
                   ` (3 preceding siblings ...)
  2018-11-23 20:03 ` [PATCH] " Guenter Roeck
@ 2019-01-16 11:59 ` tip-bot for Borislav Petkov
  4 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Borislav Petkov @ 2019-01-16 11:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, luto, linux-kernel, thomas.lendacky, john.stultz, hpa, tglx, bp

Commit-ID:  093ae8f9a86a974c920b613860f1f7fd5bbd70ab
Gitweb:     https://git.kernel.org/tip/093ae8f9a86a974c920b613860f1f7fd5bbd70ab
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Thu, 12 Apr 2018 13:11:36 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Wed, 16 Jan 2019 12:43:08 +0100

x86/TSC: Use RDTSCP

Currently, the kernel uses

  [LM]FENCE; RDTSC

in the timekeeping code, to guarantee monotonicity of time where the
*FENCE is selected based on vendor.

Replace that sequence with RDTSCP which is faster or on-par and gives
the same guarantees.

A microbenchmark on Intel shows that the change is on-par.

On AMD, the change is either on-par with the current LFENCE-prefixed
RDTSC or slightly better with RDTSCP.

The comparison is done with the LFENCE-prefixed RDTSC (and not with the
MFENCE-prefixed one, as one would normally expect) because all modern
AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
effectively enabling X86_FEATURE_LFENCE_RDTSC.

Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181119184556.11479-1-bp@alien8.de
---
 arch/x86/include/asm/msr.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 91e4cf189914..5cc3930cb465 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -217,6 +217,8 @@ static __always_inline unsigned long long rdtsc(void)
  */
 static __always_inline unsigned long long rdtsc_ordered(void)
 {
+	DECLARE_ARGS(val, low, high);
+
 	/*
 	 * The RDTSC instruction is not ordered relative to memory
 	 * access.  The Intel SDM and the AMD APM are both vague on this
@@ -227,9 +229,19 @@ static __always_inline unsigned long long rdtsc_ordered(void)
 	 * ordering guarantees as reading from a global memory location
 	 * that some other imaginary CPU is updating continuously with a
 	 * time stamp.
+	 *
+	 * Thus, use the preferred barrier on the respective CPU, aiming for
+	 * RDTSCP as the default.
 	 */
-	barrier_nospec();
-	return rdtsc();
+	asm volatile(ALTERNATIVE_3("rdtsc",
+				   "mfence; rdtsc", X86_FEATURE_MFENCE_RDTSC,
+				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+				   "rdtscp", X86_FEATURE_RDTSCP)
+			: EAX_EDX_RET(val, low, high)
+			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+			:: "ecx");
+
+	return EAX_EDX_VAL(val, low, high);
 }
 
 static inline unsigned long long native_read_pmc(int counter)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-16 12:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-19 18:45 [PATCH] x86/TSC: Use RDTSCP Borislav Petkov
2018-11-19 19:02 ` Lendacky, Thomas
2018-11-19 19:52 ` Andy Lutomirski
2018-11-19 20:17   ` H. Peter Anvin
2018-11-19 20:40     ` Borislav Petkov
2018-11-19 20:48       ` hpa
2018-11-20  9:11 ` [tip:x86/timers] " tip-bot for Borislav Petkov
2018-11-23 20:03 ` [PATCH] " Guenter Roeck
2018-11-23 20:22   ` hpa
2018-11-23 20:44     ` Thomas Gleixner
2018-11-23 20:44   ` Borislav Petkov
2018-11-23 21:03     ` Guenter Roeck
2018-11-23 21:07       ` Borislav Petkov
2018-12-07 18:39         ` Borislav Petkov
2019-01-16 11:59 ` [tip:x86/alternatives] " tip-bot for Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).