linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
@ 2020-08-21  2:50 Sean Christopherson
  2020-08-21  7:24 ` peterz
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Sean Christopherson @ 2020-08-21  2:50 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Dave Hansen, Chang Seok Bae,
	Peter Zijlstra, Sasha Levin, Paolo Bonzini, kvm, Tom Lendacky,
	Sean Christopherson

Don't use RDPID in the paranoid entry flow if KVM is enabled as doing so
can consume a KVM guest's MSR_TSC_AUX value if an NMI arrives in KVM's
run loop.

As a performance optimization, KVM loads the guest's TSC_AUX when a CPU
first enters its run loop, and on AMD's SVM doesn't restore the host's
value until the CPU exits the run loop.  VMX is even more aggressive and
defers restoring the host's value until the CPU returns to userspace.
This optimization obviously relies on the kernel not consuming TSC_AUX,
which falls apart if an NMI arrives in the run loop.

Removing KVM's optimizaton would be painful, as both SVM and VMX would
need to context switch the MSR on every VM-Enter (2x WRMSR + 1x RDMSR),
whereas using LSL instead RDPID is a minor blip.

Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Chang Seok Bae <chang.seok.bae@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Debugged-by: Tom Lendacky <thomas.lendacky@amd.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---

Andy, I know you said "unconditionally", but it felt weird adding a
comment way down in GET_PERCPU_BASE without plumbing a param in to help
provide context.  But, paranoid_entry is the only user so adding a param
that is unconditional also felt weird.  That being said, I definitely
don't have a strong opinion one way or the other.

 arch/x86/entry/calling.h  | 10 +++++++---
 arch/x86/entry/entry_64.S |  7 ++++++-
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 98e4d8886f11c..a925c0cf89c1a 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -342,9 +342,9 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
-.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req no_rdpid=0
 	rdgsbase \save_reg
-	GET_PERCPU_BASE \scratch_reg
+	GET_PERCPU_BASE \scratch_reg \no_rdpid
 	wrgsbase \scratch_reg
 .endm
 
@@ -375,11 +375,15 @@ For 32-bit we have the following conventions - kernel is built with
  * We normally use %gs for accessing per-CPU data, but we are setting up
  * %gs here and obviously can not use %gs itself to access per-CPU data.
  */
-.macro GET_PERCPU_BASE reg:req
+.macro GET_PERCPU_BASE reg:req no_rdpid=0
+	.if \no_rdpid
+	LOAD_CPU_AND_NODE_SEG_LIMIT \reg
+	.else
 	ALTERNATIVE \
 		"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
 		"RDPID	\reg", \
 		X86_FEATURE_RDPID
+	.endif
 	andq	$VDSO_CPUNODE_MASK, \reg
 	movq	__per_cpu_offset(, \reg, 8), \reg
 .endm
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 70dea93378162..fd915c46297c5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -842,8 +842,13 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	 *
 	 * The MSR write ensures that no subsequent load is based on a
 	 * mispredicted GSBASE. No extra FENCE required.
+	 *
+	 * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX
+	 * if an NMI arrives in KVM's run loop.  KVM loads guest's TSC_AUX on
+	 * VM-Enter and may not restore the host's value until the CPU returns
+	 * to userspace, i.e. KVM depends on the kernel not using TSC_AUX.
 	 */
-	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
+	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx no_rdpid=IS_ENABLED(CONFIG_KVM)
 	ret
 
 .Lparanoid_entry_checkgs:
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  2:50 [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled Sean Christopherson
@ 2020-08-21  7:24 ` peterz
  2020-08-21  7:44   ` Borislav Petkov
  2020-08-21  7:47 ` Borislav Petkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-21  7:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, linux-kernel, Dave Hansen, Chang Seok Bae,
	Sasha Levin, Paolo Bonzini, kvm, Tom Lendacky

On Thu, Aug 20, 2020 at 07:50:50PM -0700, Sean Christopherson wrote:
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 70dea93378162..fd915c46297c5 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -842,8 +842,13 @@ SYM_CODE_START_LOCAL(paranoid_entry)
>  	 *
>  	 * The MSR write ensures that no subsequent load is based on a
>  	 * mispredicted GSBASE. No extra FENCE required.
> +	 *
> +	 * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX
> +	 * if an NMI arrives in KVM's run loop.  KVM loads guest's TSC_AUX on
> +	 * VM-Enter and may not restore the host's value until the CPU returns
> +	 * to userspace, i.e. KVM depends on the kernel not using TSC_AUX.
>  	 */
> -	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
> +	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx no_rdpid=IS_ENABLED(CONFIG_KVM)
>  	ret

With distro configs that's going to be a guaranteed no_rdpid. Also with
a grand total of 0 performance numbers that RDPID is even worth it, I'd
suggest to just unconditionally remove that thing. Simpler code
all-around.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  7:24 ` peterz
@ 2020-08-21  7:44   ` Borislav Petkov
  0 siblings, 0 replies; 19+ messages in thread
From: Borislav Petkov @ 2020-08-21  7:44 UTC (permalink / raw)
  To: peterz
  Cc: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, x86, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Sasha Levin, Paolo Bonzini, kvm, Tom Lendacky

On Fri, Aug 21, 2020 at 09:24:14AM +0200, peterz@infradead.org wrote:
> With distro configs that's going to be a guaranteed no_rdpid. Also with
> a grand total of 0 performance numbers that RDPID is even worth it, I'd
> suggest to just unconditionally remove that thing. Simpler code
> all-around.

I was just about to say the same thing - all this dancing around just to
keep RDPID better be backed by numbers, otherwise just remove the damn
thing in that path and use LSL and be done with it.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  2:50 [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled Sean Christopherson
  2020-08-21  7:24 ` peterz
@ 2020-08-21  7:47 ` Borislav Petkov
  2020-08-21  8:09   ` Paolo Bonzini
  2020-08-21  8:56 ` kernel test robot
  2020-08-21 10:28 ` kernel test robot
  3 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2020-08-21  7:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86,
	H. Peter Anvin, linux-kernel, Dave Hansen, Chang Seok Bae,
	Peter Zijlstra, Sasha Levin, Paolo Bonzini, kvm, Tom Lendacky

On Thu, Aug 20, 2020 at 07:50:50PM -0700, Sean Christopherson wrote:
> +	 * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX
> +	 * if an NMI arrives in KVM's run loop.  KVM loads guest's TSC_AUX on
> +	 * VM-Enter and may not restore the host's value until the CPU returns
> +	 * to userspace, i.e. KVM depends on the kernel not using TSC_AUX.
>  	 */

And frankly, this is really unfair. The kernel should be able to use any
MSR. IOW, KVM needs to be fixed here. I'm sure it context-switches other
MSRs so one more MSR is not a big deal.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  7:47 ` Borislav Petkov
@ 2020-08-21  8:09   ` Paolo Bonzini
  2020-08-21  8:16     ` Borislav Petkov
  2020-08-21  9:28     ` Thomas Gleixner
  0 siblings, 2 replies; 19+ messages in thread
From: Paolo Bonzini @ 2020-08-21  8:09 UTC (permalink / raw)
  To: Borislav Petkov, Sean Christopherson
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, x86,
	H. Peter Anvin, linux-kernel, Dave Hansen, Chang Seok Bae,
	Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On 21/08/20 09:47, Borislav Petkov wrote:
> On Thu, Aug 20, 2020 at 07:50:50PM -0700, Sean Christopherson wrote:
>> +	 * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX
>> +	 * if an NMI arrives in KVM's run loop.  KVM loads guest's TSC_AUX on
>> +	 * VM-Enter and may not restore the host's value until the CPU returns
>> +	 * to userspace, i.e. KVM depends on the kernel not using TSC_AUX.
>>  	 */
> And frankly, this is really unfair. The kernel should be able to use any
> MSR. IOW, KVM needs to be fixed here. I'm sure it context-switches other
> MSRs so one more MSR is not a big deal.

The only MSR that KVM needs to context-switch manually are XSS and
SPEC_CTRL.  They tend to be the same on host and guest in which case
they can be optimized away.

All the other MSRs (EFER and PAT are those that come to mind) are
handled by the microcode and thus they don't have the slowness of
RDMSR/WRMSR

One more MSR *is* a big deal: KVM's vmentry+vmexit cost is around 1000
cycles, adding 100 clock cycles for 2 WRMSRs is a 10% increase.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  8:09   ` Paolo Bonzini
@ 2020-08-21  8:16     ` Borislav Petkov
  2020-08-21  9:05       ` Paolo Bonzini
  2020-08-21  9:28     ` Thomas Gleixner
  1 sibling, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2020-08-21  8:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, x86, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On Fri, Aug 21, 2020 at 10:09:01AM +0200, Paolo Bonzini wrote:
> One more MSR *is* a big deal: KVM's vmentry+vmexit cost is around 1000
> cycles, adding 100 clock cycles for 2 WRMSRs is a 10% increase.

The kernel uses TSC_AUX so we can't reserve it to KVM either.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  2:50 [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled Sean Christopherson
  2020-08-21  7:24 ` peterz
  2020-08-21  7:47 ` Borislav Petkov
@ 2020-08-21  8:56 ` kernel test robot
  2020-08-21 10:28 ` kernel test robot
  3 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-08-21  8:56 UTC (permalink / raw)
  To: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86
  Cc: kbuild-all, clang-built-linux, H. Peter Anvin, linux-kernel,
	Dave Hansen, Chang Seok Bae, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 4670 bytes --]

Hi Sean,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v5.9-rc1 next-20200821]
[cannot apply to tip/x86/asm luto/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sean-Christopherson/x86-entry-64-Disallow-RDPID-in-paranoid-entry-if-KVM-is-enabled/20200821-105339
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git a9bd3a91d6e49ebd2d7d8ace91d4cc339c382a31
config: x86_64-randconfig-a005-20200820 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project b587ca93be114d07ec3bf654add97d7872325281)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/x86/entry/entry_64.S: Assembler messages:
>> arch/x86/entry/entry_64.S:851: Error: too many positional arguments
   clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)

# https://github.com/0day-ci/linux/commit/bebb51882f9c18938e44b6a7b66fdf0452eea142
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Sean-Christopherson/x86-entry-64-Disallow-RDPID-in-paranoid-entry-if-KVM-is-enabled/20200821-105339
git checkout bebb51882f9c18938e44b6a7b66fdf0452eea142
vim +851 arch/x86/entry/entry_64.S

   794	
   795	/*
   796	 * Save all registers in pt_regs. Return GSBASE related information
   797	 * in EBX depending on the availability of the FSGSBASE instructions:
   798	 *
   799	 * FSGSBASE	R/EBX
   800	 *     N        0 -> SWAPGS on exit
   801	 *              1 -> no SWAPGS on exit
   802	 *
   803	 *     Y        GSBASE value at entry, must be restored in paranoid_exit
   804	 */
   805	SYM_CODE_START_LOCAL(paranoid_entry)
   806		UNWIND_HINT_FUNC
   807		cld
   808		PUSH_AND_CLEAR_REGS save_ret=1
   809		ENCODE_FRAME_POINTER 8
   810	
   811		/*
   812		 * Always stash CR3 in %r14.  This value will be restored,
   813		 * verbatim, at exit.  Needed if paranoid_entry interrupted
   814		 * another entry that already switched to the user CR3 value
   815		 * but has not yet returned to userspace.
   816		 *
   817		 * This is also why CS (stashed in the "iret frame" by the
   818		 * hardware at entry) can not be used: this may be a return
   819		 * to kernel code, but with a user CR3 value.
   820		 *
   821		 * Switching CR3 does not depend on kernel GSBASE so it can
   822		 * be done before switching to the kernel GSBASE. This is
   823		 * required for FSGSBASE because the kernel GSBASE has to
   824		 * be retrieved from a kernel internal table.
   825		 */
   826		SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
   827	
   828		/*
   829		 * Handling GSBASE depends on the availability of FSGSBASE.
   830		 *
   831		 * Without FSGSBASE the kernel enforces that negative GSBASE
   832		 * values indicate kernel GSBASE. With FSGSBASE no assumptions
   833		 * can be made about the GSBASE value when entering from user
   834		 * space.
   835		 */
   836		ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
   837	
   838		/*
   839		 * Read the current GSBASE and store it in %rbx unconditionally,
   840		 * retrieve and set the current CPUs kernel GSBASE. The stored value
   841		 * has to be restored in paranoid_exit unconditionally.
   842		 *
   843		 * The MSR write ensures that no subsequent load is based on a
   844		 * mispredicted GSBASE. No extra FENCE required.
   845		 *
   846		 * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX
   847		 * if an NMI arrives in KVM's run loop.  KVM loads guest's TSC_AUX on
   848		 * VM-Enter and may not restore the host's value until the CPU returns
   849		 * to userspace, i.e. KVM depends on the kernel not using TSC_AUX.
   850		 */
 > 851		SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx no_rdpid=IS_ENABLED(CONFIG_KVM)
   852		ret
   853	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24946 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  8:16     ` Borislav Petkov
@ 2020-08-21  9:05       ` Paolo Bonzini
  2020-08-21  9:22         ` Borislav Petkov
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2020-08-21  9:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, x86, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On 21/08/20 10:16, Borislav Petkov wrote:
> On Fri, Aug 21, 2020 at 10:09:01AM +0200, Paolo Bonzini wrote:
>> One more MSR *is* a big deal: KVM's vmentry+vmexit cost is around 1000
>> cycles, adding 100 clock cycles for 2 WRMSRs is a 10% increase.
> 
> The kernel uses TSC_AUX so we can't reserve it to KVM either.

KVM only uses TSC_AUX while in kernel space, because the kernel hadn't
used it until now.  That's for a good reason:

* if possible, __this_cpu_read(cpu_number) is always faster.

* The kernel can just block preemption at its will and has no need for
the atomic rdtsc+vgetcpu provided by RDTSCP.

So far, the kernel had always used LSL instead of RDPID when
__this_cpu_read was not available.  In one place, RDTSCP is used as an
ordered rdtsc but it discards the TSC_AUX value.  RDPID is also used in
the vDSO but it isn't kernel space.

Hence the assumption that KVM makes (and has made ever since TSC_AUX was
introduced.  What is the difference in speed between LSL and RDPID?  I
don't have a machine that has RDPID to test it, unfortunately.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  9:05       ` Paolo Bonzini
@ 2020-08-21  9:22         ` Borislav Petkov
  2020-08-21  9:44           ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2020-08-21  9:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, x86, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On Fri, Aug 21, 2020 at 11:05:24AM +0200, Paolo Bonzini wrote:
> ... RDTSCP is used as an ordered rdtsc but it discards the TSC_AUX
> value.

... and now because KVM is using it, the kernel can forget using
TSC_AUX. Are you kidding me?!

> Hence the assumption that KVM makes (and has made ever since TSC_AUX was
> introduced.

And *this* is the problem. KVM can't just be grabbing MSRs and claiming
them because it started using them first. You guys need to stop this. If
it is a shared resource, there better be an agreement about sharing it.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  8:09   ` Paolo Bonzini
  2020-08-21  8:16     ` Borislav Petkov
@ 2020-08-21  9:28     ` Thomas Gleixner
  2020-08-21  9:37       ` Paolo Bonzini
  2020-08-21 19:55       ` hpa
  1 sibling, 2 replies; 19+ messages in thread
From: Thomas Gleixner @ 2020-08-21  9:28 UTC (permalink / raw)
  To: Paolo Bonzini, Borislav Petkov, Sean Christopherson
  Cc: Andy Lutomirski, Ingo Molnar, x86, H. Peter Anvin, linux-kernel,
	Dave Hansen, Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm,
	Tom Lendacky

On Fri, Aug 21 2020 at 10:09, Paolo Bonzini wrote:
> On 21/08/20 09:47, Borislav Petkov wrote:
>> On Thu, Aug 20, 2020 at 07:50:50PM -0700, Sean Christopherson wrote:
>>> +	 * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX
>>> +	 * if an NMI arrives in KVM's run loop.  KVM loads guest's TSC_AUX on
>>> +	 * VM-Enter and may not restore the host's value until the CPU returns
>>> +	 * to userspace, i.e. KVM depends on the kernel not using TSC_AUX.
>>>  	 */
>> And frankly, this is really unfair. The kernel should be able to use any
>> MSR. IOW, KVM needs to be fixed here. I'm sure it context-switches other
>> MSRs so one more MSR is not a big deal.
>
> The only MSR that KVM needs to context-switch manually are XSS and
> SPEC_CTRL.  They tend to be the same on host and guest in which case
> they can be optimized away.
>
> All the other MSRs (EFER and PAT are those that come to mind) are
> handled by the microcode and thus they don't have the slowness of
> RDMSR/WRMSR
>
> One more MSR *is* a big deal: KVM's vmentry+vmexit cost is around 1000
> cycles, adding 100 clock cycles for 2 WRMSRs is a 10% increase.

We all know that MSRs are slow, but as a general rule I have to make it
entirely clear that the kernel has precedence over KVM.

If the kernel wants to use an MSR for it's own purposes then KVM has to
deal with that and not the other way round. Preventing the kernel from
using a facility freely is not an option ever.

The insanities of KVM performance optimizations have bitten us more than
once.

For this particular case at hand I don't care much and we should just
rip the whole RDPID thing out unconditionally. We still have zero
numbers about the performance difference vs. LSL.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  9:28     ` Thomas Gleixner
@ 2020-08-21  9:37       ` Paolo Bonzini
  2020-08-21 19:55       ` hpa
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2020-08-21  9:37 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov, Sean Christopherson
  Cc: Andy Lutomirski, Ingo Molnar, x86, H. Peter Anvin, linux-kernel,
	Dave Hansen, Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm,
	Tom Lendacky

On 21/08/20 11:28, Thomas Gleixner wrote:
> We all know that MSRs are slow, but as a general rule I have to make it
> entirely clear that the kernel has precedence over KVM.

I totally agree.  I just don't think that it matters _in this case_,
because the kernel hardly has any reason to use TSC_AUX while in ring0.

Paolo

> If the kernel wants to use an MSR for it's own purposes then KVM has to
> deal with that and not the other way round. Preventing the kernel from
> using a facility freely is not an option ever.
> 
> The insanities of KVM performance optimizations have bitten us more than
> once.
> 
> For this particular case at hand I don't care much and we should just
> rip the whole RDPID thing out unconditionally. We still have zero
> numbers about the performance difference vs. LSL.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  9:22         ` Borislav Petkov
@ 2020-08-21  9:44           ` Paolo Bonzini
  2020-08-21  9:48             ` Borislav Petkov
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2020-08-21  9:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, x86, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On 21/08/20 11:22, Borislav Petkov wrote:
>> Hence the assumption that KVM makes (and has made ever since TSC_AUX was
>> introduced.
> And *this* is the problem. KVM can't just be grabbing MSRs and claiming
> them because it started using them first. You guys need to stop this. If
> it is a shared resource, there better be an agreement about sharing it.

It's not like we grab MSRs every day.  The user-return notifier restores
6 MSRs (7 on very old processors).  The last two that were added were
MSR_TSC_AUX itself in 2009 (!) and MSR_IA32_TSX_CTRL last year.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  9:44           ` Paolo Bonzini
@ 2020-08-21  9:48             ` Borislav Petkov
  2020-08-21 10:07               ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2020-08-21  9:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, x86, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On Fri, Aug 21, 2020 at 11:44:33AM +0200, Paolo Bonzini wrote:
> It's not like we grab MSRs every day.  The user-return notifier restores
> 6 MSRs (7 on very old processors).  The last two that were added were
> MSR_TSC_AUX itself in 2009 (!) and MSR_IA32_TSX_CTRL last year.

What about "If it is a shared resource, there better be an agreement
about sharing it." is not clear?

It doesn't matter how many or which resources - there needs to be a
contract for shared use so that shared use is possible. It is that
simple.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  9:48             ` Borislav Petkov
@ 2020-08-21 10:07               ` Paolo Bonzini
  2020-08-22 16:42                 ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2020-08-21 10:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, x86, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On 21/08/20 11:48, Borislav Petkov wrote:
>> It's not like we grab MSRs every day.  The user-return notifier restores
>> 6 MSRs (7 on very old processors).  The last two that were added were
>> MSR_TSC_AUX itself in 2009 (!) and MSR_IA32_TSX_CTRL last year.
> What about "If it is a shared resource, there better be an agreement
> about sharing it." is not clear?
> 
> It doesn't matter how many or which resources - there needs to be a
> contract for shared use so that shared use is possible. It is that
> simple.

Sure, and I'll make sure to have that discussion the next time we add a
shared MSR in 2029.

In the meanwhile:

* for the syscall MSRs, patches to share them were reviewed by hpa and
peterz so I guess there's no problem.

* for MSR_TSC_AUX, which is the one that is causing problems, everybody
seems to agree with just using LSL (in the lack specific numbers on
performance improvements from RDPID).

* for MSR_IA32_TSX_CTRL.RTM_DISABLE, which is the only one that was
added in the last 10 years, I'm pretty sure there are no plans for using
the Trusty Sidechannel eXtension in the kernel.  So that should be fine
too.  (The CPUID_CLEAR bit of the MSR is not shared).

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  2:50 [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled Sean Christopherson
                   ` (2 preceding siblings ...)
  2020-08-21  8:56 ` kernel test robot
@ 2020-08-21 10:28 ` kernel test robot
  3 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-08-21 10:28 UTC (permalink / raw)
  To: Sean Christopherson, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86
  Cc: kbuild-all, H. Peter Anvin, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 4318 bytes --]

Hi Sean,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v5.9-rc1 next-20200821]
[cannot apply to tip/x86/asm luto/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sean-Christopherson/x86-entry-64-Disallow-RDPID-in-paranoid-entry-if-KVM-is-enabled/20200821-105339
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git a9bd3a91d6e49ebd2d7d8ace91d4cc339c382a31
config: x86_64-randconfig-s022-20200821 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-191-g10164920-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/x86/entry/entry_64.S: Assembler messages:
>> arch/x86/entry/entry_64.S:851: Error: too many positional arguments

# https://github.com/0day-ci/linux/commit/bebb51882f9c18938e44b6a7b66fdf0452eea142
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Sean-Christopherson/x86-entry-64-Disallow-RDPID-in-paranoid-entry-if-KVM-is-enabled/20200821-105339
git checkout bebb51882f9c18938e44b6a7b66fdf0452eea142
vim +851 arch/x86/entry/entry_64.S

   794	
   795	/*
   796	 * Save all registers in pt_regs. Return GSBASE related information
   797	 * in EBX depending on the availability of the FSGSBASE instructions:
   798	 *
   799	 * FSGSBASE	R/EBX
   800	 *     N        0 -> SWAPGS on exit
   801	 *              1 -> no SWAPGS on exit
   802	 *
   803	 *     Y        GSBASE value at entry, must be restored in paranoid_exit
   804	 */
   805	SYM_CODE_START_LOCAL(paranoid_entry)
   806		UNWIND_HINT_FUNC
   807		cld
   808		PUSH_AND_CLEAR_REGS save_ret=1
   809		ENCODE_FRAME_POINTER 8
   810	
   811		/*
   812		 * Always stash CR3 in %r14.  This value will be restored,
   813		 * verbatim, at exit.  Needed if paranoid_entry interrupted
   814		 * another entry that already switched to the user CR3 value
   815		 * but has not yet returned to userspace.
   816		 *
   817		 * This is also why CS (stashed in the "iret frame" by the
   818		 * hardware at entry) can not be used: this may be a return
   819		 * to kernel code, but with a user CR3 value.
   820		 *
   821		 * Switching CR3 does not depend on kernel GSBASE so it can
   822		 * be done before switching to the kernel GSBASE. This is
   823		 * required for FSGSBASE because the kernel GSBASE has to
   824		 * be retrieved from a kernel internal table.
   825		 */
   826		SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
   827	
   828		/*
   829		 * Handling GSBASE depends on the availability of FSGSBASE.
   830		 *
   831		 * Without FSGSBASE the kernel enforces that negative GSBASE
   832		 * values indicate kernel GSBASE. With FSGSBASE no assumptions
   833		 * can be made about the GSBASE value when entering from user
   834		 * space.
   835		 */
   836		ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
   837	
   838		/*
   839		 * Read the current GSBASE and store it in %rbx unconditionally,
   840		 * retrieve and set the current CPUs kernel GSBASE. The stored value
   841		 * has to be restored in paranoid_exit unconditionally.
   842		 *
   843		 * The MSR write ensures that no subsequent load is based on a
   844		 * mispredicted GSBASE. No extra FENCE required.
   845		 *
   846		 * Disallow RDPID if KVM is enabled as it may consume a guest's TSC_AUX
   847		 * if an NMI arrives in KVM's run loop.  KVM loads guest's TSC_AUX on
   848		 * VM-Enter and may not restore the host's value until the CPU returns
   849		 * to userspace, i.e. KVM depends on the kernel not using TSC_AUX.
   850		 */
 > 851		SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx no_rdpid=IS_ENABLED(CONFIG_KVM)
   852		ret
   853	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29093 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21  9:28     ` Thomas Gleixner
  2020-08-21  9:37       ` Paolo Bonzini
@ 2020-08-21 19:55       ` hpa
  2020-08-21 20:02         ` Peter Zijlstra
  1 sibling, 1 reply; 19+ messages in thread
From: hpa @ 2020-08-21 19:55 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini, Borislav Petkov, Sean Christopherson
  Cc: Andy Lutomirski, Ingo Molnar, x86, linux-kernel, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm, Tom Lendacky

On August 21, 2020 2:28:32 AM PDT, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Fri, Aug 21 2020 at 10:09, Paolo Bonzini wrote:
>> On 21/08/20 09:47, Borislav Petkov wrote:
>>> On Thu, Aug 20, 2020 at 07:50:50PM -0700, Sean Christopherson wrote:
>>>> +	 * Disallow RDPID if KVM is enabled as it may consume a guest's
>TSC_AUX
>>>> +	 * if an NMI arrives in KVM's run loop.  KVM loads guest's
>TSC_AUX on
>>>> +	 * VM-Enter and may not restore the host's value until the CPU
>returns
>>>> +	 * to userspace, i.e. KVM depends on the kernel not using
>TSC_AUX.
>>>>  	 */
>>> And frankly, this is really unfair. The kernel should be able to use
>any
>>> MSR. IOW, KVM needs to be fixed here. I'm sure it context-switches
>other
>>> MSRs so one more MSR is not a big deal.
>>
>> The only MSR that KVM needs to context-switch manually are XSS and
>> SPEC_CTRL.  They tend to be the same on host and guest in which case
>> they can be optimized away.
>>
>> All the other MSRs (EFER and PAT are those that come to mind) are
>> handled by the microcode and thus they don't have the slowness of
>> RDMSR/WRMSR
>>
>> One more MSR *is* a big deal: KVM's vmentry+vmexit cost is around
>1000
>> cycles, adding 100 clock cycles for 2 WRMSRs is a 10% increase.
>
>We all know that MSRs are slow, but as a general rule I have to make it
>entirely clear that the kernel has precedence over KVM.
>
>If the kernel wants to use an MSR for it's own purposes then KVM has to
>deal with that and not the other way round. Preventing the kernel from
>using a facility freely is not an option ever.
>
>The insanities of KVM performance optimizations have bitten us more
>than
>once.
>
>For this particular case at hand I don't care much and we should just
>rip the whole RDPID thing out unconditionally. We still have zero
>numbers about the performance difference vs. LSL.
>
>Thanks,
>
>        tglx

It is hardly going to be a performance difference for paranoid entry, which is hopefully rare enough that it falls into the noise.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21 19:55       ` hpa
@ 2020-08-21 20:02         ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2020-08-21 20:02 UTC (permalink / raw)
  To: hpa
  Cc: Thomas Gleixner, Paolo Bonzini, Borislav Petkov,
	Sean Christopherson, Andy Lutomirski, Ingo Molnar, x86,
	linux-kernel, Dave Hansen, Chang Seok Bae, Sasha Levin, kvm,
	Tom Lendacky

On Fri, Aug 21, 2020 at 12:55:53PM -0700, hpa@zytor.com wrote:

> It is hardly going to be a performance difference for paranoid entry,
> which is hopefully rare enough that it falls into the noise.

Try perf some day ;-)

But yeah, given the utter trainwreck that NMIs are anyway, this is all
not going to matter much.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-21 10:07               ` Paolo Bonzini
@ 2020-08-22 16:42                 ` Andy Lutomirski
  2020-09-16 16:54                   ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2020-08-22 16:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, X86 ML, H. Peter Anvin, LKML,
	Dave Hansen, Chang Seok Bae, Peter Zijlstra, Sasha Levin,
	kvm list, Tom Lendacky

On Fri, Aug 21, 2020 at 3:07 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 21/08/20 11:48, Borislav Petkov wrote:
> >> It's not like we grab MSRs every day.  The user-return notifier restores
> >> 6 MSRs (7 on very old processors).  The last two that were added were
> >> MSR_TSC_AUX itself in 2009 (!) and MSR_IA32_TSX_CTRL last year.
> > What about "If it is a shared resource, there better be an agreement
> > about sharing it." is not clear?
> >
> > It doesn't matter how many or which resources - there needs to be a
> > contract for shared use so that shared use is possible. It is that
> > simple.
>
> Sure, and I'll make sure to have that discussion the next time we add a
> shared MSR in 2029.
>
> In the meanwhile:
>
> * for the syscall MSRs, patches to share them were reviewed by hpa and
> peterz so I guess there's no problem.
>
> * for MSR_TSC_AUX, which is the one that is causing problems, everybody
> seems to agree with just using LSL (in the lack specific numbers on
> performance improvements from RDPID).
>
> * for MSR_IA32_TSX_CTRL.RTM_DISABLE, which is the only one that was
> added in the last 10 years, I'm pretty sure there are no plans for using
> the Trusty Sidechannel eXtension in the kernel.  So that should be fine
> too.  (The CPUID_CLEAR bit of the MSR is not shared).
>

Regardless of how anyone feels about who owns what in arch/x86 vs
arch/x86/kvm, the x86 architecture is a mess that comes from Intel and
AMD, and we have to deal with it.  On VMX, when a VM exits, the VM's
value of MSR_TSC_AUX is live, and we can take an NMI, MCE, or
abominable new #SX, #VE, #VC, etc on the next instruction boundary.
And unless we use the atomic MSR switch mechanism, the result is that
we're going through the entry path with guest-controlled MSRs.

So I think we can chalk this one up to obnoxious hardware.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled
  2020-08-22 16:42                 ` Andy Lutomirski
@ 2020-09-16 16:54                   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2020-09-16 16:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, X86 ML, H. Peter Anvin, LKML, Dave Hansen,
	Chang Seok Bae, Peter Zijlstra, Sasha Levin, kvm list,
	Tom Lendacky

On 22/08/20 18:42, Andy Lutomirski wrote:
> On VMX, when a VM exits, the VM's
> value of MSR_TSC_AUX is live, and we can take an NMI, MCE, or
> abominable new #SX, #VE, #VC, etc on the next instruction boundary.
> And unless we use the atomic MSR switch mechanism, the result is that
> we're going through the entry path with guest-controlled MSRs.

If anything of that is a problem, we can and will use the atomic MSR
switching; it's not worth doing complicated stuff if you're going to pay
the price of rdmsr/wrmsr anyway.

The remaining cases are MSRs that are really meant for usermode (such as
the syscall MSRs) and especially the edge cases of these two MSRs that
the kernel doesn't mind too much about.  But they are really really
rare, I don't expect any new one coming soon and if they are ever needed
(by SGX perhaps?!?) I'll certainly loop you guys in.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-09-16 20:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21  2:50 [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled Sean Christopherson
2020-08-21  7:24 ` peterz
2020-08-21  7:44   ` Borislav Petkov
2020-08-21  7:47 ` Borislav Petkov
2020-08-21  8:09   ` Paolo Bonzini
2020-08-21  8:16     ` Borislav Petkov
2020-08-21  9:05       ` Paolo Bonzini
2020-08-21  9:22         ` Borislav Petkov
2020-08-21  9:44           ` Paolo Bonzini
2020-08-21  9:48             ` Borislav Petkov
2020-08-21 10:07               ` Paolo Bonzini
2020-08-22 16:42                 ` Andy Lutomirski
2020-09-16 16:54                   ` Paolo Bonzini
2020-08-21  9:28     ` Thomas Gleixner
2020-08-21  9:37       ` Paolo Bonzini
2020-08-21 19:55       ` hpa
2020-08-21 20:02         ` Peter Zijlstra
2020-08-21  8:56 ` kernel test robot
2020-08-21 10:28 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).