linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] KVM TSS cleanups and speedups
@ 2017-02-21 19:14 Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 1/7] x86/asm: Define the kernel TSS limit in a macro Andy Lutomirski
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski

The first four patches here are intended to be straightforward
cleanups and to make a better base for Thomas' GDT series.  They may
be a slight speedup, too, because they remove an STR instruction
from the VMX entry path.

The last two patches are a reasonably large speedup but need careful
review.

FWIW, I can see lots of additional easy-ish speedups here.  For example:

 - The GDT reload on VM exit isn't really needed at all.  Instead let's
   just change the kernel limit to 0xFFFF.  Doing that naively would
   waste memory, but doing it carefully on top of Thomas' series would
   be straightforward and almost free.

 - RDMSR from MSR_GS_BASE is totally pointless.

 - Once I or someone finishes the FSGSBASE series, we get a big speedup
   there.

 - The LDT reload code should be split up and optimized better, I think.

Changes from v1:
 - Fix some changelog typos.
 - Fix the bug that Paolo found.
 - Rename the helpers to make their usage more obvious.
 - Move clearing __tss_limit_invalid into force_reload_TR() as a tiny
   optimization.
 - Add a test case.  It doesn't test all the machinations, but at least
   it checks basic functionality.

Andy Lutomirski (7):
  x86/asm: Define the kernel TSS limit in a macro
  x86/kvm/vmx: Don't fetch the TSS base from the GDT
  x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
  x86/kvm/vmx: Simplify segment_base()
  x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
  x86/kvm/vmx: Defer TR reload after VM exit
  selftests/x86: Add a basic selftest for ioperm

 arch/x86/include/asm/desc.h          |  62 +++++++++++--
 arch/x86/include/asm/processor.h     |  12 ++-
 arch/x86/kernel/ioport.c             |  11 +++
 arch/x86/kernel/process.c            |  10 ++
 arch/x86/kvm/vmx.c                   |  63 ++++++-------
 tools/testing/selftests/x86/Makefile |   2 +-
 tools/testing/selftests/x86/ioperm.c | 171 +++++++++++++++++++++++++++++++++++
 7 files changed, 284 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/x86/ioperm.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/7] x86/asm: Define the kernel TSS limit in a macro
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
@ 2017-02-21 19:14 ` Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 2/7] x86/kvm/vmx: Don't fetch the TSS base from the GDT Andy Lutomirski
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski

Rather than open-coding the kernel TSS limit in set_tss_desc(), make
it a real macro near the TSS layout definition.

This is purely a cleanup.

Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc.h      | 10 +---------
 arch/x86/include/asm/processor.h | 10 ++++++++++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 12080d87da3b..2e781bcc5e12 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -177,16 +177,8 @@ static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
 	struct desc_struct *d = get_cpu_gdt_table(cpu);
 	tss_desc tss;
 
-	/*
-	 * sizeof(unsigned long) coming from an extra "long" at the end
-	 * of the iobitmap. See tss_struct definition in processor.h
-	 *
-	 * -1? seg base+limit should be pointing to the address of the
-	 * last valid byte
-	 */
 	set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS,
-			      IO_BITMAP_OFFSET + IO_BITMAP_BYTES +
-			      sizeof(unsigned long) - 1);
+			      __KERNEL_TSS_LIMIT);
 	write_gdt_entry(d, entry, &tss, DESC_TSS);
 }
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index e6cfe7ba2d65..feb2ab95b8f6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -342,6 +342,16 @@ struct tss_struct {
 
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
 
+/*
+ * sizeof(unsigned long) coming from an extra "long" at the end
+ * of the iobitmap.
+ *
+ * -1? seg base+limit should be pointing to the address of the
+ * last valid byte
+ */
+#define __KERNEL_TSS_LIMIT	\
+	(IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
+
 #ifdef CONFIG_X86_32
 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
 #endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/7] x86/kvm/vmx: Don't fetch the TSS base from the GDT
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 1/7] x86/asm: Define the kernel TSS limit in a macro Andy Lutomirski
@ 2017-02-21 19:14 ` Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 3/7] x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels Andy Lutomirski
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski, Radim Krčmář

The current CPU's TSS base is a foregone conclusion, so there's no need
to parse it out of the segment tables.  This should save a couple cycles
(as STR is surely microcoded and poorly optimized) but, more importantly,
it's a cleanup and it means that segment_base() will never be called on
64-bit kernels.

Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kvm/vmx.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a236decb81e4..46420aaf1684 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2088,13 +2088,6 @@ static unsigned long segment_base(u16 selector)
 	return v;
 }
 
-static inline unsigned long kvm_read_tr_base(void)
-{
-	u16 tr;
-	asm("str %0" : "=g"(tr));
-	return segment_base(tr);
-}
-
 static void vmx_save_host_state(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2294,10 +2287,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 		/*
 		 * Linux uses per-cpu TSS and GDT, so set these when switching
-		 * processors.
+		 * processors.  See 22.2.4.
 		 */
-		vmcs_writel(HOST_TR_BASE, kvm_read_tr_base()); /* 22.2.4 */
-		vmcs_writel(HOST_GDTR_BASE, gdt->address);   /* 22.2.4 */
+		vmcs_writel(HOST_TR_BASE,
+			    (unsigned long)this_cpu_ptr(&cpu_tss));
+		vmcs_writel(HOST_GDTR_BASE, gdt->address);
 
 		rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
 		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/7] x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 1/7] x86/asm: Define the kernel TSS limit in a macro Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 2/7] x86/kvm/vmx: Don't fetch the TSS base from the GDT Andy Lutomirski
@ 2017-02-21 19:14 ` Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 4/7] x86/kvm/vmx: Simplify segment_base() Andy Lutomirski
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski, Radim Krčmář

It was a bit buggy (it didn't list all segment types that needed
64-bit fixups), but the bug was irrelevant because it wasn't called
in any interesting context on 64-bit kernels and was only used for
code and data segments on 32-bit kernels.

To avoid confusion, make it explicitly 32-bit only.

Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kvm/vmx.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 46420aaf1684..b1810a0edec3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2059,6 +2059,12 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	}
 }
 
+#ifdef CONFIG_X86_32
+/*
+ * On 32-bit kernels, VM exits still load the FS and GS bases from the
+ * VMCS rather than the segment table.  KVM uses this helper to figure
+ * out the current bases to poke them into the VMCS before entry.
+ */
 static unsigned long segment_base(u16 selector)
 {
 	struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
@@ -2081,12 +2087,9 @@ static unsigned long segment_base(u16 selector)
 	}
 	d = (struct desc_struct *)(table_base + (selector & ~7));
 	v = get_desc_base(d);
-#ifdef CONFIG_X86_64
-       if (d->s == 0 && (d->type == 2 || d->type == 9 || d->type == 11))
-               v |= ((unsigned long)((struct ldttss_desc64 *)d)->base3) << 32;
-#endif
 	return v;
 }
+#endif
 
 static void vmx_save_host_state(struct kvm_vcpu *vcpu)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/7] x86/kvm/vmx: Simplify segment_base()
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
                   ` (2 preceding siblings ...)
  2017-02-21 19:14 ` [PATCH v2 3/7] x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels Andy Lutomirski
@ 2017-02-21 19:14 ` Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 5/7] x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss Andy Lutomirski
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski, Radim Krčmář

Use actual pointer types for pointers (instead of unsigned long) and
replace hardcoded constants with the appropriate self-documenting
macros.

The function is still a bit messy, but this seems a lot better than
before to me.

This is mostly borrowed from a patch by Thomas Garnier.

Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kvm/vmx.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1810a0edec3..17ef11c64702 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2069,24 +2069,23 @@ static unsigned long segment_base(u16 selector)
 {
 	struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
 	struct desc_struct *d;
-	unsigned long table_base;
+	struct desc_struct *table;
 	unsigned long v;
 
-	if (!(selector & ~3))
+	if (!(selector & ~SEGMENT_RPL_MASK))
 		return 0;
 
-	table_base = gdt->address;
+	table = (struct desc_struct *)gdt->address;
 
-	if (selector & 4) {           /* from ldt */
+	if ((selector & SEGMENT_TI_MASK) == SEGMENT_LDT) {
 		u16 ldt_selector = kvm_read_ldt();
 
-		if (!(ldt_selector & ~3))
+		if (!(ldt_selector & ~SEGMENT_RPL_MASK))
 			return 0;
 
-		table_base = segment_base(ldt_selector);
+		table = (struct desc_struct *)segment_base(ldt_selector);
 	}
-	d = (struct desc_struct *)(table_base + (selector & ~7));
-	v = get_desc_base(d);
+	v = get_desc_base(&table[selector >> 3]);
 	return v;
 }
 #endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/7] x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
                   ` (3 preceding siblings ...)
  2017-02-21 19:14 ` [PATCH v2 4/7] x86/kvm/vmx: Simplify segment_base() Andy Lutomirski
@ 2017-02-21 19:14 ` Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 6/7] x86/kvm/vmx: Defer TR reload after VM exit Andy Lutomirski
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski

Historically, the entire TSS + io bitmap structure was cacheline
aligned, but commit ca241c75037b ("x86: unify tss_struct") changed
it (presumably inadvertently) so that the fixed-layout hardware part
is cacheline-aligned and the io bitmap is after the padding.  This
wastes 24 bytes (the hardware part should be 104 bytes, but this
pads it to 128 bytes), serves no purpose, and causes sizeof(struct
x86_hw_tss) to have a confusing value.

Drop the pointless alignment.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index feb2ab95b8f6..f385eca5407a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -304,7 +304,7 @@ struct x86_hw_tss {
 	u16			reserved5;
 	u16			io_bitmap_base;
 
-} __attribute__((packed)) ____cacheline_aligned;
+} __attribute__((packed));
 #endif
 
 /*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/7] x86/kvm/vmx: Defer TR reload after VM exit
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
                   ` (4 preceding siblings ...)
  2017-02-21 19:14 ` [PATCH v2 5/7] x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss Andy Lutomirski
@ 2017-02-21 19:14 ` Andy Lutomirski
  2017-02-21 19:14 ` [PATCH v2 7/7] selftests/x86: Add a basic selftest for ioperm Andy Lutomirski
  2017-02-22 15:13 ` [PATCH v2 0/7] KVM TSS cleanups and speedups Paolo Bonzini
  7 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski

Intel's VMX is daft and resets the hidden TSS limit register to 0x67
on VM exit, and the 0x67 is not configurable.  KVM currently reloads
TR using the LTR instruction on every exit, but this is quite slow
because LTR is serializing.

The 0x67 limit is entirely harmless unless ioperm() is in use, so
defer the reload until a task using ioperm() is actually running.

Here's some poorly done benchmarking using kvm-unit-tests:

Before:

cpuid 1313
vmcall 1195
mov_from_cr8 11
mov_to_cr8 17
inl_from_pmtimer 6770
inl_from_qemu 6856
inl_from_kernel 2435
outl_to_kernel 1402

After:

cpuid 1291
vmcall 1181
mov_from_cr8 11
mov_to_cr8 16
inl_from_pmtimer 6457
inl_from_qemu 6209
inl_from_kernel 2339
outl_to_kernel 1391

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/desc.h | 52 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/ioport.c    | 11 ++++++++++
 arch/x86/kernel/process.c   | 10 +++++++++
 arch/x86/kvm/vmx.c          | 23 ++++++++------------
 4 files changed, 82 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 2e781bcc5e12..178e58336f67 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -205,6 +205,58 @@ static inline void native_load_tr_desc(void)
 	asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
 }
 
+DECLARE_PER_CPU(bool, __tss_limit_invalid);
+
+static inline void force_reload_TR(void)
+{
+	struct desc_struct *d = get_cpu_gdt_table(smp_processor_id());
+	tss_desc tss;
+
+	memcpy(&tss, &d[GDT_ENTRY_TSS], sizeof(tss_desc));
+
+	/*
+	 * LTR requires an available TSS, and the TSS is currently
+	 * busy.  Make it be available so that LTR will work.
+	 */
+	tss.type = DESC_TSS;
+	write_gdt_entry(d, GDT_ENTRY_TSS, &tss, DESC_TSS);
+
+	load_TR_desc();
+	this_cpu_write(__tss_limit_invalid, false);
+}
+
+/*
+ * If you do something evil that corrupts the cached TSS limit (I'm looking
+ * at you, VMX exits), call this function.
+ *
+ * The optimization here is that the TSS limit only matters for Linux if the
+ * IO bitmap is in use.  If the TSS limit gets forced to its minimum value,
+ * everything works except that IO bitmap will be ignored and all CPL 3 IO
+ * instructions will #GP, which is exactly what we want for normal tasks.
+ */
+static inline void invalidate_tss_limit(void)
+{
+	DEBUG_LOCKS_WARN_ON(preemptible());
+
+	if (unlikely(test_thread_flag(TIF_IO_BITMAP)))
+		force_reload_TR();
+	else
+		this_cpu_write(__tss_limit_invalid, true);
+}
+
+/*
+ * Call this if you need the TSS limit to be correct, which should be the case
+ * if and only if you have TIF_IO_BITMAP set or you're switching to a task
+ * with TIF_IO_BITMAP set.
+ */
+static inline void refresh_tss_limit(void)
+{
+	DEBUG_LOCKS_WARN_ON(preemptible());
+
+	if (unlikely(this_cpu_read(__tss_limit_invalid)))
+		force_reload_TR();
+}
+
 static inline void native_load_gdt(const struct desc_ptr *dtr)
 {
 	asm volatile("lgdt %0"::"m" (*dtr));
diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index 589b3193f102..875d3d25dd6a 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -16,6 +16,7 @@
 #include <linux/syscalls.h>
 #include <linux/bitmap.h>
 #include <asm/syscalls.h>
+#include <asm/desc.h>
 
 /*
  * this changes the io permissions bitmap in the current task.
@@ -45,6 +46,16 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on)
 		memset(bitmap, 0xff, IO_BITMAP_BYTES);
 		t->io_bitmap_ptr = bitmap;
 		set_thread_flag(TIF_IO_BITMAP);
+
+		/*
+		 * Now that we have an IO bitmap, we need our TSS limit to be
+		 * correct.  It's fine if we are preempted after doing this:
+		 * with TIF_IO_BITMAP set, context switches will keep our TSS
+		 * limit correct.
+		 */
+		preempt_disable();
+		refresh_tss_limit();
+		preempt_enable();
 	}
 
 	/*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b615a1113f58..0b302591b51f 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -32,6 +32,7 @@
 #include <asm/mce.h>
 #include <asm/vm86.h>
 #include <asm/switch_to.h>
+#include <asm/desc.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -64,6 +65,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss);
 
+DEFINE_PER_CPU(bool, __tss_limit_invalid);
+EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
+
 /*
  * this gets called so that we can store lazy state into memory and copy the
  * current task into the new thread.
@@ -209,6 +213,12 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		 */
 		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
 		       max(prev->io_bitmap_max, next->io_bitmap_max));
+
+		/*
+		 * Make sure that the TSS limit is correct for the CPU
+		 * to notice the IO bitmap.
+		 */
+		refresh_tss_limit();
 	} else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
 		/*
 		 * Clear any possible leftover bits:
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 17ef11c64702..733c8a3c912e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1992,19 +1992,6 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
 	m->host[i].value = host_val;
 }
 
-static void reload_tss(void)
-{
-	/*
-	 * VT restores TR but not its size.  Useless.
-	 */
-	struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
-	struct desc_struct *descs;
-
-	descs = (void *)gdt->address;
-	descs[GDT_ENTRY_TSS].type = 9; /* available TSS */
-	load_TR_desc();
-}
-
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 {
 	u64 guest_efer = vmx->vcpu.arch.efer;
@@ -2174,7 +2161,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
 		loadsegment(es, vmx->host_state.es_sel);
 	}
 #endif
-	reload_tss();
+	invalidate_tss_limit();
 #ifdef CONFIG_X86_64
 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
 #endif
@@ -2295,6 +2282,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 			    (unsigned long)this_cpu_ptr(&cpu_tss));
 		vmcs_writel(HOST_GDTR_BASE, gdt->address);
 
+		/*
+		 * VM exits change the host TR limit to 0x67 after a VM
+		 * exit.  This is okay, since 0x67 covers everything except
+		 * the IO bitmap and have have code to handle the IO bitmap
+		 * being lost after a VM exit.
+		 */
+		BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 != 0x67);
+
 		rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
 		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 7/7] selftests/x86: Add a basic selftest for ioperm
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
                   ` (5 preceding siblings ...)
  2017-02-21 19:14 ` [PATCH v2 6/7] x86/kvm/vmx: Defer TR reload after VM exit Andy Lutomirski
@ 2017-02-21 19:14 ` Andy Lutomirski
  2017-02-22 15:13 ` [PATCH v2 0/7] KVM TSS cleanups and speedups Paolo Bonzini
  7 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-21 19:14 UTC (permalink / raw)
  To: Paolo Bonzini, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier,
	Jim Mattson, Andy Lutomirski

This doesn't fully exercise the interaction between KVM and ioperm(),
but it does test basic functionality.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/Makefile |   2 +-
 tools/testing/selftests/x86/ioperm.c | 171 +++++++++++++++++++++++++++++++++++
 2 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/ioperm.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 8c1cb423cfe6..5743e2966adb 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -5,7 +5,7 @@ include ../lib.mk
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall test_mremap_vdso \
-			check_initial_reg_state sigreturn ldt_gdt iopl \
+			check_initial_reg_state sigreturn ldt_gdt iopl ioperm \
 			protection_keys test_vdso
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
diff --git a/tools/testing/selftests/x86/ioperm.c b/tools/testing/selftests/x86/ioperm.c
new file mode 100644
index 000000000000..3c4c7feb93af
--- /dev/null
+++ b/tools/testing/selftests/x86/ioperm.c
@@ -0,0 +1,171 @@
+/*
+ * iopl.c - Test case for a Linux on Xen 64-bit bug
+ * Copyright (c) 2015 Andrew Lutomirski
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdbool.h>
+#include <sched.h>
+#include <sys/io.h>
+
+static int nerrs = 0;
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+		       int flags)
+{
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO | flags;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+
+}
+
+static void clearhandler(int sig)
+{
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = SIG_DFL;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static jmp_buf jmpbuf;
+
+static void sigsegv(int sig, siginfo_t *si, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static bool try_outb(unsigned short port)
+{
+	sethandler(SIGSEGV, sigsegv, SA_RESETHAND);
+	if (sigsetjmp(jmpbuf, 1) != 0) {
+		return false;
+	} else {
+		asm volatile ("outb %%al, %w[port]"
+			      : : [port] "Nd" (port), "a" (0));
+		return true;
+	}
+	clearhandler(SIGSEGV);
+}
+
+static void expect_ok(unsigned short port)
+{
+	if (!try_outb(port)) {
+		printf("[FAIL]\toutb to 0x%02hx failed\n", port);
+		exit(1);
+	}
+
+	printf("[OK]\toutb to 0x%02hx worked\n", port);
+}
+
+static void expect_gp(unsigned short port)
+{
+	if (try_outb(port)) {
+		printf("[FAIL]\toutb to 0x%02hx worked\n", port);
+		exit(1);
+	}
+
+	printf("[OK]\toutb to 0x%02hx failed\n", port);
+}
+
+int main(void)
+{
+	cpu_set_t cpuset;
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
+		err(1, "sched_setaffinity to CPU 0");
+
+	expect_gp(0x80);
+	expect_gp(0xed);
+
+	/*
+	 * Probe for ioperm support.  Note that clearing ioperm bits
+	 * works even as nonroot.
+	 */
+	printf("[RUN]\tenable 0x80\n");
+	if (ioperm(0x80, 1, 1) != 0) {
+		printf("[OK]\tioperm(0x80, 1, 1) failed (%d) -- try running as root\n",
+		       errno);
+		return 0;
+	}
+	expect_ok(0x80);
+	expect_gp(0xed);
+
+	printf("[RUN]\tdisable 0x80\n");
+	if (ioperm(0x80, 1, 0) != 0) {
+		printf("[FAIL]\tioperm(0x80, 1, 0) failed (%d)", errno);
+		return 1;
+	}
+	expect_gp(0x80);
+	expect_gp(0xed);
+
+	/* Make sure that fork() preserves ioperm. */
+	if (ioperm(0x80, 1, 1) != 0) {
+		printf("[FAIL]\tioperm(0x80, 1, 0) failed (%d)", errno);
+		return 1;
+	}
+
+	pid_t child = fork();
+	if (child == -1)
+		err(1, "fork");
+
+	if (child == 0) {
+		printf("[RUN]\tchild: check that we inherited permissions\n");
+		expect_ok(0x80);
+		expect_gp(0xed);
+		return 0;
+	} else {
+		int status;
+		if (waitpid(child, &status, 0) != child ||
+		    !WIFEXITED(status)) {
+			printf("[FAIL]\tChild died\n");
+			nerrs++;
+		} else if (WEXITSTATUS(status) != 0) {
+			printf("[FAIL]\tChild failed\n");
+			nerrs++;
+		} else {
+			printf("[OK]\tChild succeeded\n");
+		}
+	}
+
+	/* Test the capability checks. */
+
+	printf("\tDrop privileges\n");
+	if (setresuid(1, 1, 1) != 0) {
+		printf("[WARN]\tDropping privileges failed\n");
+		return 0;
+	}
+
+	printf("[RUN]\tdisable 0x80\n");
+	if (ioperm(0x80, 1, 0) != 0) {
+		printf("[FAIL]\tioperm(0x80, 1, 0) failed (%d)", errno);
+		return 1;
+	}
+	printf("[OK]\tit worked\n");
+
+	printf("[RUN]\tenable 0x80 again\n");
+	if (ioperm(0x80, 1, 1) == 0) {
+		printf("[FAIL]\tit succeeded but should have failed.\n");
+		return 1;
+	}
+	printf("[OK]\tit failed\n");
+	return 0;
+}
+
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/7] KVM TSS cleanups and speedups
  2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
                   ` (6 preceding siblings ...)
  2017-02-21 19:14 ` [PATCH v2 7/7] selftests/x86: Add a basic selftest for ioperm Andy Lutomirski
@ 2017-02-22 15:13 ` Paolo Bonzini
  2017-02-22 15:17   ` Andy Lutomirski
  7 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2017-02-22 15:13 UTC (permalink / raw)
  To: Andy Lutomirski, X86 ML
  Cc: kvm list, linux-kernel, Borislav Petkov, Thomas Garnier, Jim Mattson

On 21/02/2017 20:14, Andy Lutomirski wrote:
> The first four patches here are intended to be straightforward
> cleanups and to make a better base for Thomas' GDT series.  They may
> be a slight speedup, too, because they remove an STR instruction
> from the VMX entry path.
> 
> The last two patches are a reasonably large speedup but need careful
> review.
> 
> FWIW, I can see lots of additional easy-ish speedups here.  For example:
> 
>  - The GDT reload on VM exit isn't really needed at all.  Instead let's
>    just change the kernel limit to 0xFFFF.  Doing that naively would
>    waste memory, but doing it carefully on top of Thomas' series would
>    be straightforward and almost free.
> 
>  - RDMSR from MSR_GS_BASE is totally pointless.
> 
>  - Once I or someone finishes the FSGSBASE series, we get a big speedup
>    there.
> 
>  - The LDT reload code should be split up and optimized better, I think.
> 
> Changes from v1:
>  - Fix some changelog typos.
>  - Fix the bug that Paolo found.
>  - Rename the helpers to make their usage more obvious.
>  - Move clearing __tss_limit_invalid into force_reload_TR() as a tiny
>    optimization.
>  - Add a test case.  It doesn't test all the machinations, but at least
>    it checks basic functionality.
> 
> Andy Lutomirski (7):
>   x86/asm: Define the kernel TSS limit in a macro
>   x86/kvm/vmx: Don't fetch the TSS base from the GDT
>   x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
>   x86/kvm/vmx: Simplify segment_base()
>   x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
>   x86/kvm/vmx: Defer TR reload after VM exit
>   selftests/x86: Add a basic selftest for ioperm
> 
>  arch/x86/include/asm/desc.h          |  62 +++++++++++--
>  arch/x86/include/asm/processor.h     |  12 ++-
>  arch/x86/kernel/ioport.c             |  11 +++
>  arch/x86/kernel/process.c            |  10 ++
>  arch/x86/kvm/vmx.c                   |  63 ++++++-------
>  tools/testing/selftests/x86/Makefile |   2 +-
>  tools/testing/selftests/x86/ioperm.c | 171 +++++++++++++++++++++++++++++++++++
>  7 files changed, 284 insertions(+), 47 deletions(-)
>  create mode 100644 tools/testing/selftests/x86/ioperm.c
> 

I pushed and tagged before seeing this v2. :(  The differences seem to
be x86-only, so I suppose Ingo can handle them if you resubmit.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/7] KVM TSS cleanups and speedups
  2017-02-22 15:13 ` [PATCH v2 0/7] KVM TSS cleanups and speedups Paolo Bonzini
@ 2017-02-22 15:17   ` Andy Lutomirski
  2017-02-22 15:25     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-22 15:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andy Lutomirski, X86 ML, kvm list, linux-kernel, Borislav Petkov,
	Thomas Garnier, Jim Mattson

On Wed, Feb 22, 2017 at 7:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 21/02/2017 20:14, Andy Lutomirski wrote:
>> The first four patches here are intended to be straightforward
>> cleanups and to make a better base for Thomas' GDT series.  They may
>> be a slight speedup, too, because they remove an STR instruction
>> from the VMX entry path.
>>
>> The last two patches are a reasonably large speedup but need careful
>> review.
>>
>> FWIW, I can see lots of additional easy-ish speedups here.  For example:
>>
>>  - The GDT reload on VM exit isn't really needed at all.  Instead let's
>>    just change the kernel limit to 0xFFFF.  Doing that naively would
>>    waste memory, but doing it carefully on top of Thomas' series would
>>    be straightforward and almost free.
>>
>>  - RDMSR from MSR_GS_BASE is totally pointless.
>>
>>  - Once I or someone finishes the FSGSBASE series, we get a big speedup
>>    there.
>>
>>  - The LDT reload code should be split up and optimized better, I think.
>>
>> Changes from v1:
>>  - Fix some changelog typos.
>>  - Fix the bug that Paolo found.
>>  - Rename the helpers to make their usage more obvious.
>>  - Move clearing __tss_limit_invalid into force_reload_TR() as a tiny
>>    optimization.
>>  - Add a test case.  It doesn't test all the machinations, but at least
>>    it checks basic functionality.
>>
>> Andy Lutomirski (7):
>>   x86/asm: Define the kernel TSS limit in a macro
>>   x86/kvm/vmx: Don't fetch the TSS base from the GDT
>>   x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
>>   x86/kvm/vmx: Simplify segment_base()
>>   x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
>>   x86/kvm/vmx: Defer TR reload after VM exit
>>   selftests/x86: Add a basic selftest for ioperm
>>
>>  arch/x86/include/asm/desc.h          |  62 +++++++++++--
>>  arch/x86/include/asm/processor.h     |  12 ++-
>>  arch/x86/kernel/ioport.c             |  11 +++
>>  arch/x86/kernel/process.c            |  10 ++
>>  arch/x86/kvm/vmx.c                   |  63 ++++++-------
>>  tools/testing/selftests/x86/Makefile |   2 +-
>>  tools/testing/selftests/x86/ioperm.c | 171 +++++++++++++++++++++++++++++++++++
>>  7 files changed, 284 insertions(+), 47 deletions(-)
>>  create mode 100644 tools/testing/selftests/x86/ioperm.c
>>
>
> I pushed and tagged before seeing this v2. :(  The differences seem to
> be x86-only, so I suppose Ingo can handle them if you resubmit.
>

I renamed the helpers to make it less likely that someone would repeat
my little buglet.  I can submit a patch that just has the differences,
but I think it should go in through your tree.  Would that work?

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/7] KVM TSS cleanups and speedups
  2017-02-22 15:17   ` Andy Lutomirski
@ 2017-02-22 15:25     ` Paolo Bonzini
  2017-02-22 15:36       ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2017-02-22 15:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, kvm list, linux-kernel, Borislav Petkov,
	Thomas Garnier, Jim Mattson



On 22/02/2017 16:17, Andy Lutomirski wrote:
>> I pushed and tagged before seeing this v2. :(  The differences seem to
>> be x86-only, so I suppose Ingo can handle them if you resubmit.
>>
> I renamed the helpers to make it less likely that someone would repeat
> my little buglet.  I can submit a patch that just has the differences,
> but I think it should go in through your tree.  Would that work?

That's fine for me as long as it is for Ingo. :)

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/7] KVM TSS cleanups and speedups
  2017-02-22 15:25     ` Paolo Bonzini
@ 2017-02-22 15:36       ` Andy Lutomirski
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2017-02-22 15:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andy Lutomirski, X86 ML, kvm list, linux-kernel, Borislav Petkov,
	Thomas Garnier, Jim Mattson

On Wed, Feb 22, 2017 at 7:25 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 22/02/2017 16:17, Andy Lutomirski wrote:
>>> I pushed and tagged before seeing this v2. :(  The differences seem to
>>> be x86-only, so I suppose Ingo can handle them if you resubmit.
>>>
>> I renamed the helpers to make it less likely that someone would repeat
>> my little buglet.  I can submit a patch that just has the differences,
>> but I think it should go in through your tree.  Would that work?
>
> That's fine for me as long as it is for Ingo. :)
>

Sent.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-02-22 15:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 19:14 [PATCH v2 0/7] KVM TSS cleanups and speedups Andy Lutomirski
2017-02-21 19:14 ` [PATCH v2 1/7] x86/asm: Define the kernel TSS limit in a macro Andy Lutomirski
2017-02-21 19:14 ` [PATCH v2 2/7] x86/kvm/vmx: Don't fetch the TSS base from the GDT Andy Lutomirski
2017-02-21 19:14 ` [PATCH v2 3/7] x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels Andy Lutomirski
2017-02-21 19:14 ` [PATCH v2 4/7] x86/kvm/vmx: Simplify segment_base() Andy Lutomirski
2017-02-21 19:14 ` [PATCH v2 5/7] x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss Andy Lutomirski
2017-02-21 19:14 ` [PATCH v2 6/7] x86/kvm/vmx: Defer TR reload after VM exit Andy Lutomirski
2017-02-21 19:14 ` [PATCH v2 7/7] selftests/x86: Add a basic selftest for ioperm Andy Lutomirski
2017-02-22 15:13 ` [PATCH v2 0/7] KVM TSS cleanups and speedups Paolo Bonzini
2017-02-22 15:17   ` Andy Lutomirski
2017-02-22 15:25     ` Paolo Bonzini
2017-02-22 15:36       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).