LKML Archive on lore.kernel.org
 help / color / Atom feed
* [patch V3 00/20] x86/iopl: Prevent user space from using CLI/STI with iopl(3)
@ 2019-11-13 20:42 Thomas Gleixner
  2019-11-13 20:42 ` [patch V3 01/20] x86/ptrace: Prevent truncation of bitmap size Thomas Gleixner
                   ` (20 more replies)
  0 siblings, 21 replies; 53+ messages in thread
From: Thomas Gleixner @ 2019-11-13 20:42 UTC (permalink / raw)
  To: LKML
  Cc: x86, Andy Lutomirski, Linus Torvalds, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

This is the third version of the attempt to confine the unwanted side
effects of iopl(). The first version of this series can be found here:

   https://lore.kernel.org/r/20191106193459.581614484@linutronix.de

Second version is here:

   https://lore.kernel.org/r/20191111220314.519933535@linutronix.de

The V1 cover letter also contains a longer variant of the
background. Summary:

iopl(level = 3) enables aside of access to all 65536 I/O ports also the
usage of CLI/STI in user space.

Disabling interrupts in user space can lead to system lockups and breaks
assumptions in the kernel that userspace always runs with interrupts
enabled.

iopl() is often preferred over ioperm() as it avoids the overhead of
copying the tasks I/O bitmap to the TSS bitmap on context switch. This
overhead can be avoided by providing a all zeroes bitmap in the TSS and
switching the TSS bitmap offset to this permit all IO bitmap. It's
marginally slower than iopl() which is a one time setup, but prevents the
usage of CLI/STI in user space.

The changes vs. V3:

    - Split out the restructuring of the first/subsequent ioperm()
      invocation into a seperate patch to address the inconsisteny which
      Andy detected in the patch which introduces the concept of
      invalidating the I/O bitmap base to speed up context switching.
      This change is moved in front so the subsequent changes are
      functionally correct.

    - Moved the non HW TSS data related to I/O bitmap(s) into a seperate
      data structure. Modified version of Ingos proposed patch.

    - Made struct memeber names more consistent (Ingo)

    - Dropped the bitmap union. It is not longer necessary because V2
      already dropped the finer grained copying algorithm. The sequence
      count approach should avoid most of the copying overhead when the
      number of ioperm() using processes is very low which is the normal
      case.

    - Dropped the pointer storage of the bitmap in the TSS data as it is
      not required (Peter, Andy)

    - Fixed the missing refcount setting in the bitmap duplication code
      path. (Peter, Andy)

    - Updated changelog and comment to explain the bitmap invalidation
      logic. (Andy)

    - Removed TIF_IO_BITMAP from the TIF flags which are evaluated on the
      next task for entering the slow path.

    - Folded the NULL pointer check fix

    - Simplified the config option in the legacy removal patch (Andy)

    - Extended the scope of the config option to disable ioperm() along
      with iopl() which also mokes all related storage and functions
      compile time conditional. (Andy)

The series is also available from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/iopl

Thanks,

	tglx
---
 arch/x86/Kconfig                        |   18 ++
 arch/x86/entry/common.c                 |    4 
 arch/x86/include/asm/io_bitmap.h        |   29 ++++
 arch/x86/include/asm/paravirt.h         |    4 
 arch/x86/include/asm/paravirt_types.h   |    2 
 arch/x86/include/asm/pgtable_32_types.h |    2 
 arch/x86/include/asm/processor.h        |  113 ++++++++++-------
 arch/x86/include/asm/ptrace.h           |    6 
 arch/x86/include/asm/switch_to.h        |   10 +
 arch/x86/include/asm/thread_info.h      |   14 +-
 arch/x86/include/asm/xen/hypervisor.h   |    2 
 arch/x86/kernel/cpu/common.c            |  188 ++++++++++++----------------
 arch/x86/kernel/doublefault.c           |    2 
 arch/x86/kernel/ioport.c                |  209 +++++++++++++++++++++-----------
 arch/x86/kernel/paravirt.c              |    2 
 arch/x86/kernel/process.c               |  200 ++++++++++++++++++++++++------
 arch/x86/kernel/process_32.c            |   77 -----------
 arch/x86/kernel/process_64.c            |   86 -------------
 arch/x86/kernel/ptrace.c                |   12 +
 arch/x86/kvm/vmx/vmx.c                  |    8 -
 arch/x86/mm/cpu_entry_area.c            |    8 +
 arch/x86/xen/enlighten_pv.c             |   10 -
 tools/testing/selftests/x86/ioperm.c    |   16 ++
 tools/testing/selftests/x86/iopl.c      |  129 ++++++++++++++++++-
 24 files changed, 674 insertions(+), 477 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread
* [tip: x86/iopl] x86/io: Speedup schedule out of I/O bitmap user
@ 2019-11-16 11:51 tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2019-11-16 11:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/iopl branch of tip:

Commit-ID:     ecc7e37d4dadd16f6be125ca496feccd05454da4
Gitweb:        https://git.kernel.org/tip/ecc7e37d4dadd16f6be125ca496feccd05454da4
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Mon, 11 Nov 2019 23:03:20 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 16 Nov 2019 11:24:01 +01:00

x86/io: Speedup schedule out of I/O bitmap user

There is no requirement to update the TSS I/O bitmap when a thread using it is
scheduled out and the incoming thread does not use it.

For the permission check based on the TSS I/O bitmap the CPU calculates the memory
location of the I/O bitmap by the address of the TSS and the io_bitmap_base member
of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the
offset to an address outside of the TSS limit.

If an I/O instruction is issued from user space the TSS limit causes #GP to be
raised in the same was as valid I/O bitmap with all bits set to 1 would do.

This removes the extra work when an I/O bitmap using task is scheduled out
and puts the burden on the rare I/O bitmap users when they are scheduled
in.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h | 38 +++++++++++++------
 arch/x86/kernel/cpu/common.c     |  3 +-
 arch/x86/kernel/doublefault.c    |  2 +-
 arch/x86/kernel/ioport.c         |  4 ++-
 arch/x86/kernel/process.c        | 63 +++++++++++++++++--------------
 5 files changed, 69 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 6e0a3b4..6d0059c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -330,8 +330,23 @@ struct x86_hw_tss {
 #define IO_BITMAP_BITS			65536
 #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
-#define IO_BITMAP_OFFSET		(offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss))
-#define INVALID_IO_BITMAP_OFFSET	0x8000
+
+#define IO_BITMAP_OFFSET_VALID				\
+	(offsetof(struct tss_struct, io_bitmap) -	\
+	 offsetof(struct tss_struct, x86_tss))
+
+/*
+ * sizeof(unsigned long) coming from an extra "long" at the end
+ * of the iobitmap.
+ *
+ * -1? seg base+limit should be pointing to the address of the
+ * last valid byte
+ */
+#define __KERNEL_TSS_LIMIT	\
+	(IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
+
+/* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */
+#define IO_BITMAP_OFFSET_INVALID	(__KERNEL_TSS_LIMIT + 1)
 
 struct entry_stack {
 	unsigned long		words[64];
@@ -350,6 +365,15 @@ struct tss_struct {
 	struct x86_hw_tss	x86_tss;
 
 	/*
+	 * Store the dirty size of the last io bitmap offender. The next
+	 * one will have to do the cleanup as the switch out to a non io
+	 * bitmap user will just set x86_tss.io_bitmap_base to a value
+	 * outside of the TSS limit. So for sane tasks there is no need to
+	 * actually touch the io_bitmap at all.
+	 */
+	unsigned int		io_bitmap_prev_max;
+
+	/*
 	 * The extra 1 is there because the CPU will access an
 	 * additional byte beyond the end of the IO permission
 	 * bitmap. The extra byte must be all 1 bits, and must
@@ -360,16 +384,6 @@ struct tss_struct {
 
 DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw);
 
-/*
- * sizeof(unsigned long) coming from an extra "long" at the end
- * of the iobitmap.
- *
- * -1? seg base+limit should be pointing to the address of the
- * last valid byte
- */
-#define __KERNEL_TSS_LIMIT	\
-	(IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
-
 /* Per CPU interrupt stacks */
 struct irq_stack {
 	char		stack[IRQ_STACK_SIZE];
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d52ec1a..8c1000a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1860,7 +1860,8 @@ void cpu_init(void)
 
 	/* Initialize the TSS. */
 	tss_setup_ist(tss);
-	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
+	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
+	tss->io_bitmap_prev_max = 0;
 	memset(tss->io_bitmap, 0xff, sizeof(tss->io_bitmap));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 
diff --git a/arch/x86/kernel/doublefault.c b/arch/x86/kernel/doublefault.c
index 0b8cedb..cedb07d 100644
--- a/arch/x86/kernel/doublefault.c
+++ b/arch/x86/kernel/doublefault.c
@@ -54,7 +54,7 @@ struct x86_hw_tss doublefault_tss __cacheline_aligned = {
 	.sp0		= STACK_START,
 	.ss0		= __KERNEL_DS,
 	.ldt		= 0,
-	.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
+	.io_bitmap_base	= IO_BITMAP_OFFSET_INVALID,
 
 	.ip		= (unsigned long) doublefault_fn,
 	/* 0x2 bit is always set */
diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index 80fa36b..eed218a 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -82,6 +82,10 @@ long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 	/* Update the TSS */
 	tss = this_cpu_ptr(&cpu_tss_rw);
 	memcpy(tss->io_bitmap, t->io_bitmap_ptr, bytes_updated);
+	/* Store the new end of the zero bits */
+	tss->io_bitmap_prev_max = bytes;
+	/* Make the bitmap base in the TSS valid */
+	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
 	/* Make sure the TSS limit covers the I/O bitmap. */
 	refresh_tss_limit();
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 7e07f0b..2444fe2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -72,18 +72,9 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
 		.ss1 = __KERNEL_CS,
-		.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
 #endif
+		.io_bitmap_base	= IO_BITMAP_OFFSET_INVALID,
 	 },
-#ifdef CONFIG_X86_32
-	 /*
-	  * Note that the .io_bitmap member must be extra-big. This is because
-	  * the CPU will access an additional byte beyond the end of the IO
-	  * permission bitmap. The extra byte must be all 1 bits, and must
-	  * be within the limit.
-	  */
-	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },
-#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss_rw);
 
@@ -112,18 +103,18 @@ void exit_thread(struct task_struct *tsk)
 	struct thread_struct *t = &tsk->thread;
 	unsigned long *bp = t->io_bitmap_ptr;
 	struct fpu *fpu = &t->fpu;
+	struct tss_struct *tss;
 
 	if (bp) {
-		struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu());
+		preempt_disable();
+		tss = this_cpu_ptr(&cpu_tss_rw);
 
 		t->io_bitmap_ptr = NULL;
-		clear_thread_flag(TIF_IO_BITMAP);
-		/*
-		 * Careful, clear this in the TSS too:
-		 */
-		memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
 		t->io_bitmap_max = 0;
-		put_cpu();
+		clear_thread_flag(TIF_IO_BITMAP);
+		/* Invalidate the io bitmap base in the TSS */
+		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
+		preempt_enable();
 		kfree(bp);
 	}
 
@@ -369,29 +360,47 @@ void arch_setup_new_exec(void)
 	}
 }
 
-static inline void switch_to_bitmap(struct thread_struct *prev,
-				    struct thread_struct *next,
+static inline void switch_to_bitmap(struct thread_struct *next,
 				    unsigned long tifp, unsigned long tifn)
 {
 	struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
 
 	if (tifn & _TIF_IO_BITMAP) {
 		/*
-		 * Copy the relevant range of the IO bitmap.
-		 * Normally this is 128 bytes or less:
+		 * Copy at least the size of the incoming tasks bitmap
+		 * which covers the last permitted I/O port.
+		 *
+		 * If the previous task which used an io bitmap had more
+		 * bits permitted, then the copy needs to cover those as
+		 * well so they get turned off.
 		 */
 		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-		       max(prev->io_bitmap_max, next->io_bitmap_max));
+		       max(tss->io_bitmap_prev_max, next->io_bitmap_max));
+
+		/* Store the new max and set io_bitmap_base valid */
+		tss->io_bitmap_prev_max = next->io_bitmap_max;
+		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
+
 		/*
-		 * Make sure that the TSS limit is correct for the CPU
-		 * to notice the IO bitmap.
+		 * Make sure that the TSS limit is covering the io bitmap.
+		 * It might have been cut down by a VMEXIT to 0x67 which
+		 * would cause a subsequent I/O access from user space to
+		 * trigger a #GP because tbe bitmap is outside the TSS
+		 * limit.
 		 */
 		refresh_tss_limit();
 	} else if (tifp & _TIF_IO_BITMAP) {
 		/*
-		 * Clear any possible leftover bits:
+		 * Do not touch the bitmap. Let the next bitmap using task
+		 * deal with the mess. Just make the io_bitmap_base invalid
+		 * by moving it outside the TSS limit so any subsequent I/O
+		 * access from user space will trigger a #GP.
+		 *
+		 * This is correct even when VMEXIT rewrites the TSS limit
+		 * to 0x67 as the only requirement is that the base points
+		 * outside the limit.
 		 */
-		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
+		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
 	}
 }
 
@@ -605,7 +614,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 
 	tifn = READ_ONCE(task_thread_info(next_p)->flags);
 	tifp = READ_ONCE(task_thread_info(prev_p)->flags);
-	switch_to_bitmap(prev, next, tifp, tifn);
+	switch_to_bitmap(next, tifp, tifn);
 
 	propagate_user_return_notify(prev_p, next_p);
 

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, back to index

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 20:42 [patch V3 00/20] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
2019-11-13 20:42 ` [patch V3 01/20] x86/ptrace: Prevent truncation of bitmap size Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 02/20] x86/process: Unify copy_thread_tls() Thomas Gleixner
2019-11-13 21:10   ` Linus Torvalds
2019-11-13 21:41     ` Thomas Gleixner
2019-11-13 22:10       ` Linus Torvalds
2019-11-13 22:33         ` Thomas Gleixner
2019-11-13 21:44     ` Brian Gerst
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 03/20] x86/cpu: Unify cpu_init() Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 04/20] x86/tss: Fix and move VMX BUILD_BUG_ON() Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 05/20] x86/iopl: Cleanup include maze Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 06/20] x86/ioperm: Simplify first ioperm() invocation logic Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 07/20] x86/ioperm: Avoid bitmap allocation if no permissions are set Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 08/20] x86/io: Speedup schedule out of I/O bitmap user Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 09/20] x86/tss: Move I/O bitmap data into a seperate struct Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 10/20] x86/ioperm: Move iobitmap data into a struct Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 11/20] x86/ioperm: Add bitmap sequence number Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 12/20] x86/ioperm: Move TSS bitmap update to exit to user work Thomas Gleixner
2019-11-13 21:19   ` Linus Torvalds
2019-11-13 21:21     ` Linus Torvalds
2019-11-13 21:44       ` Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 13/20] x86/ioperm: Remove bitmap if all permissions dropped Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 14/20] x86/ioperm: Share I/O bitmap if identical Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 15/20] selftests/x86/ioperm: Extend testing so the shared bitmap is exercised Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 16/20] x86/iopl: Fixup misleading comment Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 17/20] x86/iopl: Restrict iopl() permission scope Thomas Gleixner
2019-11-14 18:13   ` Borislav Petkov
2019-11-14 18:39     ` Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 18/20] x86/iopl: Remove legacy IOPL option Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:42 ` [patch V3 19/20] x86/ioperm: Extend IOPL config to control ioperm() as well Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-13 20:43 ` [patch V3 20/20] selftests/x86/iopl: Extend test to cover IOPL emulation Thomas Gleixner
2019-11-15 21:12   ` [tip: x86/iopl] " tip-bot2 for Thomas Gleixner
2019-11-14  8:43 ` [patch V3 00/20] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Peter Zijlstra
2019-11-16 11:51 [tip: x86/iopl] x86/io: Speedup schedule out of I/O bitmap user tip-bot2 for Thomas Gleixner

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git