linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3)
@ 2019-11-11 22:03 Thomas Gleixner
  2019-11-11 22:03 ` [patch V2 01/16] x86/ptrace: Prevent truncation of bitmap size Thomas Gleixner
                   ` (16 more replies)
  0 siblings, 17 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

This is the second version of the attempt to confine the unwanted side
effects of iopl(). The first version of this series can be found here:

   https://lore.kernel.org/r/20191106193459.581614484@linutronix.de

The V1 cover letter also contains a longer variant of the
background. Summary:

iopl(level = 3) enables aside of access to all 65536 I/O ports also the
usage of CLI/STI in user space.

Disabling interrupts in user space can lead to system lockups and breaks
assumptions in the kernel that userspace always runs with interrupts
enabled.

iopl() is often preferred over ioperm() as it avoids the overhead of
copying the tasks I/O bitmap to the TSS bitmap on context switch. This
overhead can be avoided by providing a all zeroes bitmap in the TSS and
switching the TSS bitmap offset to this permit all IO bitmap. It's
marginally slower than iopl() which is a one time setup, but prevents the
usage of CLI/STI in user space.

The changes vs. V1:

    - Fix the reported fallout on 32bit (0-day/Ingo)

    - Implement a sequence count based conditional update (Linus)

    - Drop the copy optimization

    - Move the bitmap copying out of the context switch into the exit to
      user mode machinery. The context switch merely invalidates the TSS
      bitmap offset when a task using an I/O bitmap gets scheduled out.

    - Move all bitmap information into a data structure to avoid adding
      more fields to thread_struct.

    - Add a refcount so the potentially pointless duplication of the bitmap
      at fork can be avoided. 

    - Better sharing of update functions (Andy)

    - More updates to self tests to verify the share/unshare mechanism and
      the restore of an I/O bitmap when iopl() permissions are dropped.

    - Pick up a few acked/reviewed-by tags as applicable

The series is also available from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/iopl

Thanks,

	tglx
---
 arch/x86/Kconfig                        |   26 ++++
 arch/x86/entry/common.c                 |    4 
 arch/x86/include/asm/iobitmap.h         |   25 ++++
 arch/x86/include/asm/paravirt.h         |    4 
 arch/x86/include/asm/paravirt_types.h   |    2 
 arch/x86/include/asm/pgtable_32_types.h |    2 
 arch/x86/include/asm/processor.h        |   97 +++++++++-------
 arch/x86/include/asm/ptrace.h           |    6 
 arch/x86/include/asm/switch_to.h        |   10 +
 arch/x86/include/asm/xen/hypervisor.h   |    2 
 arch/x86/kernel/cpu/common.c            |  175 +++++++++++-----------------
 arch/x86/kernel/doublefault.c           |    2 
 arch/x86/kernel/ioport.c                |  176 ++++++++++++++++++-----------
 arch/x86/kernel/paravirt.c              |    2 
 arch/x86/kernel/process.c               |  194 +++++++++++++++++++++++++-------
 arch/x86/kernel/process_32.c            |   77 ------------
 arch/x86/kernel/process_64.c            |   86 --------------
 arch/x86/kernel/ptrace.c                |   12 +
 arch/x86/kvm/vmx/vmx.c                  |    8 -
 arch/x86/mm/cpu_entry_area.c            |    8 +
 arch/x86/xen/enlighten_pv.c             |   10 -
 tools/testing/selftests/x86/ioperm.c    |   16 ++
 tools/testing/selftests/x86/iopl.c      |  129 +++++++++++++++++++--
 23 files changed, 614 insertions(+), 459 deletions(-)



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 01/16] x86/ptrace: Prevent truncation of bitmap size
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 15:34   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 02/16] x86/process: Unify copy_thread_tls() Thomas Gleixner
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Ingo Molnar

From: Thomas Gleixner <tglx@linutronix.de>

The active() callback of the IO bitmap regset divides the IO bitmap size by
the word size (32/64 bit). As the I/O bitmap size is in bytes the active
check fails for bitmap sizes of 1-3 bytes on 32bit and 1-7 bytes on 64bit.

Use DIV_ROUND_UP() instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>

---
 arch/x86/kernel/ptrace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -697,7 +697,7 @@ static int ptrace_set_debugreg(struct ta
 static int ioperm_active(struct task_struct *target,
 			 const struct user_regset *regset)
 {
-	return target->thread.io_bitmap_max / regset->size;
+	return DIV_ROUND_UP(target->thread.io_bitmap_max, regset->size);
 }
 
 static int ioperm_get(struct task_struct *target,





^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 02/16] x86/process: Unify copy_thread_tls()
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
  2019-11-11 22:03 ` [patch V2 01/16] x86/ptrace: Prevent truncation of bitmap size Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-11 22:03 ` [patch V2 03/16] x86/cpu: Unify cpu_init() Thomas Gleixner
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Thomas Gleixner <tglx@linutronix.de>

While looking at the TSS io bitmap it turned out that any change in that
area would require identical changes to copy_thread_tls(). The 32 and 64
bit variants share sufficient code to consolidate them into a common
function to avoid duplication of upcoming modifications.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/ptrace.h    |    6 ++
 arch/x86/include/asm/switch_to.h |   10 ++++
 arch/x86/kernel/process.c        |   94 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/process_32.c     |   68 ----------------------------
 arch/x86/kernel/process_64.c     |   75 -------------------------------
 5 files changed, 110 insertions(+), 143 deletions(-)

--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -361,5 +361,11 @@ extern int do_get_thread_area(struct tas
 extern int do_set_thread_area(struct task_struct *p, int idx,
 			      struct user_desc __user *info, int can_allocate);
 
+#ifdef CONFIG_X86_64
+# define do_set_thread_area_64(p, s, t)	do_arch_prctl_64(p, s, t)
+#else
+# define do_set_thread_area_64(p, s, t)	(0)
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_PTRACE_H */
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -103,7 +103,17 @@ static inline void update_task_stack(str
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task_top_of_stack(task));
 #endif
+}
 
+static inline void kthread_frame_init(struct inactive_task_frame *frame,
+				      unsigned long fun, unsigned long arg)
+{
+	frame->bx = fun;
+#ifdef CONFIG_X86_32
+	frame->di = arg;
+#else
+	frame->r12 = arg;
+#endif
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -132,6 +132,100 @@ void exit_thread(struct task_struct *tsk
 	fpu__drop(fpu);
 }
 
+static int set_new_tls(struct task_struct *p, unsigned long tls)
+{
+	struct user_desc __user *utls = (struct user_desc __user *)tls;
+
+	if (in_ia32_syscall())
+		return do_set_thread_area(p, -1, utls, 0);
+	else
+		return do_set_thread_area_64(p, ARCH_SET_FS, tls);
+}
+
+static inline int copy_io_bitmap(struct task_struct *tsk)
+{
+	if (likely(!test_tsk_thread_flag(current, TIF_IO_BITMAP)))
+		return 0;
+
+	tsk->thread.io_bitmap_ptr = kmemdup(current->thread.io_bitmap_ptr,
+					    IO_BITMAP_BYTES, GFP_KERNEL);
+	if (!tsk->thread.io_bitmap_ptr) {
+		tsk->thread.io_bitmap_max = 0;
+		return -ENOMEM;
+	}
+	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
+	return 0;
+}
+
+static inline void free_io_bitmap(struct task_struct *tsk)
+{
+	if (tsk->thread.io_bitmap_ptr) {
+		kfree(tsk->thread.io_bitmap_ptr);
+		tsk->thread.io_bitmap_ptr = NULL;
+		tsk->thread.io_bitmap_max = 0;
+	}
+}
+
+int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
+		    unsigned long arg, struct task_struct *p, unsigned long tls)
+{
+	struct inactive_task_frame *frame;
+	struct fork_frame *fork_frame;
+	struct pt_regs *childregs;
+	int ret;
+
+	childregs = task_pt_regs(p);
+	fork_frame = container_of(childregs, struct fork_frame, regs);
+	frame = &fork_frame->frame;
+
+	frame->bp = 0;
+	frame->ret_addr = (unsigned long) ret_from_fork;
+	p->thread.sp = (unsigned long) fork_frame;
+	p->thread.io_bitmap_ptr = NULL;
+	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
+
+#ifdef CONFIG_X86_64
+	savesegment(gs, p->thread.gsindex);
+	p->thread.gsbase = p->thread.gsindex ? 0 : current->thread.gsbase;
+	savesegment(fs, p->thread.fsindex);
+	p->thread.fsbase = p->thread.fsindex ? 0 : current->thread.fsbase;
+	savesegment(es, p->thread.es);
+	savesegment(ds, p->thread.ds);
+#else
+	/* Clear all status flags including IF and set fixed bit. */
+	frame->flags = X86_EFLAGS_FIXED;
+#endif
+
+	/* Kernel thread ? */
+	if (unlikely(p->flags & PF_KTHREAD)) {
+		memset(childregs, 0, sizeof(struct pt_regs));
+		kthread_frame_init(frame, sp, arg);
+		return 0;
+	}
+
+	frame->bx = 0;
+	*childregs = *current_pt_regs();
+	childregs->ax = 0;
+	if (sp)
+		childregs->sp = sp;
+
+#ifdef CONFIG_X86_32
+	task_user_gs(p) = get_user_gs(current_pt_regs());
+#endif
+
+	ret = copy_io_bitmap(p);
+	if (ret)
+		return ret;
+
+	/* Set a new TLS for the child thread? */
+	if (clone_flags & CLONE_SETTLS) {
+		ret = set_new_tls(p, tls);
+		if (ret)
+			free_io_bitmap(p);
+	}
+	return ret;
+}
+
 void flush_thread(void)
 {
 	struct task_struct *tsk = current;
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -112,74 +112,6 @@ void release_thread(struct task_struct *
 	release_vm86_irqs(dead_task);
 }
 
-int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
-	unsigned long arg, struct task_struct *p, unsigned long tls)
-{
-	struct pt_regs *childregs = task_pt_regs(p);
-	struct fork_frame *fork_frame = container_of(childregs, struct fork_frame, regs);
-	struct inactive_task_frame *frame = &fork_frame->frame;
-	struct task_struct *tsk;
-	int err;
-
-	/*
-	 * For a new task use the RESET flags value since there is no before.
-	 * All the status flags are zero; DF and all the system flags must also
-	 * be 0, specifically IF must be 0 because we context switch to the new
-	 * task with interrupts disabled.
-	 */
-	frame->flags = X86_EFLAGS_FIXED;
-	frame->bp = 0;
-	frame->ret_addr = (unsigned long) ret_from_fork;
-	p->thread.sp = (unsigned long) fork_frame;
-	p->thread.sp0 = (unsigned long) (childregs+1);
-	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
-
-	if (unlikely(p->flags & PF_KTHREAD)) {
-		/* kernel thread */
-		memset(childregs, 0, sizeof(struct pt_regs));
-		frame->bx = sp;		/* function */
-		frame->di = arg;
-		p->thread.io_bitmap_ptr = NULL;
-		return 0;
-	}
-	frame->bx = 0;
-	*childregs = *current_pt_regs();
-	childregs->ax = 0;
-	if (sp)
-		childregs->sp = sp;
-
-	task_user_gs(p) = get_user_gs(current_pt_regs());
-
-	p->thread.io_bitmap_ptr = NULL;
-	tsk = current;
-	err = -ENOMEM;
-
-	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
-		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
-						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
-		set_tsk_thread_flag(p, TIF_IO_BITMAP);
-	}
-
-	err = 0;
-
-	/*
-	 * Set a new TLS for the child thread?
-	 */
-	if (clone_flags & CLONE_SETTLS)
-		err = do_set_thread_area(p, -1,
-			(struct user_desc __user *)tls, 0);
-
-	if (err && p->thread.io_bitmap_ptr) {
-		kfree(p->thread.io_bitmap_ptr);
-		p->thread.io_bitmap_max = 0;
-	}
-	return err;
-}
-
 void
 start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 {
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -371,81 +371,6 @@ void x86_gsbase_write_task(struct task_s
 	task->thread.gsbase = gsbase;
 }
 
-int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
-		unsigned long arg, struct task_struct *p, unsigned long tls)
-{
-	int err;
-	struct pt_regs *childregs;
-	struct fork_frame *fork_frame;
-	struct inactive_task_frame *frame;
-	struct task_struct *me = current;
-
-	childregs = task_pt_regs(p);
-	fork_frame = container_of(childregs, struct fork_frame, regs);
-	frame = &fork_frame->frame;
-
-	frame->bp = 0;
-	frame->ret_addr = (unsigned long) ret_from_fork;
-	p->thread.sp = (unsigned long) fork_frame;
-	p->thread.io_bitmap_ptr = NULL;
-
-	savesegment(gs, p->thread.gsindex);
-	p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
-	savesegment(fs, p->thread.fsindex);
-	p->thread.fsbase = p->thread.fsindex ? 0 : me->thread.fsbase;
-	savesegment(es, p->thread.es);
-	savesegment(ds, p->thread.ds);
-	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
-
-	if (unlikely(p->flags & PF_KTHREAD)) {
-		/* kernel thread */
-		memset(childregs, 0, sizeof(struct pt_regs));
-		frame->bx = sp;		/* function */
-		frame->r12 = arg;
-		return 0;
-	}
-	frame->bx = 0;
-	*childregs = *current_pt_regs();
-
-	childregs->ax = 0;
-	if (sp)
-		childregs->sp = sp;
-
-	err = -ENOMEM;
-	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
-		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
-						  IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
-		set_tsk_thread_flag(p, TIF_IO_BITMAP);
-	}
-
-	/*
-	 * Set a new TLS for the child thread?
-	 */
-	if (clone_flags & CLONE_SETTLS) {
-#ifdef CONFIG_IA32_EMULATION
-		if (in_ia32_syscall())
-			err = do_set_thread_area(p, -1,
-				(struct user_desc __user *)tls, 0);
-		else
-#endif
-			err = do_arch_prctl_64(p, ARCH_SET_FS, tls);
-		if (err)
-			goto out;
-	}
-	err = 0;
-out:
-	if (err && p->thread.io_bitmap_ptr) {
-		kfree(p->thread.io_bitmap_ptr);
-		p->thread.io_bitmap_max = 0;
-	}
-
-	return err;
-}
-
 static void
 start_thread_common(struct pt_regs *regs, unsigned long new_ip,
 		    unsigned long new_sp,



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 03/16] x86/cpu: Unify cpu_init()
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
  2019-11-11 22:03 ` [patch V2 01/16] x86/ptrace: Prevent truncation of bitmap size Thomas Gleixner
  2019-11-11 22:03 ` [patch V2 02/16] x86/process: Unify copy_thread_tls() Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-11 22:03 ` [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON() Thomas Gleixner
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Thomas Gleixner <tglx@linutronix.de>

Similar to copy_thread_tls() the 32bit and 64bit implementations of
cpu_init() are very similar and unification avoids duplicate changes in the
future.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
---
V2: Fix 32bit build by removing the pointless #ifdef around the uv header include.
---
 arch/x86/kernel/cpu/common.c |  175 ++++++++++++++++---------------------------
 1 file changed, 66 insertions(+), 109 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -53,10 +53,7 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
-
-#ifdef CONFIG_X86_LOCAL_APIC
 #include <asm/uv/uv.h>
-#endif
 
 #include "cpu.h"
 
@@ -1749,7 +1746,7 @@ static void wait_for_master_cpu(int cpu)
 }
 
 #ifdef CONFIG_X86_64
-static void setup_getcpu(int cpu)
+static inline void setup_getcpu(int cpu)
 {
 	unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
 	struct desc_struct d = { };
@@ -1769,7 +1766,43 @@ static void setup_getcpu(int cpu)
 
 	write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_CPUNODE, &d, DESCTYPE_S);
 }
+
+static inline void ucode_cpu_init(int cpu)
+{
+	if (cpu)
+		load_ucode_ap();
+}
+
+static inline void tss_setup_ist(struct tss_struct *tss)
+{
+	/* Set up the per-CPU TSS IST stacks */
+	tss->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(DF);
+	tss->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
+	tss->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
+	tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
+}
+
+static inline void gdt_setup_doublefault_tss(int cpu) { }
+
+#else /* CONFIG_X86_64 */
+
+static inline void setup_getcpu(int cpu) { }
+
+static inline void ucode_cpu_init(int cpu)
+{
+	show_ucode_info_early();
+}
+
+static inline void tss_setup_ist(struct tss_struct *tss) { }
+
+static inline void gdt_setup_doublefault_tss(int cpu)
+{
+#ifdef CONFIG_DOUBLEFAULT
+	/* Set up the doublefault TSS pointer in the GDT */
+	__set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss);
 #endif
+}
+#endif /* !CONFIG_X86_64 */
 
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
@@ -1777,21 +1810,15 @@ static void setup_getcpu(int cpu)
  * and IDT. We reload them nevertheless, this function acts as a
  * 'CPU state barrier', nothing should get across.
  */
-#ifdef CONFIG_X86_64
-
 void cpu_init(void)
 {
+	struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
+	struct task_struct *cur = current;
 	int cpu = raw_smp_processor_id();
-	struct task_struct *me;
-	struct tss_struct *t;
-	int i;
 
 	wait_for_master_cpu(cpu);
 
-	if (cpu)
-		load_ucode_ap();
-
-	t = &per_cpu(cpu_tss_rw, cpu);
+	ucode_cpu_init(cpu);
 
 #ifdef CONFIG_NUMA
 	if (this_cpu_read(numa_node) == 0 &&
@@ -1800,63 +1827,48 @@ void cpu_init(void)
 #endif
 	setup_getcpu(cpu);
 
-	me = current;
-
 	pr_debug("Initializing CPU#%d\n", cpu);
 
-	cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
+	if (IS_ENABLED(CONFIG_X86_64) || cpu_feature_enabled(X86_FEATURE_VME) ||
+	    boot_cpu_has(X86_FEATURE_TSC) || boot_cpu_has(X86_FEATURE_DE))
+		cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
 
 	/*
 	 * Initialize the per-CPU GDT with the boot GDT,
 	 * and set up the GDT descriptor:
 	 */
-
 	switch_to_new_gdt(cpu);
-	loadsegment(fs, 0);
-
 	load_current_idt();
 
-	memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
-	syscall_init();
+	if (IS_ENABLED(CONFIG_X86_64)) {
+		loadsegment(fs, 0);
+		memset(cur->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
+		syscall_init();
+
+		wrmsrl(MSR_FS_BASE, 0);
+		wrmsrl(MSR_KERNEL_GS_BASE, 0);
+		barrier();
 
-	wrmsrl(MSR_FS_BASE, 0);
-	wrmsrl(MSR_KERNEL_GS_BASE, 0);
-	barrier();
-
-	x86_configure_nx();
-	x2apic_setup();
-
-	/*
-	 * set up and load the per-CPU TSS
-	 */
-	if (!t->x86_tss.ist[0]) {
-		t->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(DF);
-		t->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
-		t->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
-		t->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
+		x2apic_setup();
 	}
 
-	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
-
-	/*
-	 * <= is required because the CPU will access up to
-	 * 8 bits beyond the end of the IO permission bitmap.
-	 */
-	for (i = 0; i <= IO_BITMAP_LONGS; i++)
-		t->io_bitmap[i] = ~0UL;
-
 	mmgrab(&init_mm);
-	me->active_mm = &init_mm;
-	BUG_ON(me->mm);
+	cur->active_mm = &init_mm;
+	BUG_ON(cur->mm);
 	initialize_tlbstate_and_flush();
-	enter_lazy_tlb(&init_mm, me);
+	enter_lazy_tlb(&init_mm, cur);
 
-	/*
-	 * Initialize the TSS.  sp0 points to the entry trampoline stack
-	 * regardless of what task is running.
-	 */
+	/* Initialize the TSS. */
+	tss_setup_ist(tss);
+	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
+	memset(tss->io_bitmap, 0xff, sizeof(tss->io_bitmap));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
+
 	load_TR_desc();
+	/*
+	 * sp0 points to the entry trampoline stack regardless of what task
+	 * is running.
+	 */
 	load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
 
 	load_mm_ldt(&init_mm);
@@ -1864,6 +1876,8 @@ void cpu_init(void)
 	clear_all_debug_regs();
 	dbg_restore_debug_regs();
 
+	gdt_setup_doublefault_tss(cpu);
+
 	fpu__init_cpu();
 
 	if (is_uv_system())
@@ -1872,63 +1886,6 @@ void cpu_init(void)
 	load_fixmap_gdt(cpu);
 }
 
-#else
-
-void cpu_init(void)
-{
-	int cpu = smp_processor_id();
-	struct task_struct *curr = current;
-	struct tss_struct *t = &per_cpu(cpu_tss_rw, cpu);
-
-	wait_for_master_cpu(cpu);
-
-	show_ucode_info_early();
-
-	pr_info("Initializing CPU#%d\n", cpu);
-
-	if (cpu_feature_enabled(X86_FEATURE_VME) ||
-	    boot_cpu_has(X86_FEATURE_TSC) ||
-	    boot_cpu_has(X86_FEATURE_DE))
-		cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
-
-	load_current_idt();
-	switch_to_new_gdt(cpu);
-
-	/*
-	 * Set up and load the per-CPU TSS and LDT
-	 */
-	mmgrab(&init_mm);
-	curr->active_mm = &init_mm;
-	BUG_ON(curr->mm);
-	initialize_tlbstate_and_flush();
-	enter_lazy_tlb(&init_mm, curr);
-
-	/*
-	 * Initialize the TSS.  sp0 points to the entry trampoline stack
-	 * regardless of what task is running.
-	 */
-	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
-	load_TR_desc();
-	load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
-
-	load_mm_ldt(&init_mm);
-
-	t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
-
-#ifdef CONFIG_DOUBLEFAULT
-	/* Set up doublefault TSS pointer in the GDT */
-	__set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss);
-#endif
-
-	clear_all_debug_regs();
-	dbg_restore_debug_regs();
-
-	fpu__init_cpu();
-
-	load_fixmap_gdt(cpu);
-}
-#endif
-
 /*
  * The microcode loader calls this upon late microcode load to recheck features,
  * only when microcode has been updated. Caller holds microcode_mutex and CPU



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON()
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 03/16] x86/cpu: Unify cpu_init() Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-11 22:44   ` Paolo Bonzini
  2019-11-12 15:37   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 05/16] x86/iopl: Cleanup include maze Thomas Gleixner
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Paolo Bonzini

The BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 == 0x67) in the VMX code is bogus in
two aspects:

1) This wants to be in generic x86 code simply to catch issues even when
   VMX is disabled in Kconfig.

2) The IO_BITMAP_OFFSET is not the right thing to check because it makes
   asssumptions about the layout of tss_struct. Nothing requires that the
   I/O bitmap is placed right after x86_tss, which is the hardware mandated
   tss structure. It pointlessly makes restrictions on the struct
   tss_struct layout.

The proper thing to check is:

    - Offset of x86_tss in tss_struct is 0
    - Size of x86_tss == 0x68

Move it to the other build time TSS checks and make it do the right thing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
V2: New patch
---
 arch/x86/kvm/vmx/vmx.c       |    8 --------
 arch/x86/mm/cpu_entry_area.c |    8 ++++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1338,14 +1338,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu
 			    (unsigned long)&get_cpu_entry_area(cpu)->tss.x86_tss);
 		vmcs_writel(HOST_GDTR_BASE, (unsigned long)gdt);   /* 22.2.4 */
 
-		/*
-		 * VM exits change the host TR limit to 0x67 after a VM
-		 * exit.  This is okay, since 0x67 covers everything except
-		 * the IO bitmap and have have code to handle the IO bitmap
-		 * being lost after a VM exit.
-		 */
-		BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 != 0x67);
-
 		rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
 		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
 
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -161,6 +161,14 @@ static void __init setup_cpu_entry_area(
 	BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
 		      offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
 	BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
+	/*
+	 * VMX changes the host TR limit to 0x67 after a VM exit. This is
+	 * okay, since 0x67 covers the size of struct x86_hw_tss. Make sure
+	 * that this is correct.
+	 */
+	BUILD_BUG_ON(offsetof(struct tss_struct, x86_tss) != 0);
+	BUILD_BUG_ON(sizeof(struct x86_hw_tss) != 0x68);
+
 	cea_map_percpu_pages(&cea->tss, &per_cpu(cpu_tss_rw, cpu),
 			     sizeof(struct tss_struct) / PAGE_SIZE, tss_prot);
 



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 05/16] x86/iopl: Cleanup include maze
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON() Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 15:37   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 06/16] x86/io: Speedup schedule out of I/O bitmap user Thomas Gleixner
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

Get rid of superfluous includes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/kernel/ioport.c |   16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -3,22 +3,14 @@
  * This contains the io-permission bitmap code - written by obz, with changes
  * by Linus. 32/64 bits code unification by Miguel Botón.
  */
-
-#include <linux/sched.h>
-#include <linux/sched/task_stack.h>
-#include <linux/kernel.h>
 #include <linux/capability.h>
-#include <linux/errno.h>
-#include <linux/types.h>
-#include <linux/ioport.h>
 #include <linux/security.h>
-#include <linux/smp.h>
-#include <linux/stddef.h>
-#include <linux/slab.h>
-#include <linux/thread_info.h>
 #include <linux/syscalls.h>
 #include <linux/bitmap.h>
-#include <asm/syscalls.h>
+#include <linux/ioport.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
 #include <asm/desc.h>
 
 /*



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 06/16] x86/io: Speedup schedule out of I/O bitmap user
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (4 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 05/16] x86/iopl: Cleanup include maze Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 16:00   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 07/16] x86/ioperm: Move iobitmap data into a struct Thomas Gleixner
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Thomas Gleixner <tglx@linutronix.de>

There is no requirement to update the TSS I/O bitmap when a thread using it is
scheduled out and the incoming thread does not use it.

For the permission check based on the TSS I/O bitmap the CPU calculates the memory
location of the I/O bitmap by the address of the TSS and the io_bitmap_base member
of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the
offset to an address outside of the TSS limit.

If an I/O instruction is issued from user space the TSS limit causes #GP to be
raised in the same was as valid I/O bitmap with all bits set to 1 would do.

This removes the extra work when an I/O bitmap using task is scheduled out
and puts the burden on the rare I/O bitmap users when they are scheduled
in.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/processor.h |   38 ++++++++++++++++-------
 arch/x86/kernel/cpu/common.c     |    3 +
 arch/x86/kernel/doublefault.c    |    2 -
 arch/x86/kernel/ioport.c         |    7 +++-
 arch/x86/kernel/process.c        |   63 ++++++++++++++++++++++-----------------
 5 files changed, 70 insertions(+), 43 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -330,8 +330,23 @@ struct x86_hw_tss {
 #define IO_BITMAP_BITS			65536
 #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
-#define IO_BITMAP_OFFSET		(offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss))
-#define INVALID_IO_BITMAP_OFFSET	0x8000
+
+#define IO_BITMAP_OFFSET_VALID				\
+	(offsetof(struct tss_struct, io_bitmap) -	\
+	 offsetof(struct tss_struct, x86_tss))
+
+/*
+ * sizeof(unsigned long) coming from an extra "long" at the end
+ * of the iobitmap.
+ *
+ * -1? seg base+limit should be pointing to the address of the
+ * last valid byte
+ */
+#define __KERNEL_TSS_LIMIT	\
+	(IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
+
+/* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */
+#define IO_BITMAP_OFFSET_INVALID	(__KERNEL_TSS_LIMIT + 1)
 
 struct entry_stack {
 	unsigned long		words[64];
@@ -350,6 +365,15 @@ struct tss_struct {
 	struct x86_hw_tss	x86_tss;
 
 	/*
+	 * Store the dirty size of the last io bitmap offender. The next
+	 * one will have to do the cleanup as the switch out to a non io
+	 * bitmap user will just set x86_tss.io_bitmap_base to a value
+	 * outside of the TSS limit. So for sane tasks there is no need to
+	 * actually touch the io_bitmap at all.
+	 */
+	unsigned int		io_bitmap_prev_max;
+
+	/*
 	 * The extra 1 is there because the CPU will access an
 	 * additional byte beyond the end of the IO permission
 	 * bitmap. The extra byte must be all 1 bits, and must
@@ -360,16 +384,6 @@ struct tss_struct {
 
 DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw);
 
-/*
- * sizeof(unsigned long) coming from an extra "long" at the end
- * of the iobitmap.
- *
- * -1? seg base+limit should be pointing to the address of the
- * last valid byte
- */
-#define __KERNEL_TSS_LIMIT	\
-	(IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
-
 /* Per CPU interrupt stacks */
 struct irq_stack {
 	char		stack[IRQ_STACK_SIZE];
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1860,7 +1860,8 @@ void cpu_init(void)
 
 	/* Initialize the TSS. */
 	tss_setup_ist(tss);
-	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
+	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
+	tss->io_bitmap_prev_max = 0;
 	memset(tss->io_bitmap, 0xff, sizeof(tss->io_bitmap));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 
--- a/arch/x86/kernel/doublefault.c
+++ b/arch/x86/kernel/doublefault.c
@@ -54,7 +54,7 @@ struct x86_hw_tss doublefault_tss __cach
 	.sp0		= STACK_START,
 	.ss0		= __KERNEL_DS,
 	.ldt		= 0,
-	.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
+	.io_bitmap_base	= IO_BITMAP_OFFSET_INVALID,
 
 	.ip		= (unsigned long) doublefault_fn,
 	/* 0x2 bit is always set */
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -40,8 +40,6 @@ long ksys_ioperm(unsigned long from, uns
 			return -ENOMEM;
 
 		memset(bitmap, 0xff, IO_BITMAP_BYTES);
-		t->io_bitmap_ptr = bitmap;
-		set_thread_flag(TIF_IO_BITMAP);
 
 		/*
 		 * Now that we have an IO bitmap, we need our TSS limit to be
@@ -50,6 +48,11 @@ long ksys_ioperm(unsigned long from, uns
 		 * limit correct.
 		 */
 		preempt_disable();
+		t->io_bitmap_ptr = bitmap;
+		set_thread_flag(TIF_IO_BITMAP);
+		/* Make the bitmap base in the TSS valid */
+		tss = this_cpu_ptr(&cpu_tss_rw);
+		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
 		refresh_tss_limit();
 		preempt_enable();
 	}
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -72,18 +72,9 @@
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
 		.ss1 = __KERNEL_CS,
-		.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,
 #endif
+		.io_bitmap_base	= IO_BITMAP_OFFSET_INVALID,
 	 },
-#ifdef CONFIG_X86_32
-	 /*
-	  * Note that the .io_bitmap member must be extra-big. This is because
-	  * the CPU will access an additional byte beyond the end of the IO
-	  * permission bitmap. The extra byte must be all 1 bits, and must
-	  * be within the limit.
-	  */
-	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },
-#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss_rw);
 
@@ -112,18 +103,18 @@ void exit_thread(struct task_struct *tsk
 	struct thread_struct *t = &tsk->thread;
 	unsigned long *bp = t->io_bitmap_ptr;
 	struct fpu *fpu = &t->fpu;
+	struct tss_struct *tss;
 
 	if (bp) {
-		struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu());
+		preempt_disable();
+		tss = this_cpu_ptr(&cpu_tss_rw);
 
 		t->io_bitmap_ptr = NULL;
-		clear_thread_flag(TIF_IO_BITMAP);
-		/*
-		 * Careful, clear this in the TSS too:
-		 */
-		memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
 		t->io_bitmap_max = 0;
-		put_cpu();
+		clear_thread_flag(TIF_IO_BITMAP);
+		/* Invalidate the io bitmap base in the TSS */
+		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
+		preempt_enable();
 		kfree(bp);
 	}
 
@@ -363,29 +354,47 @@ void arch_setup_new_exec(void)
 	}
 }
 
-static inline void switch_to_bitmap(struct thread_struct *prev,
-				    struct thread_struct *next,
+static inline void switch_to_bitmap(struct thread_struct *next,
 				    unsigned long tifp, unsigned long tifn)
 {
 	struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
 
 	if (tifn & _TIF_IO_BITMAP) {
 		/*
-		 * Copy the relevant range of the IO bitmap.
-		 * Normally this is 128 bytes or less:
+		 * Copy at least the size of the incoming tasks bitmap
+		 * which covers the last permitted I/O port.
+		 *
+		 * If the previous task which used an io bitmap had more
+		 * bits permitted, then the copy needs to cover those as
+		 * well so they get turned off.
 		 */
 		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-		       max(prev->io_bitmap_max, next->io_bitmap_max));
+		       max(tss->io_bitmap_prev_max, next->io_bitmap_max));
+
+		/* Store the new max and set io_bitmap_base valid */
+		tss->io_bitmap_prev_max = next->io_bitmap_max;
+		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
+
 		/*
-		 * Make sure that the TSS limit is correct for the CPU
-		 * to notice the IO bitmap.
+		 * Make sure that the TSS limit is covering the io bitmap.
+		 * It might have been cut down by a VMEXIT to 0x67 which
+		 * would cause a subsequent I/O access from user space to
+		 * trigger a #GP because tbe bitmap is outside the TSS
+		 * limit.
 		 */
 		refresh_tss_limit();
 	} else if (tifp & _TIF_IO_BITMAP) {
 		/*
-		 * Clear any possible leftover bits:
+		 * Do not touch the bitmap. Let the next bitmap using task
+		 * deal with the mess. Just make the io_bitmap_base invalid
+		 * by moving it outside the TSS limit so any subsequent I/O
+		 * access from user space will trigger a #GP.
+		 *
+		 * This is correct even when VMEXIT rewrites the TSS limit
+		 * to 0x67 as the only requirement is that the base points
+		 * outside the limit.
 		 */
-		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
+		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
 	}
 }
 
@@ -599,7 +608,7 @@ void __switch_to_xtra(struct task_struct
 
 	tifn = READ_ONCE(task_thread_info(next_p)->flags);
 	tifp = READ_ONCE(task_thread_info(prev_p)->flags);
-	switch_to_bitmap(prev, next, tifp, tifn);
+	switch_to_bitmap(next, tifp, tifn);
 
 	propagate_user_return_notify(prev_p, next_p);
 



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 07/16] x86/ioperm: Move iobitmap data into a struct
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (5 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 06/16] x86/io: Speedup schedule out of I/O bitmap user Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 16:02   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Thomas Gleixner
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

No point in having all the data in thread_struct, especially as upcoming
changes add more.

Make the bitmap in the new struct accessible as array of longs and as array
of characters via a union, so both the bitmap functions and the update
logic can avoid type casts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/iobitmap.h  |   15 ++++++++
 arch/x86/include/asm/processor.h |   21 +++++------
 arch/x86/kernel/cpu/common.c     |    2 -
 arch/x86/kernel/ioport.c         |   69 +++++++++++++++++++--------------------
 arch/x86/kernel/process.c        |   38 +++++++++++----------
 arch/x86/kernel/ptrace.c         |   12 ++++--
 6 files changed, 89 insertions(+), 68 deletions(-)

--- /dev/null
+++ b/arch/x86/include/asm/iobitmap.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_IOBITMAP_H
+#define _ASM_X86_IOBITMAP_H
+
+#include <asm/processor.h>
+
+struct io_bitmap {
+	unsigned int		io_bitmap_max;
+	union {
+		unsigned long	bits[IO_BITMAP_LONGS];
+		unsigned char	bitmap_bytes[IO_BITMAP_BYTES];
+	};
+};
+
+#endif
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -7,6 +7,7 @@
 /* Forward declaration, a strange C thing */
 struct task_struct;
 struct mm_struct;
+struct io_bitmap;
 struct vm86;
 
 #include <asm/math_emu.h>
@@ -328,22 +329,22 @@ struct x86_hw_tss {
  * IO-bitmap sizes:
  */
 #define IO_BITMAP_BITS			65536
-#define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
-#define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
+#define IO_BITMAP_BYTES			(IO_BITMAP_BITS / BITS_PER_BYTE)
+#define IO_BITMAP_LONGS			(IO_BITMAP_BYTES / sizeof(long))
 
 #define IO_BITMAP_OFFSET_VALID				\
-	(offsetof(struct tss_struct, io_bitmap) -	\
+	(offsetof(struct tss_struct, io_bitmap_bytes) -	\
 	 offsetof(struct tss_struct, x86_tss))
 
 /*
- * sizeof(unsigned long) coming from an extra "long" at the end
- * of the iobitmap.
+ * The extra byte at the end is required by the hardware. It has all
+ * bits set.
  *
  * -1? seg base+limit should be pointing to the address of the
  * last valid byte
  */
 #define __KERNEL_TSS_LIMIT	\
-	(IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + sizeof(unsigned long) - 1)
+	(IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + 1 - 1)
 
 /* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */
 #define IO_BITMAP_OFFSET_INVALID	(__KERNEL_TSS_LIMIT + 1)
@@ -379,7 +380,8 @@ struct tss_struct {
 	 * bitmap. The extra byte must be all 1 bits, and must
 	 * be within the limit.
 	 */
-	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
+	unsigned char		io_bitmap_bytes[IO_BITMAP_BYTES + 1]
+				__aligned(sizeof(unsigned long));
 } __aligned(PAGE_SIZE);
 
 DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw);
@@ -494,10 +496,8 @@ struct thread_struct {
 	struct vm86		*vm86;
 #endif
 	/* IO permissions: */
-	unsigned long		*io_bitmap_ptr;
+	struct io_bitmap	*io_bitmap;
 	unsigned long		iopl;
-	/* Max allowed port in the bitmap, in bytes: */
-	unsigned		io_bitmap_max;
 
 	mm_segment_t		addr_limit;
 
@@ -855,7 +855,6 @@ static inline void spin_lock_prefetch(co
 #define INIT_THREAD  {							  \
 	.sp0			= TOP_OF_INIT_STACK,			  \
 	.sysenter_cs		= __KERNEL_CS,				  \
-	.io_bitmap_ptr		= NULL,					  \
 	.addr_limit		= KERNEL_DS,				  \
 }
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1862,7 +1862,7 @@ void cpu_init(void)
 	tss_setup_ist(tss);
 	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
 	tss->io_bitmap_prev_max = 0;
-	memset(tss->io_bitmap, 0xff, sizeof(tss->io_bitmap));
+	memset(tss->io_bitmap_bytes, 0xff, sizeof(tss->io_bitmap_bytes));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 
 	load_TR_desc();
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -11,6 +11,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 
+#include <asm/iobitmap.h>
 #include <asm/desc.h>
 
 /*
@@ -18,9 +19,10 @@
  */
 long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 {
+	unsigned int i, max_long, bytes, bytes_updated;
 	struct thread_struct *t = &current->thread;
 	struct tss_struct *tss;
-	unsigned int i, max_long, bytes, bytes_updated;
+	struct io_bitmap *iobm;
 
 	if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
 		return -EINVAL;
@@ -33,62 +35,61 @@ long ksys_ioperm(unsigned long from, uns
 	 * IO bitmap up. ioperm() is much less timing critical than clone(),
 	 * this is why we delay this operation until now:
 	 */
-	if (!t->io_bitmap_ptr) {
-		unsigned long *bitmap = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
+	iobm = t->io_bitmap;
 
-		if (!bitmap)
-			return -ENOMEM;
+	if (!iobm) {
+		if (!turn_on)
+			return 0;
 
-		memset(bitmap, 0xff, IO_BITMAP_BYTES);
+		iobm = kmalloc(sizeof(*iobm), GFP_KERNEL);
+		if (!iobm)
+			return -ENOMEM;
 
-		/*
-		 * Now that we have an IO bitmap, we need our TSS limit to be
-		 * correct.  It's fine if we are preempted after doing this:
-		 * with TIF_IO_BITMAP set, context switches will keep our TSS
-		 * limit correct.
-		 */
-		preempt_disable();
-		t->io_bitmap_ptr = bitmap;
-		set_thread_flag(TIF_IO_BITMAP);
-		/* Make the bitmap base in the TSS valid */
-		tss = this_cpu_ptr(&cpu_tss_rw);
-		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
-		refresh_tss_limit();
-		preempt_enable();
+		memset(iobm->bits, 0xff, sizeof(iobm->bits));
 	}
 
 	/*
-	 * do it in the per-thread copy and in the TSS ...
-	 *
-	 * Disable preemption via get_cpu() - we must not switch away
-	 * because the ->io_bitmap_max value must match the bitmap
-	 * contents:
+	 * Update the tasks bitmap and the TSS copy. This requires to have
+	 * preemption disabled to protect against a context switch.
 	 */
-	tss = &per_cpu(cpu_tss_rw, get_cpu());
+	preempt_disable();
+	tss = this_cpu_ptr(&cpu_tss_rw);
 
+	/* Update the bitmap */
 	if (turn_on)
-		bitmap_clear(t->io_bitmap_ptr, from, num);
+		bitmap_clear(iobm->bits, from, num);
 	else
-		bitmap_set(t->io_bitmap_ptr, from, num);
+		bitmap_set(iobm->bits, from, num);
 
 	/*
 	 * Search for a (possibly new) maximum. This is simple and stupid,
 	 * to keep it obviously correct:
 	 */
 	max_long = 0;
-	for (i = 0; i < IO_BITMAP_LONGS; i++)
-		if (t->io_bitmap_ptr[i] != ~0UL)
+	for (i = 0; i < IO_BITMAP_LONGS; i++) {
+		if (iobm->bits[i] != ~0UL)
 			max_long = i;
+	}
 
 	bytes = (max_long + 1) * sizeof(unsigned long);
-	bytes_updated = max(bytes, t->io_bitmap_max);
+	bytes_updated = max(bytes, iobm->io_bitmap_max);
 
-	t->io_bitmap_max = bytes;
+	iobm->io_bitmap_max = bytes;
 
 	/* Update the TSS: */
-	memcpy(tss->io_bitmap, t->io_bitmap_ptr, bytes_updated);
+	memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes, bytes_updated);
 
-	put_cpu();
+	/* Set the tasks io_bitmap pointer (might be the same) */
+	t->io_bitmap = iobm;
+	/* Mark it active for context switching */
+	set_thread_flag(TIF_IO_BITMAP);
+
+	/* Make the bitmap base in the TSS valid */
+	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
+
+	/* Make sure the TSS limit covers the I/O bitmap. */
+	refresh_tss_limit();
+	preempt_enable();
 
 	return 0;
 }
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -41,6 +41,7 @@
 #include <asm/desc.h>
 #include <asm/prctl.h>
 #include <asm/spec-ctrl.h>
+#include <asm/iobitmap.h>
 #include <asm/proto.h>
 
 #include "process.h"
@@ -101,21 +102,20 @@ int arch_dup_task_struct(struct task_str
 void exit_thread(struct task_struct *tsk)
 {
 	struct thread_struct *t = &tsk->thread;
-	unsigned long *bp = t->io_bitmap_ptr;
+	struct io_bitmap *bm = t->io_bitmap;
 	struct fpu *fpu = &t->fpu;
 	struct tss_struct *tss;
 
-	if (bp) {
+	if (bm) {
 		preempt_disable();
 		tss = this_cpu_ptr(&cpu_tss_rw);
 
-		t->io_bitmap_ptr = NULL;
-		t->io_bitmap_max = 0;
+		t->io_bitmap = NULL;
 		clear_thread_flag(TIF_IO_BITMAP);
 		/* Invalidate the io bitmap base in the TSS */
 		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
 		preempt_enable();
-		kfree(bp);
+		kfree(bm);
 	}
 
 	free_vm86(t);
@@ -135,25 +135,25 @@ static int set_new_tls(struct task_struc
 
 static inline int copy_io_bitmap(struct task_struct *tsk)
 {
+	struct io_bitmap *bm = current->thread.io_bitmap;
+
 	if (likely(!test_tsk_thread_flag(current, TIF_IO_BITMAP)))
 		return 0;
 
-	tsk->thread.io_bitmap_ptr = kmemdup(current->thread.io_bitmap_ptr,
-					    IO_BITMAP_BYTES, GFP_KERNEL);
-	if (!tsk->thread.io_bitmap_ptr) {
-		tsk->thread.io_bitmap_max = 0;
+	tsk->thread.io_bitmap = kmemdup(bm, sizeof(*bm), GFP_KERNEL);
+
+	if (!tsk->thread.io_bitmap)
 		return -ENOMEM;
-	}
+
 	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
 	return 0;
 }
 
 static inline void free_io_bitmap(struct task_struct *tsk)
 {
-	if (tsk->thread.io_bitmap_ptr) {
-		kfree(tsk->thread.io_bitmap_ptr);
-		tsk->thread.io_bitmap_ptr = NULL;
-		tsk->thread.io_bitmap_max = 0;
+	if (tsk->thread.io_bitmap) {
+		kfree(tsk->thread.io_bitmap);
+		tsk->thread.io_bitmap = NULL;
 	}
 }
 
@@ -172,7 +172,7 @@ int copy_thread_tls(unsigned long clone_
 	frame->bp = 0;
 	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;
-	p->thread.io_bitmap_ptr = NULL;
+	p->thread.io_bitmap = NULL;
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
 #ifdef CONFIG_X86_64
@@ -360,6 +360,8 @@ static inline void switch_to_bitmap(stru
 	struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
 
 	if (tifn & _TIF_IO_BITMAP) {
+		struct io_bitmap *iobm = next->io_bitmap;
+
 		/*
 		 * Copy at least the size of the incoming tasks bitmap
 		 * which covers the last permitted I/O port.
@@ -368,11 +370,11 @@ static inline void switch_to_bitmap(stru
 		 * bits permitted, then the copy needs to cover those as
 		 * well so they get turned off.
 		 */
-		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-		       max(tss->io_bitmap_prev_max, next->io_bitmap_max));
+		memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes,
+		       max(tss->io_bitmap_prev_max, iobm->io_bitmap_max));
 
 		/* Store the new max and set io_bitmap_base valid */
-		tss->io_bitmap_prev_max = next->io_bitmap_max;
+		tss->io_bitmap_prev_max = iobm->io_bitmap_max;
 		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
 
 		/*
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -42,6 +42,7 @@
 #include <asm/traps.h>
 #include <asm/syscall.h>
 #include <asm/fsgsbase.h>
+#include <asm/iobitmap.h>
 
 #include "tls.h"
 
@@ -697,7 +698,9 @@ static int ptrace_set_debugreg(struct ta
 static int ioperm_active(struct task_struct *target,
 			 const struct user_regset *regset)
 {
-	return DIV_ROUND_UP(target->thread.io_bitmap_max, regset->size);
+	struct io_bitmap *iobm = target->thread.io_bitmap;
+
+	return iobm ? DIV_ROUND_UP(iobm->io_bitmap_max, regset->size) : 0;
 }
 
 static int ioperm_get(struct task_struct *target,
@@ -705,12 +708,13 @@ static int ioperm_get(struct task_struct
 		      unsigned int pos, unsigned int count,
 		      void *kbuf, void __user *ubuf)
 {
-	if (!target->thread.io_bitmap_ptr)
+	struct io_bitmap *iobm = target->thread.io_bitmap;
+
+	if (!iobm)
 		return -ENXIO;
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
-				   target->thread.io_bitmap_ptr,
-				   0, IO_BITMAP_BYTES);
+				   iobm->bitmap_bytes, 0, IO_BITMAP_BYTES);
 }
 
 /*



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 08/16] x86/ioperm: Add bitmap sequence number
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (6 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 07/16] x86/ioperm: Move iobitmap data into a struct Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12  9:22   ` Peter Zijlstra
  2019-11-12 16:08   ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work Thomas Gleixner
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Linus Torvalds

Add a globally unique sequence number which is incremented when ioperm() is
changing the I/O bitmap of a task. Store the new sequence number in the
io_bitmap structure and compare it along with the actual struct pointer
with the one which was last loaded on a CPU. Only update the bitmap if
either of the two changes. That should further reduce the overhead of I/O
bitmap scheduling when there are only a few I/O bitmap users on the system.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/iobitmap.h  |    1 
 arch/x86/include/asm/processor.h |    8 +++++++
 arch/x86/kernel/cpu/common.c     |    1 
 arch/x86/kernel/ioport.c         |    9 +++++---
 arch/x86/kernel/process.c        |   40 +++++++++++++++++++++++++++++----------
 5 files changed, 46 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/iobitmap.h
+++ b/arch/x86/include/asm/iobitmap.h
@@ -5,6 +5,7 @@
 #include <asm/processor.h>
 
 struct io_bitmap {
+	u64			sequence;
 	unsigned int		io_bitmap_max;
 	union {
 		unsigned long	bits[IO_BITMAP_LONGS];
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -366,6 +366,14 @@ struct tss_struct {
 	struct x86_hw_tss	x86_tss;
 
 	/*
+	 * The bitmap pointer and the sequence number of the last active
+	 * bitmap. last_bitmap cannot be dereferenced. It's solely for
+	 * comparison.
+	 */
+	struct io_bitmap	*last_bitmap;
+	u64			last_sequence;
+
+	/*
 	 * Store the dirty size of the last io bitmap offender. The next
 	 * one will have to do the cleanup as the switch out to a non io
 	 * bitmap user will just set x86_tss.io_bitmap_base to a value
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1861,6 +1861,7 @@ void cpu_init(void)
 	/* Initialize the TSS. */
 	tss_setup_ist(tss);
 	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
+	tss->last_bitmap = NULL;
 	tss->io_bitmap_prev_max = 0;
 	memset(tss->io_bitmap_bytes, 0xff, sizeof(tss->io_bitmap_bytes));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -14,6 +14,8 @@
 #include <asm/iobitmap.h>
 #include <asm/desc.h>
 
+static atomic64_t io_bitmap_sequence;
+
 /*
  * this changes the io permissions bitmap in the current task.
  */
@@ -76,14 +78,15 @@ long ksys_ioperm(unsigned long from, uns
 
 	iobm->io_bitmap_max = bytes;
 
-	/* Update the TSS: */
-	memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes, bytes_updated);
-
+	/* Update the sequence number to force an update in switch_to() */
+	iobm->sequence = atomic64_add_return(1, &io_bitmap_sequence);
 	/* Set the tasks io_bitmap pointer (might be the same) */
 	t->io_bitmap = iobm;
 	/* Mark it active for context switching */
 	set_thread_flag(TIF_IO_BITMAP);
 
+	/* Update the TSS: */
+	memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes, bytes_updated);
 	/* Make the bitmap base in the TSS valid */
 	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
 
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -354,6 +354,29 @@ void arch_setup_new_exec(void)
 	}
 }
 
+static void switch_to_update_io_bitmap(struct tss_struct *tss,
+				       struct io_bitmap *iobm)
+{
+	/*
+	 * Copy at least the byte range of the incoming tasks bitmap which
+	 * covers the permitted I/O ports.
+	 *
+	 * If the previous task which used an I/O bitmap had more bits
+	 * permitted, then the copy needs to cover those as well so they
+	 * get turned off.
+	 */
+	memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes,
+	       max(tss->io_bitmap_prev_max, iobm->io_bitmap_max));
+
+	/*
+	 * Store the new max and the sequence number of this bitmap
+	 * and a pointer to the bitmap itself.
+	 */
+	tss->io_bitmap_prev_max = iobm->io_bitmap_max;
+	tss->last_sequence = iobm->sequence;
+	tss->last_bitmap = iobm;
+}
+
 static inline void switch_to_bitmap(struct thread_struct *next,
 				    unsigned long tifp, unsigned long tifn)
 {
@@ -363,18 +386,15 @@ static inline void switch_to_bitmap(stru
 		struct io_bitmap *iobm = next->io_bitmap;
 
 		/*
-		 * Copy at least the size of the incoming tasks bitmap
-		 * which covers the last permitted I/O port.
-		 *
-		 * If the previous task which used an io bitmap had more
-		 * bits permitted, then the copy needs to cover those as
-		 * well so they get turned off.
+		 * Only copy bitmap data when the bitmap or the sequence
+		 * number differs. The update time is accounted to the
+		 * incoming task.
 		 */
-		memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes,
-		       max(tss->io_bitmap_prev_max, iobm->io_bitmap_max));
+		if (tss->last_bitmap != iobm ||
+		    tss->last_sequence != iobm->sequence)
+			switch_to_update_io_bitmap(tss, iobm);
 
-		/* Store the new max and set io_bitmap_base valid */
-		tss->io_bitmap_prev_max = iobm->io_bitmap_max;
+		/* Enable the bitmap */
 		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
 
 		/*



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (7 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 16:16   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 10/16] x86/ioperm: Remove bitmap if all permissions dropped Thomas Gleixner
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

There is no point to update the TSS bitmap for tasks which use I/O bitmaps
on every context switch. It's enough to update it right before exiting to
user space.

That reduces the context switch bitmap handling to invalidating the io
bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP
set. It also removes the requirement to update the tasks bitmap atomically
in ioperm().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/entry/common.c         |    4 ++
 arch/x86/include/asm/iobitmap.h |    2 +
 arch/x86/kernel/ioport.c        |   27 +++----------------
 arch/x86/kernel/process.c       |   56 +++++++++++++++++++++++++---------------
 4 files changed, 47 insertions(+), 42 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -33,6 +33,7 @@
 #include <asm/cpufeature.h>
 #include <asm/fpu/api.h>
 #include <asm/nospec-branch.h>
+#include <asm/iobitmap.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -196,6 +197,9 @@ static void exit_to_usermode_loop(struct
 	/* Reload ti->flags; we may have rescheduled above. */
 	cached_flags = READ_ONCE(ti->flags);
 
+	if (unlikely(cached_flags & _TIF_IO_BITMAP))
+		tss_update_io_bitmap();
+
 	fpregs_assert_state_consistent();
 	if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD))
 		switch_fpu_return();
--- a/arch/x86/include/asm/iobitmap.h
+++ b/arch/x86/include/asm/iobitmap.h
@@ -13,4 +13,6 @@ struct io_bitmap {
 	};
 };
 
+void tss_update_io_bitmap(void);
+
 #endif
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -21,9 +21,8 @@ static atomic64_t io_bitmap_sequence;
  */
 long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 {
-	unsigned int i, max_long, bytes, bytes_updated;
 	struct thread_struct *t = &current->thread;
-	struct tss_struct *tss;
+	unsigned int i, max_long;
 	struct io_bitmap *iobm;
 
 	if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
@@ -51,13 +50,9 @@ long ksys_ioperm(unsigned long from, uns
 	}
 
 	/*
-	 * Update the tasks bitmap and the TSS copy. This requires to have
-	 * preemption disabled to protect against a context switch.
+	 * Update the tasks bitmap. The update of the TSS bitmap happens on
+	 * exit to user mode. So this needs no protection.
 	 */
-	preempt_disable();
-	tss = this_cpu_ptr(&cpu_tss_rw);
-
-	/* Update the bitmap */
 	if (turn_on)
 		bitmap_clear(iobm->bits, from, num);
 	else
@@ -73,27 +68,15 @@ long ksys_ioperm(unsigned long from, uns
 			max_long = i;
 	}
 
-	bytes = (max_long + 1) * sizeof(unsigned long);
-	bytes_updated = max(bytes, iobm->io_bitmap_max);
-
-	iobm->io_bitmap_max = bytes;
+	iobm->io_bitmap_max = (max_long + 1) * sizeof(unsigned long);
 
 	/* Update the sequence number to force an update in switch_to() */
 	iobm->sequence = atomic64_add_return(1, &io_bitmap_sequence);
 	/* Set the tasks io_bitmap pointer (might be the same) */
 	t->io_bitmap = iobm;
-	/* Mark it active for context switching */
+	/* Mark it active for context switching and exit to user mode */
 	set_thread_flag(TIF_IO_BITMAP);
 
-	/* Update the TSS: */
-	memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes, bytes_updated);
-	/* Make the bitmap base in the TSS valid */
-	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
-
-	/* Make sure the TSS limit covers the I/O bitmap. */
-	refresh_tss_limit();
-	preempt_enable();
-
 	return 0;
 }
 
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -354,8 +354,31 @@ void arch_setup_new_exec(void)
 	}
 }
 
-static void switch_to_update_io_bitmap(struct tss_struct *tss,
-				       struct io_bitmap *iobm)
+static inline void tss_invalidate_io_bitmap(struct tss_struct *tss)
+{
+	/*
+	 * Invalidate the I/O bitmap by moving io_bitmap_base outside the
+	 * TSS limit so any subsequent I/O access from user space will
+	 * trigger a #GP.
+	 *
+	 * This is correct even when VMEXIT rewrites the TSS limit
+	 * to 0x67 as the only requirement is that the base points
+	 * outside the limit.
+	 */
+	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
+}
+
+static inline void switch_to_bitmap(unsigned long tifp)
+{
+	/*
+	 * Invalidate I/O bitmap if the previous task used it. If the next
+	 * task has an I/O bitmap it will handle it on exit to user mode.
+	 */
+	if (tifp & _TIF_IO_BITMAP)
+		tss_invalidate_io_bitmap(this_cpu_ptr(&cpu_tss_rw));
+}
+
+static void tss_copy_io_bitmap(struct tss_struct *tss, struct io_bitmap *iobm)
 {
 	/*
 	 * Copy at least the byte range of the incoming tasks bitmap which
@@ -377,13 +400,15 @@ static void switch_to_update_io_bitmap(s
 	tss->last_bitmap = iobm;
 }
 
-static inline void switch_to_bitmap(struct thread_struct *next,
-				    unsigned long tifp, unsigned long tifn)
+/**
+ * tss_update_io_bitmap - Update I/O bitmap before exiting to usermode
+ */
+void tss_update_io_bitmap(void)
 {
 	struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
 
-	if (tifn & _TIF_IO_BITMAP) {
-		struct io_bitmap *iobm = next->io_bitmap;
+	if (test_thread_flag(TIF_IO_BITMAP)) {
+		struct io_bitmap *iobm = current->thread.io_bitmap;
 
 		/*
 		 * Only copy bitmap data when the bitmap or the sequence
@@ -392,7 +417,7 @@ static inline void switch_to_bitmap(stru
 		 */
 		if (tss->last_bitmap != iobm ||
 		    tss->last_sequence != iobm->sequence)
-			switch_to_update_io_bitmap(tss, iobm);
+			tss_copy_io_bitmap(tss, iobm);
 
 		/* Enable the bitmap */
 		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
@@ -405,18 +430,8 @@ static inline void switch_to_bitmap(stru
 		 * limit.
 		 */
 		refresh_tss_limit();
-	} else if (tifp & _TIF_IO_BITMAP) {
-		/*
-		 * Do not touch the bitmap. Let the next bitmap using task
-		 * deal with the mess. Just make the io_bitmap_base invalid
-		 * by moving it outside the TSS limit so any subsequent I/O
-		 * access from user space will trigger a #GP.
-		 *
-		 * This is correct even when VMEXIT rewrites the TSS limit
-		 * to 0x67 as the only requirement is that the base points
-		 * outside the limit.
-		 */
-		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
+	} else {
+		tss_invalidate_io_bitmap(tss);
 	}
 }
 
@@ -630,7 +645,8 @@ void __switch_to_xtra(struct task_struct
 
 	tifn = READ_ONCE(task_thread_info(next_p)->flags);
 	tifp = READ_ONCE(task_thread_info(prev_p)->flags);
-	switch_to_bitmap(next, tifp, tifn);
+
+	switch_to_bitmap(tifp);
 
 	propagate_user_return_notify(prev_p, next_p);
 



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 10/16] x86/ioperm: Remove bitmap if all permissions dropped
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (8 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 17:43   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical Thomas Gleixner
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

If ioperm() results in a bitmap with all bits set (no permissions to any
I/O port), then handling that bitmap on context switch and exit to user
mode is pointless. Drop it.

Move the bitmap exit handling to the ioport code and reuse it for both the
thread exit path and dropping it. This allows to reuse this code for the
upcoming iopl() emulation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/iobitmap.h |    2 ++
 arch/x86/kernel/ioport.c        |   20 +++++++++++++++++++-
 arch/x86/kernel/process.c       |   15 ++-------------
 3 files changed, 23 insertions(+), 14 deletions(-)

--- a/arch/x86/include/asm/iobitmap.h
+++ b/arch/x86/include/asm/iobitmap.h
@@ -13,6 +13,8 @@ struct io_bitmap {
 	};
 };
 
+void io_bitmap_exit(void);
+
 void tss_update_io_bitmap(void);
 
 #endif
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -16,6 +16,18 @@
 
 static atomic64_t io_bitmap_sequence;
 
+void io_bitmap_exit(void)
+{
+	struct io_bitmap *iobm = current->thread.io_bitmap;
+
+	preempt_disable();
+	current->thread.io_bitmap = NULL;
+	clear_thread_flag(TIF_IO_BITMAP);
+	tss_update_io_bitmap();
+	preempt_enable();
+	kfree(iobm);
+}
+
 /*
  * this changes the io permissions bitmap in the current task.
  */
@@ -62,12 +74,18 @@ long ksys_ioperm(unsigned long from, uns
 	 * Search for a (possibly new) maximum. This is simple and stupid,
 	 * to keep it obviously correct:
 	 */
-	max_long = 0;
+	max_long = UINT_MAX;
 	for (i = 0; i < IO_BITMAP_LONGS; i++) {
 		if (iobm->bits[i] != ~0UL)
 			max_long = i;
 	}
 
+	/* All permissions dropped? */
+	if (max_long == UINT_MAX) {
+		io_bitmap_exit();
+		return 0;
+	}
+
 	iobm->io_bitmap_max = (max_long + 1) * sizeof(unsigned long);
 
 	/* Update the sequence number to force an update in switch_to() */
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -102,21 +102,10 @@ int arch_dup_task_struct(struct task_str
 void exit_thread(struct task_struct *tsk)
 {
 	struct thread_struct *t = &tsk->thread;
-	struct io_bitmap *bm = t->io_bitmap;
 	struct fpu *fpu = &t->fpu;
-	struct tss_struct *tss;
 
-	if (bm) {
-		preempt_disable();
-		tss = this_cpu_ptr(&cpu_tss_rw);
-
-		t->io_bitmap = NULL;
-		clear_thread_flag(TIF_IO_BITMAP);
-		/* Invalidate the io bitmap base in the TSS */
-		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
-		preempt_enable();
-		kfree(bm);
-	}
+	if (test_thread_flag(TIF_IO_BITMAP))
+		io_bitmap_exit();
 
 	free_vm86(t);
 



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (9 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 10/16] x86/ioperm: Remove bitmap if all permissions dropped Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12  7:14   ` Ingo Molnar
                     ` (2 more replies)
  2019-11-11 22:03 ` [patch V2 12/16] selftests/x86/ioperm: Extend testing so the shared bitmap is exercised Thomas Gleixner
                   ` (5 subsequent siblings)
  16 siblings, 3 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

The I/O bitmap is duplicated on fork. That's wasting memory and slows down
fork. There is no point to do so. As long as the bitmap is not modified it
can be shared between threads and processes.

Add a refcount and just share it on fork. If a task modifies the bitmap
then it has to do the duplication if and only if it is shared.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 arch/x86/include/asm/iobitmap.h |    5 +++++
 arch/x86/kernel/ioport.c        |   38 ++++++++++++++++++++++++++++++++------
 arch/x86/kernel/process.c       |   39 ++++++---------------------------------
 3 files changed, 43 insertions(+), 39 deletions(-)

--- a/arch/x86/include/asm/iobitmap.h
+++ b/arch/x86/include/asm/iobitmap.h
@@ -2,10 +2,12 @@
 #ifndef _ASM_X86_IOBITMAP_H
 #define _ASM_X86_IOBITMAP_H
 
+#include <linux/refcount.h>
 #include <asm/processor.h>
 
 struct io_bitmap {
 	u64			sequence;
+	refcount_t		refcnt;
 	unsigned int		io_bitmap_max;
 	union {
 		unsigned long	bits[IO_BITMAP_LONGS];
@@ -13,6 +15,9 @@ struct io_bitmap {
 	};
 };
 
+struct task_struct;
+
+void io_bitmap_share(struct task_struct *tsk);
 void io_bitmap_exit(void);
 
 void tss_update_io_bitmap(void);
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -16,6 +16,17 @@
 
 static atomic64_t io_bitmap_sequence;
 
+void io_bitmap_share(struct task_struct *tsk)
+ {
+	/*
+	 * Take a refcount on current's bitmap. It can be used by
+	 * both tasks as long as none of them changes the bitmap.
+	 */
+	refcount_inc(&current->thread.io_bitmap->refcnt);
+	tsk->thread.io_bitmap = current->thread.io_bitmap;
+	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
+}
+
 void io_bitmap_exit(void)
 {
 	struct io_bitmap *iobm = current->thread.io_bitmap;
@@ -25,7 +36,8 @@ void io_bitmap_exit(void)
 	clear_thread_flag(TIF_IO_BITMAP);
 	tss_update_io_bitmap();
 	preempt_enable();
-	kfree(iobm);
+	if (iobm && refcount_dec_and_test(&iobm->refcnt))
+		kfree(iobm);
 }
 
 /*
@@ -59,8 +71,26 @@ long ksys_ioperm(unsigned long from, uns
 			return -ENOMEM;
 
 		memset(iobm->bits, 0xff, sizeof(iobm->bits));
+		refcount_set(&iobm->refcnt, 1);
+	}
+
+	/*
+	 * If the bitmap is not shared, then nothing can take a refcount as
+	 * current can obviously not fork at the same time. If it's shared
+	 * duplicate it and drop the refcount on the original one.
+	 */
+	if (refcount_read(&iobm->refcnt) > 1) {
+		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
+		if (!iobm)
+			return -ENOMEM;
+		io_bitmap_exit();
 	}
 
+	/* Set the tasks io_bitmap pointer (might be the same) */
+	t->io_bitmap = iobm;
+	/* Mark it active for context switching and exit to user mode */
+	set_thread_flag(TIF_IO_BITMAP);
+
 	/*
 	 * Update the tasks bitmap. The update of the TSS bitmap happens on
 	 * exit to user mode. So this needs no protection.
@@ -88,12 +118,8 @@ long ksys_ioperm(unsigned long from, uns
 
 	iobm->io_bitmap_max = (max_long + 1) * sizeof(unsigned long);
 
-	/* Update the sequence number to force an update in switch_to() */
+	/* Increment the sequence number to force a TSS update */
 	iobm->sequence = atomic64_add_return(1, &io_bitmap_sequence);
-	/* Set the tasks io_bitmap pointer (might be the same) */
-	t->io_bitmap = iobm;
-	/* Mark it active for context switching and exit to user mode */
-	set_thread_flag(TIF_IO_BITMAP);
 
 	return 0;
 }
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -122,37 +122,13 @@ static int set_new_tls(struct task_struc
 		return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
-static inline int copy_io_bitmap(struct task_struct *tsk)
-{
-	struct io_bitmap *bm = current->thread.io_bitmap;
-
-	if (likely(!test_tsk_thread_flag(current, TIF_IO_BITMAP)))
-		return 0;
-
-	tsk->thread.io_bitmap = kmemdup(bm, sizeof(*bm), GFP_KERNEL);
-
-	if (!tsk->thread.io_bitmap)
-		return -ENOMEM;
-
-	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
-	return 0;
-}
-
-static inline void free_io_bitmap(struct task_struct *tsk)
-{
-	if (tsk->thread.io_bitmap) {
-		kfree(tsk->thread.io_bitmap);
-		tsk->thread.io_bitmap = NULL;
-	}
-}
-
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 		    unsigned long arg, struct task_struct *p, unsigned long tls)
 {
 	struct inactive_task_frame *frame;
 	struct fork_frame *fork_frame;
 	struct pt_regs *childregs;
-	int ret;
+	int ret = 0;
 
 	childregs = task_pt_regs(p);
 	fork_frame = container_of(childregs, struct fork_frame, regs);
@@ -193,16 +169,13 @@ int copy_thread_tls(unsigned long clone_
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
-	ret = copy_io_bitmap(p);
-	if (ret)
-		return ret;
-
 	/* Set a new TLS for the child thread? */
-	if (clone_flags & CLONE_SETTLS) {
+	if (clone_flags & CLONE_SETTLS)
 		ret = set_new_tls(p, tls);
-		if (ret)
-			free_io_bitmap(p);
-	}
+
+	if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP)))
+		io_bitmap_share(p);
+
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 12/16] selftests/x86/ioperm: Extend testing so the shared bitmap is exercised
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (10 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-11 22:03 ` [patch V2 13/16] x86/iopl: Fixup misleading comment Thomas Gleixner
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

Add code to the fork path which forces the shared bitmap to be duplicated
and the reference count to be dropped. Verify that the child modifications
did not affect the parent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch
---
 tools/testing/selftests/x86/ioperm.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/tools/testing/selftests/x86/ioperm.c
+++ b/tools/testing/selftests/x86/ioperm.c
@@ -131,6 +131,17 @@ int main(void)
 		printf("[RUN]\tchild: check that we inherited permissions\n");
 		expect_ok(0x80);
 		expect_gp(0xed);
+		printf("[RUN]\tchild: Extend permissions to 0x81\n");
+		if (ioperm(0x81, 1, 1) != 0) {
+			printf("[FAIL]\tioperm(0x81, 1, 1) failed (%d)", errno);
+			return 1;
+		}
+		printf("[RUN]\tchild: Drop permissions to 0x80\n");
+		if (ioperm(0x80, 1, 0) != 0) {
+			printf("[FAIL]\tioperm(0x80, 1, 0) failed (%d)", errno);
+			return 1;
+		}
+		expect_gp(0x80);
 		return 0;
 	} else {
 		int status;
@@ -146,8 +157,11 @@ int main(void)
 		}
 	}
 
-	/* Test the capability checks. */
+	/* Verify that the child dropping 0x80 did not affect the parent */
+	printf("\tVerify that unsharing the bitmap worked\n");
+	expect_ok(0x80);
 
+	/* Test the capability checks. */
 	printf("\tDrop privileges\n");
 	if (setresuid(1, 1, 1) != 0) {
 		printf("[WARN]\tDropping privileges failed\n");



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 13/16] x86/iopl: Fixup misleading comment
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (11 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 12/16] selftests/x86/ioperm: Extend testing so the shared bitmap is exercised Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 18:14   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 14/16] x86/iopl: Restrict iopl() permission scope Thomas Gleixner
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Thomas Gleixner <tglx@linutronix.de>

The comment for the sys_iopl() implementation is outdated and actively
misleading in some parts. Fix it up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/ioport.c |   35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -41,7 +41,7 @@ void io_bitmap_exit(void)
 }
 
 /*
- * this changes the io permissions bitmap in the current task.
+ * This changes the io permissions bitmap in the current task.
  */
 long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 {
@@ -130,14 +130,24 @@ SYSCALL_DEFINE3(ioperm, unsigned long, f
 }
 
 /*
- * sys_iopl has to be used when you want to access the IO ports
- * beyond the 0x3ff range: to get the full 65536 ports bitmapped
- * you'd need 8kB of bitmaps/process, which is a bit excessive.
+ * The sys_iopl functionality depends on the level argument, which if
+ * granted for the task is used by the CPU to check I/O instruction and
+ * CLI/STI against the current priviledge level (CPL). If CPL is less than
+ * or equal the tasks IOPL level the instructions take effect. If not a #GP
+ * is raised. The default IOPL is 0, i.e. no permissions.
  *
- * Here we just change the flags value on the stack: we allow
- * only the super-user to do it. This depends on the stack-layout
- * on system-call entry - see also fork() and the signal handling
- * code.
+ * Setting IOPL to level 0-2 is disabling the userspace access. Only level
+ * 3 enables it. If set it allows the user space thread:
+ *
+ * - Unrestricted access to all 65535 I/O ports
+ * - The usage of CLI/STI instructions
+ *
+ * The advantage over ioperm is that the context switch does not require to
+ * update the I/O bitmap which is especially true when a large number of
+ * ports is accessed. But the allowance of CLI/STI in userspace is
+ * considered a major problem.
+ *
+ * IOPL is strictly per thread and inherited on fork.
  */
 SYSCALL_DEFINE1(iopl, unsigned int, level)
 {
@@ -158,9 +168,18 @@ SYSCALL_DEFINE1(iopl, unsigned int, leve
 		    security_locked_down(LOCKDOWN_IOPORT))
 			return -EPERM;
 	}
+	/*
+	 * Change the flags value on the return stack, which has been set
+	 * up on system-call entry. See also the fork and signal handling
+	 * code how this is handled.
+	 */
 	regs->flags = (regs->flags & ~X86_EFLAGS_IOPL) |
 		(level << X86_EFLAGS_IOPL_BIT);
+	/* Store the new level in the thread struct */
 	t->iopl = level << X86_EFLAGS_IOPL_BIT;
+	/*
+	 * X86_32 switches immediately and XEN handles it via emulation.
+	 */
 	set_iopl_mask(t->iopl);
 
 	return 0;



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 14/16] x86/iopl: Restrict iopl() permission scope
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (12 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 13/16] x86/iopl: Fixup misleading comment Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-11 23:03   ` Thomas Gleixner
                     ` (2 more replies)
  2019-11-11 22:03 ` [patch V2 15/16] x86/iopl: Remove legacy IOPL option Thomas Gleixner
                   ` (2 subsequent siblings)
  16 siblings, 3 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Thomas Gleixner <tglx@linutronix.de>

The access to the full I/O port range can be also provided by the TSS I/O
bitmap, but that would require to copy 8k of data on scheduling in the
task. As shown with the sched out optimization TSS.io_bitmap_base can be
used to switch the incoming task to a preallocated I/O bitmap which has all
bits zero, i.e. allows access to all I/O ports.

Implementing this allows to provide an iopl() emulation mode which restricts
the IOPL level 3 permissions to I/O port access but removes the STI/CLI
permission which is coming with the hardware IOPL mechansim.

Provide a config option to switch IOPL to emulation mode, make it the
default and while at it also provide an option to disable IOPL completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---

V2:	Fixed the 32bit build fail by increasing the cpu entry area size
	Move the TSS update out of the iopl() emulation code.
---
 arch/x86/Kconfig                        |   32 ++++++++++++++
 arch/x86/include/asm/pgtable_32_types.h |    2 
 arch/x86/include/asm/processor.h        |   20 ++++++++-
 arch/x86/kernel/ioport.c                |   70 +++++++++++++++++++++++---------
 arch/x86/kernel/process.c               |   29 +++++++++----
 5 files changed, 122 insertions(+), 31 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1254,6 +1254,38 @@ config X86_VSYSCALL_EMULATION
 	 Disabling this option saves about 7K of kernel size and
 	 possibly 4K of additional runtime pagetable memory.
 
+choice
+	prompt "IOPL"
+	default X86_IOPL_EMULATION
+
+config X86_IOPL_EMULATION
+	bool "IOPL Emulation"
+	---help---
+	  Legacy IOPL support is an overbroad mechanism which allows user
+	  space aside of accessing all 65536 I/O ports also to disable
+	  interrupts. To gain this access the caller needs CAP_SYS_RAWIO
+	  capabilities and permission from eventually active security
+	  modules.
+
+	  The emulation restricts the functionality of the syscall to
+	  only allowing the full range I/O port access, but prevents the
+	  ability to disable interrupts from user space.
+
+config X86_IOPL_LEGACY
+	bool "IOPL Legacy"
+	---help---
+	Allow the full IOPL permissions, i.e. user space access to all
+	65536 I/O ports and also the ability to disable interrupts, which
+	is overbroad and can result in system lockups.
+
+config X86_IOPL_NONE
+	bool "IOPL None"
+	---help---
+	Disable the IOPL permission syscall. That's the safest option as
+	no sane application should depend on this functionality.
+
+endchoice
+
 config TOSHIBA
 	tristate "Toshiba Laptop support"
 	depends on X86_32
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -44,7 +44,7 @@ extern bool __vmalloc_start_set; /* set
  * Define this here and validate with BUILD_BUG_ON() in pgtable_32.c
  * to avoid include recursion hell
  */
-#define CPU_ENTRY_AREA_PAGES	(NR_CPUS * 40)
+#define CPU_ENTRY_AREA_PAGES	(NR_CPUS * 41)
 
 #define CPU_ENTRY_AREA_BASE						\
 	((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1))   \
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -332,10 +332,14 @@ struct x86_hw_tss {
 #define IO_BITMAP_BYTES			(IO_BITMAP_BITS / BITS_PER_BYTE)
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES / sizeof(long))
 
-#define IO_BITMAP_OFFSET_VALID				\
+#define IO_BITMAP_OFFSET_VALID_MAP			\
 	(offsetof(struct tss_struct, io_bitmap_bytes) -	\
 	 offsetof(struct tss_struct, x86_tss))
 
+#define IO_BITMAP_OFFSET_VALID_ALL			\
+	(offsetof(struct tss_struct, io_bitmap_all) -	\
+	 offsetof(struct tss_struct, x86_tss))
+
 /*
  * The extra byte at the end is required by the hardware. It has all
  * bits set.
@@ -344,7 +348,7 @@ struct x86_hw_tss {
  * last valid byte
  */
 #define __KERNEL_TSS_LIMIT	\
-	(IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + 1 - 1)
+	(IO_BITMAP_OFFSET_VALID_ALL + IO_BITMAP_BYTES + 1 - 1)
 
 /* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */
 #define IO_BITMAP_OFFSET_INVALID	(__KERNEL_TSS_LIMIT + 1)
@@ -390,6 +394,12 @@ struct tss_struct {
 	 */
 	unsigned char		io_bitmap_bytes[IO_BITMAP_BYTES + 1]
 				__aligned(sizeof(unsigned long));
+	/*
+	 * Special I/O bitmap to emulate IOPL(3). All bytes zero,
+	 * except the additional byte at the end.
+	 */
+	unsigned char		io_bitmap_all[IO_BITMAP_BYTES + 1]
+				__aligned(sizeof(unsigned long));
 } __aligned(PAGE_SIZE);
 
 DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw);
@@ -505,7 +515,13 @@ struct thread_struct {
 #endif
 	/* IO permissions: */
 	struct io_bitmap	*io_bitmap;
+
+	/*
+	 * IOPL. Priviledge level dependent I/O permission which includes
+	 * user space CLI/STI when granted.
+	 */
 	unsigned long		iopl;
+	unsigned long		iopl_emul;
 
 	mm_segment_t		addr_limit;
 
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -27,15 +27,28 @@ void io_bitmap_share(struct task_struct
 	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
 }
 
+static void task_update_io_bitmap(void)
+{
+	struct thread_struct *t = &current->thread;
+
+	preempt_disable();
+	if (t->iopl_emul == 3 || t->io_bitmap) {
+		/* TSS update is handled on exit to user space */
+		set_thread_flag(TIF_IO_BITMAP);
+	} else {
+		clear_thread_flag(TIF_IO_BITMAP);
+		/* Invalidate TSS */
+		tss_update_io_bitmap();
+	}
+	preempt_enable();
+}
+
 void io_bitmap_exit(void)
 {
 	struct io_bitmap *iobm = current->thread.io_bitmap;
 
-	preempt_disable();
 	current->thread.io_bitmap = NULL;
-	clear_thread_flag(TIF_IO_BITMAP);
-	tss_update_io_bitmap();
-	preempt_enable();
+	task_update_io_bitmap();
 	if (iobm && refcount_dec_and_test(&iobm->refcnt))
 		kfree(iobm);
 }
@@ -151,36 +164,55 @@ SYSCALL_DEFINE3(ioperm, unsigned long, f
  */
 SYSCALL_DEFINE1(iopl, unsigned int, level)
 {
-	struct pt_regs *regs = current_pt_regs();
 	struct thread_struct *t = &current->thread;
+	struct pt_regs *regs = current_pt_regs();
+	unsigned int old;
 
 	/*
 	 * Careful: the IOPL bits in regs->flags are undefined under Xen PV
 	 * and changing them has no effect.
 	 */
-	unsigned int old = t->iopl >> X86_EFLAGS_IOPL_BIT;
+	if (IS_ENABLED(CONFIG_X86_IOPL_NONE))
+		return -ENOSYS;
 
 	if (level > 3)
 		return -EINVAL;
+
+	if (IS_ENABLED(CONFIG_X86_IOPL_EMULATION))
+		old = t->iopl_emul;
+	else
+		old = t->iopl >> X86_EFLAGS_IOPL_BIT;
+
+	/* No point in going further if nothing changes */
+	if (level == old)
+		return 0;
+
 	/* Trying to gain more privileges? */
 	if (level > old) {
 		if (!capable(CAP_SYS_RAWIO) ||
 		    security_locked_down(LOCKDOWN_IOPORT))
 			return -EPERM;
 	}
-	/*
-	 * Change the flags value on the return stack, which has been set
-	 * up on system-call entry. See also the fork and signal handling
-	 * code how this is handled.
-	 */
-	regs->flags = (regs->flags & ~X86_EFLAGS_IOPL) |
-		(level << X86_EFLAGS_IOPL_BIT);
-	/* Store the new level in the thread struct */
-	t->iopl = level << X86_EFLAGS_IOPL_BIT;
-	/*
-	 * X86_32 switches immediately and XEN handles it via emulation.
-	 */
-	set_iopl_mask(t->iopl);
+
+	if (IS_ENABLED(CONFIG_X86_IOPL_EMULATION)) {
+		t->iopl_emul = level;
+		task_update_io_bitmap();
+	} else {
+		/*
+		 * Change the flags value on the return stack, which has
+		 * been set up on system-call entry. See also the fork and
+		 * signal handling code how this is handled.
+		 */
+		regs->flags = (regs->flags & ~X86_EFLAGS_IOPL) |
+			(level << X86_EFLAGS_IOPL_BIT);
+		/* Store the new level in the thread struct */
+		t->iopl = level << X86_EFLAGS_IOPL_BIT;
+		/*
+		 * X86_32 switches immediately and XEN handles it via
+		 * emulation.
+		 */
+		set_iopl_mask(t->iopl);
+	}
 
 	return 0;
 }
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -368,22 +368,33 @@ static void tss_copy_io_bitmap(struct ts
 void tss_update_io_bitmap(void)
 {
 	struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
+	u16 *base = &tss->x86_tss.io_bitmap_base;
 
 	if (test_thread_flag(TIF_IO_BITMAP)) {
-		struct io_bitmap *iobm = current->thread.io_bitmap;
+		struct thread_struct *t = &current->thread;
 
 		/*
-		 * Only copy bitmap data when the bitmap or the sequence
-		 * number differs. The update time is accounted to the
-		 * incoming task.
+		 * IF IOPL emulation is enabled and the emulated I/O
+		 * priviledge level is 3, switch to the 'grant all' bitmap.
 		 */
-		if (tss->last_bitmap != iobm ||
-		    tss->last_sequence != iobm->sequence)
-			tss_copy_io_bitmap(tss, iobm);
+		if (IS_ENABLED(CONFIG_X86_IOPL_EMULATION) &&
+		    t->iopl_emul == 3) {
+			*base = IO_BITMAP_OFFSET_VALID_ALL;
+		} else {
+			struct io_bitmap *iobm = t->io_bitmap;
 
-		/* Enable the bitmap */
-		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
+			/*
+			 * Only copy bitmap data when the bitmap or the
+			 * sequence number differs. The update time is
+			 * accounted to the incoming task.
+			 */
+			if (tss->last_bitmap != iobm ||
+			    tss->last_sequence != iobm->sequence)
+				tss_copy_io_bitmap(tss, iobm);
 
+			/* Enable the bitmap */
+			*base = IO_BITMAP_OFFSET_VALID_MAP;
+		}
 		/*
 		 * Make sure that the TSS limit is covering the io bitmap.
 		 * It might have been cut down by a VMEXIT to 0x67 which



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 15/16] x86/iopl: Remove legacy IOPL option
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (13 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 14/16] x86/iopl: Restrict iopl() permission scope Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12 18:37   ` Andy Lutomirski
  2019-11-11 22:03 ` [patch V2 16/16] selftests/x86/iopl: Extend test to cover IOPL emulation Thomas Gleixner
  2019-11-12  7:40 ` [PATCH] x86/iopl: Factor out IO-bitmap related TSS fields into 'struct x86_io_bitmap' Ingo Molnar
  16 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Thomas Gleixner <tglx@linutronix.de>

The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy
cruft dealing with the (e)flags based IOPL mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com> (Paravirt and Xen parts)
---
V2: Adapted to changes in the previous patches.
---
 arch/x86/Kconfig                      |   10 +------
 arch/x86/include/asm/paravirt.h       |    4 --
 arch/x86/include/asm/paravirt_types.h |    2 -
 arch/x86/include/asm/processor.h      |   26 ++----------------
 arch/x86/include/asm/xen/hypervisor.h |    2 -
 arch/x86/kernel/ioport.c              |   47 +++++++---------------------------
 arch/x86/kernel/paravirt.c            |    2 -
 arch/x86/kernel/process_32.c          |    9 ------
 arch/x86/kernel/process_64.c          |   11 -------
 arch/x86/xen/enlighten_pv.c           |   10 -------
 10 files changed, 16 insertions(+), 107 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1269,14 +1269,8 @@ config X86_IOPL_EMULATION
 
 	  The emulation restricts the functionality of the syscall to
 	  only allowing the full range I/O port access, but prevents the
-	  ability to disable interrupts from user space.
-
-config X86_IOPL_LEGACY
-	bool "IOPL Legacy"
-	---help---
-	Allow the full IOPL permissions, i.e. user space access to all
-	65536 I/O ports and also the ability to disable interrupts, which
-	is overbroad and can result in system lockups.
+	  ability to disable interrupts from user space which would be
+	  granted if the hardware IOPL mechanism would be used.
 
 config X86_IOPL_NONE
 	bool "IOPL None"
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -294,10 +294,6 @@ static inline void write_idt_entry(gate_
 {
 	PVOP_VCALL3(cpu.write_idt_entry, dt, entry, g);
 }
-static inline void set_iopl_mask(unsigned mask)
-{
-	PVOP_VCALL1(cpu.set_iopl_mask, mask);
-}
 
 static inline void paravirt_activate_mm(struct mm_struct *prev,
 					struct mm_struct *next)
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -140,8 +140,6 @@ struct pv_cpu_ops {
 
 	void (*load_sp0)(unsigned long sp0);
 
-	void (*set_iopl_mask)(unsigned mask);
-
 	void (*wbinvd)(void);
 
 	/* cpuid emulation, mostly so that caps bits can be disabled */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -517,10 +517,10 @@ struct thread_struct {
 	struct io_bitmap	*io_bitmap;
 
 	/*
-	 * IOPL. Priviledge level dependent I/O permission which includes
-	 * user space CLI/STI when granted.
+	 * IOPL. Priviledge level dependent I/O permission which is
+	 * emulated via the I/O bitmap to prevent user space from disabling
+	 * interrupts.
 	 */
-	unsigned long		iopl;
 	unsigned long		iopl_emul;
 
 	mm_segment_t		addr_limit;
@@ -553,25 +553,6 @@ static inline void arch_thread_struct_wh
  */
 #define TS_COMPAT		0x0002	/* 32bit syscall active (64BIT)*/
 
-/*
- * Set IOPL bits in EFLAGS from given mask
- */
-static inline void native_set_iopl_mask(unsigned mask)
-{
-#ifdef CONFIG_X86_32
-	unsigned int reg;
-
-	asm volatile ("pushfl;"
-		      "popl %0;"
-		      "andl %1, %0;"
-		      "orl %2, %0;"
-		      "pushl %0;"
-		      "popfl"
-		      : "=&r" (reg)
-		      : "i" (~X86_EFLAGS_IOPL), "r" (mask));
-#endif
-}
-
 static inline void
 native_load_sp0(unsigned long sp0)
 {
@@ -611,7 +592,6 @@ static inline void load_sp0(unsigned lon
 	native_load_sp0(sp0);
 }
 
-#define set_iopl_mask native_set_iopl_mask
 #endif /* CONFIG_PARAVIRT_XXL */
 
 /* Free all resources held by a thread. */
--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -62,6 +62,4 @@ void xen_arch_register_cpu(int num);
 void xen_arch_unregister_cpu(int num);
 #endif
 
-extern void xen_set_iopl_mask(unsigned mask);
-
 #endif /* _ASM_X86_XEN_HYPERVISOR_H */
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -144,28 +144,23 @@ SYSCALL_DEFINE3(ioperm, unsigned long, f
 
 /*
  * The sys_iopl functionality depends on the level argument, which if
- * granted for the task is used by the CPU to check I/O instruction and
- * CLI/STI against the current priviledge level (CPL). If CPL is less than
- * or equal the tasks IOPL level the instructions take effect. If not a #GP
- * is raised. The default IOPL is 0, i.e. no permissions.
+ * granted for the task is used to enable access to all 65536 I/O ports.
  *
- * Setting IOPL to level 0-2 is disabling the userspace access. Only level
- * 3 enables it. If set it allows the user space thread:
+ * This does not use the IOPL mechanism provided by the CPU as that would
+ * also allow the user space task to use the CLI/STI instructions.
  *
- * - Unrestricted access to all 65535 I/O ports
- * - The usage of CLI/STI instructions
+ * Disabling interrupts in a user space task is dangerous as it might lock
+ * up the machine and the semantics vs. syscalls and exceptions is
+ * undefined.
  *
- * The advantage over ioperm is that the context switch does not require to
- * update the I/O bitmap which is especially true when a large number of
- * ports is accessed. But the allowance of CLI/STI in userspace is
- * considered a major problem.
+ * Setting IOPL to level 0-2 is disabling I/O permissions. Level 3
+ * 3 enables them.
  *
  * IOPL is strictly per thread and inherited on fork.
  */
 SYSCALL_DEFINE1(iopl, unsigned int, level)
 {
 	struct thread_struct *t = &current->thread;
-	struct pt_regs *regs = current_pt_regs();
 	unsigned int old;
 
 	/*
@@ -178,10 +173,7 @@ SYSCALL_DEFINE1(iopl, unsigned int, leve
 	if (level > 3)
 		return -EINVAL;
 
-	if (IS_ENABLED(CONFIG_X86_IOPL_EMULATION))
-		old = t->iopl_emul;
-	else
-		old = t->iopl >> X86_EFLAGS_IOPL_BIT;
+	old = t->iopl_emul;
 
 	/* No point in going further if nothing changes */
 	if (level == old)
@@ -194,25 +186,8 @@ SYSCALL_DEFINE1(iopl, unsigned int, leve
 			return -EPERM;
 	}
 
-	if (IS_ENABLED(CONFIG_X86_IOPL_EMULATION)) {
-		t->iopl_emul = level;
-		task_update_io_bitmap();
-	} else {
-		/*
-		 * Change the flags value on the return stack, which has
-		 * been set up on system-call entry. See also the fork and
-		 * signal handling code how this is handled.
-		 */
-		regs->flags = (regs->flags & ~X86_EFLAGS_IOPL) |
-			(level << X86_EFLAGS_IOPL_BIT);
-		/* Store the new level in the thread struct */
-		t->iopl = level << X86_EFLAGS_IOPL_BIT;
-		/*
-		 * X86_32 switches immediately and XEN handles it via
-		 * emulation.
-		 */
-		set_iopl_mask(t->iopl);
-	}
+	t->iopl_emul = level;
+	task_update_io_bitmap();
 
 	return 0;
 }
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -341,8 +341,6 @@ struct paravirt_patch_template pv_ops =
 	.cpu.iret		= native_iret,
 	.cpu.swapgs		= native_swapgs,
 
-	.cpu.set_iopl_mask	= native_set_iopl_mask,
-
 	.cpu.start_context_switch	= paravirt_nop,
 	.cpu.end_context_switch		= paravirt_nop,
 
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -187,15 +187,6 @@ EXPORT_SYMBOL_GPL(start_thread);
 	 */
 	load_TLS(next, cpu);
 
-	/*
-	 * Restore IOPL if needed.  In normal use, the flags restore
-	 * in the switch assembly will handle this.  But if the kernel
-	 * is running virtualized at a non-zero CPL, the popf will
-	 * not restore flags, so it must be done in a separate step.
-	 */
-	if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
-		set_iopl_mask(next->iopl);
-
 	switch_to_extra(prev_p, next_p);
 
 	/*
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -497,17 +497,6 @@ void compat_start_thread(struct pt_regs
 
 	switch_to_extra(prev_p, next_p);
 
-#ifdef CONFIG_XEN_PV
-	/*
-	 * On Xen PV, IOPL bits in pt_regs->flags have no effect, and
-	 * current_pt_regs()->flags may not match the current task's
-	 * intended IOPL.  We need to switch it manually.
-	 */
-	if (unlikely(static_cpu_has(X86_FEATURE_XENPV) &&
-		     prev->iopl != next->iopl))
-		xen_set_iopl_mask(next->iopl);
-#endif
-
 	if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) {
 		/*
 		 * AMD CPUs have a misfeature: SYSRET sets the SS selector but
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -837,15 +837,6 @@ static void xen_load_sp0(unsigned long s
 	this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0);
 }
 
-void xen_set_iopl_mask(unsigned mask)
-{
-	struct physdev_set_iopl set_iopl;
-
-	/* Force the change at ring 0. */
-	set_iopl.iopl = (mask == 0) ? 1 : (mask >> 12) & 3;
-	HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
-}
-
 static void xen_io_delay(void)
 {
 }
@@ -1055,7 +1046,6 @@ static const struct pv_cpu_ops xen_cpu_o
 	.write_idt_entry = xen_write_idt_entry,
 	.load_sp0 = xen_load_sp0,
 
-	.set_iopl_mask = xen_set_iopl_mask,
 	.io_delay = xen_io_delay,
 
 	/* Xen takes care of %gs when switching to usermode for us */



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [patch V2 16/16] selftests/x86/iopl: Extend test to cover IOPL emulation
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (14 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 15/16] x86/iopl: Remove legacy IOPL option Thomas Gleixner
@ 2019-11-11 22:03 ` Thomas Gleixner
  2019-11-12  7:40 ` [PATCH] x86/iopl: Factor out IO-bitmap related TSS fields into 'struct x86_io_bitmap' Ingo Molnar
  16 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 22:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Thomas Gleixner <tglx@linutronix.de>

Add tests that the now emulated iopl() functionality:

    - does not longer allow user space to disable interrupts.

    - does restore a I/O bitmap when IOPL is dropped

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 tools/testing/selftests/x86/iopl.c |  129 +++++++++++++++++++++++++++++++++----
 1 file changed, 118 insertions(+), 11 deletions(-)

--- a/tools/testing/selftests/x86/iopl.c
+++ b/tools/testing/selftests/x86/iopl.c
@@ -35,6 +35,16 @@ static void sethandler(int sig, void (*h
 
 }
 
+static void clearhandler(int sig)
+{
+	struct sigaction sa;
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = SIG_DFL;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
 static jmp_buf jmpbuf;
 
 static void sigsegv(int sig, siginfo_t *si, void *ctx_void)
@@ -42,25 +52,128 @@ static void sigsegv(int sig, siginfo_t *
 	siglongjmp(jmpbuf, 1);
 }
 
+static bool try_outb(unsigned short port)
+{
+	sethandler(SIGSEGV, sigsegv, SA_RESETHAND);
+	if (sigsetjmp(jmpbuf, 1) != 0) {
+		return false;
+	} else {
+		asm volatile ("outb %%al, %w[port]"
+			      : : [port] "Nd" (port), "a" (0));
+		return true;
+	}
+	clearhandler(SIGSEGV);
+}
+
+static void expect_ok_outb(unsigned short port)
+{
+	if (!try_outb(port)) {
+		printf("[FAIL]\toutb to 0x%02hx failed\n", port);
+		exit(1);
+	}
+
+	printf("[OK]\toutb to 0x%02hx worked\n", port);
+}
+
+static void expect_gp_outb(unsigned short port)
+{
+	if (try_outb(port)) {
+		printf("[FAIL]\toutb to 0x%02hx worked\n", port);
+		nerrs++;
+	}
+
+	printf("[OK]\toutb to 0x%02hx failed\n", port);
+}
+
+static bool try_cli(void)
+{
+	sethandler(SIGSEGV, sigsegv, SA_RESETHAND);
+	if (sigsetjmp(jmpbuf, 1) != 0) {
+		return false;
+	} else {
+		asm volatile ("cli");
+		return true;
+	}
+	clearhandler(SIGSEGV);
+}
+
+static bool try_sti(void)
+{
+	sethandler(SIGSEGV, sigsegv, SA_RESETHAND);
+	if (sigsetjmp(jmpbuf, 1) != 0) {
+		return false;
+	} else {
+		asm volatile ("sti");
+		return true;
+	}
+	clearhandler(SIGSEGV);
+}
+
+static void expect_gp_sti(void)
+{
+	if (try_sti()) {
+		printf("[FAIL]\tSTI worked\n");
+		nerrs++;
+	} else {
+		printf("[OK]\tSTI faulted\n");
+	}
+}
+
+static void expect_gp_cli(void)
+{
+	if (try_cli()) {
+		printf("[FAIL]\tCLI worked\n");
+		nerrs++;
+	} else {
+		printf("[OK]\tCLI faulted\n");
+	}
+}
+
 int main(void)
 {
 	cpu_set_t cpuset;
+
 	CPU_ZERO(&cpuset);
 	CPU_SET(0, &cpuset);
 	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
 		err(1, "sched_setaffinity to CPU 0");
 
 	/* Probe for iopl support.  Note that iopl(0) works even as nonroot. */
-	if (iopl(3) != 0) {
+	switch(iopl(3)) {
+	case 0:
+		break;
+	case -ENOSYS:
+		printf("[OK]\tiopl() nor supported\n");
+		return 0;
+	default:
 		printf("[OK]\tiopl(3) failed (%d) -- try running as root\n",
 		       errno);
 		return 0;
 	}
 
-	/* Restore our original state prior to starting the test. */
+	/* Make sure that CLI/STI are blocked even with IOPL level 3 */
+	expect_gp_cli();
+	expect_gp_sti();
+	expect_ok_outb(0x80);
+
+	/* Establish an I/O bitmap to test the restore */
+	if (ioperm(0x80, 1, 1) != 0)
+		err(1, "ioperm(0x80, 1, 1) failed\n");
+
+	/* Restore our original state prior to starting the fork test. */
 	if (iopl(0) != 0)
 		err(1, "iopl(0)");
 
+	/*
+	 * Verify that IOPL emulation is disabled and the I/O bitmap still
+	 * works.
+	 */
+	expect_ok_outb(0x80);
+	expect_gp_outb(0xed);
+	/* Drop the I/O bitmap */
+	if (ioperm(0x80, 1, 0) != 0)
+		err(1, "ioperm(0x80, 1, 0) failed\n");
+
 	pid_t child = fork();
 	if (child == -1)
 		err(1, "fork");
@@ -90,14 +203,9 @@ int main(void)
 
 	printf("[RUN]\tparent: write to 0x80 (should fail)\n");
 
-	sethandler(SIGSEGV, sigsegv, 0);
-	if (sigsetjmp(jmpbuf, 1) != 0) {
-		printf("[OK]\twrite was denied\n");
-	} else {
-		asm volatile ("outb %%al, $0x80" : : "a" (0));
-		printf("[FAIL]\twrite was allowed\n");
-		nerrs++;
-	}
+	expect_gp_outb(0x80);
+	expect_gp_cli();
+	expect_gp_sti();
 
 	/* Test the capability checks. */
 	printf("\tiopl(3)\n");
@@ -133,4 +241,3 @@ int main(void)
 done:
 	return nerrs ? 1 : 0;
 }
-



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON()
  2019-11-11 22:03 ` [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON() Thomas Gleixner
@ 2019-11-11 22:44   ` Paolo Bonzini
  2019-11-12 15:37   ` Andy Lutomirski
  1 sibling, 0 replies; 57+ messages in thread
From: Paolo Bonzini @ 2019-11-11 22:44 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On 11/11/19 23:03, Thomas Gleixner wrote:
> The BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 == 0x67) in the VMX code is bogus in
> two aspects:
> 
> 1) This wants to be in generic x86 code simply to catch issues even when
>    VMX is disabled in Kconfig.
> 
> 2) The IO_BITMAP_OFFSET is not the right thing to check because it makes
>    asssumptions about the layout of tss_struct. Nothing requires that the
>    I/O bitmap is placed right after x86_tss, which is the hardware mandated
>    tss structure. It pointlessly makes restrictions on the struct
>    tss_struct layout.
> 
> The proper thing to check is:
> 
>     - Offset of x86_tss in tss_struct is 0
>     - Size of x86_tss == 0x68
> 
> Move it to the other build time TSS checks and make it do the right thing.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---
> V2: New patch
> ---
>  arch/x86/kvm/vmx/vmx.c       |    8 --------
>  arch/x86/mm/cpu_entry_area.c |    8 ++++++++
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1338,14 +1338,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu
>  			    (unsigned long)&get_cpu_entry_area(cpu)->tss.x86_tss);
>  		vmcs_writel(HOST_GDTR_BASE, (unsigned long)gdt);   /* 22.2.4 */
>  
> -		/*
> -		 * VM exits change the host TR limit to 0x67 after a VM
> -		 * exit.  This is okay, since 0x67 covers everything except
> -		 * the IO bitmap and have have code to handle the IO bitmap
> -		 * being lost after a VM exit.
> -		 */
> -		BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 != 0x67);
> -
>  		rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
>  		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
>  
> --- a/arch/x86/mm/cpu_entry_area.c
> +++ b/arch/x86/mm/cpu_entry_area.c
> @@ -161,6 +161,14 @@ static void __init setup_cpu_entry_area(
>  	BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
>  		      offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
>  	BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
> +	/*
> +	 * VMX changes the host TR limit to 0x67 after a VM exit. This is
> +	 * okay, since 0x67 covers the size of struct x86_hw_tss. Make sure
> +	 * that this is correct.
> +	 */
> +	BUILD_BUG_ON(offsetof(struct tss_struct, x86_tss) != 0);
> +	BUILD_BUG_ON(sizeof(struct x86_hw_tss) != 0x68);
> +
>  	cea_map_percpu_pages(&cea->tss, &per_cpu(cpu_tss_rw, cpu),
>  			     sizeof(struct tss_struct) / PAGE_SIZE, tss_prot);
>  
> 
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 14/16] x86/iopl: Restrict iopl() permission scope
  2019-11-11 22:03 ` [patch V2 14/16] x86/iopl: Restrict iopl() permission scope Thomas Gleixner
@ 2019-11-11 23:03   ` Thomas Gleixner
  2019-11-12  6:32     ` Ingo Molnar
  2019-11-12  8:42   ` Ingo Molnar
  2019-11-12 18:35   ` Andy Lutomirski
  2 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-11 23:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, 11 Nov 2019, Thomas Gleixner wrote:
> --- a/arch/x86/kernel/ioport.c
> +++ b/arch/x86/kernel/ioport.c
> @@ -27,15 +27,28 @@ void io_bitmap_share(struct task_struct
>  	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
>  }
>  
> +static void task_update_io_bitmap(void)
> +{
> +	struct thread_struct *t = &current->thread;
> +
> +	preempt_disable();
> +	if (t->iopl_emul == 3 || t->io_bitmap) {
> +		/* TSS update is handled on exit to user space */
> +		set_thread_flag(TIF_IO_BITMAP);
> +	} else {
> +		clear_thread_flag(TIF_IO_BITMAP);
> +		/* Invalidate TSS */
> +		tss_update_io_bitmap();
> +	}
> +	preempt_enable();
> +}
> +
>  void io_bitmap_exit(void)
>  {
>  	struct io_bitmap *iobm = current->thread.io_bitmap;
>  
> -	preempt_disable();
>  	current->thread.io_bitmap = NULL;
> -	clear_thread_flag(TIF_IO_BITMAP);
> -	tss_update_io_bitmap();
> -	preempt_enable();
> +	task_update_io_bitmap();
>  	if (iobm && refcount_dec_and_test(&iobm->refcnt))
>  		kfree(iobm);

This obviously needs the following delta to be folded in. Noticed too late
after fiddling with the test case some more. git tree is updated
accordingly.

Thanks,

	tglx
---
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -18,12 +18,15 @@ static atomic64_t io_bitmap_sequence;
 
 void io_bitmap_share(struct task_struct *tsk)
  {
-	/*
-	 * Take a refcount on current's bitmap. It can be used by
-	 * both tasks as long as none of them changes the bitmap.
-	 */
-	refcount_inc(&current->thread.io_bitmap->refcnt);
-	tsk->thread.io_bitmap = current->thread.io_bitmap;
+	 /* Can be NULL when current->thread.iopl_emul == 3 */
+	 if (current->thread.io_bitmap) {
+		 /*
+		  * Take a refcount on current's bitmap. It can be used by
+		  * both tasks as long as none of them changes the bitmap.
+		  */
+		 refcount_inc(&current->thread.io_bitmap->refcnt);
+		 tsk->thread.io_bitmap = current->thread.io_bitmap;
+	 }
 	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
 }
 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 14/16] x86/iopl: Restrict iopl() permission scope
  2019-11-11 23:03   ` Thomas Gleixner
@ 2019-11-12  6:32     ` Ingo Molnar
  0 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  6:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


* Thomas Gleixner <tglx@linutronix.de> wrote:

> --- a/arch/x86/kernel/ioport.c
> +++ b/arch/x86/kernel/ioport.c
> @@ -18,12 +18,15 @@ static atomic64_t io_bitmap_sequence;
>  
>  void io_bitmap_share(struct task_struct *tsk)
>   {
> -	/*
> -	 * Take a refcount on current's bitmap. It can be used by
> -	 * both tasks as long as none of them changes the bitmap.
> -	 */
> -	refcount_inc(&current->thread.io_bitmap->refcnt);
> -	tsk->thread.io_bitmap = current->thread.io_bitmap;
> +	 /* Can be NULL when current->thread.iopl_emul == 3 */
> +	 if (current->thread.io_bitmap) {
> +		 /*
> +		  * Take a refcount on current's bitmap. It can be used by
> +		  * both tasks as long as none of them changes the bitmap.
> +		  */
> +		 refcount_inc(&current->thread.io_bitmap->refcnt);
> +		 tsk->thread.io_bitmap = current->thread.io_bitmap;
> +	 }

Minor side note: whitespace damage managed to slip in that code, see the 
fix below.

Thanks,

	Ingo

 arch/x86/kernel/ioport.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index f87844e22ec9..ee37a1c25ecc 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -17,16 +17,16 @@
 static atomic64_t io_bitmap_sequence;
 
 void io_bitmap_share(struct task_struct *tsk)
- {
-	 /* Can be NULL when current->thread.iopl_emul == 3 */
-	 if (current->thread.io_bitmap) {
-		 /*
-		  * Take a refcount on current's bitmap. It can be used by
-		  * both tasks as long as none of them changes the bitmap.
-		  */
-		 refcount_inc(&current->thread.io_bitmap->refcnt);
-		 tsk->thread.io_bitmap = current->thread.io_bitmap;
-	 }
+{
+	/* Can be NULL when current->thread.iopl_emul == 3 */
+	if (current->thread.io_bitmap) {
+		/*
+		 * Take a refcount on current's bitmap. It can be used by
+		 * both tasks as long as none of them changes the bitmap.
+		 */
+		refcount_inc(&current->thread.io_bitmap->refcnt);
+		tsk->thread.io_bitmap = current->thread.io_bitmap;
+	}
 	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
 }
 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-11 22:03 ` [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical Thomas Gleixner
@ 2019-11-12  7:14   ` Ingo Molnar
  2019-11-12  7:17     ` Thomas Gleixner
  2019-11-12  9:15   ` Peter Zijlstra
  2019-11-12 18:12   ` Andy Lutomirski
  2 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  7:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


* Thomas Gleixner <tglx@linutronix.de> wrote:

> The I/O bitmap is duplicated on fork. That's wasting memory and slows down
> fork. There is no point to do so. As long as the bitmap is not modified it
> can be shared between threads and processes.
> 
> Add a refcount and just share it on fork. If a task modifies the bitmap
> then it has to do the duplication if and only if it is shared.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
>  arch/x86/include/asm/iobitmap.h |    5 +++++
>  arch/x86/kernel/ioport.c        |   38 ++++++++++++++++++++++++++++++++------
>  arch/x86/kernel/process.c       |   39 ++++++---------------------------------
>  3 files changed, 43 insertions(+), 39 deletions(-)
> 
> --- a/arch/x86/include/asm/iobitmap.h
> +++ b/arch/x86/include/asm/iobitmap.h
> @@ -2,10 +2,12 @@
>  #ifndef _ASM_X86_IOBITMAP_H
>  #define _ASM_X86_IOBITMAP_H
>  
> +#include <linux/refcount.h>
>  #include <asm/processor.h>
>  
>  struct io_bitmap {
>  	u64			sequence;
> +	refcount_t		refcnt;
>  	unsigned int		io_bitmap_max;
>  	union {
>  		unsigned long	bits[IO_BITMAP_LONGS];

> +void io_bitmap_share(struct task_struct *tsk)
> + {
> +	/*
> +	 * Take a refcount on current's bitmap. It can be used by
> +	 * both tasks as long as none of them changes the bitmap.
> +	 */
> +	refcount_inc(&current->thread.io_bitmap->refcnt);
> +	tsk->thread.io_bitmap = current->thread.io_bitmap;
> +	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
> +}

Ok, this is really neat. I suspect there might be some pathological cases 
on ancient NUMA systems with a really high NUMA factor and bad caching 
where this new sharing might regress performance, but I doubt this 
matters, as both the hardware and this software functionality is legacy.


> +	/*
> +	 * If the bitmap is not shared, then nothing can take a refcount as
> +	 * current can obviously not fork at the same time. If it's shared
> +	 * duplicate it and drop the refcount on the original one.
> +	 */
> +	if (refcount_read(&iobm->refcnt) > 1) {
> +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> +		if (!iobm)
> +			return -ENOMEM;
> +		io_bitmap_exit();
>  	}
>  
> +	/* Set the tasks io_bitmap pointer (might be the same) */

speling nit:

s/tasks
 /task's

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-12  7:14   ` Ingo Molnar
@ 2019-11-12  7:17     ` Thomas Gleixner
  2019-11-12  7:52       ` Ingo Molnar
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12  7:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Tue, 12 Nov 2019, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> > +void io_bitmap_share(struct task_struct *tsk)
> > + {
> > +	/*
> > +	 * Take a refcount on current's bitmap. It can be used by
> > +	 * both tasks as long as none of them changes the bitmap.
> > +	 */
> > +	refcount_inc(&current->thread.io_bitmap->refcnt);
> > +	tsk->thread.io_bitmap = current->thread.io_bitmap;
> > +	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
> > +}
> 
> Ok, this is really neat. I suspect there might be some pathological cases 
> on ancient NUMA systems with a really high NUMA factor and bad caching 
> where this new sharing might regress performance, but I doubt this 
> matters, as both the hardware and this software functionality is legacy.

Definitely.

> > +	/*
> > +	 * If the bitmap is not shared, then nothing can take a refcount as
> > +	 * current can obviously not fork at the same time. If it's shared
> > +	 * duplicate it and drop the refcount on the original one.
> > +	 */
> > +	if (refcount_read(&iobm->refcnt) > 1) {
> > +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> > +		if (!iobm)
> > +			return -ENOMEM;
> > +		io_bitmap_exit();
> >  	}
> >  
> > +	/* Set the tasks io_bitmap pointer (might be the same) */
> 
> speling nit:

s/speling/spelling/ :)
 
> s/tasks
>  /task's

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH] x86/iopl: Factor out IO-bitmap related TSS fields into 'struct x86_io_bitmap'
  2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
                   ` (15 preceding siblings ...)
  2019-11-11 22:03 ` [patch V2 16/16] selftests/x86/iopl: Extend test to cover IOPL emulation Thomas Gleixner
@ 2019-11-12  7:40 ` Ingo Molnar
  2019-11-12  7:59   ` [PATCH] x86/iopl: Harmonize 'struct io_bitmap' and 'struct x86_io_bitmap' nomenclature Ingo Molnar
                     ` (2 more replies)
  16 siblings, 3 replies; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  7:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


* Thomas Gleixner <tglx@linutronix.de> wrote:

> This is the second version of the attempt to confine the unwanted side
> effects of iopl(). The first version of this series can be found here:
> 
>    https://lore.kernel.org/r/20191106193459.581614484@linutronix.de
> 
> The V1 cover letter also contains a longer variant of the
> background. Summary:
> 
> iopl(level = 3) enables aside of access to all 65536 I/O ports also the
> usage of CLI/STI in user space.
> 
> Disabling interrupts in user space can lead to system lockups and breaks
> assumptions in the kernel that userspace always runs with interrupts
> enabled.
> 
> iopl() is often preferred over ioperm() as it avoids the overhead of
> copying the tasks I/O bitmap to the TSS bitmap on context switch. This
> overhead can be avoided by providing a all zeroes bitmap in the TSS and
> switching the TSS bitmap offset to this permit all IO bitmap. It's
> marginally slower than iopl() which is a one time setup, but prevents the
> usage of CLI/STI in user space.
> 
> The changes vs. V1:
> 
>     - Fix the reported fallout on 32bit (0-day/Ingo)
> 
>     - Implement a sequence count based conditional update (Linus)
> 
>     - Drop the copy optimization
> 
>     - Move the bitmap copying out of the context switch into the exit to
>       user mode machinery. The context switch merely invalidates the TSS
>       bitmap offset when a task using an I/O bitmap gets scheduled out.
> 
>     - Move all bitmap information into a data structure to avoid adding
>       more fields to thread_struct.
> 
>     - Add a refcount so the potentially pointless duplication of the bitmap
>       at fork can be avoided. 
> 
>     - Better sharing of update functions (Andy)
> 
>     - More updates to self tests to verify the share/unshare mechanism and
>       the restore of an I/O bitmap when iopl() permissions are dropped.
> 
>     - Pick up a few acked/reviewed-by tags as applicable

>  23 files changed, 614 insertions(+), 459 deletions(-)

Ok, this new version is much easier on the eyes.

There's now a bigger collection of various x86_tss_struct fields related 
to the IO-bitmap, and the organization of those fields is still a bit 
idiosyncratic - for example tss->last_sequence doesn't tell us that it's 
bitmap related.

How about the patch below?

It reorganizes all those fields into a new container structure, 'struct 
x86_io_bitmap', adds it as tss.io_bitmap, and uses the prefix to shorten 
and organize the names of the fields:

   tss.last_bitmap         =>  tss.io_bitmap.last_bitmap
   tss.last_sequence       =>  tss.io_bitmap.last_sequence
   tss.io_bitmap_prev_max  =>  tss.io_bitmap.prev_max
   tss.io_bitmap_bytes     =>  tss.io_bitmap.map_bytes
   tss.io_bitmap_all       =>  tss.io_bitmap.map_all

This makes it far more readable, and the local variable references as 
short and tidy:

   iobm->last_bitmap
   iobm->last_sequence
   iobm->prev_max
   iobm->map_bytes
   iobm->map_all

Only build tested.

Thanks,

	Ingo

 arch/x86/include/asm/processor.h | 38 +++++++++++++++++++++++---------------
 arch/x86/kernel/cpu/common.c     |  6 +++---
 arch/x86/kernel/process.c        | 14 +++++++-------
 3 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 933f0b9b1cd7..4358ae63c252 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -333,11 +333,11 @@ struct x86_hw_tss {
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES / sizeof(long))
 
 #define IO_BITMAP_OFFSET_VALID_MAP			\
-	(offsetof(struct tss_struct, io_bitmap_bytes) -	\
+	(offsetof(struct tss_struct, io_bitmap.map_bytes) -	\
 	 offsetof(struct tss_struct, x86_tss))
 
 #define IO_BITMAP_OFFSET_VALID_ALL			\
-	(offsetof(struct tss_struct, io_bitmap_all) -	\
+	(offsetof(struct tss_struct, io_bitmap.map_all) -	\
 	 offsetof(struct tss_struct, x86_tss))
 
 /*
@@ -361,14 +361,11 @@ struct entry_stack_page {
 	struct entry_stack stack;
 } __aligned(PAGE_SIZE);
 
-struct tss_struct {
-	/*
-	 * The fixed hardware portion.  This must not cross a page boundary
-	 * at risk of violating the SDM's advice and potentially triggering
-	 * errata.
-	 */
-	struct x86_hw_tss	x86_tss;
-
+/*
+ * All IO bitmap related data stored in the TSS:
+ */
+struct x86_io_bitmap
+{
 	/*
 	 * The bitmap pointer and the sequence number of the last active
 	 * bitmap. last_bitmap cannot be dereferenced. It's solely for
@@ -384,7 +381,7 @@ struct tss_struct {
 	 * outside of the TSS limit. So for sane tasks there is no need to
 	 * actually touch the io_bitmap at all.
 	 */
-	unsigned int		io_bitmap_prev_max;
+	unsigned int		prev_max;
 
 	/*
 	 * The extra 1 is there because the CPU will access an
@@ -392,14 +389,25 @@ struct tss_struct {
 	 * bitmap. The extra byte must be all 1 bits, and must
 	 * be within the limit.
 	 */
-	unsigned char		io_bitmap_bytes[IO_BITMAP_BYTES + 1]
-				__aligned(sizeof(unsigned long));
+	unsigned char		map_bytes[IO_BITMAP_BYTES + 1] __aligned(sizeof(unsigned long));
+
 	/*
 	 * Special I/O bitmap to emulate IOPL(3). All bytes zero,
 	 * except the additional byte at the end.
 	 */
-	unsigned char		io_bitmap_all[IO_BITMAP_BYTES + 1]
-				__aligned(sizeof(unsigned long));
+	unsigned char		map_all[IO_BITMAP_BYTES + 1] __aligned(sizeof(unsigned long));
+};
+
+struct tss_struct {
+	/*
+	 * The fixed hardware portion.  This must not cross a page boundary
+	 * at risk of violating the SDM's advice and potentially triggering
+	 * errata.
+	 */
+	struct x86_hw_tss	x86_tss;
+
+	struct x86_io_bitmap	io_bitmap;
+
 } __aligned(PAGE_SIZE);
 
 DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index dfbe6fce04f3..eea0e3170de4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1861,9 +1861,9 @@ void cpu_init(void)
 	/* Initialize the TSS. */
 	tss_setup_ist(tss);
 	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
-	tss->last_bitmap = NULL;
-	tss->io_bitmap_prev_max = 0;
-	memset(tss->io_bitmap_bytes, 0xff, sizeof(tss->io_bitmap_bytes));
+	tss->io_bitmap.last_bitmap = NULL;
+	tss->io_bitmap.prev_max = 0;
+	memset(tss->io_bitmap.map_bytes, 0xff, sizeof(tss->io_bitmap.map_bytes));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 
 	load_TR_desc();
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ccb48f4dab75..8179f3ee6a55 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -350,16 +350,16 @@ static void tss_copy_io_bitmap(struct tss_struct *tss, struct io_bitmap *iobm)
 	 * permitted, then the copy needs to cover those as well so they
 	 * get turned off.
 	 */
-	memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes,
-	       max(tss->io_bitmap_prev_max, iobm->io_bitmap_max));
+	memcpy(tss->io_bitmap.map_bytes, iobm->bitmap_bytes,
+	       max(tss->io_bitmap.prev_max, iobm->io_bitmap_max));
 
 	/*
 	 * Store the new max and the sequence number of this bitmap
 	 * and a pointer to the bitmap itself.
 	 */
-	tss->io_bitmap_prev_max = iobm->io_bitmap_max;
-	tss->last_sequence = iobm->sequence;
-	tss->last_bitmap = iobm;
+	tss->io_bitmap.prev_max = iobm->io_bitmap_max;
+	tss->io_bitmap.last_sequence = iobm->sequence;
+	tss->io_bitmap.last_bitmap = iobm;
 }
 
 /**
@@ -388,8 +388,8 @@ void tss_update_io_bitmap(void)
 			 * sequence number differs. The update time is
 			 * accounted to the incoming task.
 			 */
-			if (tss->last_bitmap != iobm ||
-			    tss->last_sequence != iobm->sequence)
+			if (tss->io_bitmap.last_bitmap != iobm ||
+			    tss->io_bitmap.last_sequence != iobm->sequence)
 				tss_copy_io_bitmap(tss, iobm);
 
 			/* Enable the bitmap */

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-12  7:17     ` Thomas Gleixner
@ 2019-11-12  7:52       ` Ingo Molnar
  0 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  7:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Tue, 12 Nov 2019, Ingo Molnar wrote:
> > * Thomas Gleixner <tglx@linutronix.de> wrote:
> > > +void io_bitmap_share(struct task_struct *tsk)
> > > + {
> > > +	/*
> > > +	 * Take a refcount on current's bitmap. It can be used by
> > > +	 * both tasks as long as none of them changes the bitmap.
> > > +	 */
> > > +	refcount_inc(&current->thread.io_bitmap->refcnt);
> > > +	tsk->thread.io_bitmap = current->thread.io_bitmap;
> > > +	set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
> > > +}
> > 
> > Ok, this is really neat. I suspect there might be some pathological cases 
> > on ancient NUMA systems with a really high NUMA factor and bad caching 
> > where this new sharing might regress performance, but I doubt this 
> > matters, as both the hardware and this software functionality is legacy.
> 
> Definitely.
> 
> > > +	/*
> > > +	 * If the bitmap is not shared, then nothing can take a refcount as
> > > +	 * current can obviously not fork at the same time. If it's shared
> > > +	 * duplicate it and drop the refcount on the original one.
> > > +	 */
> > > +	if (refcount_read(&iobm->refcnt) > 1) {
> > > +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> > > +		if (!iobm)
> > > +			return -ENOMEM;
> > > +		io_bitmap_exit();
> > >  	}
> > >  
> > > +	/* Set the tasks io_bitmap pointer (might be the same) */
> > 
> > speling nit:
> 
> s/speling/spelling/ :)

Was a lame attempt at a self-depreciating joke ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH] x86/iopl: Harmonize 'struct io_bitmap' and 'struct x86_io_bitmap' nomenclature
  2019-11-12  7:40 ` [PATCH] x86/iopl: Factor out IO-bitmap related TSS fields into 'struct x86_io_bitmap' Ingo Molnar
@ 2019-11-12  7:59   ` Ingo Molnar
  2019-11-12  8:11   ` [PATCH] x86/iopl: Clear up the role of the two bitmap copying fields Ingo Molnar
  2019-11-12  8:15   ` [PATCH] x86/iopl: Rename <asm/iobitmap.h> to <asm/io_bitmap.h> Ingo Molnar
  2 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  7:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


* Ingo Molnar <mingo@kernel.org> wrote:

> How about the patch below?
> 
> It reorganizes all those fields into a new container structure, 'struct 
> x86_io_bitmap', adds it as tss.io_bitmap, and uses the prefix to shorten 
> and organize the names of the fields:
> 
>    tss.last_bitmap         =>  tss.io_bitmap.last_bitmap
>    tss.last_sequence       =>  tss.io_bitmap.last_sequence
>    tss.io_bitmap_prev_max  =>  tss.io_bitmap.prev_max
>    tss.io_bitmap_bytes     =>  tss.io_bitmap.map_bytes
>    tss.io_bitmap_all       =>  tss.io_bitmap.map_all
> 
> This makes it far more readable, and the local variable references as 
> short and tidy:
> 
>    iobm->last_bitmap
>    iobm->last_sequence
>    iobm->prev_max
>    iobm->map_bytes
>    iobm->map_all
> 
> Only build tested.

On top of this we can now do the slight reorganization of field names 
below.

    io_bitmap.io_bitmap_max => bytes_max
    x86_io_bitmap.prev_max => bytes_max


The point is to harmonize the fields in 'struct io_bitmap' with 'struct 
x86_io_bitmap':

    io_bitmap.map_bytes      io_bitmap.bytes_max
         iobm.map_bytes           iobm.bytes_max


which makes following segments of code really tidy:

  memcpy(tss->io_bitmap.map_bytes, iobm->map_bytes,
               max(tss->io_bitmap.bytes_max, iobm->bytes_max));

The other point is to remove the 'prev' notion from the TSS data field - 
there's very little "previous" about this - we have this bitmap loaded 
right now.

Also note the cleanup of the bits/bytes union:

   iobm->bits           =>   iobm->map_bits
   iobm->bitmap_bytes   =>   iobm->map_bytes

Only build tested, but 100% perfect, guaranteed.

Thanks,

	Ingo

---
 arch/x86/include/asm/iobitmap.h  |  6 +++---
 arch/x86/include/asm/processor.h |  2 +-
 arch/x86/kernel/cpu/common.c     |  2 +-
 arch/x86/kernel/ioport.c         | 10 +++++-----
 arch/x86/kernel/process.c        |  6 +++---
 arch/x86/kernel/ptrace.c         |  4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/iobitmap.h b/arch/x86/include/asm/iobitmap.h
index a075f0c55fb3..faba42c7d3de 100644
--- a/arch/x86/include/asm/iobitmap.h
+++ b/arch/x86/include/asm/iobitmap.h
@@ -8,10 +8,10 @@
 struct io_bitmap {
 	u64			sequence;
 	refcount_t		refcnt;
-	unsigned int		io_bitmap_max;
+	unsigned int		bytes_max;
 	union {
-		unsigned long	bits[IO_BITMAP_LONGS];
-		unsigned char	bitmap_bytes[IO_BITMAP_BYTES];
+		unsigned long	map_bits[IO_BITMAP_LONGS];
+		unsigned char	map_bytes[IO_BITMAP_BYTES];
 	};
 };
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4358ae63c252..d1f2c1eb14e9 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -381,7 +381,7 @@ struct x86_io_bitmap
 	 * outside of the TSS limit. So for sane tasks there is no need to
 	 * actually touch the io_bitmap at all.
 	 */
-	unsigned int		prev_max;
+	unsigned int		bytes_max;
 
 	/*
 	 * The extra 1 is there because the CPU will access an
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index eea0e3170de4..a35a557429e7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1862,7 +1862,7 @@ void cpu_init(void)
 	tss_setup_ist(tss);
 	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
 	tss->io_bitmap.last_bitmap = NULL;
-	tss->io_bitmap.prev_max = 0;
+	tss->io_bitmap.bytes_max = 0;
 	memset(tss->io_bitmap.map_bytes, 0xff, sizeof(tss->io_bitmap.map_bytes));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 
diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index ee37a1c25ecc..203f82383bf6 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -86,7 +86,7 @@ long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 		if (!iobm)
 			return -ENOMEM;
 
-		memset(iobm->bits, 0xff, sizeof(iobm->bits));
+		memset(iobm->map_bits, 0xff, sizeof(iobm->map_bits));
 		refcount_set(&iobm->refcnt, 1);
 	}
 
@@ -112,9 +112,9 @@ long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 	 * exit to user mode. So this needs no protection.
 	 */
 	if (turn_on)
-		bitmap_clear(iobm->bits, from, num);
+		bitmap_clear(iobm->map_bits, from, num);
 	else
-		bitmap_set(iobm->bits, from, num);
+		bitmap_set(iobm->map_bits, from, num);
 
 	/*
 	 * Search for a (possibly new) maximum. This is simple and stupid,
@@ -122,7 +122,7 @@ long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 	 */
 	max_long = UINT_MAX;
 	for (i = 0; i < IO_BITMAP_LONGS; i++) {
-		if (iobm->bits[i] != ~0UL)
+		if (iobm->map_bits[i] != ~0UL)
 			max_long = i;
 	}
 
@@ -132,7 +132,7 @@ long ksys_ioperm(unsigned long from, unsigned long num, int turn_on)
 		return 0;
 	}
 
-	iobm->io_bitmap_max = (max_long + 1) * sizeof(unsigned long);
+	iobm->bytes_max = (max_long + 1) * sizeof(unsigned long);
 
 	/* Increment the sequence number to force a TSS update */
 	iobm->sequence = atomic64_add_return(1, &io_bitmap_sequence);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 8179f3ee6a55..c4e76a540b51 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -350,14 +350,14 @@ static void tss_copy_io_bitmap(struct tss_struct *tss, struct io_bitmap *iobm)
 	 * permitted, then the copy needs to cover those as well so they
 	 * get turned off.
 	 */
-	memcpy(tss->io_bitmap.map_bytes, iobm->bitmap_bytes,
-	       max(tss->io_bitmap.prev_max, iobm->io_bitmap_max));
+	memcpy(tss->io_bitmap.map_bytes, iobm->map_bytes,
+	       max(tss->io_bitmap.bytes_max, iobm->bytes_max));
 
 	/*
 	 * Store the new max and the sequence number of this bitmap
 	 * and a pointer to the bitmap itself.
 	 */
-	tss->io_bitmap.prev_max = iobm->io_bitmap_max;
+	tss->io_bitmap.bytes_max = iobm->bytes_max;
 	tss->io_bitmap.last_sequence = iobm->sequence;
 	tss->io_bitmap.last_bitmap = iobm;
 }
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 6baa5a1a9f1c..f27c322f1c93 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -700,7 +700,7 @@ static int ioperm_active(struct task_struct *target,
 {
 	struct io_bitmap *iobm = target->thread.io_bitmap;
 
-	return iobm ? DIV_ROUND_UP(iobm->io_bitmap_max, regset->size) : 0;
+	return iobm ? DIV_ROUND_UP(iobm->bytes_max, regset->size) : 0;
 }
 
 static int ioperm_get(struct task_struct *target,
@@ -714,7 +714,7 @@ static int ioperm_get(struct task_struct *target,
 		return -ENXIO;
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
-				   iobm->bitmap_bytes, 0, IO_BITMAP_BYTES);
+				   iobm->map_bytes, 0, IO_BITMAP_BYTES);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH] x86/iopl: Clear up the role of the two bitmap copying fields
  2019-11-12  7:40 ` [PATCH] x86/iopl: Factor out IO-bitmap related TSS fields into 'struct x86_io_bitmap' Ingo Molnar
  2019-11-12  7:59   ` [PATCH] x86/iopl: Harmonize 'struct io_bitmap' and 'struct x86_io_bitmap' nomenclature Ingo Molnar
@ 2019-11-12  8:11   ` Ingo Molnar
  2019-11-12  8:15   ` [PATCH] x86/iopl: Rename <asm/iobitmap.h> to <asm/io_bitmap.h> Ingo Molnar
  2 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  8:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


Here's another one, which makes the pointer-caching logic a bit clearer 
IMO:

Firstly, it lines up the two related fields in the namespace:

   x86_io_bitmap.last_bitmap        =>   bitmap_copied_ptr
   x86_io_bitmap.last_sequence      =>   bitmap_copied_seq

It also constifies the pointer to better signal that it should never 
actually be used for anything but comparison

I think the 'bitmap_copied_' ptr/seq pairing makes it more obvious, and 
removing the 'last' language makes it a tiny bit clearer that this bitmap 
is special because we copied its contents into the TSS.

This makes code like this a tiny bit easier to read IMHO:

+                       if (tss->io_bitmap.bitmap_copied_ptr != iobm ||
+                           tss->io_bitmap.bitmap_copied_seq != iobm->sequence)

I marked it RFC because you might not agree. :-)

Only build tested.

Thanks,

	Ingo

---
 arch/x86/include/asm/processor.h | 10 +++++-----
 arch/x86/kernel/cpu/common.c     |  2 +-
 arch/x86/kernel/process.c        |  8 ++++----
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index d1f2c1eb14e9..b45ff7c1419f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -367,12 +367,12 @@ struct entry_stack_page {
 struct x86_io_bitmap
 {
 	/*
-	 * The bitmap pointer and the sequence number of the last active
-	 * bitmap. last_bitmap cannot be dereferenced. It's solely for
-	 * comparison.
+	 * The bitmap pointer and the sequence number of the last copied
+	 * bitmap. bitmap_copied_ptr must not be dereferenced, it's solely
+	 * for comparison.
 	 */
-	struct io_bitmap	*last_bitmap;
-	u64			last_sequence;
+	const struct io_bitmap	*bitmap_copied_ptr;
+	u64			bitmap_copied_seq;
 
 	/*
 	 * Store the dirty size of the last io bitmap offender. The next
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a35a557429e7..5a74c5b11b1c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1861,7 +1861,7 @@ void cpu_init(void)
 	/* Initialize the TSS. */
 	tss_setup_ist(tss);
 	tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
-	tss->io_bitmap.last_bitmap = NULL;
+	tss->io_bitmap.bitmap_copied_ptr = NULL;
 	tss->io_bitmap.bytes_max = 0;
 	memset(tss->io_bitmap.map_bytes, 0xff, sizeof(tss->io_bitmap.map_bytes));
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c4e76a540b51..ecf97855ed68 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -358,8 +358,8 @@ static void tss_copy_io_bitmap(struct tss_struct *tss, struct io_bitmap *iobm)
 	 * and a pointer to the bitmap itself.
 	 */
 	tss->io_bitmap.bytes_max = iobm->bytes_max;
-	tss->io_bitmap.last_sequence = iobm->sequence;
-	tss->io_bitmap.last_bitmap = iobm;
+	tss->io_bitmap.bitmap_copied_seq = iobm->sequence;
+	tss->io_bitmap.bitmap_copied_ptr = iobm;
 }
 
 /**
@@ -388,8 +388,8 @@ void tss_update_io_bitmap(void)
 			 * sequence number differs. The update time is
 			 * accounted to the incoming task.
 			 */
-			if (tss->io_bitmap.last_bitmap != iobm ||
-			    tss->io_bitmap.last_sequence != iobm->sequence)
+			if (tss->io_bitmap.bitmap_copied_ptr != iobm ||
+			    tss->io_bitmap.bitmap_copied_seq != iobm->sequence)
 				tss_copy_io_bitmap(tss, iobm);
 
 			/* Enable the bitmap */

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH] x86/iopl: Rename <asm/iobitmap.h> to <asm/io_bitmap.h>
  2019-11-12  7:40 ` [PATCH] x86/iopl: Factor out IO-bitmap related TSS fields into 'struct x86_io_bitmap' Ingo Molnar
  2019-11-12  7:59   ` [PATCH] x86/iopl: Harmonize 'struct io_bitmap' and 'struct x86_io_bitmap' nomenclature Ingo Molnar
  2019-11-12  8:11   ` [PATCH] x86/iopl: Clear up the role of the two bitmap copying fields Ingo Molnar
@ 2019-11-12  8:15   ` Ingo Molnar
  2 siblings, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  8:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


Rename <asm/iobitmap.h> to <asm/io_bitmap.h>, because IMHO the header 
that defines 'struct io_bitmap' should be called io_bitmap.h. :-)

Build tested.

Thanks,

	Ingo

---
 arch/x86/entry/common.c                          | 2 +-
 arch/x86/include/asm/{iobitmap.h => io_bitmap.h} | 0
 arch/x86/kernel/ioport.c                         | 2 +-
 arch/x86/kernel/process.c                        | 2 +-
 arch/x86/kernel/ptrace.c                         | 2 +-
 5 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 85e8a8d7b380..9747876980b5 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -33,7 +33,7 @@
 #include <asm/cpufeature.h>
 #include <asm/fpu/api.h>
 #include <asm/nospec-branch.h>
-#include <asm/iobitmap.h>
+#include <asm/io_bitmap.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
diff --git a/arch/x86/include/asm/iobitmap.h b/arch/x86/include/asm/io_bitmap.h
similarity index 100%
rename from arch/x86/include/asm/iobitmap.h
rename to arch/x86/include/asm/io_bitmap.h
diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index 203f82383bf6..d1a3a9f5314b 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -11,7 +11,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 
-#include <asm/iobitmap.h>
+#include <asm/io_bitmap.h>
 #include <asm/desc.h>
 
 static atomic64_t io_bitmap_sequence;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ecf97855ed68..7bf60741e80a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -41,7 +41,7 @@
 #include <asm/desc.h>
 #include <asm/prctl.h>
 #include <asm/spec-ctrl.h>
-#include <asm/iobitmap.h>
+#include <asm/io_bitmap.h>
 #include <asm/proto.h>
 
 #include "process.h"
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index f27c322f1c93..72bf005c8208 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -42,7 +42,7 @@
 #include <asm/traps.h>
 #include <asm/syscall.h>
 #include <asm/fsgsbase.h>
-#include <asm/iobitmap.h>
+#include <asm/io_bitmap.h>
 
 #include "tls.h"
 


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [patch V2 14/16] x86/iopl: Restrict iopl() permission scope
  2019-11-11 22:03 ` [patch V2 14/16] x86/iopl: Restrict iopl() permission scope Thomas Gleixner
  2019-11-11 23:03   ` Thomas Gleixner
@ 2019-11-12  8:42   ` Ingo Molnar
  2019-11-12 10:07     ` Thomas Gleixner
  2019-11-12 18:35   ` Andy Lutomirski
  2 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2019-11-12  8:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin


* Thomas Gleixner <tglx@linutronix.de> wrote:

> +static void task_update_io_bitmap(void)
> +{
> +	struct thread_struct *t = &current->thread;
> +
> +	preempt_disable();
> +	if (t->iopl_emul == 3 || t->io_bitmap) {
> +		/* TSS update is handled on exit to user space */
> +		set_thread_flag(TIF_IO_BITMAP);
> +	} else {
> +		clear_thread_flag(TIF_IO_BITMAP);
> +		/* Invalidate TSS */
> +		tss_update_io_bitmap();
> +	}
> +	preempt_enable();
> +}
> +
>  void io_bitmap_exit(void)
>  {
>  	struct io_bitmap *iobm = current->thread.io_bitmap;
>  
> -	preempt_disable();
>  	current->thread.io_bitmap = NULL;
> -	clear_thread_flag(TIF_IO_BITMAP);
> -	tss_update_io_bitmap();
> -	preempt_enable();
> +	task_update_io_bitmap();

BTW., isn't the preempt_disable()/enable() sequence only needed around 
the tss_update_io_bitmap() call?

->iopl_emul, ->io_bitmap and TIF_IO_BITMAP can only be set by the current 
task AFAICS.

I.e. critical section could be narrowed a bit.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-11 22:03 ` [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical Thomas Gleixner
  2019-11-12  7:14   ` Ingo Molnar
@ 2019-11-12  9:15   ` Peter Zijlstra
  2019-11-12  9:51     ` Thomas Gleixner
  2019-11-14 11:02     ` David Laight
  2019-11-12 18:12   ` Andy Lutomirski
  2 siblings, 2 replies; 57+ messages in thread
From: Peter Zijlstra @ 2019-11-12  9:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 11:03:25PM +0100, Thomas Gleixner wrote:
> @@ -59,8 +71,26 @@ long ksys_ioperm(unsigned long from, uns
>  			return -ENOMEM;
>  
>  		memset(iobm->bits, 0xff, sizeof(iobm->bits));
> +		refcount_set(&iobm->refcnt, 1);
> +	}
> +
> +	/*
> +	 * If the bitmap is not shared, then nothing can take a refcount as
> +	 * current can obviously not fork at the same time. If it's shared
> +	 * duplicate it and drop the refcount on the original one.
> +	 */
> +	if (refcount_read(&iobm->refcnt) > 1) {
> +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> +		if (!iobm)
> +			return -ENOMEM;
> +		io_bitmap_exit();
		refcount_set(&iobm->refcnd, 1);
>  	}
>  
> +	/* Set the tasks io_bitmap pointer (might be the same) */
> +	t->io_bitmap = iobm;
> +	/* Mark it active for context switching and exit to user mode */
> +	set_thread_flag(TIF_IO_BITMAP);
> +

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 08/16] x86/ioperm: Add bitmap sequence number
  2019-11-11 22:03 ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Thomas Gleixner
@ 2019-11-12  9:22   ` Peter Zijlstra
  2019-11-12  9:55     ` [patch V2 08/16] x86/ioperm: Add bitmap sequence numberc Thomas Gleixner
  2019-11-12 16:08   ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Andy Lutomirski
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2019-11-12  9:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Linus Torvalds

On Mon, Nov 11, 2019 at 11:03:22PM +0100, Thomas Gleixner wrote:
> Add a globally unique sequence number which is incremented when ioperm() is
> changing the I/O bitmap of a task. Store the new sequence number in the
> io_bitmap structure and compare it along with the actual struct pointer
> with the one which was last loaded on a CPU. Only update the bitmap if
> either of the two changes. That should further reduce the overhead of I/O
> bitmap scheduling when there are only a few I/O bitmap users on the system.

> +	/* Update the sequence number to force an update in switch_to() */
> +	iobm->sequence = atomic64_add_return(1, &io_bitmap_sequence);

> +		if (tss->last_bitmap != iobm ||
> +		    tss->last_sequence != iobm->sequence)
> +			switch_to_update_io_bitmap(tss, iobm);

Initially I wondered why we need a globally unique sequence number if we
already check the struct iobitmap pointer. That ought to make the
sequence number per-object.

However, that breaks for memory re-use. So yes, we need that thing to be
global.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-12  9:15   ` Peter Zijlstra
@ 2019-11-12  9:51     ` Thomas Gleixner
  2019-11-14 11:02     ` David Laight
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12  9:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Tue, 12 Nov 2019, Peter Zijlstra wrote:
> On Mon, Nov 11, 2019 at 11:03:25PM +0100, Thomas Gleixner wrote:
> > @@ -59,8 +71,26 @@ long ksys_ioperm(unsigned long from, uns
> >  			return -ENOMEM;
> >  
> >  		memset(iobm->bits, 0xff, sizeof(iobm->bits));
> > +		refcount_set(&iobm->refcnt, 1);
> > +	}
> > +
> > +	/*
> > +	 * If the bitmap is not shared, then nothing can take a refcount as
> > +	 * current can obviously not fork at the same time. If it's shared
> > +	 * duplicate it and drop the refcount on the original one.
> > +	 */
> > +	if (refcount_read(&iobm->refcnt) > 1) {
> > +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> > +		if (!iobm)
> > +			return -ENOMEM;
> > +		io_bitmap_exit();
> 		refcount_set(&iobm->refcnd, 1);

Indeed.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 08/16] x86/ioperm: Add bitmap sequence numberc
  2019-11-12  9:22   ` Peter Zijlstra
@ 2019-11-12  9:55     ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12  9:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Linus Torvalds

On Tue, 12 Nov 2019, Peter Zijlstra wrote:
> On Mon, Nov 11, 2019 at 11:03:22PM +0100, Thomas Gleixner wrote:
> > Add a globally unique sequence number which is incremented when ioperm() is
> > changing the I/O bitmap of a task. Store the new sequence number in the
> > io_bitmap structure and compare it along with the actual struct pointer
> > with the one which was last loaded on a CPU. Only update the bitmap if
> > either of the two changes. That should further reduce the overhead of I/O
> > bitmap scheduling when there are only a few I/O bitmap users on the system.
> 
> > +	/* Update the sequence number to force an update in switch_to() */
> > +	iobm->sequence = atomic64_add_return(1, &io_bitmap_sequence);
> 
> > +		if (tss->last_bitmap != iobm ||
> > +		    tss->last_sequence != iobm->sequence)
> > +			switch_to_update_io_bitmap(tss, iobm);
> 
> Initially I wondered why we need a globally unique sequence number if we
> already check the struct iobitmap pointer. That ought to make the
> sequence number per-object.
> 
> However, that breaks for memory re-use. So yes, we need that thing to be
> global.

Actually with a global 64bit wide counter we can just avoid the pointer
comparison. Assumed a ioperm() syscall every nanosecond it takes about 584
years of uptime to wrap around. :)

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 14/16] x86/iopl: Restrict iopl() permission scope
  2019-11-12  8:42   ` Ingo Molnar
@ 2019-11-12 10:07     ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12 10:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Tue, 12 Nov 2019, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > +static void task_update_io_bitmap(void)
> > +{
> > +	struct thread_struct *t = &current->thread;
> > +
> > +	preempt_disable();
> > +	if (t->iopl_emul == 3 || t->io_bitmap) {
> > +		/* TSS update is handled on exit to user space */
> > +		set_thread_flag(TIF_IO_BITMAP);
> > +	} else {
> > +		clear_thread_flag(TIF_IO_BITMAP);
> > +		/* Invalidate TSS */
> > +		tss_update_io_bitmap();
> > +	}
> > +	preempt_enable();
> > +}
> > +
> >  void io_bitmap_exit(void)
> >  {
> >  	struct io_bitmap *iobm = current->thread.io_bitmap;
> >  
> > -	preempt_disable();
> >  	current->thread.io_bitmap = NULL;
> > -	clear_thread_flag(TIF_IO_BITMAP);
> > -	tss_update_io_bitmap();
> > -	preempt_enable();
> > +	task_update_io_bitmap();
> 
> BTW., isn't the preempt_disable()/enable() sequence only needed around 
> the tss_update_io_bitmap() call?
> 
> ->iopl_emul, ->io_bitmap and TIF_IO_BITMAP can only be set by the current 
> task AFAICS.
> 
> I.e. critical section could be narrowed a bit.

Yes.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 01/16] x86/ptrace: Prevent truncation of bitmap size
  2019-11-11 22:03 ` [patch V2 01/16] x86/ptrace: Prevent truncation of bitmap size Thomas Gleixner
@ 2019-11-12 15:34   ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 15:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Ingo Molnar

On Mon, Nov 11, 2019 at 2:36 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The active() callback of the IO bitmap regset divides the IO bitmap size by
> the word size (32/64 bit). As the I/O bitmap size is in bytes the active
> check fails for bitmap sizes of 1-3 bytes on 32bit and 1-7 bytes on 64bit.
>
> Use DIV_ROUND_UP() instead.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON()
  2019-11-11 22:03 ` [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON() Thomas Gleixner
  2019-11-11 22:44   ` Paolo Bonzini
@ 2019-11-12 15:37   ` Andy Lutomirski
  1 sibling, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 15:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Paolo Bonzini

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> The BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 == 0x67) in the VMX code is bogus in
> two aspects:
>
> 1) This wants to be in generic x86 code simply to catch issues even when
>    VMX is disabled in Kconfig.
>
> 2) The IO_BITMAP_OFFSET is not the right thing to check because it makes
>    asssumptions about the layout of tss_struct. Nothing requires that the
>    I/O bitmap is placed right after x86_tss, which is the hardware mandated
>    tss structure. It pointlessly makes restrictions on the struct
>    tss_struct layout.
>
> The proper thing to check is:
>
>     - Offset of x86_tss in tss_struct is 0
>     - Size of x86_tss == 0x68
>
> Move it to the other build time TSS checks and make it do the right thing.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>

Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 05/16] x86/iopl: Cleanup include maze
  2019-11-11 22:03 ` [patch V2 05/16] x86/iopl: Cleanup include maze Thomas Gleixner
@ 2019-11-12 15:37   ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 15:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Get rid of superfluous includes.
>

Acked-by: Andy Lutomirski <luto@kernel.org>

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 06/16] x86/io: Speedup schedule out of I/O bitmap user
  2019-11-11 22:03 ` [patch V2 06/16] x86/io: Speedup schedule out of I/O bitmap user Thomas Gleixner
@ 2019-11-12 16:00   ` Andy Lutomirski
  2019-11-12 17:08     ` Thomas Gleixner
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 16:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> There is no requirement to update the TSS I/O bitmap when a thread using it is
> scheduled out and the incoming thread does not use it.
>
> For the permission check based on the TSS I/O bitmap the CPU calculates the memory
> location of the I/O bitmap by the address of the TSS and the io_bitmap_base member
> of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the
> offset to an address outside of the TSS limit.
>
> If an I/O instruction is issued from user space the TSS limit causes #GP to be
> raised in the same was as valid I/O bitmap with all bits set to 1 would do.
>
> This removes the extra work when an I/O bitmap using task is scheduled out
> and puts the burden on the rare I/O bitmap users when they are scheduled
> in.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>

> --- a/arch/x86/kernel/ioport.c
> +++ b/arch/x86/kernel/ioport.c

I won't swear this is wrong, but I'm not convinced it's correct
either.  I see two issues:

> @@ -40,8 +40,6 @@ long ksys_ioperm(unsigned long from, uns
>                         return -ENOMEM;
>
>                 memset(bitmap, 0xff, IO_BITMAP_BYTES);
> -               t->io_bitmap_ptr = bitmap;
> -               set_thread_flag(TIF_IO_BITMAP);
>
>                 /*
>                  * Now that we have an IO bitmap, we need our TSS limit to be
> @@ -50,6 +48,11 @@ long ksys_ioperm(unsigned long from, uns
>                  * limit correct.
>                  */
>                 preempt_disable();
> +               t->io_bitmap_ptr = bitmap;
> +               set_thread_flag(TIF_IO_BITMAP);
> +               /* Make the bitmap base in the TSS valid */
> +               tss = this_cpu_ptr(&cpu_tss_rw);
> +               tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
>                 refresh_tss_limit();
>                 preempt_enable();
>         }

It's not shown in the diff, but the very next line of code turns
preemption back off.  This means that we might schedule right here
with TIF_IO_BITMAP set, the base set to VALID, but the wrong data in
the bitmap.  I *think* this will actually end up being okay, but it
certainly makes understanding the code harder.  Can you adjust the
code so that preemption stays off?

More importantly, the code below this modifies the TSS copy in place
instead of writing a whole new copy.  But now that you've added your
optimization, the TSS copy might be *someone else's* IO bitmap.  So I
think you might end up with more io ports allowed than you intended.
For example:

Task A uses ioperm() to enable all ports.
Switch to task B.  Now the TSS base is INVALID but all bitmap bits are still 0.
Task B calls ioperm().

The code will set the base to VALID and will correctly set up the
thread's copy of the bitmap, but I think the copy will only update the
bits 0 through whatever ioperm() touched and not the bits above that
in the TSS.

I would believe that this is fixed later in your patch set.  If so,
perhaps you should just memcpy() the whole thing without trying to
optimize in this patch and then let the changes later re-optimize it
as appropriate.  IOW change memcpy(tss->io_bitmap, t->io_bitmap_ptr,
bytes_updated); to memcpy(..., BYTES_PER_LONG * IO_BITMAP_LONGS) or
similar.

--Andy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 07/16] x86/ioperm: Move iobitmap data into a struct
  2019-11-11 22:03 ` [patch V2 07/16] x86/ioperm: Move iobitmap data into a struct Thomas Gleixner
@ 2019-11-12 16:02   ` Andy Lutomirski
  2019-11-12 17:08     ` Thomas Gleixner
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 16:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> No point in having all the data in thread_struct, especially as upcoming
> changes add more.
>
> Make the bitmap in the new struct accessible as array of longs and as array
> of characters via a union, so both the bitmap functions and the update
> logic can avoid type casts.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
>  arch/x86/include/asm/iobitmap.h  |   15 ++++++++
>  arch/x86/include/asm/processor.h |   21 +++++------
>  arch/x86/kernel/cpu/common.c     |    2 -
>  arch/x86/kernel/ioport.c         |   69 +++++++++++++++++++--------------------
>  arch/x86/kernel/process.c        |   38 +++++++++++----------
>  arch/x86/kernel/ptrace.c         |   12 ++++--
>  6 files changed, 89 insertions(+), 68 deletions(-)
>
> --- /dev/null
> +++ b/arch/x86/include/asm/iobitmap.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_IOBITMAP_H
> +#define _ASM_X86_IOBITMAP_H
> +
> +#include <asm/processor.h>
> +
> +struct io_bitmap {
> +       unsigned int            io_bitmap_max;
> +       union {
> +               unsigned long   bits[IO_BITMAP_LONGS];
> +               unsigned char   bitmap_bytes[IO_BITMAP_BYTES];
> +       };

Now that you have bytes and longs, can you rename io_bitmap_max so
it's obvious which one it refers to?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 08/16] x86/ioperm: Add bitmap sequence number
  2019-11-11 22:03 ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Thomas Gleixner
  2019-11-12  9:22   ` Peter Zijlstra
@ 2019-11-12 16:08   ` Andy Lutomirski
  2019-11-12 17:10     ` Thomas Gleixner
  1 sibling, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 16:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin, Linus Torvalds

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Add a globally unique sequence number which is incremented when ioperm() is
> changing the I/O bitmap of a task. Store the new sequence number in the
> io_bitmap structure and compare it along with the actual struct pointer
> with the one which was last loaded on a CPU. Only update the bitmap if
> either of the two changes. That should further reduce the overhead of I/O
> bitmap scheduling when there are only a few I/O bitmap users on the system.
>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
>  arch/x86/include/asm/iobitmap.h  |    1
>  arch/x86/include/asm/processor.h |    8 +++++++
>  arch/x86/kernel/cpu/common.c     |    1
>  arch/x86/kernel/ioport.c         |    9 +++++---
>  arch/x86/kernel/process.c        |   40 +++++++++++++++++++++++++++++----------
>  5 files changed, 46 insertions(+), 13 deletions(-)
>
> --- a/arch/x86/include/asm/iobitmap.h
> +++ b/arch/x86/include/asm/iobitmap.h
> @@ -5,6 +5,7 @@
>  #include <asm/processor.h>
>
>  struct io_bitmap {
> +       u64                     sequence;
>         unsigned int            io_bitmap_max;
>         union {
>                 unsigned long   bits[IO_BITMAP_LONGS];
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -366,6 +366,14 @@ struct tss_struct {
>         struct x86_hw_tss       x86_tss;
>
>         /*
> +        * The bitmap pointer and the sequence number of the last active
> +        * bitmap. last_bitmap cannot be dereferenced. It's solely for
> +        * comparison.
> +        */
> +       struct io_bitmap        *last_bitmap;
> +       u64                     last_sequence;
> +
> +       /*
>          * Store the dirty size of the last io bitmap offender. The next
>          * one will have to do the cleanup as the switch out to a non io
>          * bitmap user will just set x86_tss.io_bitmap_base to a value

Why is all this stuff in the TSS?  Would it make more sense in a
percpu variable tss_state?  By putting it in the TSS, you're putting
it in cpu_entry_area, which is at least a bit odd if not an actual
security problem.

And why do you need a last_bitmap pointer?  I thin that comparing just
last_sequence should be enough.  Keeping last_bitmap around at all is
asking for trouble, since it might point to freed memory.

> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1861,6 +1861,7 @@ void cpu_init(void)
>         /* Initialize the TSS. */
>         tss_setup_ist(tss);
>         tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID;
> +       tss->last_bitmap = NULL;
>         tss->io_bitmap_prev_max = 0;
>         memset(tss->io_bitmap_bytes, 0xff, sizeof(tss->io_bitmap_bytes));
>         set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
> --- a/arch/x86/kernel/ioport.c
> +++ b/arch/x86/kernel/ioport.c
> @@ -14,6 +14,8 @@
>  #include <asm/iobitmap.h>
>  #include <asm/desc.h>
>
> +static atomic64_t io_bitmap_sequence;
> +
>  /*
>   * this changes the io permissions bitmap in the current task.
>   */
> @@ -76,14 +78,15 @@ long ksys_ioperm(unsigned long from, uns
>
>         iobm->io_bitmap_max = bytes;
>
> -       /* Update the TSS: */
> -       memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes, bytes_updated);
> -
> +       /* Update the sequence number to force an update in switch_to() */
> +       iobm->sequence = atomic64_add_return(1, &io_bitmap_sequence);
>         /* Set the tasks io_bitmap pointer (might be the same) */
>         t->io_bitmap = iobm;
>         /* Mark it active for context switching */
>         set_thread_flag(TIF_IO_BITMAP);
>
> +       /* Update the TSS: */
> +       memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes, bytes_updated);
>         /* Make the bitmap base in the TSS valid */
>         tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -354,6 +354,29 @@ void arch_setup_new_exec(void)
>         }
>  }
>
> +static void switch_to_update_io_bitmap(struct tss_struct *tss,
> +                                      struct io_bitmap *iobm)
> +{
> +       /*
> +        * Copy at least the byte range of the incoming tasks bitmap which
> +        * covers the permitted I/O ports.
> +        *
> +        * If the previous task which used an I/O bitmap had more bits
> +        * permitted, then the copy needs to cover those as well so they
> +        * get turned off.
> +        */
> +       memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes,
> +              max(tss->io_bitmap_prev_max, iobm->io_bitmap_max));
> +
> +       /*
> +        * Store the new max and the sequence number of this bitmap
> +        * and a pointer to the bitmap itself.
> +        */
> +       tss->io_bitmap_prev_max = iobm->io_bitmap_max;
> +       tss->last_sequence = iobm->sequence;
> +       tss->last_bitmap = iobm;
> +}
> +
>  static inline void switch_to_bitmap(struct thread_struct *next,
>                                     unsigned long tifp, unsigned long tifn)
>  {
> @@ -363,18 +386,15 @@ static inline void switch_to_bitmap(stru
>                 struct io_bitmap *iobm = next->io_bitmap;
>
>                 /*
> -                * Copy at least the size of the incoming tasks bitmap
> -                * which covers the last permitted I/O port.
> -                *
> -                * If the previous task which used an io bitmap had more
> -                * bits permitted, then the copy needs to cover those as
> -                * well so they get turned off.
> +                * Only copy bitmap data when the bitmap or the sequence
> +                * number differs. The update time is accounted to the
> +                * incoming task.
>                  */
> -               memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes,
> -                      max(tss->io_bitmap_prev_max, iobm->io_bitmap_max));
> +               if (tss->last_bitmap != iobm ||
> +                   tss->last_sequence != iobm->sequence)

As above, I think this could just be if (tss->last_sequence !=
iobm->sequence) or even if (this_cpu_read(cpu_tss_state.iobm_sequence)
!= iobm->sequence).

--Andy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work
  2019-11-11 22:03 ` [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work Thomas Gleixner
@ 2019-11-12 16:16   ` Andy Lutomirski
  2019-11-12 17:20     ` Thomas Gleixner
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 16:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> There is no point to update the TSS bitmap for tasks which use I/O bitmaps
> on every context switch. It's enough to update it right before exiting to
> user space.
>

+
> +static inline void switch_to_bitmap(unsigned long tifp)
> +{
> +       /*
> +        * Invalidate I/O bitmap if the previous task used it. If the next
> +        * task has an I/O bitmap it will handle it on exit to user mode.
> +        */
> +       if (tifp & _TIF_IO_BITMAP)
> +               tss_invalidate_io_bitmap(this_cpu_ptr(&cpu_tss_rw));
> +}

Shouldn't you be invalidating the io bitmap if the *next* task doesn't
use?  Or is the rule that, when a non-io-bitmap-using task is running,
even in kernel mode, the io bitmap is always invalid.

As it stands, you need exit_thread() to invalidate the bitmap.  I
assume it does, but I can't easily see it in the middle of the series
like this.

IOW your code might be fine, but it could at least use some comments
in appropriate places (exit_to_usermode_loop()?) that we guarantee
that, if the bit is *clear*, then the TSS has the io bitmap marked
invalid.  And add an assertion under CONFIG_DEBUG_ENTRY.

Also, do you need to update EXIT_TO_USERMODE_LOOP_FLAGS?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 06/16] x86/io: Speedup schedule out of I/O bitmap user
  2019-11-12 16:00   ` Andy Lutomirski
@ 2019-11-12 17:08     ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12 17:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Linus Torvalds, Stephen Hemminger, Willy Tarreau,
	Juergen Gross, Sean Christopherson, H. Peter Anvin

On Tue, 12 Nov 2019, Andy Lutomirski wrote:
> On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > @@ -50,6 +48,11 @@ long ksys_ioperm(unsigned long from, uns
> >                  * limit correct.
> >                  */
> >                 preempt_disable();
> > +               t->io_bitmap_ptr = bitmap;
> > +               set_thread_flag(TIF_IO_BITMAP);
> > +               /* Make the bitmap base in the TSS valid */
> > +               tss = this_cpu_ptr(&cpu_tss_rw);
> > +               tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID;
> >                 refresh_tss_limit();
> >                 preempt_enable();
> >         }
> 
> It's not shown in the diff, but the very next line of code turns
> preemption back off.  This means that we might schedule right here
> with TIF_IO_BITMAP set, the base set to VALID, but the wrong data in
> the bitmap.  I *think* this will actually end up being okay, but it
> certainly makes understanding the code harder.  Can you adjust the
> code so that preemption stays off?
> 
> More importantly, the code below this modifies the TSS copy in place
> instead of writing a whole new copy.  But now that you've added your
> optimization, the TSS copy might be *someone else's* IO bitmap.  So I
> think you might end up with more io ports allowed than you intended.
> For example:
> 
> Task A uses ioperm() to enable all ports.
> Switch to task B.  Now the TSS base is INVALID but all bitmap bits are still 0.
> Task B calls ioperm().
> 
> The code will set the base to VALID and will correctly set up the
> thread's copy of the bitmap, but I think the copy will only update the
> bits 0 through whatever ioperm() touched and not the bits above that
> in the TSS.

Yeah, you are right. Did not think about that. Will fix that up.
 
> I would believe that this is fixed later in your patch set.  If so,
> perhaps you should just memcpy() the whole thing without trying to
> optimize in this patch and then let the changes later re-optimize it
> as appropriate.  IOW change memcpy(tss->io_bitmap, t->io_bitmap_ptr,
> bytes_updated); to memcpy(..., BYTES_PER_LONG * IO_BITMAP_LONGS) or
> similar.

Right.

Thanks for spotting that!

       tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 07/16] x86/ioperm: Move iobitmap data into a struct
  2019-11-12 16:02   ` Andy Lutomirski
@ 2019-11-12 17:08     ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12 17:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Linus Torvalds, Stephen Hemminger, Willy Tarreau,
	Juergen Gross, Sean Christopherson, H. Peter Anvin

On Tue, 12 Nov 2019, Andy Lutomirski wrote:
> On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > --- /dev/null
> > +++ b/arch/x86/include/asm/iobitmap.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_X86_IOBITMAP_H
> > +#define _ASM_X86_IOBITMAP_H
> > +
> > +#include <asm/processor.h>
> > +
> > +struct io_bitmap {
> > +       unsigned int            io_bitmap_max;
> > +       union {
> > +               unsigned long   bits[IO_BITMAP_LONGS];
> > +               unsigned char   bitmap_bytes[IO_BITMAP_BYTES];
> > +       };
> 
> Now that you have bytes and longs, can you rename io_bitmap_max so
> it's obvious which one it refers to?

Sure.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 08/16] x86/ioperm: Add bitmap sequence number
  2019-11-12 16:08   ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Andy Lutomirski
@ 2019-11-12 17:10     ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12 17:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Linus Torvalds, Stephen Hemminger, Willy Tarreau,
	Juergen Gross, Sean Christopherson, H. Peter Anvin,
	Linus Torvalds

On Tue, 12 Nov 2019, Andy Lutomirski wrote:
> On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >         /*
> > +        * The bitmap pointer and the sequence number of the last active
> > +        * bitmap. last_bitmap cannot be dereferenced. It's solely for
> > +        * comparison.
> > +        */
> > +       struct io_bitmap        *last_bitmap;
> > +       u64                     last_sequence;
> > +
> > +       /*
> >          * Store the dirty size of the last io bitmap offender. The next
> >          * one will have to do the cleanup as the switch out to a non io
> >          * bitmap user will just set x86_tss.io_bitmap_base to a value
> 
> Why is all this stuff in the TSS?  Would it make more sense in a
> percpu variable tss_state?  By putting it in the TSS, you're putting
> it in cpu_entry_area, which is at least a bit odd if not an actual
> security problem.
> 
> And why do you need a last_bitmap pointer?  I thin that comparing just
> last_sequence should be enough.  Keeping last_bitmap around at all is
> asking for trouble, since it might point to freed memory.

The bitmap pointer is pointless as I said in an earlier reply to Peter. It
will go away. The sequence number and the dirty size are surely not a
problem leakage wise, but yes, we could put it into a per cpu variable as
well. Not sure whether it buys much.
 
> > -               memcpy(tss->io_bitmap_bytes, iobm->bitmap_bytes,
> > -                      max(tss->io_bitmap_prev_max, iobm->io_bitmap_max));
> > +               if (tss->last_bitmap != iobm ||
> > +                   tss->last_sequence != iobm->sequence)
> 
> As above, I think this could just be if (tss->last_sequence !=
> iobm->sequence) or even if (this_cpu_read(cpu_tss_state.iobm_sequence)
> != iobm->sequence).

Already fixed as per Peter's comments.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work
  2019-11-12 16:16   ` Andy Lutomirski
@ 2019-11-12 17:20     ` Thomas Gleixner
  2019-11-12 17:41       ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12 17:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Linus Torvalds, Stephen Hemminger, Willy Tarreau,
	Juergen Gross, Sean Christopherson, H. Peter Anvin

On Tue, 12 Nov 2019, Andy Lutomirski wrote:
> On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > There is no point to update the TSS bitmap for tasks which use I/O bitmaps
> > on every context switch. It's enough to update it right before exiting to
> > user space.
> +
> > +static inline void switch_to_bitmap(unsigned long tifp)
> > +{
> > +       /*
> > +        * Invalidate I/O bitmap if the previous task used it. If the next
> > +        * task has an I/O bitmap it will handle it on exit to user mode.
> > +        */
> > +       if (tifp & _TIF_IO_BITMAP)
> > +               tss_invalidate_io_bitmap(this_cpu_ptr(&cpu_tss_rw));
> > +}
> 
> Shouldn't you be invalidating the io bitmap if the *next* task doesn't
> use?  Or is the rule that, when a non-io-bitmap-using task is running,
> even in kernel mode, the io bitmap is always invalid.

Well it does not make much of a difference whether we do the above or
!(tifn & _TIF_IO_BITMAP). We always end up in that code when one of the
involved tasks has TIF_IO_BITMAP set. I decided to use the sched out check
because that makes it clear that this is the end of the valid I/O
bitmap. If the next task has TIF_IO_BITMAP set as well, then it will anyway
end up in the exit to user mode update code. Clearing it here ensures that
even if the exit to user mode malfunctions the bitmap cannot be leaked.

> As it stands, you need exit_thread() to invalidate the bitmap.  I
> assume it does, but I can't easily see it in the middle of the series
> like this.

It does.
 
> IOW your code might be fine, but it could at least use some comments
> in appropriate places (exit_to_usermode_loop()?) that we guarantee
> that, if the bit is *clear*, then the TSS has the io bitmap marked
> invalid.  And add an assertion under CONFIG_DEBUG_ENTRY.
> 
> Also, do you need to update EXIT_TO_USERMODE_LOOP_FLAGS?

No, the TIF_IO_BITMAP check is done once after the loop has run and it
would not make any sense in the loop as TIF_IO_BITMAP cannot be cleared
there and we'd loop forever. The other usermode loop flags are transient.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work
  2019-11-12 17:20     ` Thomas Gleixner
@ 2019-11-12 17:41       ` Andy Lutomirski
  2019-11-12 17:46         ` Linus Torvalds
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 17:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, LKML, X86 ML, Linus Torvalds, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Tue, Nov 12, 2019 at 9:20 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, 12 Nov 2019, Andy Lutomirski wrote:
> > On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > There is no point to update the TSS bitmap for tasks which use I/O bitmaps
> > > on every context switch. It's enough to update it right before exiting to
> > > user space.
> > +
> > > +static inline void switch_to_bitmap(unsigned long tifp)
> > > +{
> > > +       /*
> > > +        * Invalidate I/O bitmap if the previous task used it. If the next
> > > +        * task has an I/O bitmap it will handle it on exit to user mode.
> > > +        */
> > > +       if (tifp & _TIF_IO_BITMAP)
> > > +               tss_invalidate_io_bitmap(this_cpu_ptr(&cpu_tss_rw));
> > > +}
> >
> > Shouldn't you be invalidating the io bitmap if the *next* task doesn't
> > use?  Or is the rule that, when a non-io-bitmap-using task is running,
> > even in kernel mode, the io bitmap is always invalid.
>
> Well it does not make much of a difference whether we do the above or
> !(tifn & _TIF_IO_BITMAP). We always end up in that code when one of the
> involved tasks has TIF_IO_BITMAP set. I decided to use the sched out check
> because that makes it clear that this is the end of the valid I/O
> bitmap. If the next task has TIF_IO_BITMAP set as well, then it will anyway
> end up in the exit to user mode update code. Clearing it here ensures that
> even if the exit to user mode malfunctions the bitmap cannot be leaked.
>
> > As it stands, you need exit_thread() to invalidate the bitmap.  I
> > assume it does, but I can't easily see it in the middle of the series
> > like this.
>
> It does.
>
> > IOW your code might be fine, but it could at least use some comments
> > in appropriate places (exit_to_usermode_loop()?) that we guarantee
> > that, if the bit is *clear*, then the TSS has the io bitmap marked
> > invalid.  And add an assertion under CONFIG_DEBUG_ENTRY.
> >
> > Also, do you need to update EXIT_TO_USERMODE_LOOP_FLAGS?
>
> No, the TIF_IO_BITMAP check is done once after the loop has run and it
> would not make any sense in the loop as TIF_IO_BITMAP cannot be cleared
> there and we'd loop forever. The other usermode loop flags are transient.
>

Right.  But your diff tool *said* the diff was in
exit_to_usermode_loop().  Can you look at your .gitconfig and see if
you have something weird going on?

--Andy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 10/16] x86/ioperm: Remove bitmap if all permissions dropped
  2019-11-11 22:03 ` [patch V2 10/16] x86/ioperm: Remove bitmap if all permissions dropped Thomas Gleixner
@ 2019-11-12 17:43   ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 17:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> If ioperm() results in a bitmap with all bits set (no permissions to any
> I/O port), then handling that bitmap on context switch and exit to user
> mode is pointless. Drop it.
>
> Move the bitmap exit handling to the ioport code and reuse it for both the
> thread exit path and dropping it. This allows to reuse this code for the
> upcoming iopl() emulation.

Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work
  2019-11-12 17:41       ` Andy Lutomirski
@ 2019-11-12 17:46         ` Linus Torvalds
  2019-11-13  8:30           ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2019-11-12 17:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, LKML, X86 ML, Stephen Hemminger, Willy Tarreau,
	Juergen Gross, Sean Christopherson, H. Peter Anvin

On Tue, Nov 12, 2019 at 9:42 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> Right.  But your diff tool *said* the diff was in
> exit_to_usermode_loop().  Can you look at your .gitconfig and see if
> you have something weird going on?

I think it's just that the pattern to find "start of new function" is
confused by the "__visible" or something.

Don't rely too much on the function names in the diff headers. They
can be confused by labels, or just by other things. I think it ends up
being "does the line start with alphabetic character" that is the
heuristic for "this is a function header".

           Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-11 22:03 ` [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical Thomas Gleixner
  2019-11-12  7:14   ` Ingo Molnar
  2019-11-12  9:15   ` Peter Zijlstra
@ 2019-11-12 18:12   ` Andy Lutomirski
  2 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 18:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> The I/O bitmap is duplicated on fork. That's wasting memory and slows down
> fork. There is no point to do so. As long as the bitmap is not modified it
> can be shared between threads and processes.
>
> Add a refcount and just share it on fork. If a task modifies the bitmap
> then it has to do the duplication if and only if it is shared.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch
> ---
>  arch/x86/include/asm/iobitmap.h |    5 +++++
>  arch/x86/kernel/ioport.c        |   38 ++++++++++++++++++++++++++++++++------
>  arch/x86/kernel/process.c       |   39 ++++++---------------------------------
>  3 files changed, 43 insertions(+), 39 deletions(-)
>
> --- a/arch/x86/include/asm/iobitmap.h
> +++ b/arch/x86/include/asm/iobitmap.h
> @@ -2,10 +2,12 @@
>  #ifndef _ASM_X86_IOBITMAP_H
>  #define _ASM_X86_IOBITMAP_H
>
> +#include <linux/refcount.h>
>  #include <asm/processor.h>
>
>  struct io_bitmap {
>         u64                     sequence;
> +       refcount_t              refcnt;
>         unsigned int            io_bitmap_max;
>         union {
>                 unsigned long   bits[IO_BITMAP_LONGS];
> @@ -13,6 +15,9 @@ struct io_bitmap {
>         };
>  };
>
> +struct task_struct;
> +
> +void io_bitmap_share(struct task_struct *tsk);
>  void io_bitmap_exit(void);
>
>  void tss_update_io_bitmap(void);
> --- a/arch/x86/kernel/ioport.c
> +++ b/arch/x86/kernel/ioport.c
> @@ -16,6 +16,17 @@
>
>  static atomic64_t io_bitmap_sequence;
>
> +void io_bitmap_share(struct task_struct *tsk)
> + {
> +       /*
> +        * Take a refcount on current's bitmap. It can be used by
> +        * both tasks as long as none of them changes the bitmap.
> +        */
> +       refcount_inc(&current->thread.io_bitmap->refcnt);
> +       tsk->thread.io_bitmap = current->thread.io_bitmap;
> +       set_tsk_thread_flag(tsk, TIF_IO_BITMAP);
> +}
> +
>  void io_bitmap_exit(void)
>  {
>         struct io_bitmap *iobm = current->thread.io_bitmap;
> @@ -25,7 +36,8 @@ void io_bitmap_exit(void)
>         clear_thread_flag(TIF_IO_BITMAP);
>         tss_update_io_bitmap();
>         preempt_enable();
> -       kfree(iobm);
> +       if (iobm && refcount_dec_and_test(&iobm->refcnt))
> +               kfree(iobm);
>  }
>
>  /*
> @@ -59,8 +71,26 @@ long ksys_ioperm(unsigned long from, uns
>                         return -ENOMEM;
>
>                 memset(iobm->bits, 0xff, sizeof(iobm->bits));
> +               refcount_set(&iobm->refcnt, 1);
> +       }
> +
> +       /*
> +        * If the bitmap is not shared, then nothing can take a refcount as
> +        * current can obviously not fork at the same time. If it's shared
> +        * duplicate it and drop the refcount on the original one.
> +        */
> +       if (refcount_read(&iobm->refcnt) > 1) {
> +               iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> +               if (!iobm)
> +                       return -ENOMEM;
> +               io_bitmap_exit();

And change the refcount to 1?

>         }
>

Otherwise:

Acked-by: Andy Lutomirski <luto@kernel.org>

(I'm about to send, and I see PeterZ beat me to the punch.  You can
still have the ack, though.)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 13/16] x86/iopl: Fixup misleading comment
  2019-11-11 22:03 ` [patch V2 13/16] x86/iopl: Fixup misleading comment Thomas Gleixner
@ 2019-11-12 18:14   ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 18:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The comment for the sys_iopl() implementation is outdated and actively
> misleading in some parts. Fix it up.

Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 14/16] x86/iopl: Restrict iopl() permission scope
  2019-11-11 22:03 ` [patch V2 14/16] x86/iopl: Restrict iopl() permission scope Thomas Gleixner
  2019-11-11 23:03   ` Thomas Gleixner
  2019-11-12  8:42   ` Ingo Molnar
@ 2019-11-12 18:35   ` Andy Lutomirski
  2 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 18:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The access to the full I/O port range can be also provided by the TSS I/O
> bitmap, but that would require to copy 8k of data on scheduling in the
> task. As shown with the sched out optimization TSS.io_bitmap_base can be
> used to switch the incoming task to a preallocated I/O bitmap which has all
> bits zero, i.e. allows access to all I/O ports.
>
> Implementing this allows to provide an iopl() emulation mode which restricts
> the IOPL level 3 permissions to I/O port access but removes the STI/CLI
> permission which is coming with the hardware IOPL mechansim.
>
> Provide a config option to switch IOPL to emulation mode, make it the
> default and while at it also provide an option to disable IOPL completely.

Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 15/16] x86/iopl: Remove legacy IOPL option
  2019-11-11 22:03 ` [patch V2 15/16] x86/iopl: Remove legacy IOPL option Thomas Gleixner
@ 2019-11-12 18:37   ` Andy Lutomirski
  2019-11-12 19:40     ` Thomas Gleixner
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2019-11-12 18:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy
> cruft dealing with the (e)flags based IOPL mechanism.

Acked-by: Andy Lutomirski <luto@kernel.org>

But I think you could simplify a little bit and have a single config
option that controls the iopl() and iopl() syscalls.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 15/16] x86/iopl: Remove legacy IOPL option
  2019-11-12 18:37   ` Andy Lutomirski
@ 2019-11-12 19:40     ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-12 19:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Linus Torvalds, Stephen Hemminger, Willy Tarreau,
	Juergen Gross, Sean Christopherson, H. Peter Anvin

On Tue, 12 Nov 2019, Andy Lutomirski wrote:
> On Mon, Nov 11, 2019 at 2:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > From: Thomas Gleixner <tglx@linutronix.de>
> >
> > The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy
> > cruft dealing with the (e)flags based IOPL mechanism.
> 
> Acked-by: Andy Lutomirski <luto@kernel.org>
> 
> But I think you could simplify a little bit and have a single config
> option that controls the iopl() and iopl() syscalls.

You mean turning off both iopl() and ioperm() in one go. Yes, that'd be
possible. Let me look.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work
  2019-11-12 17:46         ` Linus Torvalds
@ 2019-11-13  8:30           ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2019-11-13  8:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, LKML, X86 ML,
	Stephen Hemminger, Willy Tarreau, Juergen Gross,
	Sean Christopherson, H. Peter Anvin

On Tue, Nov 12, 2019 at 09:46:30AM -0800, Linus Torvalds wrote:

> I think it's just that the pattern to find "start of new function" is
> confused by the "__visible" or something.
> 
> Don't rely too much on the function names in the diff headers. They
> can be confused by labels, or just by other things. I think it ends up
> being "does the line start with alphabetic character" that is the
> heuristic for "this is a function header".

Right, so I have this in my .gitconfig:

[diff "default"]
	xfuncname = "^[[:alpha:]$_].*[^:]$"

Which works nice for C in that it will no longer accept labels as
funcname, but it stinks for ASM :-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-12  9:15   ` Peter Zijlstra
  2019-11-12  9:51     ` Thomas Gleixner
@ 2019-11-14 11:02     ` David Laight
  2019-11-14 12:39       ` Thomas Gleixner
  2019-11-14 13:09       ` Peter Zijlstra
  1 sibling, 2 replies; 57+ messages in thread
From: David Laight @ 2019-11-14 11:02 UTC (permalink / raw)
  To: 'Peter Zijlstra', Thomas Gleixner
  Cc: LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

From: Peter Zijlstra
> Sent: 12 November 2019 09:15
...
> > +	/*
> > +	 * If the bitmap is not shared, then nothing can take a refcount as
> > +	 * current can obviously not fork at the same time. If it's shared
> > +	 * duplicate it and drop the refcount on the original one.
> > +	 */
> > +	if (refcount_read(&iobm->refcnt) > 1) {
> > +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> > +		if (!iobm)
> > +			return -ENOMEM;
> > +		io_bitmap_exit();
> 		refcount_set(&iobm->refcnd, 1);
> >  	}

What happens if two threads of the same process enter the above
at the same time?
(I've not looked for a lock, but since kmemdup() can sleep I'd suspect there isn't one.)

Also can another thread call fork() - eg while the kmemdup() is sleeping?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-14 11:02     ` David Laight
@ 2019-11-14 12:39       ` Thomas Gleixner
  2019-11-14 13:09       ` Peter Zijlstra
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2019-11-14 12:39 UTC (permalink / raw)
  To: David Laight
  Cc: 'Peter Zijlstra',
	LKML, x86, Linus Torvalds, Andy Lutomirski, Stephen Hemminger,
	Willy Tarreau, Juergen Gross, Sean Christopherson,
	H. Peter Anvin

On Thu, 14 Nov 2019, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 12 November 2019 09:15
> ...
> > > +	/*
> > > +	 * If the bitmap is not shared, then nothing can take a refcount as
> > > +	 * current can obviously not fork at the same time. If it's shared
> > > +	 * duplicate it and drop the refcount on the original one.
> > > +	 */
> > > +	if (refcount_read(&iobm->refcnt) > 1) {
> > > +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> > > +		if (!iobm)
> > > +			return -ENOMEM;
> > > +		io_bitmap_exit();
> > 		refcount_set(&iobm->refcnt, 1);
> > >  	}
> 
> What happens if two threads of the same process enter the above
> at the same time?
> (I've not looked for a lock, but since kmemdup() can sleep I'd suspect there isn't one.)
> 
> Also can another thread call fork() - eg while the kmemdup() is sleeping?

That's not a problem at all. The io bitmap which is duplicated can neither
be modified nor freed while the duplication is in progress. That's what the
refcount is for. The threads drop their refcount _after_ duplication.

And fork is not a problem either because that just increments the refcount.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-14 11:02     ` David Laight
  2019-11-14 12:39       ` Thomas Gleixner
@ 2019-11-14 13:09       ` Peter Zijlstra
  2019-11-14 13:22         ` David Laight
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2019-11-14 13:09 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Gleixner, LKML, x86, Linus Torvalds, Andy Lutomirski,
	Stephen Hemminger, Willy Tarreau, Juergen Gross,
	Sean Christopherson, H. Peter Anvin

On Thu, Nov 14, 2019 at 11:02:01AM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 12 November 2019 09:15
> ...
> > > +	/*
> > > +	 * If the bitmap is not shared, then nothing can take a refcount as
> > > +	 * current can obviously not fork at the same time. If it's shared
> > > +	 * duplicate it and drop the refcount on the original one.
> > > +	 */
> > > +	if (refcount_read(&iobm->refcnt) > 1) {
> > > +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> > > +		if (!iobm)
> > > +			return -ENOMEM;
> > > +		io_bitmap_exit();
> > 		refcount_set(&iobm->refcnd, 1);
> > >  	}
> 
> What happens if two threads of the same process enter the above
> at the same time?

Suppose there's just the two threads, and both will change it. Then both
do copy-on-write and the original gets freed.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical
  2019-11-14 13:09       ` Peter Zijlstra
@ 2019-11-14 13:22         ` David Laight
  0 siblings, 0 replies; 57+ messages in thread
From: David Laight @ 2019-11-14 13:22 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: Thomas Gleixner, LKML, x86, Linus Torvalds, Andy Lutomirski,
	Stephen Hemminger, Willy Tarreau, Juergen Gross,
	Sean Christopherson, H. Peter Anvin

From: Peter Zijlstra <peterz@infradead.org>
> On Thu, Nov 14, 2019 at 11:02:01AM +0000, David Laight wrote:
> > From: Peter Zijlstra
> > > Sent: 12 November 2019 09:15
> > ...
> > > > +	/*
> > > > +	 * If the bitmap is not shared, then nothing can take a refcount as
> > > > +	 * current can obviously not fork at the same time. If it's shared
> > > > +	 * duplicate it and drop the refcount on the original one.
> > > > +	 */
> > > > +	if (refcount_read(&iobm->refcnt) > 1) {
> > > > +		iobm = kmemdup(iobm, sizeof(*iobm), GFP_KERNEL);
> > > > +		if (!iobm)
> > > > +			return -ENOMEM;
> > > > +		io_bitmap_exit();
> > > 		refcount_set(&iobm->refcnd, 1);
> > > >  	}
> >
> > What happens if two threads of the same process enter the above
> > at the same time?
> 
> Suppose there's just the two threads, and both will change it. Then both
> do copy-on-write and the original gets freed.

I was probably forgetting that the linux kernel uses (more or less) full 'process'
structures for threads, rather than separate 'thread' data areas.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2019-11-14 13:22 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11 22:03 [patch V2 00/16] x86/iopl: Prevent user space from using CLI/STI with iopl(3) Thomas Gleixner
2019-11-11 22:03 ` [patch V2 01/16] x86/ptrace: Prevent truncation of bitmap size Thomas Gleixner
2019-11-12 15:34   ` Andy Lutomirski
2019-11-11 22:03 ` [patch V2 02/16] x86/process: Unify copy_thread_tls() Thomas Gleixner
2019-11-11 22:03 ` [patch V2 03/16] x86/cpu: Unify cpu_init() Thomas Gleixner
2019-11-11 22:03 ` [patch V2 04/16] x86/tss: Fix and move VMX BUILD_BUG_ON() Thomas Gleixner
2019-11-11 22:44   ` Paolo Bonzini
2019-11-12 15:37   ` Andy Lutomirski
2019-11-11 22:03 ` [patch V2 05/16] x86/iopl: Cleanup include maze Thomas Gleixner
2019-11-12 15:37   ` Andy Lutomirski
2019-11-11 22:03 ` [patch V2 06/16] x86/io: Speedup schedule out of I/O bitmap user Thomas Gleixner
2019-11-12 16:00   ` Andy Lutomirski
2019-11-12 17:08     ` Thomas Gleixner
2019-11-11 22:03 ` [patch V2 07/16] x86/ioperm: Move iobitmap data into a struct Thomas Gleixner
2019-11-12 16:02   ` Andy Lutomirski
2019-11-12 17:08     ` Thomas Gleixner
2019-11-11 22:03 ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Thomas Gleixner
2019-11-12  9:22   ` Peter Zijlstra
2019-11-12  9:55     ` [patch V2 08/16] x86/ioperm: Add bitmap sequence numberc Thomas Gleixner
2019-11-12 16:08   ` [patch V2 08/16] x86/ioperm: Add bitmap sequence number Andy Lutomirski
2019-11-12 17:10     ` Thomas Gleixner
2019-11-11 22:03 ` [patch V2 09/16] x86/ioperm: Move TSS bitmap update to exit to user work Thomas Gleixner
2019-11-12 16:16   ` Andy Lutomirski
2019-11-12 17:20     ` Thomas Gleixner
2019-11-12 17:41       ` Andy Lutomirski
2019-11-12 17:46         ` Linus Torvalds
2019-11-13  8:30           ` Peter Zijlstra
2019-11-11 22:03 ` [patch V2 10/16] x86/ioperm: Remove bitmap if all permissions dropped Thomas Gleixner
2019-11-12 17:43   ` Andy Lutomirski
2019-11-11 22:03 ` [patch V2 11/16] x86/ioperm: Share I/O bitmap if identical Thomas Gleixner
2019-11-12  7:14   ` Ingo Molnar
2019-11-12  7:17     ` Thomas Gleixner
2019-11-12  7:52       ` Ingo Molnar
2019-11-12  9:15   ` Peter Zijlstra
2019-11-12  9:51     ` Thomas Gleixner
2019-11-14 11:02     ` David Laight
2019-11-14 12:39       ` Thomas Gleixner
2019-11-14 13:09       ` Peter Zijlstra
2019-11-14 13:22         ` David Laight
2019-11-12 18:12   ` Andy Lutomirski
2019-11-11 22:03 ` [patch V2 12/16] selftests/x86/ioperm: Extend testing so the shared bitmap is exercised Thomas Gleixner
2019-11-11 22:03 ` [patch V2 13/16] x86/iopl: Fixup misleading comment Thomas Gleixner
2019-11-12 18:14   ` Andy Lutomirski
2019-11-11 22:03 ` [patch V2 14/16] x86/iopl: Restrict iopl() permission scope Thomas Gleixner
2019-11-11 23:03   ` Thomas Gleixner
2019-11-12  6:32     ` Ingo Molnar
2019-11-12  8:42   ` Ingo Molnar
2019-11-12 10:07     ` Thomas Gleixner
2019-11-12 18:35   ` Andy Lutomirski
2019-11-11 22:03 ` [patch V2 15/16] x86/iopl: Remove legacy IOPL option Thomas Gleixner
2019-11-12 18:37   ` Andy Lutomirski
2019-11-12 19:40     ` Thomas Gleixner
2019-11-11 22:03 ` [patch V2 16/16] selftests/x86/iopl: Extend test to cover IOPL emulation Thomas Gleixner
2019-11-12  7:40 ` [PATCH] x86/iopl: Factor out IO-bitmap related TSS fields into 'struct x86_io_bitmap' Ingo Molnar
2019-11-12  7:59   ` [PATCH] x86/iopl: Harmonize 'struct io_bitmap' and 'struct x86_io_bitmap' nomenclature Ingo Molnar
2019-11-12  8:11   ` [PATCH] x86/iopl: Clear up the role of the two bitmap copying fields Ingo Molnar
2019-11-12  8:15   ` [PATCH] x86/iopl: Rename <asm/iobitmap.h> to <asm/io_bitmap.h> Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).