linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction
@ 2016-11-08 18:39 Kyle Huey
  2016-11-08 18:39 ` [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl Kyle Huey
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

rr (http://rr-project.org/), a userspace record-and-replay reverse-
execution debugger, would like to trap and emulate the CPUID instruction.
This would allow us to a) mask away certain hardware features that rr does
not support (e.g. RDRAND) and b) enable trace portability across machines
by providing constant results.

Newer Intel CPUs (Ivy Bridge and later) can fault when CPUID is executed at
CPL > 0. Expose this capability to userspace as a new pair of arch_prctls,
ARCH_GET_CPUID and ARCH_SET_CPUID, with two values, ARCH_CPUID_ENABLE and
ARCH_CPUID_SIGSEGV.

Since v9:
Patch 7: KVM: x86: virtualize cpuid faulting
- Fixed wrong condition when testing for a hypervisor disabling cpuid
  faulting while it is active.
- Now stores MSRs as u64 for future extensibility.
- Added cpuid_fault_enabled and supports_cpuid_fault helper functions.
- Style nits.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl
  2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
@ 2016-11-08 18:39 ` Kyle Huey
  2016-11-09  9:47   ` Borislav Petkov
  2016-11-08 18:39 ` [PATCH v10 2/7] x86/arch_prctl/64: Rename do_arch_prctl to do_arch_prctl_64 Kyle Huey
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/kernel/process_64.c | 3 ++-
 arch/x86/um/syscalls_64.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3760b3..2718cf9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -30,16 +30,17 @@
 #include <linux/ptrace.h>
 #include <linux/notifier.h>
 #include <linux/kprobes.h>
 #include <linux/kdebug.h>
 #include <linux/prctl.h>
 #include <linux/uaccess.h>
 #include <linux/io.h>
 #include <linux/ftrace.h>
+#include <linux/syscalls.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
 #include <asm/fpu/internal.h>
 #include <asm/mmu_context.h>
 #include <asm/prctl.h>
 #include <asm/desc.h>
 #include <asm/proto.h>
@@ -607,17 +608,17 @@ long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
 	default:
 		ret = -EINVAL;
 		break;
 	}
 
 	return ret;
 }
 
-long sys_arch_prctl(int code, unsigned long addr)
+SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, addr)
 {
 	return do_arch_prctl(current, code, addr);
 }
 
 unsigned long KSTK_ESP(struct task_struct *task)
 {
 	return task_pt_regs(task)->sp;
 }
diff --git a/arch/x86/um/syscalls_64.c b/arch/x86/um/syscalls_64.c
index e655227..ab3f7f4 100644
--- a/arch/x86/um/syscalls_64.c
+++ b/arch/x86/um/syscalls_64.c
@@ -1,16 +1,17 @@
 /*
  * Copyright (C) 2003 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  * Copyright 2003 PathScale, Inc.
  *
  * Licensed under the GPL
  */
 
 #include <linux/sched.h>
+#include <linux/syscalls.h>
 #include <linux/uaccess.h>
 #include <asm/prctl.h> /* XXX This should get the constants from libc */
 #include <os.h>
 
 long arch_prctl(struct task_struct *task, int code, unsigned long __user *addr)
 {
 	unsigned long *ptr = addr, tmp;
 	long ret;
@@ -67,17 +68,17 @@ long arch_prctl(struct task_struct *task, int code, unsigned long __user *addr)
 	case ARCH_GET_GS:
 		ret = put_user(tmp, addr);
 		break;
 	}
 
 	return ret;
 }
 
-long sys_arch_prctl(int code, unsigned long addr)
+SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, addr)
 {
 	return arch_prctl(current, code, (unsigned long __user *) addr);
 }
 
 void arch_switch_to(struct task_struct *to)
 {
 	if ((to->thread.arch.fs == 0) || (to->mm == NULL))
 		return;

base-commit: e3a00f68e426df24a5fb98956a1bd1b23943aa1e
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 2/7] x86/arch_prctl/64: Rename do_arch_prctl to do_arch_prctl_64
  2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
  2016-11-08 18:39 ` [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl Kyle Huey
@ 2016-11-08 18:39 ` Kyle Huey
  2016-11-09  9:58   ` Borislav Petkov
  2016-11-08 18:39 ` [PATCH v10 3/7] x86/arch_prctl: Add do_arch_prctl_common Kyle Huey
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

In order to introduce new arch_prctls that are not 64 bit only, rename the
existing 64 bit implementation to do_arch_prctl_64. Also rename the second
argument to arch_prctl, which will no longer always be an address.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/proto.h |  4 +++-
 arch/x86/kernel/process_64.c | 32 +++++++++++++++++---------------
 arch/x86/kernel/ptrace.c     |  8 ++++----
 arch/x86/um/syscalls_64.c    |  4 ++--
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 9b9b30b..95c3e51 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -25,11 +25,13 @@ void entry_SYSCALL_compat(void);
 void entry_INT80_compat(void);
 #endif
 
 void x86_configure_nx(void);
 void x86_report_nx(void);
 
 extern int reboot_force;
 
-long do_arch_prctl(struct task_struct *task, int code, unsigned long addr);
+#ifdef CONFIG_X86_64
+long do_arch_prctl_64(struct task_struct *task, int code, unsigned long arg2);
+#endif
 
 #endif /* _ASM_X86_PROTO_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2718cf9..611df20 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -193,17 +193,17 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	 */
 	if (clone_flags & CLONE_SETTLS) {
 #ifdef CONFIG_IA32_EMULATION
 		if (in_ia32_syscall())
 			err = do_set_thread_area(p, -1,
 				(struct user_desc __user *)tls, 0);
 		else
 #endif
-			err = do_arch_prctl(p, ARCH_SET_FS, tls);
+			err = do_arch_prctl_64(p, ARCH_SET_FS, tls);
 		if (err)
 			goto out;
 	}
 	err = 0;
 out:
 	if (err && p->thread.io_bitmap_ptr) {
 		kfree(p->thread.io_bitmap_ptr);
 		p->thread.io_bitmap_max = 0;
@@ -534,91 +534,93 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 	ret = map_vdso_once(image, addr);
 	if (ret)
 		return ret;
 
 	return (long)image->size;
 }
 #endif
 
-long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
+long do_arch_prctl_64(struct task_struct *task, int code, unsigned long arg2)
 {
 	int ret = 0;
 	int doit = task == current;
 	int cpu;
 
 	switch (code) {
 	case ARCH_SET_GS:
-		if (addr >= TASK_SIZE_MAX)
+		if (arg2 >= TASK_SIZE_MAX)
 			return -EPERM;
 		cpu = get_cpu();
 		task->thread.gsindex = 0;
-		task->thread.gsbase = addr;
+		task->thread.gsbase = arg2;
 		if (doit) {
 			load_gs_index(0);
-			ret = wrmsrl_safe(MSR_KERNEL_GS_BASE, addr);
+			ret = wrmsrl_safe(MSR_KERNEL_GS_BASE, arg2);
 		}
 		put_cpu();
 		break;
 	case ARCH_SET_FS:
 		/* Not strictly needed for fs, but do it for symmetry
 		   with gs */
-		if (addr >= TASK_SIZE_MAX)
+		if (arg2 >= TASK_SIZE_MAX)
 			return -EPERM;
 		cpu = get_cpu();
 		task->thread.fsindex = 0;
-		task->thread.fsbase = addr;
+		task->thread.fsbase = arg2;
 		if (doit) {
 			/* set the selector to 0 to not confuse __switch_to */
 			loadsegment(fs, 0);
-			ret = wrmsrl_safe(MSR_FS_BASE, addr);
+			ret = wrmsrl_safe(MSR_FS_BASE, arg2);
 		}
 		put_cpu();
 		break;
 	case ARCH_GET_FS: {
 		unsigned long base;
+
 		if (doit)
 			rdmsrl(MSR_FS_BASE, base);
 		else
 			base = task->thread.fsbase;
-		ret = put_user(base, (unsigned long __user *)addr);
+		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
 	case ARCH_GET_GS: {
 		unsigned long base;
+
 		if (doit)
 			rdmsrl(MSR_KERNEL_GS_BASE, base);
 		else
 			base = task->thread.gsbase;
-		ret = put_user(base, (unsigned long __user *)addr);
+		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
 # ifdef CONFIG_X86_X32_ABI
 	case ARCH_MAP_VDSO_X32:
-		return prctl_map_vdso(&vdso_image_x32, addr);
+		return prctl_map_vdso(&vdso_image_x32, arg2);
 # endif
 # if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
 	case ARCH_MAP_VDSO_32:
-		return prctl_map_vdso(&vdso_image_32, addr);
+		return prctl_map_vdso(&vdso_image_32, arg2);
 # endif
 	case ARCH_MAP_VDSO_64:
-		return prctl_map_vdso(&vdso_image_64, addr);
+		return prctl_map_vdso(&vdso_image_64, arg2);
 #endif
 
 	default:
 		ret = -EINVAL;
 		break;
 	}
 
 	return ret;
 }
 
-SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, addr)
+SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
 {
-	return do_arch_prctl(current, code, addr);
+	return do_arch_prctl_64(current, code, arg2);
 }
 
 unsigned long KSTK_ESP(struct task_struct *task)
 {
 	return task_pt_regs(task)->sp;
 }
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 0e63c02..5004302 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -390,31 +390,31 @@ static int putreg(struct task_struct *child,
 	case offsetof(struct user_regs_struct, flags):
 		return set_flags(child, value);
 
 #ifdef CONFIG_X86_64
 	case offsetof(struct user_regs_struct,fs_base):
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
 		/*
-		 * When changing the segment base, use do_arch_prctl
+		 * When changing the segment base, use do_arch_prctl_64
 		 * to set either thread.fs or thread.fsindex and the
 		 * corresponding GDT slot.
 		 */
 		if (child->thread.fsbase != value)
-			return do_arch_prctl(child, ARCH_SET_FS, value);
+			return do_arch_prctl_64(child, ARCH_SET_FS, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
 		/*
 		 * Exactly the same here as the %fs handling above.
 		 */
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
 		if (child->thread.gsbase != value)
-			return do_arch_prctl(child, ARCH_SET_GS, value);
+			return do_arch_prctl_64(child, ARCH_SET_GS, value);
 		return 0;
 #endif
 	}
 
 	*pt_regs_access(task_pt_regs(child), offset) = value;
 	return 0;
 }
 
@@ -863,17 +863,17 @@ long arch_ptrace(struct task_struct *child, long request,
 		break;
 #endif
 
 #ifdef CONFIG_X86_64
 		/* normal 64bit interface to access TLS data.
 		   Works just like arch_prctl, except that the arguments
 		   are reversed. */
 	case PTRACE_ARCH_PRCTL:
-		ret = do_arch_prctl(child, data, addr);
+		ret = do_arch_prctl_64(child, data, addr);
 		break;
 #endif
 
 	default:
 		ret = ptrace_request(child, request, addr, data);
 		break;
 	}
 
diff --git a/arch/x86/um/syscalls_64.c b/arch/x86/um/syscalls_64.c
index ab3f7f4..3362c4e 100644
--- a/arch/x86/um/syscalls_64.c
+++ b/arch/x86/um/syscalls_64.c
@@ -68,19 +68,19 @@ long arch_prctl(struct task_struct *task, int code, unsigned long __user *addr)
 	case ARCH_GET_GS:
 		ret = put_user(tmp, addr);
 		break;
 	}
 
 	return ret;
 }
 
-SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, addr)
+SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
 {
-	return arch_prctl(current, code, (unsigned long __user *) addr);
+	return arch_prctl(current, code, (unsigned long __user *) arg2);
 }
 
 void arch_switch_to(struct task_struct *to)
 {
 	if ((to->thread.arch.fs == 0) || (to->mm == NULL))
 		return;
 
 	arch_prctl(to, ARCH_SET_FS, (void __user *) to->thread.arch.fs);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 3/7] x86/arch_prctl: Add do_arch_prctl_common
  2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
  2016-11-08 18:39 ` [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl Kyle Huey
  2016-11-08 18:39 ` [PATCH v10 2/7] x86/arch_prctl/64: Rename do_arch_prctl to do_arch_prctl_64 Kyle Huey
@ 2016-11-08 18:39 ` Kyle Huey
  2016-11-09 10:31   ` Borislav Petkov
  2016-11-08 18:39 ` [PATCH v10 4/7] x86/syscalls/32: Wire up arch_prctl on x86-32 Kyle Huey
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

Add do_arch_prctl_common to handle arch_prctls that are not specific to 64
bits. Call it from the syscall entry point, but not any of the other
callsites in the kernel, which all want one of the existing 64 bit only
arch_prctls.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/include/asm/proto.h | 1 +
 arch/x86/kernel/process.c    | 5 +++++
 arch/x86/kernel/process_64.c | 8 +++++++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 95c3e51..f72b551 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -25,13 +25,14 @@ void entry_SYSCALL_compat(void);
 void entry_INT80_compat(void);
 #endif
 
 void x86_configure_nx(void);
 void x86_report_nx(void);
 
 extern int reboot_force;
 
+long do_arch_prctl_common(struct task_struct *task, int code, unsigned long arg2);
 #ifdef CONFIG_X86_64
 long do_arch_prctl_64(struct task_struct *task, int code, unsigned long arg2);
 #endif
 
 #endif /* _ASM_X86_PROTO_H */
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a87..d0126b2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -579,8 +579,13 @@ unsigned long get_wchan(struct task_struct *p)
 		}
 		fp = READ_ONCE_NOCHECK(*(unsigned long *)fp);
 	} while (count++ < 16 && p->state != TASK_RUNNING);
 
 out:
 	put_task_stack(p);
 	return ret;
 }
+
+long do_arch_prctl_common(struct task_struct *task, int code, unsigned long arg2)
+{
+	return -EINVAL;
+}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 611df20..bf75d26 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -612,15 +612,21 @@ long do_arch_prctl_64(struct task_struct *task, int code, unsigned long arg2)
 		break;
 	}
 
 	return ret;
 }
 
 SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
 {
-	return do_arch_prctl_64(current, code, arg2);
+	long ret;
+
+	ret = do_arch_prctl_64(current, code, arg2);
+	if (ret == -EINVAL)
+		ret = do_arch_prctl_common(current, code, arg2);
+
+	return ret;
 }
 
 unsigned long KSTK_ESP(struct task_struct *task)
 {
 	return task_pt_regs(task)->sp;
 }
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 4/7] x86/syscalls/32: Wire up arch_prctl on x86-32
  2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
                   ` (2 preceding siblings ...)
  2016-11-08 18:39 ` [PATCH v10 3/7] x86/arch_prctl: Add do_arch_prctl_common Kyle Huey
@ 2016-11-08 18:39 ` Kyle Huey
  2016-11-09 11:04   ` Borislav Petkov
  2016-11-08 18:39 ` [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support Kyle Huey
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

Hook up arch_prctl to call do_arch_prctl on x86-32, and in 32 bit compat
mode on x86-64. This allows us to have arch_prctls that are not specific to
64 bits.

On UML, simply stub out this syscall.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/kernel/process_32.c           | 7 +++++++
 arch/x86/kernel/process_64.c           | 7 +++++++
 arch/x86/um/Makefile                   | 2 +-
 arch/x86/um/syscalls_32.c              | 7 +++++++
 include/linux/compat.h                 | 2 ++
 6 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/um/syscalls_32.c

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 2b36185..d78c6b5 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -384,8 +384,9 @@
 375	i386	membarrier		sys_membarrier
 376	i386	mlock2			sys_mlock2
 377	i386	copy_file_range		sys_copy_file_range
 378	i386	preadv2			sys_preadv2			compat_sys_preadv2
 379	i386	pwritev2		sys_pwritev2			compat_sys_pwritev2
 380	i386	pkey_mprotect		sys_pkey_mprotect
 381	i386	pkey_alloc		sys_pkey_alloc
 382	i386	pkey_free		sys_pkey_free
+385	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index bd7be8e..95d3adc 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -30,16 +30,17 @@
 #include <linux/ptrace.h>
 #include <linux/personality.h>
 #include <linux/percpu.h>
 #include <linux/prctl.h>
 #include <linux/ftrace.h>
 #include <linux/uaccess.h>
 #include <linux/io.h>
 #include <linux/kdebug.h>
+#include <linux/syscalls.h>
 
 #include <asm/pgtable.h>
 #include <asm/ldt.h>
 #include <asm/processor.h>
 #include <asm/fpu/internal.h>
 #include <asm/desc.h>
 #ifdef CONFIG_MATH_EMULATION
 #include <asm/math_emu.h>
@@ -49,16 +50,17 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
 #include <asm/idle.h>
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
+#include <asm/proto.h>
 
 void __show_regs(struct pt_regs *regs, int all)
 {
 	unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
 	unsigned long d0, d1, d2, d3, d6, d7;
 	unsigned long sp;
 	unsigned short ss, gs;
 
@@ -296,8 +298,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 		lazy_load_gs(next->gs);
 
 	switch_fpu_finish(next_fpu, fpu_switch);
 
 	this_cpu_write(current_task, next_p);
 
 	return prev_p;
 }
+
+SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
+{
+	return do_arch_prctl_common(current, code, arg2);
+}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index bf75d26..3a2a84d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -621,12 +621,19 @@ SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
 
 	ret = do_arch_prctl_64(current, code, arg2);
 	if (ret == -EINVAL)
 		ret = do_arch_prctl_common(current, code, arg2);
 
 	return ret;
 }
 
+#ifdef CONFIG_IA32_EMULATION
+COMPAT_SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
+{
+	return do_arch_prctl_common(current, code, arg2);
+}
+#endif
+
 unsigned long KSTK_ESP(struct task_struct *task)
 {
 	return task_pt_regs(task)->sp;
 }
diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile
index e7e7055..69f0827 100644
--- a/arch/x86/um/Makefile
+++ b/arch/x86/um/Makefile
@@ -11,17 +11,17 @@ endif
 obj-y = bug.o bugs_$(BITS).o delay.o fault.o ldt.o \
 	ptrace_$(BITS).o ptrace_user.o setjmp_$(BITS).o signal.o \
 	stub_$(BITS).o stub_segv.o \
 	sys_call_table_$(BITS).o sysrq_$(BITS).o tls_$(BITS).o \
 	mem_$(BITS).o subarch.o os-$(OS)/
 
 ifeq ($(CONFIG_X86_32),y)
 
-obj-y += checksum_32.o
+obj-y += checksum_32.o syscalls_32.o
 obj-$(CONFIG_ELF_CORE) += elfcore.o
 
 subarch-y = ../lib/string_32.o ../lib/atomic64_32.o ../lib/atomic64_cx8_32.o
 subarch-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += ../lib/rwsem.o
 
 else
 
 obj-y += syscalls_64.o vdso/
diff --git a/arch/x86/um/syscalls_32.c b/arch/x86/um/syscalls_32.c
new file mode 100644
index 0000000..ccf0598
--- /dev/null
+++ b/arch/x86/um/syscalls_32.c
@@ -0,0 +1,7 @@
+#include <linux/syscalls.h>
+#include <os.h>
+
+SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
+{
+	return -EINVAL;
+}
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 6360939..500cdb3 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -716,16 +716,18 @@ int __compat_save_altstack(compat_stack_t __user *, unsigned long);
 } while (0);
 
 asmlinkage long compat_sys_sched_rr_get_interval(compat_pid_t pid,
 						 struct compat_timespec __user *interval);
 
 asmlinkage long compat_sys_fanotify_mark(int, unsigned int, __u32, __u32,
 					    int, const char __user *);
 
+asmlinkage long compat_sys_arch_prctl(int, unsigned long);
+
 /*
  * For most but not all architectures, "am I in a compat syscall?" and
  * "am I a compat task?" are the same question.  For architectures on which
  * they aren't the same question, arch code can override in_compat_syscall.
  */
 
 #ifndef in_compat_syscall
 static inline bool in_compat_syscall(void) { return is_compat_task(); }
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support
  2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
                   ` (3 preceding siblings ...)
  2016-11-08 18:39 ` [PATCH v10 4/7] x86/syscalls/32: Wire up arch_prctl on x86-32 Kyle Huey
@ 2016-11-08 18:39 ` Kyle Huey
  2016-11-08 19:06   ` Thomas Gleixner
  2016-11-09 11:14   ` Borislav Petkov
  2016-11-08 18:39 ` [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID Kyle Huey
  2016-11-08 18:39 ` [PATCH v10 7/7] KVM: x86: virtualize cpuid faulting Kyle Huey
  6 siblings, 2 replies; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. This will allow a ptracer to emulate the CPUID
instruction.

Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
documented in detail in Section 2.3.2 of
http://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf

Detect support for this feature and expose it as X86_FEATURE_CPUID_FAULT.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  2 ++
 arch/x86/kernel/cpu/scattered.c    | 22 +++++++++++++++++++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index a396292..62962e8 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -184,16 +184,17 @@
  * Auxiliary flags: Linux defined - For features scattered in various
  * CPUID levels like 0x6, 0xA etc, word 7.
  *
  * Reuse free bits when adding new feature flags!
  */
 
 #define X86_FEATURE_CPB		( 7*32+ 2) /* AMD Core Performance Boost */
 #define X86_FEATURE_EPB		( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */
+#define X86_FEATURE_CPUID_FAULT ( 7*32+ 4) /* Intel CPUID faulting */
 
 #define X86_FEATURE_HW_PSTATE	( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
 
 #define X86_FEATURE_INTEL_PT	( 7*32+15) /* Intel Processor Trace */
 #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 78f3760..97fb50b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -36,16 +36,18 @@
 #define EFER_LMSLE		(1<<_EFER_LMSLE)
 #define EFER_FFXSR		(1<<_EFER_FFXSR)
 
 /* Intel MSRs. Some also available on other CPUs */
 #define MSR_IA32_PERFCTR0		0x000000c1
 #define MSR_IA32_PERFCTR1		0x000000c2
 #define MSR_FSB_FREQ			0x000000cd
 #define MSR_PLATFORM_INFO		0x000000ce
+#define PLATINFO_CPUID_FAULT_BIT	31
+#define PLATINFO_CPUID_FAULT		(1ULL << PLATINFO_CPUID_FAULT_BIT)
 
 #define MSR_NHM_SNB_PKG_CST_CFG_CTL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
 #define ATM_LNC_C6_AUTO_DEMOTE		(1UL << 25)
 #define SNB_C1_AUTO_UNDEMOTE		(1UL << 27)
 #define SNB_C3_AUTO_UNDEMOTE		(1UL << 28)
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..97a340d 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -19,41 +19,61 @@ struct cpuid_bit {
 
 enum cpuid_regs {
 	CR_EAX = 0,
 	CR_ECX,
 	CR_EDX,
 	CR_EBX
 };
 
+struct msr_bit {
+	u16 feature;
+	u16 msr;
+	u8 bit;
+};
+
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
+	const struct cpuid_bit *cb;
+	const struct msr_bit *mb;
 	u32 max_level;
 	u32 regs[4];
-	const struct cpuid_bit *cb;
+	u64 msrval;
 
 	static const struct cpuid_bit cpuid_bits[] = {
 		{ X86_FEATURE_INTEL_PT,		CR_EBX,25, 0x00000007, 0 },
 		{ X86_FEATURE_AVX512_4VNNIW,	CR_EDX, 2, 0x00000007, 0 },
 		{ X86_FEATURE_AVX512_4FMAPS,	CR_EDX, 3, 0x00000007, 0 },
 		{ X86_FEATURE_APERFMPERF,	CR_ECX, 0, 0x00000006, 0 },
 		{ X86_FEATURE_EPB,		CR_ECX, 3, 0x00000006, 0 },
 		{ X86_FEATURE_HW_PSTATE,	CR_EDX, 7, 0x80000007, 0 },
 		{ X86_FEATURE_CPB,		CR_EDX, 9, 0x80000007, 0 },
 		{ X86_FEATURE_PROC_FEEDBACK,	CR_EDX,11, 0x80000007, 0 },
 		{ 0, 0, 0, 0, 0 }
 	};
 
+	static const struct msr_bit msr_bits[] = {
+		{ X86_FEATURE_CPUID_FAULT,	MSR_PLATFORM_INFO, 31 },
+		{ 0, 0, 0 }
+	};
+
 	for (cb = cpuid_bits; cb->feature; cb++) {
 
 		/* Verify that the level is valid */
 		max_level = cpuid_eax(cb->level & 0xffff0000);
 		if (max_level < cb->level ||
 		    max_level > (cb->level | 0xffff))
 			continue;
 
 		cpuid_count(cb->level, cb->sub_leaf, &regs[CR_EAX],
 			    &regs[CR_EBX], &regs[CR_ECX], &regs[CR_EDX]);
 
 		if (regs[cb->reg] & (1 << cb->bit))
 			set_cpu_cap(c, cb->feature);
 	}
+
+	for (mb = msr_bits; mb->feature; mb++) {
+		if (rdmsrl_safe(mb->msr, &msrval))
+			continue;
+		if (msrval & (1ULL << mb->bit))
+			set_cpu_cap(c, mb->feature);
+	}
 }
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
                   ` (4 preceding siblings ...)
  2016-11-08 18:39 ` [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support Kyle Huey
@ 2016-11-08 18:39 ` Kyle Huey
  2016-11-08 20:06   ` Thomas Gleixner
                     ` (2 more replies)
  2016-11-08 18:39 ` [PATCH v10 7/7] KVM: x86: virtualize cpuid faulting Kyle Huey
  6 siblings, 3 replies; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.

When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
http://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf

Implement a new pair of arch_prctls, available on both x86-32 and x86-64.

ARCH_GET_CPUID: Returns the current CPUID faulting state, either
  ARCH_CPUID_ENABLE or ARCH_CPUID_SIGSEGV. arg2 must be 0.

ARCH_SET_CPUID: Set the CPUID faulting state to arg2, which must be either
  ARCH_CPUID_ENABLE or ARCH_CPUID_SIGSEGV. Returns EINVAL if arg2 is
  another value or CPUID faulting is not supported on this system.

The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/include/asm/msr-index.h          |   3 +
 arch/x86/include/asm/processor.h          |   2 +
 arch/x86/include/asm/thread_info.h        |   6 +-
 arch/x86/include/uapi/asm/prctl.h         |   6 +
 arch/x86/kernel/cpu/scattered.c           |   5 +
 arch/x86/kernel/process.c                 |  85 ++++++++++
 fs/exec.c                                 |   1 +
 include/linux/thread_info.h               |   4 +
 tools/testing/selftests/x86/Makefile      |   2 +-
 tools/testing/selftests/x86/cpuid-fault.c | 250 ++++++++++++++++++++++++++++++
 10 files changed, 362 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/x86/cpuid-fault.c

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 97fb50b..cfcf647a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -49,16 +49,19 @@
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
 #define ATM_LNC_C6_AUTO_DEMOTE		(1UL << 25)
 #define SNB_C1_AUTO_UNDEMOTE		(1UL << 27)
 #define SNB_C3_AUTO_UNDEMOTE		(1UL << 28)
 
 #define MSR_MTRRcap			0x000000fe
 #define MSR_IA32_BBL_CR_CTL		0x00000119
 #define MSR_IA32_BBL_CR_CTL3		0x0000011e
+#define MSR_MISC_FEATURES_ENABLES	0x00000140
+#define CPUID_FAULT_ENABLE_BIT		0
+#define CPUID_FAULT_ENABLE		(1UL << CPUID_FAULT_ENABLE_BIT)
 
 #define MSR_IA32_SYSENTER_CS		0x00000174
 #define MSR_IA32_SYSENTER_ESP		0x00000175
 #define MSR_IA32_SYSENTER_EIP		0x00000176
 
 #define MSR_IA32_MCG_CAP		0x00000179
 #define MSR_IA32_MCG_STATUS		0x0000017a
 #define MSR_IA32_MCG_CTL		0x0000017b
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..4c1088c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -803,16 +803,18 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
 
 /* Get/set a process' ability to use the timestamp counter instruction */
 #define GET_TSC_CTL(adr)	get_tsc_mode((adr))
 #define SET_TSC_CTL(val)	set_tsc_mode((val))
 
 extern int get_tsc_mode(unsigned long adr);
 extern int set_tsc_mode(unsigned int val);
 
+DECLARE_PER_CPU(u64, msr_misc_features_enables_shadow);
+
 /* Register/unregister a process' MPX related resource */
 #define MPX_ENABLE_MANAGEMENT()	mpx_enable_management()
 #define MPX_DISABLE_MANAGEMENT()	mpx_disable_management()
 
 #ifdef CONFIG_X86_INTEL_MPX
 extern int mpx_enable_management(void);
 extern int mpx_disable_management(void);
 #else
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index ad6f5eb0..9fc44b9 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -82,16 +82,17 @@ struct thread_info {
 #define TIF_SIGPENDING		2	/* signal pending */
 #define TIF_NEED_RESCHED	3	/* rescheduling necessary */
 #define TIF_SINGLESTEP		4	/* reenable singlestep on user return*/
 #define TIF_SYSCALL_EMU		6	/* syscall emulation active */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SECCOMP		8	/* secure computing */
 #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
 #define TIF_UPROBE		12	/* breakpointed or singlestepping */
+#define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
 #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
 #define TIF_FORCED_TF		24	/* true if TF in eflags artificially */
 #define TIF_BLOCKSTEP		25	/* set when we want DEBUGCTLMSR_BTF */
@@ -105,16 +106,17 @@ struct thread_info {
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
 #define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
+#define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
 #define _TIF_BLOCKSTEP		(1 << TIF_BLOCKSTEP)
 #define _TIF_LAZY_MMU_UPDATES	(1 << TIF_LAZY_MMU_UPDATES)
@@ -133,17 +135,17 @@ struct thread_info {
 
 /* work to do on any return to user space */
 #define _TIF_ALLWORK_MASK						\
 	((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT |	\
 	_TIF_NOHZ)
 
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW							\
-	(_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP)
+	(_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP)
 
 #define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
 #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
 
 #define STACK_WARN		(THREAD_SIZE/8)
 
 /*
  * macros/functions for gaining access to the thread information structure
@@ -234,11 +236,13 @@ static inline int arch_within_stack_frames(const void * const stack,
  * EFLAGS values that other (fast) syscall return instructions
  * are not able to restore properly.
  */
 #define force_iret() set_thread_flag(TIF_NOTIFY_RESUME)
 
 extern void arch_task_cache_init(void);
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
 extern void arch_release_task_struct(struct task_struct *tsk);
+extern void arch_setup_new_exec(void);
+#define arch_setup_new_exec arch_setup_new_exec
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_THREAD_INFO_H */
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index ae135de..0f6389c 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -1,15 +1,21 @@
 #ifndef _ASM_X86_PRCTL_H
 #define _ASM_X86_PRCTL_H
 
 #define ARCH_SET_GS 0x1001
 #define ARCH_SET_FS 0x1002
 #define ARCH_GET_FS 0x1003
 #define ARCH_GET_GS 0x1004
 
+/* Get/set the process' ability to use the CPUID instruction */
+#define ARCH_GET_CPUID 0x1005
+#define ARCH_SET_CPUID 0x1006
+# define ARCH_CPUID_ENABLE		1	/* allow the use of the CPUID instruction */
+# define ARCH_CPUID_SIGSEGV		2	/* throw a SIGSEGV instead of reading the CPUID */
+
 #ifdef CONFIG_CHECKPOINT_RESTORE
 # define ARCH_MAP_VDSO_X32	0x2001
 # define ARCH_MAP_VDSO_32	0x2002
 # define ARCH_MAP_VDSO_64	0x2003
 #endif
 
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 97a340d..7d364e4 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -71,9 +71,14 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 	}
 
 	for (mb = msr_bits; mb->feature; mb++) {
 		if (rdmsrl_safe(mb->msr, &msrval))
 			continue;
 		if (msrval & (1ULL << mb->bit))
 			set_cpu_cap(c, mb->feature);
 	}
+
+	if (cpu_has(c, X86_FEATURE_CPUID_FAULT)) {
+		rdmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
+		this_cpu_write(msr_misc_features_enables_shadow, msrval);
+	}
 }
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index d0126b2..bd96746 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,16 +28,17 @@
 #include <asm/mwait.h>
 #include <asm/fpu/internal.h>
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
 #include <asm/tlbflush.h>
 #include <asm/mce.h>
 #include <asm/vm86.h>
 #include <asm/switch_to.h>
+#include <asm/prctl.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
  * no more per-task TSS's. The TSS size is kept cacheline-aligned
  * so they are allowed to end up in the .data..cacheline_aligned
  * section. Since TSS's are completely CPU-local, we want them
  * on exact cacheline boundaries, to eliminate cacheline ping-pong.
  */
@@ -187,16 +188,88 @@ int set_tsc_mode(unsigned int val)
 	else if (val == PR_TSC_ENABLE)
 		enable_TSC();
 	else
 		return -EINVAL;
 
 	return 0;
 }
 
+DEFINE_PER_CPU(u64, msr_misc_features_enables_shadow);
+
+static void set_cpuid_faulting(bool on)
+{
+	u64 msrval;
+
+	DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+
+	msrval = this_cpu_read(msr_misc_features_enables_shadow);
+	msrval &= ~CPUID_FAULT_ENABLE;
+	msrval |= (on << CPUID_FAULT_ENABLE_BIT);
+	this_cpu_write(msr_misc_features_enables_shadow, msrval);
+	wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
+}
+
+static void disable_cpuid(void)
+{
+	preempt_disable();
+	if (!test_and_set_thread_flag(TIF_NOCPUID)) {
+		/*
+		 * Must flip the CPU state synchronously with
+		 * TIF_NOCPUID in the current running context.
+		 */
+		set_cpuid_faulting(true);
+	}
+	preempt_enable();
+}
+
+static void enable_cpuid(void)
+{
+	preempt_disable();
+	if (test_and_clear_thread_flag(TIF_NOCPUID)) {
+		/*
+		 * Must flip the CPU state synchronously with
+		 * TIF_NOCPUID in the current running context.
+		 */
+		set_cpuid_faulting(false);
+	}
+	preempt_enable();
+}
+
+static int get_cpuid_mode(void)
+{
+	return test_thread_flag(TIF_NOCPUID) ? ARCH_CPUID_SIGSEGV : ARCH_CPUID_ENABLE;
+}
+
+static int set_cpuid_mode(struct task_struct *task, unsigned long val)
+{
+	/* Only disable_cpuid() if it is supported on this hardware. */
+	if (!static_cpu_has(X86_FEATURE_CPUID_FAULT))
+		return -ENODEV;
+
+	if (val == ARCH_CPUID_ENABLE)
+		enable_cpuid();
+	else if (val == ARCH_CPUID_SIGSEGV)
+		disable_cpuid();
+	else
+		return -EINVAL;
+
+	return 0;
+}
+
+/*
+ * Called immediately after a successful exec.
+ */
+void arch_setup_new_exec(void)
+{
+	/* If cpuid was previously disabled for this task, re-enable it. */
+	if (test_thread_flag(TIF_NOCPUID))
+		enable_cpuid();
+}
+
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss)
 {
 	struct thread_struct *prev, *next;
 
 	prev = &prev_p->thread;
 	next = &next_p->thread;
 
@@ -206,16 +279,21 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 
 		debugctl &= ~DEBUGCTLMSR_BTF;
 		if (test_tsk_thread_flag(next_p, TIF_BLOCKSTEP))
 			debugctl |= DEBUGCTLMSR_BTF;
 
 		update_debugctlmsr(debugctl);
 	}
 
+	if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^
+	    test_tsk_thread_flag(next_p, TIF_NOCPUID)) {
+		set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID));
+	}
+
 	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
 	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
 		/* prev and next are different */
 		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
 			hard_disable_TSC();
 		else
 			hard_enable_TSC();
 	}
@@ -582,10 +660,17 @@ unsigned long get_wchan(struct task_struct *p)
 
 out:
 	put_task_stack(p);
 	return ret;
 }
 
 long do_arch_prctl_common(struct task_struct *task, int code, unsigned long arg2)
 {
+	switch (code) {
+	case ARCH_GET_CPUID:
+		return arg2 ? -EINVAL : get_cpuid_mode();
+	case ARCH_SET_CPUID:
+		return set_cpuid_mode(task, arg2);
+	}
+
 	return -EINVAL;
 }
diff --git a/fs/exec.c b/fs/exec.c
index 4e497b9..3e6872c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1287,16 +1287,17 @@ void setup_new_exec(struct linux_binprm * bprm)
 	/* This is the point of no return */
 	current->sas_ss_sp = current->sas_ss_size = 0;
 
 	if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
 		set_dumpable(current->mm, SUID_DUMP_USER);
 	else
 		set_dumpable(current->mm, suid_dumpable);
 
+	arch_setup_new_exec();
 	perf_event_exec();
 	__set_task_comm(current, kbasename(bprm->filename), true);
 
 	/* Set the new mm task size. We have to do that late because it may
 	 * depend on TIF_32BIT which is only updated in flush_thread() on
 	 * some architectures like powerpc
 	 */
 	current->mm->task_size = TASK_SIZE;
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 2873baf..d46f50d 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -129,11 +129,15 @@ static __always_inline void check_object_size(const void *ptr, unsigned long n,
 		__check_object_size(ptr, n, to_user);
 }
 #else
 static inline void check_object_size(const void *ptr, unsigned long n,
 				     bool to_user)
 { }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
+#ifndef arch_setup_new_exec
+static inline void arch_setup_new_exec(void) {}
+#endif
+
 #endif	/* __KERNEL__ */
 
 #endif /* _LINUX_THREAD_INFO_H */
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index a89f80a..3744f28 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -1,17 +1,17 @@
 all:
 
 include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall test_mremap_vdso \
 			check_initial_reg_state sigreturn ldt_gdt iopl \
-			protection_keys
+			protection_keys cpuid-fault
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
 TARGETS_C_64BIT_ONLY := fsgsbase
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
 BINARIES_32 := $(TARGETS_C_32BIT_ALL:%=%_32)
diff --git a/tools/testing/selftests/x86/cpuid-fault.c b/tools/testing/selftests/x86/cpuid-fault.c
new file mode 100644
index 0000000..65419de
--- /dev/null
+++ b/tools/testing/selftests/x86/cpuid-fault.c
@@ -0,0 +1,250 @@
+
+/*
+ * Tests for arch_prctl(ARCH_GET_CPUID, ...) / arch_prctl(ARCH_SET_CPUID, ...)
+ *
+ * Basic test to test behaviour of ARCH_GET_CPUID and ARCH_SET_CPUID
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <signal.h>
+#include <inttypes.h>
+#include <cpuid.h>
+#include <err.h>
+#include <errno.h>
+#include <sys/wait.h>
+
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+
+/*
+#define ARCH_GET_CPUID 0x1005
+#define ARCH_SET_CPUID 0x1006
+#define ARCH_CPUID_ENABLE 1
+#define ARCH_CPUID_SIGSEGV 2
+#ifdef __x86_64__
+#define SYS_arch_prctl 158
+#else
+#define SYS_arch_prctl 385
+#endif
+*/
+
+const char *cpuid_names[] = {
+	[0] = "[not set]",
+	[ARCH_CPUID_ENABLE] = "ARCH_CPUID_ENABLE",
+	[ARCH_CPUID_SIGSEGV] = "ARCH_CPUID_SIGSEGV",
+};
+
+int arch_prctl(int code, unsigned long arg2)
+{
+	return syscall(SYS_arch_prctl, code, arg2);
+}
+
+int cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx,
+	  unsigned int *edx)
+{
+	return __get_cpuid(0, eax, ebx, ecx, edx);
+}
+
+int do_child_exec_test(int eax, int ebx, int ecx, int edx)
+{
+	int cpuid_val = 0, child = 0, status = 0;
+
+	printf("arch_prctl(ARCH_GET_CPUID); ");
+
+	cpuid_val = arch_prctl(ARCH_GET_CPUID, 0);
+	if (cpuid_val < 0)
+		errx(1, "ARCH_GET_CPUID fails now, but not before?");
+
+	printf("cpuid_val == %s\n", cpuid_names[cpuid_val]);
+	if (cpuid_val != ARCH_CPUID_SIGSEGV)
+		errx(1, "How did cpuid get re-enabled on fork?");
+
+	if ((child = fork()) == 0) {
+		cpuid_val = arch_prctl(ARCH_GET_CPUID, 0);
+		if (cpuid_val < 0)
+			errx(1, "ARCH_GET_CPUID fails now, but not before?");
+
+		printf("cpuid_val == %s\n", cpuid_names[cpuid_val]);
+		if (cpuid_val != ARCH_CPUID_SIGSEGV)
+			errx(1, "How did cpuid get re-enabled on fork?");
+
+		printf("exec\n");
+		execl("/proc/self/exe", "cpuid-fault", "-early-return", NULL);
+	}
+
+	if (child != waitpid(child, &status, 0))
+		errx(1, "waitpid failed!?");
+
+	if (WEXITSTATUS(status) != 0)
+		errx(1, "Execed child exited abnormally");
+
+	return 0;
+}
+
+int child_received_signal;
+
+void child_sigsegv_cb(int sig)
+{
+	int cpuid_val = 0;
+
+	child_received_signal = 1;
+	printf("[ SIG_SEGV ]\n");
+	printf("arch_prctl(ARCH_GET_CPUID); ");
+
+	cpuid_val = arch_prctl(ARCH_GET_CPUID, 0);
+	if (cpuid_val < 0)
+		errx(1, "ARCH_GET_CPUID fails now, but not before?");
+
+	printf("cpuid_val == %s\n", cpuid_names[cpuid_val]);
+	printf("arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_ENABLE)\n");
+	if (arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_ENABLE) != 0)
+		exit(errno);
+
+	printf("cpuid() == ");
+}
+
+int do_child_test(void)
+{
+	unsigned int eax = 0, ebx = 0, ecx = 0, edx = 0;
+
+	signal(SIGSEGV, child_sigsegv_cb);
+
+	/* the child starts out with cpuid disabled, the signal handler
+	 * attempts to enable and retry
+	 */
+	printf("cpuid() == ");
+	cpuid(&eax, &ebx, &ecx, &edx);
+	printf("{%x, %x, %x, %x}\n", eax, ebx, ecx, edx);
+	return child_received_signal ? 0 : 42;
+}
+
+int signal_count;
+
+void sigsegv_cb(int sig)
+{
+	int cpuid_val = 0;
+
+	signal_count++;
+	printf("[ SIG_SEGV ]\n");
+	printf("arch_prctl(ARCH_GET_CPUID); ");
+
+	cpuid_val = arch_prctl(ARCH_GET_CPUID, 0);
+	if (cpuid_val < 0)
+		errx(1, "ARCH_GET_CPUID fails now, but not before?");
+
+	printf("cpuid_val == %s\n", cpuid_names[cpuid_val]);
+	printf("arch_prctl(ARC_SET_CPUID, ARCH_CPUID_ENABLE)\n");
+	if (arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_ENABLE) != 0)
+		errx(1, "ARCH_SET_CPUID failed!");
+
+	printf("cpuid() == ");
+}
+
+int main(int argc, char **argv)
+{
+	int cpuid_val = 0, child = 0, status = 0;
+	unsigned int eax = 0, ebx = 0, ecx = 0, edx = 0;
+
+	signal(SIGSEGV, sigsegv_cb);
+	setvbuf(stdout, NULL, _IONBF, 0);
+
+	cpuid(&eax, &ebx, &ecx, &edx);
+	printf("cpuid() == {%x, %x, %x, %x}\n", eax, ebx, ecx, edx);
+	printf("arch_prctl(ARCH_GET_CPUID); ");
+
+	cpuid_val = arch_prctl(ARCH_GET_CPUID, 0);
+	if (cpuid_val < 0) {
+		if (errno == EINVAL) {
+			printf("ARCH_GET_CPUID is unsupported on this kernel.\n");
+			fflush(stdout);
+			exit(0); /* no ARCH_GET_CPUID on this system */
+		} else if (errno == ENODEV) {
+			printf("ARCH_GET_CPUID is unsupported on this hardware.\n");
+			fflush(stdout);
+			exit(0); /* no ARCH_GET_CPUID on this system */
+		} else {
+			errx(errno, "ARCH_GET_CPUID failed unexpectedly!");
+		}
+	}
+
+	printf("cpuid_val == %s\n", cpuid_names[cpuid_val]);
+	cpuid(&eax, &ebx, &ecx, &edx);
+	printf("cpuid() == {%x, %x, %x, %x}\n", eax, ebx, ecx, edx);
+	printf("arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_ENABLE)\n");
+
+	if (arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_ENABLE) != 0) {
+		if (errno == EINVAL) {
+			printf("ARCH_SET_CPUID is unsupported on this kernel.");
+			exit(0); /* no ARCH_SET_CPUID on this system */
+		} else if (errno == ENODEV) {
+			printf("ARCH_SET_CPUID is unsupported on this hardware.");
+			exit(0); /* no ARCH_SET_CPUID on this system */
+		} else {
+			errx(errno, "ARCH_SET_CPUID failed unexpectedly!");
+		}
+	}
+
+
+	cpuid(&eax, &ebx, &ecx, &edx);
+	printf("cpuid() == {%x, %x, %x, %x}\n", eax, ebx, ecx, edx);
+	printf("arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_SIGSEGV)\n");
+	fflush(stdout);
+
+	if (arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_SIGSEGV) == -1)
+		errx(1, "ARCH_SET_CPUID failed!");
+
+	printf("cpuid() == ");
+	eax = ebx = ecx = edx = 0;
+	cpuid(&eax, &ebx, &ecx, &edx);
+	printf("{%x, %x, %x, %x}\n", eax, ebx, ecx, edx);
+	printf("arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_SIGSEGV)\n");
+
+	if (signal_count != 1)
+		errx(1, "cpuid didn't fault!");
+
+	if (arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_SIGSEGV) == -1)
+		errx(1, "ARCH_SET_CPUID failed!");
+
+	if (argc > 1)
+		exit(0); /* Don't run the whole test again if we were execed */
+
+	printf("do_child_test\n");
+	if ((child = fork()) == 0)
+		return do_child_test();
+
+	if (child != waitpid(child, &status, 0))
+		errx(1, "waitpid failed!?");
+
+	if (WEXITSTATUS(status) != 0)
+		errx(1, "Child exited abnormally!");
+
+	/* The child enabling cpuid should not have affected us */
+	printf("cpuid() == ");
+	eax = ebx = ecx = edx = 0;
+	cpuid(&eax, &ebx, &ecx, &edx);
+	printf("{%x, %x, %x, %x}\n", eax, ebx, ecx, edx);
+	printf("arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_SIGSEGV)\n");
+
+	if (signal_count != 2)
+		errx(1, "cpuid didn't fault!");
+
+	if (arch_prctl(ARCH_SET_CPUID, ARCH_CPUID_SIGSEGV) == -1)
+		errx(1, "ARCH_SET_CPUID failed!");
+
+	/* Our ARCH_CPUID_SIGSEGV should not propagate through exec */
+	printf("do_child_exec_test\n");
+	fflush(stdout);
+	if ((child = fork()) == 0)
+		return do_child_exec_test(eax, ebx, ecx, edx);
+
+	if (child != waitpid(child, &status, 0))
+		errx(1, "waitpid failed!?");
+
+	if (WEXITSTATUS(status) != 0)
+		errx(1, "Child exited abnormally!");
+
+	printf("All tests passed!\n");
+	exit(EXIT_SUCCESS);
+}
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 7/7] KVM: x86: virtualize cpuid faulting
  2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
                   ` (5 preceding siblings ...)
  2016-11-08 18:39 ` [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID Kyle Huey
@ 2016-11-08 18:39 ` Kyle Huey
  2016-11-08 22:12   ` David Matlack
  6 siblings, 1 reply; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 18:39 UTC (permalink / raw)
  To: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

Hardware support for faulting on the cpuid instruction is not required to
emulate it, because cpuid triggers a VM exit anyways. KVM handles the relevant
MSRs (MSR_PLATFORM_INFO and MSR_MISC_FEATURES_ENABLE) and upon a
cpuid-induced VM exit checks the cpuid faulting state and the CPL.
kvm_require_cpl is even kind enough to inject the GP fault for us.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/cpuid.c            |  3 +++
 arch/x86/kvm/cpuid.h            | 10 ++++++++++
 arch/x86/kvm/x86.c              | 25 +++++++++++++++++++++++++
 4 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdde807..954f37c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -592,16 +592,18 @@ struct kvm_vcpu_arch {
 	u64 pat;
 
 	unsigned switch_db_regs;
 	unsigned long db[KVM_NR_DB_REGS];
 	unsigned long dr6;
 	unsigned long dr7;
 	unsigned long eff_db[KVM_NR_DB_REGS];
 	unsigned long guest_debug_dr7;
+	u64 msr_platform_info;
+	u64 msr_misc_features_enables;
 
 	u64 mcg_cap;
 	u64 mcg_status;
 	u64 mcg_ctl;
 	u64 mcg_ext_ctl;
 	u64 *mce_banks;
 
 	/* Cache MMIO info */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..0109bc0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -862,16 +862,19 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
 	trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx);
 }
 EXPORT_SYMBOL_GPL(kvm_cpuid);
 
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
 	u32 function, eax, ebx, ecx, edx;
 
+	if (cpuid_fault_enabled(vcpu) && !kvm_require_cpl(vcpu, 0))
+		return;
+
 	function = eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
 	ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
 	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx);
 	kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
 	kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
 	kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
 	kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
 	kvm_x86_ops->skip_emulated_instruction(vcpu);
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 35058c2..994aa01 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -200,9 +200,19 @@ static inline int guest_cpuid_stepping(struct kvm_vcpu *vcpu)
 
 	best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
 	if (!best)
 		return -1;
 
 	return x86_stepping(best->eax);
 }
 
+static inline bool supports_cpuid_fault(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.msr_platform_info & PLATINFO_CPUID_FAULT;
+}
+
+static inline bool cpuid_fault_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.msr_misc_features_enables & CPUID_FAULT_ENABLE;
+}
+
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3017de0..797d0b0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -986,16 +986,18 @@ static u32 emulated_msrs[] = {
 
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
 	MSR_IA32_MISC_ENABLE,
 	MSR_IA32_MCG_STATUS,
 	MSR_IA32_MCG_CTL,
 	MSR_IA32_MCG_EXT_CTL,
 	MSR_IA32_SMBASE,
+	MSR_PLATFORM_INFO,
+	MSR_MISC_FEATURES_ENABLES,
 };
 
 static unsigned num_emulated_msrs;
 
 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	if (efer & efer_reserved_bits)
 		return false;
@@ -2269,16 +2271,30 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		vcpu->arch.osvw.length = data;
 		break;
 	case MSR_AMD64_OSVW_STATUS:
 		if (!guest_cpuid_has_osvw(vcpu))
 			return 1;
 		vcpu->arch.osvw.status = data;
 		break;
+	case MSR_PLATFORM_INFO:
+		if (!msr_info->host_initiated ||
+		    data & ~PLATINFO_CPUID_FAULT ||
+		    (!(data & PLATINFO_CPUID_FAULT) &&
+		     cpuid_fault_enabled(vcpu)))
+			return 1;
+		vcpu->arch.msr_platform_info = data;
+		break;
+	case MSR_MISC_FEATURES_ENABLES:
+		if (data & ~CPUID_FAULT_ENABLE ||
+		    (data & CPUID_FAULT_ENABLE && !supports_cpuid_fault(vcpu)))
+			return 1;
+		vcpu->arch.msr_misc_features_enables = data;
+		break;
 	default:
 		if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
 			return xen_hvm_config(vcpu, data);
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
 		if (!ignore_msrs) {
 			vcpu_unimpl(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
 				    msr, data);
@@ -2483,16 +2499,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		msr_info->data = vcpu->arch.osvw.length;
 		break;
 	case MSR_AMD64_OSVW_STATUS:
 		if (!guest_cpuid_has_osvw(vcpu))
 			return 1;
 		msr_info->data = vcpu->arch.osvw.status;
 		break;
+	case MSR_PLATFORM_INFO:
+		msr_info->data = vcpu->arch.msr_platform_info;
+		break;
+	case MSR_MISC_FEATURES_ENABLES:
+		msr_info->data = vcpu->arch.msr_misc_features_enables;
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
 			return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
 		if (!ignore_msrs) {
 			vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr_info->index);
 			return 1;
 		} else {
 			vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr_info->index);
@@ -7508,16 +7530,19 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	kvm_clear_async_pf_completion_queue(vcpu);
 	kvm_async_pf_hash_reset(vcpu);
 	vcpu->arch.apf.halted = false;
 
 	if (!init_event) {
 		kvm_pmu_reset(vcpu);
 		vcpu->arch.smbase = 0x30000;
+
+		vcpu->arch.msr_platform_info = PLATINFO_CPUID_FAULT;
+		vcpu->arch.msr_misc_features_enables = 0;
 	}
 
 	memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs));
 	vcpu->arch.regs_avail = ~0;
 	vcpu->arch.regs_dirty = ~0;
 
 	kvm_x86_ops->vcpu_reset(vcpu, init_event);
 }
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support
  2016-11-08 18:39 ` [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support Kyle Huey
@ 2016-11-08 19:06   ` Thomas Gleixner
  2016-11-08 19:38     ` Kyle Huey
  2016-11-09 11:14   ` Borislav Petkov
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-11-08 19:06 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, x86, Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack,
	linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

On Tue, 8 Nov 2016, Kyle Huey wrote:

> Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
> When enabled, the processor will fault on attempts to execute the CPUID
> instruction with CPL>0. This will allow a ptracer to emulate the CPUID
> instruction.
> 
> Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
> documented in detail in Section 2.3.2 of
> http://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf

Can you please stick that document into the kernel bugzilla, as it's going
to be on a different place before this gets merged into Linus tree?

See: http://lkml.kernel.org/r/1478631281-5061-1-git-send-email-kan.liang@intel.com

> +	static const struct msr_bit msr_bits[] = {
> +		{ X86_FEATURE_CPUID_FAULT,	MSR_PLATFORM_INFO, 31 },

Can you please make that PLATINFO_CPUID_FAULT_BIT instead of 31?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support
  2016-11-08 19:06   ` Thomas Gleixner
@ 2016-11-08 19:38     ` Kyle Huey
  0 siblings, 0 replies; 29+ messages in thread
From: Kyle Huey @ 2016-11-08 19:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Robert O'Callahan, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack,
	open list, open list:USER-MODE LINUX (UML),
	open list:USER-MODE LINUX (UML),
	open list:FILESYSTEMS (VFS and infrastructure),
	open list:KERNEL SELFTEST FRAMEWORK, kvm list

On Tue, Nov 8, 2016 at 11:06 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 8 Nov 2016, Kyle Huey wrote:
>
>> Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
>> When enabled, the processor will fault on attempts to execute the CPUID
>> instruction with CPL>0. This will allow a ptracer to emulate the CPUID
>> instruction.
>>
>> Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
>> documented in detail in Section 2.3.2 of
>> http://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf

Done. https://bugzilla.kernel.org/attachment.cgi?id=243991

> Can you please stick that document into the kernel bugzilla, as it's going
> to be on a different place before this gets merged into Linus tree?
>
> See: http://lkml.kernel.org/r/1478631281-5061-1-git-send-email-kan.liang@intel.com
>
>> +     static const struct msr_bit msr_bits[] = {
>> +             { X86_FEATURE_CPUID_FAULT,      MSR_PLATFORM_INFO, 31 },
>
> Can you please make that PLATINFO_CPUID_FAULT_BIT instead of 31?

Sure.

- Kyle

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-08 18:39 ` [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID Kyle Huey
@ 2016-11-08 20:06   ` Thomas Gleixner
  2016-11-09 13:21     ` Borislav Petkov
  2016-11-09 13:12   ` Borislav Petkov
  2017-03-14 19:01   ` H. Peter Anvin
  2 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-11-08 20:06 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, x86, Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack,
	linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

On Tue, 8 Nov 2016, Kyle Huey wrote:
> Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
> When enabled, the processor will fault on attempts to execute the CPUID
> instruction with CPL>0. Exposing this feature to userspace will allow a
> ptracer to trap and emulate the CPUID instruction.
> 
> When supported, this feature is controlled by toggling bit 0 of
> MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
> http://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf

See previous mail.

> +DECLARE_PER_CPU(u64, msr_misc_features_enables_shadow);
> +
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 97a340d..7d364e4 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -71,9 +71,14 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
>  	}
>  
>  	for (mb = msr_bits; mb->feature; mb++) {
>  		if (rdmsrl_safe(mb->msr, &msrval))
>  			continue;
>  		if (msrval & (1ULL << mb->bit))
>  			set_cpu_cap(c, mb->feature);
>  	}
> +
> +	if (cpu_has(c, X86_FEATURE_CPUID_FAULT)) {
> +		rdmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
> +		this_cpu_write(msr_misc_features_enables_shadow, msrval);
> +	}

I'm not really happy about this placement. There is more stuff coming up
which affects that MSR, so we should have a central place to handle it.

The most obvious is here:

> +DEFINE_PER_CPU(u64, msr_misc_features_enables_shadow);

void msr_misc_features_enable_init(struct cpuinfo_x86 *c)
{
	u64 val;

	if (rdmsrl_safe(MSR_MISC_FEATURES_ENABLES, val))
		return;

	this_cpu_write(msr_misc_features_enables_shadow, val);
}

The upcoming ring3 mwait stuff can add its magic to tweak that MSR into
this function.

Stick the call at the end of init_scattered_cpuid_features() for now. I
still need to figure out a proper place for it.

> +static int set_cpuid_mode(struct task_struct *task, unsigned long val)
> +{
> +	/* Only disable_cpuid() if it is supported on this hardware. */

That comment makes no sense.

> +	if (!static_cpu_has(X86_FEATURE_CPUID_FAULT))
> +		return -ENODEV;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 7/7] KVM: x86: virtualize cpuid faulting
  2016-11-08 18:39 ` [PATCH v10 7/7] KVM: x86: virtualize cpuid faulting Kyle Huey
@ 2016-11-08 22:12   ` David Matlack
  0 siblings, 0 replies; 29+ messages in thread
From: David Matlack @ 2016-11-08 22:12 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, X86 ML, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm list

On Tue, Nov 8, 2016 at 10:39 AM, Kyle Huey <me@kylehuey.com> wrote:
> Hardware support for faulting on the cpuid instruction is not required to
> emulate it, because cpuid triggers a VM exit anyways. KVM handles the relevant
> MSRs (MSR_PLATFORM_INFO and MSR_MISC_FEATURES_ENABLE) and upon a
> cpuid-induced VM exit checks the cpuid faulting state and the CPL.
> kvm_require_cpl is even kind enough to inject the GP fault for us.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>

Reviewed-by: David Matlack <dmatlack@google.com>

(v10)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl
  2016-11-08 18:39 ` [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl Kyle Huey
@ 2016-11-09  9:47   ` Borislav Petkov
  2016-11-10 18:17     ` Kyle Huey
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-11-09  9:47 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Tue, Nov 08, 2016 at 10:39:50AM -0800, Kyle Huey wrote:

<--- Add commit message here.

> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  arch/x86/kernel/process_64.c | 3 ++-
>  arch/x86/um/syscalls_64.c    | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)

...

>  void arch_switch_to(struct task_struct *to)
>  {
>  	if ((to->thread.arch.fs == 0) || (to->mm == NULL))
>  		return;
> 
> base-commit: e3a00f68e426df24a5fb98956a1bd1b23943aa1e

This looks like some tracking thing. It gets ignored by tools though...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 2/7] x86/arch_prctl/64: Rename do_arch_prctl to do_arch_prctl_64
  2016-11-08 18:39 ` [PATCH v10 2/7] x86/arch_prctl/64: Rename do_arch_prctl to do_arch_prctl_64 Kyle Huey
@ 2016-11-09  9:58   ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-11-09  9:58 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Tue, Nov 08, 2016 at 10:39:51AM -0800, Kyle Huey wrote:
> In order to introduce new arch_prctls that are not 64 bit only, rename the
> existing 64 bit implementation to do_arch_prctl_64. Also rename the second

				    do_arch_prctl_64()

> argument to arch_prctl, which will no longer always be an address.

	      arch_prctl()

so that we know they're functions.

> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> Reviewed-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/proto.h |  4 +++-
>  arch/x86/kernel/process_64.c | 32 +++++++++++++++++---------------
>  arch/x86/kernel/ptrace.c     |  8 ++++----
>  arch/x86/um/syscalls_64.c    |  4 ++--
>  4 files changed, 26 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
> index 9b9b30b..95c3e51 100644
> --- a/arch/x86/include/asm/proto.h
> +++ b/arch/x86/include/asm/proto.h
> @@ -25,11 +25,13 @@ void entry_SYSCALL_compat(void);
>  void entry_INT80_compat(void);
>  #endif
>  
>  void x86_configure_nx(void);
>  void x86_report_nx(void);
>  
>  extern int reboot_force;
>  
> -long do_arch_prctl(struct task_struct *task, int code, unsigned long addr);
> +#ifdef CONFIG_X86_64
> +long do_arch_prctl_64(struct task_struct *task, int code, unsigned long arg2);
> +#endif

There's already an #ifdef CONFIG_X86_64 in that file, please move this
one there too.

...

> @@ -863,17 +863,17 @@ long arch_ptrace(struct task_struct *child, long request,
>  		break;
>  #endif
>  
>  #ifdef CONFIG_X86_64
>  		/* normal 64bit interface to access TLS data.
>  		   Works just like arch_prctl, except that the arguments
>  		   are reversed. */
>  	case PTRACE_ARCH_PRCTL:
> -		ret = do_arch_prctl(child, data, addr);
> +		ret = do_arch_prctl_64(child, data, addr);
>  		break;
>  #endif
>  
>  	default:
>  		ret = ptrace_request(child, request, addr, data);
>  		break;
>  	}
>  
> diff --git a/arch/x86/um/syscalls_64.c b/arch/x86/um/syscalls_64.c
> index ab3f7f4..3362c4e 100644
> --- a/arch/x86/um/syscalls_64.c
> +++ b/arch/x86/um/syscalls_64.c
> @@ -68,19 +68,19 @@ long arch_prctl(struct task_struct *task, int code, unsigned long __user *addr)
												^^^^^
You missed one here.

Actually I see a couple:

$ git grep -E arch_prctl.*addr
arch/um/include/shared/os.h:306:extern int os_arch_prctl(int pid, int code, unsigned long *addr);
arch/x86/kernel/ptrace.c:871:           ret = do_arch_prctl_64(child, data, addr);
arch/x86/um/os-Linux/prctl.c:9:int os_arch_prctl(int pid, int code, unsigned long *addr)
arch/x86/um/ptrace_64.c:272:            ret = arch_prctl(child, data, (void __user *) addr);
arch/x86/um/syscalls_64.c:14:long arch_prctl(struct task_struct *task, int code, unsigned long __user *addr)

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 3/7] x86/arch_prctl: Add do_arch_prctl_common
  2016-11-08 18:39 ` [PATCH v10 3/7] x86/arch_prctl: Add do_arch_prctl_common Kyle Huey
@ 2016-11-09 10:31   ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-11-09 10:31 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Tue, Nov 08, 2016 at 10:39:52AM -0800, Kyle Huey wrote:
> Add do_arch_prctl_common to handle arch_prctls that are not specific to 64

      do_arch_prctl_common()

> bits. Call it from the syscall entry point, but not any of the other

"... to 64-bit mode." Or something like that...

> callsites in the kernel, which all want one of the existing 64 bit only

							      64-bit

> arch_prctls.
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>

...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/7] x86/syscalls/32: Wire up arch_prctl on x86-32
  2016-11-08 18:39 ` [PATCH v10 4/7] x86/syscalls/32: Wire up arch_prctl on x86-32 Kyle Huey
@ 2016-11-09 11:04   ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-11-09 11:04 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Tue, Nov 08, 2016 at 10:39:53AM -0800, Kyle Huey wrote:
> Hook up arch_prctl to call do_arch_prctl on x86-32, and in 32 bit compat
> mode on x86-64. This allows us to have arch_prctls that are not specific to

function_name()

> 64 bits.
> 
> On UML, simply stub out this syscall.
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>

...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support
  2016-11-08 18:39 ` [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support Kyle Huey
  2016-11-08 19:06   ` Thomas Gleixner
@ 2016-11-09 11:14   ` Borislav Petkov
  1 sibling, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-11-09 11:14 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Tue, Nov 08, 2016 at 10:39:54AM -0800, Kyle Huey wrote:
> Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
> When enabled, the processor will fault on attempts to execute the CPUID
> instruction with CPL>0. This will allow a ptracer to emulate the CPUID
> instruction.
> 
> Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
> documented in detail in Section 2.3.2 of
> http://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf
> 
> Detect support for this feature and expose it as X86_FEATURE_CPUID_FAULT.
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> Reviewed-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/cpufeatures.h |  1 +
>  arch/x86/include/asm/msr-index.h   |  2 ++
>  arch/x86/kernel/cpu/scattered.c    | 22 +++++++++++++++++++++-
>  3 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index a396292..62962e8 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -184,16 +184,17 @@
>   * Auxiliary flags: Linux defined - For features scattered in various
>   * CPUID levels like 0x6, 0xA etc, word 7.
>   *
>   * Reuse free bits when adding new feature flags!
>   */
>  
>  #define X86_FEATURE_CPB		( 7*32+ 2) /* AMD Core Performance Boost */
>  #define X86_FEATURE_EPB		( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */
> +#define X86_FEATURE_CPUID_FAULT ( 7*32+ 4) /* Intel CPUID faulting */

Bit 0 in that leaf is free. Also, bit 4 is already claimed by RDT/CAT/...
whatever that thing is going to be called so please do:

#define X86_FEATURE_CPUID_FAULT ( 7*32+ 0) /* Intel CPUID faulting */

>  
>  #define X86_FEATURE_HW_PSTATE	( 7*32+ 8) /* AMD HW-PState */
>  #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
>  
>  #define X86_FEATURE_INTEL_PT	( 7*32+15) /* Intel Processor Trace */
>  #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
>  #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */

...

>  	for (cb = cpuid_bits; cb->feature; cb++) {
>  
>  		/* Verify that the level is valid */
>  		max_level = cpuid_eax(cb->level & 0xffff0000);
>  		if (max_level < cb->level ||
>  		    max_level > (cb->level | 0xffff))
>  			continue;
>  
>  		cpuid_count(cb->level, cb->sub_leaf, &regs[CR_EAX],
>  			    &regs[CR_EBX], &regs[CR_ECX], &regs[CR_EDX]);
>  
>  		if (regs[cb->reg] & (1 << cb->bit))
>  			set_cpu_cap(c, cb->feature);
>  	}
> +
> +	for (mb = msr_bits; mb->feature; mb++) {
> +		if (rdmsrl_safe(mb->msr, &msrval))
> +			continue;

<--- newline here.

> +		if (msrval & (1ULL << mb->bit))

		if (msrval & BIT_ULL(mb->bit))


> +			set_cpu_cap(c, mb->feature);
> +	}
>  }
> -- 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-08 18:39 ` [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID Kyle Huey
  2016-11-08 20:06   ` Thomas Gleixner
@ 2016-11-09 13:12   ` Borislav Petkov
  2017-03-14 19:01   ` H. Peter Anvin
  2 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-11-09 13:12 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Tue, Nov 08, 2016 at 10:39:55AM -0800, Kyle Huey wrote:
> Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
> When enabled, the processor will fault on attempts to execute the CPUID
> instruction with CPL>0. Exposing this feature to userspace will allow a
> ptracer to trap and emulate the CPUID instruction.
> 
> When supported, this feature is controlled by toggling bit 0 of
> MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
> http://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf
> 
> Implement a new pair of arch_prctls, available on both x86-32 and x86-64.
> 
> ARCH_GET_CPUID: Returns the current CPUID faulting state, either
>   ARCH_CPUID_ENABLE or ARCH_CPUID_SIGSEGV. arg2 must be 0.
> 
> ARCH_SET_CPUID: Set the CPUID faulting state to arg2, which must be either
>   ARCH_CPUID_ENABLE or ARCH_CPUID_SIGSEGV. Returns EINVAL if arg2 is
>   another value or CPUID faulting is not supported on this system.
> 
> The state of the CPUID faulting flag is propagated across forks, but reset
> upon exec.
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---

...

> diff --git a/tools/testing/selftests/x86/cpuid-fault.c b/tools/testing/selftests/x86/cpuid-fault.c
> new file mode 100644
> index 0000000..65419de
> --- /dev/null
> +++ b/tools/testing/selftests/x86/cpuid-fault.c
> @@ -0,0 +1,250 @@
> +
> +/*
> + * Tests for arch_prctl(ARCH_GET_CPUID, ...) / arch_prctl(ARCH_SET_CPUID, ...)
> + *
> + * Basic test to test behaviour of ARCH_GET_CPUID and ARCH_SET_CPUID
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <signal.h>
> +#include <inttypes.h>
> +#include <cpuid.h>
> +#include <err.h>
> +#include <errno.h>
> +#include <sys/wait.h>
> +
> +#include <sys/prctl.h>
> +#include <linux/prctl.h>
> +
> +/*
> +#define ARCH_GET_CPUID 0x1005
> +#define ARCH_SET_CPUID 0x1006
> +#define ARCH_CPUID_ENABLE 1
> +#define ARCH_CPUID_SIGSEGV 2
> +#ifdef __x86_64__
> +#define SYS_arch_prctl 158
> +#else
> +#define SYS_arch_prctl 385
> +#endif
> +*/
> +
> +const char *cpuid_names[] = {
> +	[0] = "[not set]",
> +	[ARCH_CPUID_ENABLE] = "ARCH_CPUID_ENABLE",
> +	[ARCH_CPUID_SIGSEGV] = "ARCH_CPUID_SIGSEGV",
> +};
> +
> +int arch_prctl(int code, unsigned long arg2)
> +{
> +	return syscall(SYS_arch_prctl, code, arg2);
> +}
> +
> +int cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx,
> +	  unsigned int *edx)
> +{
> +	return __get_cpuid(0, eax, ebx, ecx, edx);
> +}
> +
> +int do_child_exec_test(int eax, int ebx, int ecx, int edx)
> +{
> +	int cpuid_val = 0, child = 0, status = 0;
> +
> +	printf("arch_prctl(ARCH_GET_CPUID); ");
> +
> +	cpuid_val = arch_prctl(ARCH_GET_CPUID, 0);
> +	if (cpuid_val < 0)
> +		errx(1, "ARCH_GET_CPUID fails now, but not before?");
> +
> +	printf("cpuid_val == %s\n", cpuid_names[cpuid_val]);
> +	if (cpuid_val != ARCH_CPUID_SIGSEGV)
> +		errx(1, "How did cpuid get re-enabled on fork?");
> +
> +	if ((child = fork()) == 0) {

ERROR: do not use assignment in if condition
#513: FILE: tools/testing/selftests/x86/cpuid-fault.c:64:
+       if ((child = fork()) == 0) {

There are more in that file.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-08 20:06   ` Thomas Gleixner
@ 2016-11-09 13:21     ` Borislav Petkov
  2016-11-09 13:34       ` Thomas Gleixner
  2016-11-10 23:26       ` Kyle Huey
  0 siblings, 2 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-11-09 13:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kyle Huey, Robert O'Callahan, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, x86, Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Tue, Nov 08, 2016 at 09:06:31PM +0100, Thomas Gleixner wrote:
> The upcoming ring3 mwait stuff can add its magic to tweak that MSR into
> this function.
> 
> Stick the call at the end of init_scattered_cpuid_features() for now. I
> still need to figure out a proper place for it.

So Thomas and I discussed this more on IRC and I think we can get rid
of the MSR iterating in scattered.c and integrate both the R3 MWAIT and
CPUID faulting like this:

---
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fcd484d2bb03..5c38a85af2e7 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -452,6 +457,39 @@ static void intel_bsp_resume(struct cpuinfo_x86 *c)
 	init_intel_energy_perf(c);
 }
 
+static void init_misc_enables(struct cpuinfo_x86 *c)
+{
+	u64 val, misc_en;
+
+	if (rdmsrl_safe(MSR_MISC_FEATURES_ENABLES, &misc_en))
+		return;
+
+	misc_en &= ~MSR_MISC_ENABLES_CPUID_FAULT_ENABLE;
+
+	if (!rdmsrl_safe(MSR_PLATFORM_INFO, &val)) {
+		if (val & PLATINFO_CPUID_FAULT_BIT)
+			set_cpu_cap(c, X86_FEATURE_CPUID_FAULT);
+	}
+
+	wrmsrl(MSR_MISC_FEATURES_ENABLES, misc_en);
+	this_cpu_write(msr_misc_features_enables_shadow, misc_en);
+}
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	unsigned int l2 = 0;
@@ -565,6 +603,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		detect_vmx_virtcap(c);
 
 	init_intel_energy_perf(c);
+
+	init_misc_enables(c);
 }
 
 #ifdef CONFIG_X86_32
---

Please redo your patchset and add the detection to init_intel() like above.

Also, let's call that MSR mask MSR_MISC_ENABLES_CPUID_FAULT_ENABLE like
the rest of the bits in msr-index.h

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-09 13:21     ` Borislav Petkov
@ 2016-11-09 13:34       ` Thomas Gleixner
  2016-11-10 23:38         ` Kyle Huey
  2016-11-10 23:26       ` Kyle Huey
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-11-09 13:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kyle Huey, Robert O'Callahan, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, x86, Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, linux-kernel,
	user-mode-linux-devel, user-mode-linux-user, linux-fsdevel,
	linux-kselftest, kvm

On Wed, 9 Nov 2016, Borislav Petkov wrote:

> On Tue, Nov 08, 2016 at 09:06:31PM +0100, Thomas Gleixner wrote:
> > The upcoming ring3 mwait stuff can add its magic to tweak that MSR into
> > this function.
> > 
> > Stick the call at the end of init_scattered_cpuid_features() for now. I
> > still need to figure out a proper place for it.
> 
> So Thomas and I discussed this more on IRC and I think we can get rid
> of the MSR iterating in scattered.c and integrate both the R3 MWAIT and
> CPUID faulting like this:

I agree mostly.
 
> ---
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index fcd484d2bb03..5c38a85af2e7 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -452,6 +457,39 @@ static void intel_bsp_resume(struct cpuinfo_x86 *c)
>  	init_intel_energy_perf(c);
>  }
>  
> +static void init_misc_enables(struct cpuinfo_x86 *c)
> +{
> +	u64 val, misc_en;
> +
> +	if (rdmsrl_safe(MSR_MISC_FEATURES_ENABLES, &misc_en))
> +		return;
> +
> +	misc_en &= ~MSR_MISC_ENABLES_CPUID_FAULT_ENABLE;

I rather prefer to write this MSR to 0 right away and just enable the bits
which we really support.

Whatever Intel comes up with next, e.g. faulting of random other
instructions or whatever (mis)feature they think is valuable, will lead to
a debugging nightmare if some incompetent BIOS writer sets the bit and the
kernel does not know about it.

Yes, I know that there might be bits forced to 1 at some point in the
future, but let's deal with that when it happens.

Right now I can enable the CPUID FAULT bit on my broadwell and watch user
space programs die unexpectedly without a hint why. Simply because it's not
documented in the SDM. So we rather be safe than surprised.

@hpa: Thoughts?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl
  2016-11-09  9:47   ` Borislav Petkov
@ 2016-11-10 18:17     ` Kyle Huey
  0 siblings, 0 replies; 29+ messages in thread
From: Kyle Huey @ 2016-11-10 18:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, open list,
	open list:USER-MODE LINUX (UML), open list:USER-MODE LINUX (UML),
	open list:FILESYSTEMS (VFS and infrastructure),
	open list:KERNEL SELFTEST FRAMEWORK, kvm list

On Wed, Nov 9, 2016 at 1:47 AM, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Nov 08, 2016 at 10:39:50AM -0800, Kyle Huey wrote:
>
> <--- Add commit message here.
>
>> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
>> ---
>>  arch/x86/kernel/process_64.c | 3 ++-
>>  arch/x86/um/syscalls_64.c    | 3 ++-
>>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> ...
>
>>  void arch_switch_to(struct task_struct *to)
>>  {
>>       if ((to->thread.arch.fs == 0) || (to->mm == NULL))
>>               return;
>>
>> base-commit: e3a00f68e426df24a5fb98956a1bd1b23943aa1e
>
> This looks like some tracking thing. It gets ignored by tools though...

Yeah despite git format-patch generating it, git doesn't automatically
honor it when applying patches.

- Kyle

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-09 13:21     ` Borislav Petkov
  2016-11-09 13:34       ` Thomas Gleixner
@ 2016-11-10 23:26       ` Kyle Huey
  1 sibling, 0 replies; 29+ messages in thread
From: Kyle Huey @ 2016-11-10 23:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Robert O'Callahan, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, open list,
	open list:USER-MODE LINUX (UML), open list:USER-MODE LINUX (UML),
	open list:FILESYSTEMS (VFS and infrastructure),
	open list:KERNEL SELFTEST FRAMEWORK, kvm list

On Wed, Nov 9, 2016 at 5:21 AM, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Nov 08, 2016 at 09:06:31PM +0100, Thomas Gleixner wrote:
>> The upcoming ring3 mwait stuff can add its magic to tweak that MSR into
>> this function.
>>
>> Stick the call at the end of init_scattered_cpuid_features() for now. I
>> still need to figure out a proper place for it.
>
> So Thomas and I discussed this more on IRC and I think we can get rid
> of the MSR iterating in scattered.c and integrate both the R3 MWAIT and
> CPUID faulting like this:
>
> ---
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index fcd484d2bb03..5c38a85af2e7 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -452,6 +457,39 @@ static void intel_bsp_resume(struct cpuinfo_x86 *c)
>         init_intel_energy_perf(c);
>  }
>
> +static void init_misc_enables(struct cpuinfo_x86 *c)
> +{
> +       u64 val, misc_en;
> +
> +       if (rdmsrl_safe(MSR_MISC_FEATURES_ENABLES, &misc_en))
> +               return;
> +
> +       misc_en &= ~MSR_MISC_ENABLES_CPUID_FAULT_ENABLE;
> +
> +       if (!rdmsrl_safe(MSR_PLATFORM_INFO, &val)) {
> +               if (val & PLATINFO_CPUID_FAULT_BIT)
> +                       set_cpu_cap(c, X86_FEATURE_CPUID_FAULT);
> +       }
> +
> +       wrmsrl(MSR_MISC_FEATURES_ENABLES, misc_en);
> +       this_cpu_write(msr_misc_features_enables_shadow, misc_en);
> +}
> +
>  static void init_intel(struct cpuinfo_x86 *c)
>  {
>         unsigned int l2 = 0;
> @@ -565,6 +603,8 @@ static void init_intel(struct cpuinfo_x86 *c)
>                 detect_vmx_virtcap(c);
>
>         init_intel_energy_perf(c);
> +
> +       init_misc_enables(c);
>  }
>
>  #ifdef CONFIG_X86_32
> ---
>
> Please redo your patchset and add the detection to init_intel() like above.
>
> Also, let's call that MSR mask MSR_MISC_ENABLES_CPUID_FAULT_ENABLE like
> the rest of the bits in msr-index.h

There's already an IA32_MISC_ENABLE, so I've made this
MSR_MISC_FEATURES_ENABLES_CPUID_FAULT instead.

- Kyle

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-09 13:34       ` Thomas Gleixner
@ 2016-11-10 23:38         ` Kyle Huey
  0 siblings, 0 replies; 29+ messages in thread
From: Kyle Huey @ 2016-11-10 23:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, Robert O'Callahan, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Peter Zijlstra, Boris Ostrovsky, Len Brown,
	Rafael J. Wysocki, Dmitry Safonov, David Matlack, open list,
	open list:USER-MODE LINUX (UML), open list:USER-MODE LINUX (UML),
	open list:FILESYSTEMS (VFS and infrastructure),
	open list:KERNEL SELFTEST FRAMEWORK, kvm list

On Wed, Nov 9, 2016 at 5:34 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 9 Nov 2016, Borislav Petkov wrote:
>> +static void init_misc_enables(struct cpuinfo_x86 *c)
>> +{
>> +     u64 val, misc_en;
>> +
>> +     if (rdmsrl_safe(MSR_MISC_FEATURES_ENABLES, &misc_en))
>> +             return;
>> +
>> +     misc_en &= ~MSR_MISC_ENABLES_CPUID_FAULT_ENABLE;
>
> I rather prefer to write this MSR to 0 right away and just enable the bits
> which we really support.

I've gone ahead and done this.

- Kyle

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2016-11-08 18:39 ` [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID Kyle Huey
  2016-11-08 20:06   ` Thomas Gleixner
  2016-11-09 13:12   ` Borislav Petkov
@ 2017-03-14 19:01   ` H. Peter Anvin
  2017-03-14 19:08     ` Kyle Huey
                       ` (2 more replies)
  2 siblings, 3 replies; 29+ messages in thread
From: H. Peter Anvin @ 2017-03-14 19:01 UTC (permalink / raw)
  To: Kyle Huey, Robert O'Callahan, Thomas Gleixner,
	Andy Lutomirski, Ingo Molnar, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

On 11/08/16 10:39, Kyle Huey wrote:
>  	}
>  
> +	if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^
> +	    test_tsk_thread_flag(next_p, TIF_NOCPUID)) {
> +		set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID));
> +	}
> +
>  	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
>  	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
>  		/* prev and next are different */
>  		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
>  			hard_disable_TSC();
>  		else
>  			hard_enable_TSC();
>  	}

I'm unhappy about this part: we already do two XORs on these after bit
extraction, which is quite inefficient; and at least theoretically we
could be indirecting though the ->stack pointer for every one if gcc
can't tell it won't have changed (we really need to get thread_info
moved into the task_struct allocation and away from the kernel stack,
especially since on x86 the pointer is the same size as the vestigial
structure it points to.)

It would be so much saner to do one xor and then go onto a common slow path:

	struct thread_info *prev_ti = task_thread_info(prev_p);
	struct thread_info *next_ti = task_thread_info(next_p);

	tif_flipped = prev_ti->flags ^ next_ti->flags;

	if (unlikely(tif_flipped &
		(_TIF_BLOCKSTEP | _TIF_NOTSC | _TIF_NOCPUID))) {
		if (tif_flipped & _TIF_BLOCKSTEP) {
			...
		}
		if (tif_flipped & _TIF_NOTSC) {
			...
		}
		if (tif_flipped & _TIF_NOCPUID) {
			...
		}
	}

Then we can also replace test_tsk_thread_flag() with
test_ti_thread_flag() in other places in this function.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2017-03-14 19:01   ` H. Peter Anvin
@ 2017-03-14 19:08     ` Kyle Huey
  2017-03-14 20:06       ` H. Peter Anvin
  2017-03-14 19:17     ` H. Peter Anvin
  2017-03-14 19:23     ` Andy Lutomirski
  2 siblings, 1 reply; 29+ messages in thread
From: Kyle Huey @ 2017-03-14 19:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack,
	open list, open list:USER-MODE LINUX (UML),
	open list:USER-MODE LINUX (UML),
	open list:FILESYSTEMS (VFS and infrastructure),
	open list:KERNEL SELFTEST FRAMEWORK, kvm list

On Tue, Mar 14, 2017 at 12:01 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 11/08/16 10:39, Kyle Huey wrote:
>>       }
>>
>> +     if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^
>> +         test_tsk_thread_flag(next_p, TIF_NOCPUID)) {
>> +             set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID));
>> +     }
>> +
>>       if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
>>           test_tsk_thread_flag(next_p, TIF_NOTSC)) {
>>               /* prev and next are different */
>>               if (test_tsk_thread_flag(next_p, TIF_NOTSC))
>>                       hard_disable_TSC();
>>               else
>>                       hard_enable_TSC();
>>       }
>
> I'm unhappy about this part: we already do two XORs on these after bit
> extraction, which is quite inefficient; and at least theoretically we
> could be indirecting though the ->stack pointer for every one if gcc
> can't tell it won't have changed (we really need to get thread_info
> moved into the task_struct allocation and away from the kernel stack,
> especially since on x86 the pointer is the same size as the vestigial
> structure it points to.)
>
> It would be so much saner to do one xor and then go onto a common slow path:
>
>         struct thread_info *prev_ti = task_thread_info(prev_p);
>         struct thread_info *next_ti = task_thread_info(next_p);
>
>         tif_flipped = prev_ti->flags ^ next_ti->flags;
>
>         if (unlikely(tif_flipped &
>                 (_TIF_BLOCKSTEP | _TIF_NOTSC | _TIF_NOCPUID))) {
>                 if (tif_flipped & _TIF_BLOCKSTEP) {
>                         ...
>                 }
>                 if (tif_flipped & _TIF_NOTSC) {
>                         ...
>                 }
>                 if (tif_flipped & _TIF_NOCPUID) {
>                         ...
>                 }
>         }
>
> Then we can also replace test_tsk_thread_flag() with
> test_ti_thread_flag() in other places in this function.

That's largely what we ended up doing.  See
https://lkml.org/lkml/2017/2/14/80 and the latest version of this
patch, https://lkml.org/lkml/2017/3/11/197.

- Kyle

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2017-03-14 19:01   ` H. Peter Anvin
  2017-03-14 19:08     ` Kyle Huey
@ 2017-03-14 19:17     ` H. Peter Anvin
  2017-03-14 19:23     ` Andy Lutomirski
  2 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2017-03-14 19:17 UTC (permalink / raw)
  To: Kyle Huey, Robert O'Callahan, Thomas Gleixner,
	Andy Lutomirski, Ingo Molnar, x86, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack
  Cc: linux-kernel, user-mode-linux-devel, user-mode-linux-user,
	linux-fsdevel, linux-kselftest, kvm

On 03/14/17 12:01, H. Peter Anvin wrote:
> On 11/08/16 10:39, Kyle Huey wrote:
>>  	}
>>  
>> +	if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^
>> +	    test_tsk_thread_flag(next_p, TIF_NOCPUID)) {
>> +		set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID));
>> +	}
>> +
>>  	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
>>  	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
>>  		/* prev and next are different */
>>  		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
>>  			hard_disable_TSC();
>>  		else
>>  			hard_enable_TSC();
>>  	}
> 
> I'm unhappy about this part: we already do two XORs on these after bit
> extraction, which is quite inefficient; and at least theoretically we
> could be indirecting though the ->stack pointer for every one if gcc
> can't tell it won't have changed (we really need to get thread_info
> moved into the task_struct allocation and away from the kernel stack,
> especially since on x86 the pointer is the same size as the vestigial
> structure it points to.)
> 

Nevermind, I was accidentally looking at v10 not v15 of this patchset.
My bad.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2017-03-14 19:01   ` H. Peter Anvin
  2017-03-14 19:08     ` Kyle Huey
  2017-03-14 19:17     ` H. Peter Anvin
@ 2017-03-14 19:23     ` Andy Lutomirski
  2017-03-15  9:11       ` H. Peter Anvin
  2 siblings, 1 reply; 29+ messages in thread
From: Andy Lutomirski @ 2017-03-14 19:23 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kyle Huey, Robert O'Callahan, Thomas Gleixner,
	Andy Lutomirski, Ingo Molnar, X86 ML, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack,
	linux-kernel, user-mode-linux-devel,
	open list:USER-MODE LINUX (UML),
	Linux FS Devel, open list:KERNEL SELFTEST FRAMEWORK, kvm list

On Tue, Mar 14, 2017 at 12:01 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> and at least theoretically we
> could be indirecting though the ->stack pointer for every one if gcc
> can't tell it won't have changed (we really need to get thread_info
> moved into the task_struct allocation and away from the kernel stack,
> especially since on x86 the pointer is the same size as the vestigial
> structure it points to.)

Solved by use of time machine:

commit 15f4eae70d365bba26854c90b6002aaabb18c8aa
Author: Andy Lutomirski <luto@kernel.org>
Date:   Tue Sep 13 14:29:25 2016 -0700

    x86: Move thread_info into task_struct


:)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2017-03-14 19:08     ` Kyle Huey
@ 2017-03-14 20:06       ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2017-03-14 20:06 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Paolo Bonzini, Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack,
	open list, open list:USER-MODE LINUX (UML),
	open list:USER-MODE LINUX (UML),
	open list:FILESYSTEMS (VFS and infrastructure),
	open list:KERNEL SELFTEST FRAMEWORK, kvm list

On 03/14/17 12:08, Kyle Huey wrote:
> 
> That's largely what we ended up doing.  See
> https://lkml.org/lkml/2017/2/14/80 and the latest version of this
> patch, https://lkml.org/lkml/2017/3/11/197.
> 

Yes, as I said, my mistake.

I would still like to see an early-out when none of these flags are set
(I just discussed this with tglx on IRC):

if (likely(!((tifp|tifn) &
	(_TIF_BLOCKSTEP|_TIF_NOTSC|_TIF_IO_BITMAP|
         _TIF_NOCPUID|_TIF_USER_RETURN_NOTIFY))))
	return;

The USER_RETURN_NOTIFY could really use some sanity: it is a notifier
chain with a single in-kernel user, which is KVM on x86 only, but we
most likely will need to propagate the flag even if it ends up getting
specialized.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  2017-03-14 19:23     ` Andy Lutomirski
@ 2017-03-15  9:11       ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2017-03-15  9:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kyle Huey, Robert O'Callahan, Thomas Gleixner,
	Andy Lutomirski, Ingo Molnar, X86 ML, Paolo Bonzini,
	Radim Krčmář,
	Jeff Dike, Richard Weinberger, Alexander Viro, Shuah Khan,
	Dave Hansen, Borislav Petkov, Peter Zijlstra, Boris Ostrovsky,
	Len Brown, Rafael J. Wysocki, Dmitry Safonov, David Matlack,
	linux-kernel, user-mode-linux-devel,
	open list:USER-MODE LINUX (UML),
	Linux.FS.Devel

<linux-fsdevel@vger.kernel.org>,"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org>,kvm list <kvm@vger.kernel.org>
From: hpa@zytor.com
Message-ID: <E47D06A9-AFDA-4A9D-8539-9CC5AB19B395@zytor.com>

On March 14, 2017 12:23:40 PM PDT, Andy Lutomirski <luto@amacapital.net> wrote:
>On Tue, Mar 14, 2017 at 12:01 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> and at least theoretically we
>> could be indirecting though the ->stack pointer for every one if gcc
>> can't tell it won't have changed (we really need to get thread_info
>> moved into the task_struct allocation and away from the kernel stack,
>> especially since on x86 the pointer is the same size as the vestigial
>> structure it points to.)
>
>Solved by use of time machine:
>
>commit 15f4eae70d365bba26854c90b6002aaabb18c8aa
>Author: Andy Lutomirski <luto@kernel.org>
>Date:   Tue Sep 13 14:29:25 2016 -0700
>
>    x86: Move thread_info into task_struct
>
>
>:)

My apologies, -ESTALEBRAINCACHE...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-03-15  9:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-08 18:39 [PATCH v10 0/7] x86/arch_prctl Add ARCH_[GET|SET]_CPUID for controlling the CPUID instruction Kyle Huey
2016-11-08 18:39 ` [PATCH v10 1/7] x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl Kyle Huey
2016-11-09  9:47   ` Borislav Petkov
2016-11-10 18:17     ` Kyle Huey
2016-11-08 18:39 ` [PATCH v10 2/7] x86/arch_prctl/64: Rename do_arch_prctl to do_arch_prctl_64 Kyle Huey
2016-11-09  9:58   ` Borislav Petkov
2016-11-08 18:39 ` [PATCH v10 3/7] x86/arch_prctl: Add do_arch_prctl_common Kyle Huey
2016-11-09 10:31   ` Borislav Petkov
2016-11-08 18:39 ` [PATCH v10 4/7] x86/syscalls/32: Wire up arch_prctl on x86-32 Kyle Huey
2016-11-09 11:04   ` Borislav Petkov
2016-11-08 18:39 ` [PATCH v10 5/7] x86/cpufeature: Detect CPUID faulting support Kyle Huey
2016-11-08 19:06   ` Thomas Gleixner
2016-11-08 19:38     ` Kyle Huey
2016-11-09 11:14   ` Borislav Petkov
2016-11-08 18:39 ` [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID Kyle Huey
2016-11-08 20:06   ` Thomas Gleixner
2016-11-09 13:21     ` Borislav Petkov
2016-11-09 13:34       ` Thomas Gleixner
2016-11-10 23:38         ` Kyle Huey
2016-11-10 23:26       ` Kyle Huey
2016-11-09 13:12   ` Borislav Petkov
2017-03-14 19:01   ` H. Peter Anvin
2017-03-14 19:08     ` Kyle Huey
2017-03-14 20:06       ` H. Peter Anvin
2017-03-14 19:17     ` H. Peter Anvin
2017-03-14 19:23     ` Andy Lutomirski
2017-03-15  9:11       ` H. Peter Anvin
2016-11-08 18:39 ` [PATCH v10 7/7] KVM: x86: virtualize cpuid faulting Kyle Huey
2016-11-08 22:12   ` David Matlack

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).