linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention
@ 2017-05-05 18:16 Ricardo Neri
  2017-05-05 18:16 ` [PATCH v7 01/26] ptrace,x86: Make user_64bit_mode() available to 32-bit builds Ricardo Neri
                   ` (26 more replies)
  0 siblings, 27 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:16 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri

This is v7 of this series. The six previous submissions can be found
here [1], here [2], here[3], here[4], here[5] and here[6]. This version
addresses the comments received in v6 plus improvements of the handling
of exceptions unrelated to UMIP as well as corner cases in virtual-8086
mode. Please see details in the change log.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table. Hiding these system resources reduces the tools
available to craft privilege escalation attacks such as [7].

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

We want to have UMIP enabled by default. However, UMIP will change the
behavior that certain applications expect from the operating system.
For instance, programs running on WineHQ and DOSEMU2 rely on some of these
instructions to function. Stas Sergeev found that Microsoft Windows 3.1
and dos4gw use the instruction SMSW when running in virtual-8086 mode[8].
SGDT and SIDT can also run on virtual-8086 mode.

In order to not change the behavior of the system. This patchset emulates
the SGDT, SIDT and SMSW. This should be sufficient to not break the
applications mentioned above. Regarding the two remaining instructions, STR
and SLDT, the WineHQ team has shown interest catching the general protection
fault and use it as a vehicle to fix broken applications[9]. Furthermore,
STR and SLDT can only run in protected and long modes.

DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken
unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support
it. Also, this should not pose a security risk as no system resouces would
be revealed. Instead, code running inside the KVM would only see the KVM's
GDT, IDT and MSW.

Please note that UMIP is always enabled for both 64-bit and 32-bit Linux
builds. However, emulation of the UMIP-protected instructions is not done
for 64-bit processes. 64-bit user space applications will receive the
SIGSEGV signal when UMIP instructions causes a general protection fault.

=== How are UMIP-protected instructions emulated?

This version keeps UMIP enabled at all times and by default. If a general
protection fault caused by the instructions protected by UMIP is
detected, such fault will be fixed-up by returning dummy values as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments in x86_32, the base of GDT and IDT was always a 4-byte
   address, even for 16-bit operands. Thus, my emulation code does the
   same. In all cases, the limit of the table is set to 0.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabled:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask. As per the Intel 64 and IA-32 Architectures Software Developer's
   Manual, SMSW returns a 16-bit results for memory operands. However, when
   the operand is a register, the results can be up to CR0[63:0]. Since
   the emulation code only kicks-in in x86_32, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.
 * Again, STR and SLDT are not emulated.

=== How is this series laid out?

++ Preparatory work
As per suggestions from Andy Lutormirsky and Borislav Petkov, I moved
the x86 page fault error codes to a header. Also, I made user_64bit_mode
available to x86_32 builds. This helps to reuse code and reduce the number
of #ifdef's in these patches.

++ Fix bugs in MPX address evaluator
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general
purpose registers when used as operands. I put some of this code in
a separate library file that both MPX and UMIP can access and avoid code
duplication. Before creating the new library, I fixed a couple of bugs
that I found in how MPX determines the address contained in the
instruction and operands.

++ Provide a new x86 instruction evaluating library
With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c
library. The basic functionality of this library is extended to obtain the
segment descriptor selected by either segment override prefixes or the
default segment by the involved registers in the calculation of the
effective address. It was also extended to obtain the default address and
operand sizes as well as the segment base address. Also, support to 
process 16-bit address encodings. Armed with this arsenal, it is now
possible to determine the linear address onto which the emulated results
shall be copied.

This code supports Normal 32-bit and 64-bit (i.e., __USER32_CS and/or
__USER_CS) protected mode, virtual-8086 mode, 16-bit protected mode with
32-bit base address. 

++ Emulate UMIP instructions
A new fixup_umip_exception functions inspect the instruction at the
instruction pointer. If it is an UMIP-protected instruction, it executes
the emulation code. This uses all the address-computing code of the
previous section.

++ Add self-tests
Lastly, self-tests are added to entry_from_v86.c to exercise the most
typical use cases of UMIP-protected instructions in a virtual-8086 mode.

++ Extensive tests
Extensive tests were performed to test all the combinations of ModRM,
SiB and displacements for 16-bit and 32-bit encodings for the ss, ds,
es, fs and gs segments. Tests also include a 64-bit program that uses
segmentation via fs and gs. For this purpose, I temporarily enabled UMIP
support for 64-bit process. This change is not part of this patchset.
The intention is to test the computations of linear addresses in 64-bit
mode, including the extra R8-R15 registers. Extensive test is also
implemented for virtual-8086 tasks. Code of these tests can be found here
[10] and here [11].

++ Merging this series?
Am I any close to see these patches merged? :)
 
[1]. https://lwn.net/Articles/705877/
[2]. https://lkml.org/lkml/2016/12/23/265
[3]. https://lkml.org/lkml/2017/1/25/622
[4]. https://lkml.org/lkml/2017/2/23/40
[5]. https://lkml.org/lkml/2017/3/3/678
[6]. https://lkml.org/lkml/2017/3/7/866
[7]. http://timetobleed.com/a-closer-look-at-a-recent-privilege-escalation-bug-in-linux-cve-2013-2094/
[8]. https://www.winehq.org/pipermail/wine-devel/2017-April/117159.html
[10]. https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umip/files
[11]. https://github.com/01org/luv-yocto/commit/a72a7fe7d68693c0f4100ad86de6ecabde57334f#diff-3860c136a63add269bce4ea50222c248R1

Thanks and BR,
Ricardo

Changes since V6:
*Reworded and addded more details on the special cases of ModRM and SIB
 bytes. To avoid confusion, I ommited mentioning the involved registers
 (EBP and ESP).
*Replaced BUG() with printk_ratelimited in function get_reg_offset of
 insn-eval.c
*Removed unused utility functions that obtain a register value from pt_regs
 given a SIB base and index.
*Clarified nomenclature to call CS, DS, ES, FS, GS and SS segment registers
 and their values segment selectors.
*Reworked function resolve_seg_register to issue an error when more than
 one segment overrides prefixes are used in the instruction.
*Added logic in resolve_seg_register to ignore segment register when in
 long mode and not using FS or GS.
*Added logic to ensure the effective address is within the limits of the
 segment in protected mode.
*Added logic to ensure segment override prefixes are ignored when resolving
 the segment of EIP and EDI with string instructions.
*Added code to make user_64bit_mode() available in CONFIG_X86_32... and
 make it return false, of course.
*Merged the two functions that obtain the default address and operand size
 of a code segment into one as they are always used together.
*Corrected logic of displacement-only addressing in long mode to make the
 displacement relative to the RIP of the next instruction.
*Reworked logic to sign-extend 32-bit memory offsets into 64-bit signed
 memory offsets. This include more checks and putting all together in an
 utility function.
*Removed the 'unlikely' of conditional statements as we are not in a
 critical path.
*In virtual-8086 mode, ensure that effective addresses are always less
 than 0x10000,  even when address override prefixes are used. Also, ensure
 that linear addresses have a size of 20-bits.

Changes since V5:
* Relocate the page fault error code enumerations to traps.h

Changes since V4:
* Audited patches to use braces in all the branches of conditional.
  statements, except those in which the conditional action only takes one
  line.
* Implemented support in 64-builds for both 32-bit and 64-bit tasks in the
  instruction evaluating library.
* Split segment selector function in the instruction evaluating library
  into two functions to resolve the segment type by instruction override
  or default and a separate function to actually read the segment selector.
* Fixed a bug when evaluating 32-bit effective addresses with 64-bit
  kernels.
* Split patches further for for easier review.
* Use signed variables for computation of effective address.
* Fixed issue with a spurious static modifier in function insn_get_addr_ref
  found by kbuild test bot.
* Removed comparison between true and fixup_umip_exception.
* Reworked check logic when identifying erroneous vs invalid values of the
  SiB base and index.

Changes since V3:
* Limited emulation to 32-bit and 16-bit modes. For 64-bit mode, a general
  protection fault is still issued when UMIP-protected instructions are
  executed with CPL > 0.
* Expanded instruction-evaluating code to obtain segment descriptor along
  with their attributes such as base address and default address and
  operand sizes. Also, support for 16-bit encodings in protected mode was
  implemented.
* When getting a segment descriptor, this include support to obtain those
  of a local descriptor table.
* Now the instruction-evaluating code returns -EDOM when the value of
  registers should not be used in calculating the effective address. The
  value -EINVAL is left for errors.
* Incorporate the value of the segment base address in the computation of
  linear addresses.
* Renamed new instruction evaluation library from insn-kernel.c to
  insn-eval.c
* Exported functions insn_get_reg_offset_* to obtain the register offset
  by ModRM r/m, SiB base and SiB index.
* Improved documentation of functions.
* Split patches further for easier review.

Changes since V2:
* Added new utility functions to decode the memory addresses contained in
  registers when the 16-bit addressing encodings are used. This includes
  code to obtain and compute memory addresses using segment selectors for
  real-mode address translation.
* Added support to emulate UMIP-protected instructions for virtual-8086
  tasks.
* Added self-tests for virtual-8086 mode that contains representative
  use cases: address represented as a displacement, address in registers
  and registers as operands.
* Instead of maintaining a static variable for the dummy base addresses
  of the IDT and GDT, a hard-coded value is used.
* The emulated SMSW instructions now return the value with which the CR0
  register is programmed in head_32/64.S This is: PE | MP | ET | NE | WP
  | AM. For x86_64, PG is also enabled.
* The new file arch/x86/lib/insn-utils.c is now renamed as arch/x86/lib/
  insn-kernel.c. It also has its own header. This helps keep in sync the
  the kernel and objtool instruction decoders. Also, the new insn-kernel.c
  contains utility functions that are only relevant in a kernel context.
* Removed printed warnings for errors that occur when decoding instructions
  with invalid operands.
* Added more comments on fixes in the instruction-decoding MPX functions.
* Now user_64bit_mode(regs) is used instead of test_thread_flag(TIF_IA32)
  to determine if the task is 32-bit or 64-bit.
* Found and fixed a bug in insn-decoder in which X86_MODRM_RM was
  incorrectly used to obtain the mod part of the ModRM byte.
* Added more explanatory code in emulation and instruction decoding code.
  This includes a comment regarding that copy_from_user could fail if there
  exists a memory protection key in place.
* Tested code with CONFIG_X86_DECODER_SELFTEST=y and everything passes now.
* Prefixed get_reg_offset_rm with insn_ as this function is exposed
  via a header file. For clarity, this function was added in a separate
  patch.

Changes since V1:
* Virtual-8086 mode tasks are not treated in a special manner. All code
  for this purpose was removed.
* Instead of attempting to disable UMIP during a context switch or when
  entering virtual-8086 mode, UMIP remains enabled all the time. General
  protection faults that occur are fixed-up by returning dummy values as
  detailed above.
* Removed umip= kernel parameter in favor of using clearcpuid=514 to
  disable UMIP.
* Removed selftests designed to detect the absence of SIGSEGV signals when
  running in virtual-8086 mode.
* Reused code from MPX to decode instructions operands. For this purpose
  code was put in a common location.
* Fixed two bugs in MPX code that decodes operands.

Ricardo Neri (26):
  ptrace,x86: Make user_64bit_mode() available to 32-bit builds
  x86/mm: Relocate page fault error codes to traps.h
  x86/mpx: Use signed variables to compute effective addresses
  x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is
    not 11b
  x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0
  x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval
    file
  x86/insn-eval: Do not BUG on invalid register type
  x86/insn-eval: Add a utility function to get register offsets
  x86/insn-eval: Add utility function to identify string instructions
  x86/insn-eval: Add utility functions to get segment selector
  x86/insn-eval: Add utility function to get segment descriptor
  x86/insn-eval: Add utility functions to get segment descriptor base
    address and limit
  x86/insn-eval: Add function to get default params of code segment
  x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and
    ModRM.rm is 5
  x86/insn-eval: Incorporate segment base and limit in linear address
    computation
  x86/insn-eval: Support both signed 32-bit and 64-bit effective
    addresses
  x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
  x86/insn-eval: Add support to resolve 16-bit addressing encodings
  x86/insn-eval: Add wrapper function for 16-bit and 32-bit address
    encodings
  x86/cpufeature: Add User-Mode Instruction Prevention definitions
  x86: Add emulation code for UMIP instructions
  x86/umip: Force a page fault when unable to copy emulated result to
    user
  x86/traps: Fixup general protection faults caused by UMIP
  x86: Enable User-Mode Instruction Prevention
  selftests/x86: Add tests for User-Mode Instruction Prevention
  selftests/x86: Add tests for instruction str and sldt

 arch/x86/Kconfig                              |   10 +
 arch/x86/include/asm/cpufeatures.h            |    1 +
 arch/x86/include/asm/disabled-features.h      |    8 +-
 arch/x86/include/asm/insn-eval.h              |   25 +
 arch/x86/include/asm/ptrace.h                 |    6 +-
 arch/x86/include/asm/traps.h                  |   18 +
 arch/x86/include/asm/umip.h                   |   15 +
 arch/x86/include/uapi/asm/processor-flags.h   |    2 +
 arch/x86/kernel/Makefile                      |    1 +
 arch/x86/kernel/cpu/common.c                  |   16 +-
 arch/x86/kernel/traps.c                       |    4 +
 arch/x86/kernel/umip.c                        |  286 +++++++
 arch/x86/lib/Makefile                         |    2 +-
 arch/x86/lib/insn-eval.c                      | 1066 +++++++++++++++++++++++++
 arch/x86/mm/fault.c                           |   88 +-
 arch/x86/mm/mpx.c                             |  120 +--
 tools/testing/selftests/x86/entry_from_vm86.c |   89 ++-
 17 files changed, 1580 insertions(+), 177 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c
 create mode 100644 arch/x86/lib/insn-eval.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v7 01/26] ptrace,x86: Make user_64bit_mode() available to 32-bit builds
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
@ 2017-05-05 18:16 ` Ricardo Neri
  2017-05-21 14:19   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
                   ` (25 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:16 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

In its current form, user_64bit_mode() can only be used when CONFIG_X86_64
is selected. This implies that code built with CONFIG_X86_64=n cannot use
it. If a piece of code needs to be built for both CONFIG_X86_64=y and
CONFIG_X86_64=n and wants to use this function, it needs to wrap it in
an #ifdef/#endif; potentially, in multiple places.

This can be easily avoided with a single #ifdef/#endif pair within
user_64bit_mode() itself.

Suggested-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/ptrace.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 2b5d686..ea78a84 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -115,9 +115,9 @@ static inline int v8086_mode(struct pt_regs *regs)
 #endif
 }
 
-#ifdef CONFIG_X86_64
 static inline bool user_64bit_mode(struct pt_regs *regs)
 {
+#ifdef CONFIG_X86_64
 #ifndef CONFIG_PARAVIRT
 	/*
 	 * On non-paravirt systems, this is the only long mode CPL 3
@@ -128,8 +128,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
 	/* Headers are too twisted for this to go in paravirt.h. */
 	return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;
 #endif
+#else /* !CONFIG_X86_64 */
+	return false;
+#endif
 }
 
+#ifdef CONFIG_X86_64
 #define current_user_stack_pointer()	current_pt_regs()->sp
 #define compat_user_stack_pointer()	current_pt_regs()->sp
 #endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
  2017-05-05 18:16 ` [PATCH v7 01/26] ptrace,x86: Make user_64bit_mode() available to 32-bit builds Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-21 14:23   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 03/26] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Kirill A. Shutemov, Josh Poimboeuf

Up to this point, only fault.c used the definitions of the page fault error
codes. Thus, it made sense to keep them within such file. Other portions of
code might be interested in those definitions too. For instance, the User-
Mode Instruction Prevention emulation code will use such definitions to
emulate a page fault when it is unable to successfully copy the results
of the emulated instructions to user space.

While relocating the error code enumeration, the prefix X86_ is used to
make it consistent with the rest of the definitions in traps.h. Of course,
code using the enumeration had to be updated as well. No functional changes
were performed.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: x86@kernel.org
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/traps.h | 18 +++++++++
 arch/x86/mm/fault.c          | 88 +++++++++++++++++---------------------------
 2 files changed, 52 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 01fd0a7..4a2e585 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -148,4 +148,22 @@ enum {
 	X86_TRAP_IRET = 32,	/* 32, IRET Exception */
 };
 
+/*
+ * Page fault error code bits:
+ *
+ *   bit 0 ==	 0: no page found	1: protection fault
+ *   bit 1 ==	 0: read access		1: write access
+ *   bit 2 ==	 0: kernel-mode access	1: user-mode access
+ *   bit 3 ==				1: use of reserved bit detected
+ *   bit 4 ==				1: fault was an instruction fetch
+ *   bit 5 ==				1: protection keys block access
+ */
+enum x86_pf_error_code {
+	X86_PF_PROT	=		1 << 0,
+	X86_PF_WRITE	=		1 << 1,
+	X86_PF_USER	=		1 << 2,
+	X86_PF_RSVD	=		1 << 3,
+	X86_PF_INSTR	=		1 << 4,
+	X86_PF_PK	=		1 << 5,
+};
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8ad91a0..32f3070 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -29,26 +29,6 @@
 #include <asm/trace/exceptions.h>
 
 /*
- * Page fault error code bits:
- *
- *   bit 0 ==	 0: no page found	1: protection fault
- *   bit 1 ==	 0: read access		1: write access
- *   bit 2 ==	 0: kernel-mode access	1: user-mode access
- *   bit 3 ==				1: use of reserved bit detected
- *   bit 4 ==				1: fault was an instruction fetch
- *   bit 5 ==				1: protection keys block access
- */
-enum x86_pf_error_code {
-
-	PF_PROT		=		1 << 0,
-	PF_WRITE	=		1 << 1,
-	PF_USER		=		1 << 2,
-	PF_RSVD		=		1 << 3,
-	PF_INSTR	=		1 << 4,
-	PF_PK		=		1 << 5,
-};
-
-/*
  * Returns 0 if mmiotrace is disabled, or if the fault is not
  * handled by mmiotrace:
  */
@@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
 	 * If it was a exec (instruction fetch) fault on NX page, then
 	 * do not ignore the fault:
 	 */
-	if (error_code & PF_INSTR)
+	if (error_code & X86_PF_INSTR)
 		return 0;
 
 	instr = (void *)convert_ip_to_linear(current, regs);
@@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
  * siginfo so userspace can discover which protection key was set
  * on the PTE.
  *
- * If we get here, we know that the hardware signaled a PF_PK
+ * If we get here, we know that the hardware signaled a X86_PF_PK
  * fault and that there was a VMA once we got in the fault
  * handler.  It does *not* guarantee that the VMA we find here
  * was the one that we faulted on.
@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info,
 	/*
 	 * force_sig_info_fault() is called from a number of
 	 * contexts, some of which have a VMA and some of which
-	 * do not.  The PF_PK handing happens after we have a
+	 * do not.  The X86_PF_PK handing happens after we have a
 	 * valid VMA, so we should never reach this without a
 	 * valid VMA.
 	 */
@@ -695,7 +675,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 	if (!oops_may_print())
 		return;
 
-	if (error_code & PF_INSTR) {
+	if (error_code & X86_PF_INSTR) {
 		unsigned int level;
 		pgd_t *pgd;
 		pte_t *pte;
@@ -779,7 +759,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 		 */
 		if (current->thread.sig_on_uaccess_err && signal) {
 			tsk->thread.trap_nr = X86_TRAP_PF;
-			tsk->thread.error_code = error_code | PF_USER;
+			tsk->thread.error_code = error_code | X86_PF_USER;
 			tsk->thread.cr2 = address;
 
 			/* XXX: hwpoison faults will set the wrong code. */
@@ -899,7 +879,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	struct task_struct *tsk = current;
 
 	/* User mode accesses just cause a SIGSEGV */
-	if (error_code & PF_USER) {
+	if (error_code & X86_PF_USER) {
 		/*
 		 * It's possible to have interrupts off here:
 		 */
@@ -920,7 +900,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		 * Instruction fetch faults in the vsyscall page might need
 		 * emulation.
 		 */
-		if (unlikely((error_code & PF_INSTR) &&
+		if (unlikely((error_code & X86_PF_INSTR) &&
 			     ((address & ~0xfff) == VSYSCALL_ADDR))) {
 			if (emulate_vsyscall(regs, address))
 				return;
@@ -933,7 +913,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		 * are always protection faults.
 		 */
 		if (address >= TASK_SIZE_MAX)
-			error_code |= PF_PROT;
+			error_code |= X86_PF_PROT;
 
 		if (likely(show_unhandled_signals))
 			show_signal_msg(regs, error_code, address, tsk);
@@ -989,11 +969,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 
 	if (!boot_cpu_has(X86_FEATURE_OSPKE))
 		return false;
-	if (error_code & PF_PK)
+	if (error_code & X86_PF_PK)
 		return true;
 	/* this checks permission keys on the VMA: */
-	if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE),
-				(error_code & PF_INSTR), foreign))
+	if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
+				       (error_code & X86_PF_INSTR), foreign))
 		return true;
 	return false;
 }
@@ -1021,7 +1001,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 	int code = BUS_ADRERR;
 
 	/* Kernel mode? Handle exceptions or die: */
-	if (!(error_code & PF_USER)) {
+	if (!(error_code & X86_PF_USER)) {
 		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
 		return;
 	}
@@ -1050,14 +1030,14 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	       unsigned long address, struct vm_area_struct *vma,
 	       unsigned int fault)
 {
-	if (fatal_signal_pending(current) && !(error_code & PF_USER)) {
+	if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
 		no_context(regs, error_code, address, 0, 0);
 		return;
 	}
 
 	if (fault & VM_FAULT_OOM) {
 		/* Kernel mode? Handle exceptions or die: */
-		if (!(error_code & PF_USER)) {
+		if (!(error_code & X86_PF_USER)) {
 			no_context(regs, error_code, address,
 				   SIGSEGV, SEGV_MAPERR);
 			return;
@@ -1082,16 +1062,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 
 static int spurious_fault_check(unsigned long error_code, pte_t *pte)
 {
-	if ((error_code & PF_WRITE) && !pte_write(*pte))
+	if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
 		return 0;
 
-	if ((error_code & PF_INSTR) && !pte_exec(*pte))
+	if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
 		return 0;
 	/*
 	 * Note: We do not do lazy flushing on protection key
-	 * changes, so no spurious fault will ever set PF_PK.
+	 * changes, so no spurious fault will ever set X86_PF_PK.
 	 */
-	if ((error_code & PF_PK))
+	if ((error_code & X86_PF_PK))
 		return 1;
 
 	return 1;
@@ -1137,8 +1117,8 @@ spurious_fault(unsigned long error_code, unsigned long address)
 	 * change, so user accesses are not expected to cause spurious
 	 * faults.
 	 */
-	if (error_code != (PF_WRITE | PF_PROT)
-	    && error_code != (PF_INSTR | PF_PROT))
+	if (error_code != (X86_PF_WRITE | X86_PF_PROT) &&
+	    error_code != (X86_PF_INSTR | X86_PF_PROT))
 		return 0;
 
 	pgd = init_mm.pgd + pgd_index(address);
@@ -1198,19 +1178,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
 	 * always an unconditional error and can never result in
 	 * a follow-up action to resolve the fault, like a COW.
 	 */
-	if (error_code & PF_PK)
+	if (error_code & X86_PF_PK)
 		return 1;
 
 	/*
 	 * Make sure to check the VMA so that we do not perform
-	 * faults just to hit a PF_PK as soon as we fill in a
+	 * faults just to hit a X86_PF_PK as soon as we fill in a
 	 * page.
 	 */
-	if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE),
-				(error_code & PF_INSTR), foreign))
+	if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
+				       (error_code & X86_PF_INSTR), foreign))
 		return 1;
 
-	if (error_code & PF_WRITE) {
+	if (error_code & X86_PF_WRITE) {
 		/* write, present and write, not present: */
 		if (unlikely(!(vma->vm_flags & VM_WRITE)))
 			return 1;
@@ -1218,7 +1198,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
 	}
 
 	/* read, present: */
-	if (unlikely(error_code & PF_PROT))
+	if (unlikely(error_code & X86_PF_PROT))
 		return 1;
 
 	/* read, not present: */
@@ -1241,7 +1221,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
 	if (!static_cpu_has(X86_FEATURE_SMAP))
 		return false;
 
-	if (error_code & PF_USER)
+	if (error_code & X86_PF_USER)
 		return false;
 
 	if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC))
@@ -1297,7 +1277,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * protection error (error_code & 9) == 0.
 	 */
 	if (unlikely(fault_in_kernel_space(address))) {
-		if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) {
+		if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
 			if (vmalloc_fault(address) >= 0)
 				return;
 
@@ -1325,7 +1305,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	if (unlikely(kprobes_fault(regs)))
 		return;
 
-	if (unlikely(error_code & PF_RSVD))
+	if (unlikely(error_code & X86_PF_RSVD))
 		pgtable_bad(regs, error_code, address);
 
 	if (unlikely(smap_violation(error_code, regs))) {
@@ -1351,7 +1331,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 */
 	if (user_mode(regs)) {
 		local_irq_enable();
-		error_code |= PF_USER;
+		error_code |= X86_PF_USER;
 		flags |= FAULT_FLAG_USER;
 	} else {
 		if (regs->flags & X86_EFLAGS_IF)
@@ -1360,9 +1340,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
-	if (error_code & PF_WRITE)
+	if (error_code & X86_PF_WRITE)
 		flags |= FAULT_FLAG_WRITE;
-	if (error_code & PF_INSTR)
+	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
 	/*
@@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * space check, thus avoiding the deadlock:
 	 */
 	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
-		if ((error_code & PF_USER) == 0 &&
+		if ((error_code & X86_PF_USER) == 0 &&
 		    !search_exception_tables(regs->ip)) {
 			bad_area_nosemaphore(regs, error_code, address, NULL);
 			return;
@@ -1409,7 +1389,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		bad_area(regs, error_code, address);
 		return;
 	}
-	if (error_code & PF_USER) {
+	if (error_code & X86_PF_USER) {
 		/*
 		 * Accessing the stack below %sp is always a bug.
 		 * The large cushion allows instructions like enter
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 03/26] x86/mpx: Use signed variables to compute effective addresses
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
  2017-05-05 18:16 ` [PATCH v7 01/26] ptrace,x86: Make user_64bit_mode() available to 32-bit builds Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 04/26] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b Ricardo Neri
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

Even though memory addresses are unsigned, the operands used to compute the
effective address do have a sign. This is true for the ModRM.rm, SIB.base,
SIB.index as well as the displacement bytes. Thus, signed variables shall
be used when computing the effective address from these operands. Once the
signed effective address has been computed, it is casted to an unsigned
long to determine the linear address.

Variables are renamed to better reflect the type of address being
computed.

Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nathan Howard <liverlint@gmail.com>
Cc: Adan Hawthorn <adanhawthorn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/mm/mpx.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 1c34b76..ebdead8 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -138,7 +138,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
  */
 static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-	unsigned long addr, base, indx;
+	unsigned long linear_addr;
+	long eff_addr, base, indx;
 	int addr_offset, base_offset, indx_offset;
 	insn_byte_t sib;
 
@@ -150,7 +151,7 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 		if (addr_offset < 0)
 			goto out_err;
-		addr = regs_get_register(regs, addr_offset);
+		eff_addr = regs_get_register(regs, addr_offset);
 	} else {
 		if (insn->sib.nbytes) {
 			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
@@ -163,16 +164,18 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 
 			base = regs_get_register(regs, base_offset);
 			indx = regs_get_register(regs, indx_offset);
-			addr = base + indx * (1 << X86_SIB_SCALE(sib));
+			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 			if (addr_offset < 0)
 				goto out_err;
-			addr = regs_get_register(regs, addr_offset);
+			eff_addr = regs_get_register(regs, addr_offset);
 		}
-		addr += insn->displacement.value;
+		eff_addr += insn->displacement.value;
 	}
-	return (void __user *)addr;
+	linear_addr = (unsigned long)eff_addr;
+
+	return (void __user *)linear_addr;
 out_err:
 	return (void __user *)-1;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 04/26] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (2 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 03/26] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-24 13:37   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 05/26] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0 Ricardo Neri
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod !=11b and
ModRM.rm = 100b indexed register-indirect addressing is used. In other
words, a SIB byte follows the ModRM byte. In the specific case of
SIB.index = 100b, the scale*index portion of the computation of the
effective address is null. To signal callers of this particular situation,
get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
error when decoding the SIB byte).

An example of this situation can be the following instruction:

   8b 4c 23 80       mov -0x80(%rbx,%riz,1),%rcx
   ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
   SIB:              0x23 [scale:0b][index:100b][base:11b]
   Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)

The %riz 'register' indicates a null index.

In long mode, a REX prefix may be used. When a REX prefix is present,
REX.X adds a fourth bit to the register selection of SIB.index. This gives
the ability to refer to all the 16 general purpose registers. When REX.X is
1b and SIB.index is 100b, the index is indicated in %r12. In our example,
this would look like:

   42 8b 4c 23 80    mov -0x80(%rbx,%r12,1),%rcx
   REX:              0x42 [W:0b][R:0b][X:1b][B:0b]
   ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
   SIB:              0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
   Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)

Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nathan Howard <liverlint@gmail.com>
Cc: Adan Hawthorn <adanhawthorn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/mm/mpx.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index ebdead8..7397b81 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 		regno = X86_SIB_INDEX(insn->sib.value);
 		if (X86_REX_X(insn->rex_prefix.value))
 			regno += 8;
+		/*
+		 * If ModRM.mod !=3 and SIB.index (regno=4) the scale*index
+		 * portion of the address computation is null. This is
+		 * true only if REX.X is 0. In such a case, the SIB index
+		 * is used in the address computation.
+		 */
+		if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
+			return -EDOM;
 		break;
 
 	case REG_TYPE_BASE:
@@ -159,11 +167,19 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 				goto out_err;
 
 			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			if (indx_offset < 0)
+			/*
+			 * A negative offset generally means a error, except
+			 * -EDOM, which means that the contents of the register
+			 * should not be used as index.
+			 */
+			if (indx_offset == -EDOM)
+				indx = 0;
+			else if (indx_offset < 0)
 				goto out_err;
+			else
+				indx = regs_get_register(regs, indx_offset);
 
 			base = regs_get_register(regs, base_offset);
-			indx = regs_get_register(regs, indx_offset);
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 05/26] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (3 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 04/26] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-29 13:07   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 06/26] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file Ricardo Neri
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when a SIB byte is used and the
base of the SIB byte points is base = 101b and the mod part
of the ModRM byte is zero, the base port on the effective address
computation is null. In this case, a 32-bit displacement follows the SIB
byte. This is obtained when the instruction decoder parses the operands.

To signal this scenario, a -EDOM error is returned to indicate callers that
they should ignore the base.

Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nathan Howard <liverlint@gmail.com>
Cc: Adan Hawthorn <adanhawthorn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/mm/mpx.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 7397b81..30aef92 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -122,6 +122,15 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 
 	case REG_TYPE_BASE:
 		regno = X86_SIB_BASE(insn->sib.value);
+		/*
+		 * If ModRM.mod is 0 and SIB.base == 5, the base of the
+		 * register-indirect addressing is 0. In this case, a
+		 * 32-bit displacement is expected in this case; the
+		 * instruction decoder finds such displacement for us.
+		 */
+		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+			return -EDOM;
+
 		if (X86_REX_B(insn->rex_prefix.value))
 			regno += 8;
 		break;
@@ -162,16 +171,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 		eff_addr = regs_get_register(regs, addr_offset);
 	} else {
 		if (insn->sib.nbytes) {
+			/*
+			 * Negative values in the base and index offset means
+			 * an error when decoding the SIB byte. Except -EDOM,
+			 * which means that the registers should not be used
+			 * in the address computation.
+			 */
 			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-			if (base_offset < 0)
+			if (base_offset == -EDOM)
+				base = 0;
+			else if (base_offset < 0)
 				goto out_err;
+			else
+				base = regs_get_register(regs, base_offset);
 
 			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			/*
-			 * A negative offset generally means a error, except
-			 * -EDOM, which means that the contents of the register
-			 * should not be used as index.
-			 */
 			if (indx_offset == -EDOM)
 				indx = 0;
 			else if (indx_offset < 0)
@@ -179,7 +193,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			else
 				indx = regs_get_register(regs, indx_offset);
 
-			base = regs_get_register(regs, base_offset);
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 06/26] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (4 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 05/26] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0 Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type Ricardo Neri
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve and userspace linear addresses based on
the contents of the instruction operands as well as the contents of pt_regs
structure.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes. The checkpatch script issues the following warning with this
commit:

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
rather than BUG() or BUG_ON()
+               BUG();

This warning will be fixed in a subsequent patch.

Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |  16 ++++
 arch/x86/lib/Makefile            |   2 +-
 arch/x86/lib/insn-eval.c         | 159 +++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/mpx.c                | 152 +------------------------------------
 4 files changed, 178 insertions(+), 151 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index 0000000..5cab1b1
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include <linux/compiler.h>
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <asm/ptrace.h>
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..675d7b0 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index 0000000..e746a6f
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,159 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2017
+ */
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <asm/inat.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+
+enum reg_type {
+	REG_TYPE_RM = 0,
+	REG_TYPE_INDEX,
+	REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+			  enum reg_type type)
+{
+	int regno = 0;
+
+	static const int regoff[] = {
+		offsetof(struct pt_regs, ax),
+		offsetof(struct pt_regs, cx),
+		offsetof(struct pt_regs, dx),
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, sp),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+		offsetof(struct pt_regs, r8),
+		offsetof(struct pt_regs, r9),
+		offsetof(struct pt_regs, r10),
+		offsetof(struct pt_regs, r11),
+		offsetof(struct pt_regs, r12),
+		offsetof(struct pt_regs, r13),
+		offsetof(struct pt_regs, r14),
+		offsetof(struct pt_regs, r15),
+#endif
+	};
+	int nr_registers = ARRAY_SIZE(regoff);
+	/*
+	 * Don't possibly decode a 32-bit instructions as
+	 * reading a 64-bit-only register.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+		nr_registers -= 8;
+
+	switch (type) {
+	case REG_TYPE_RM:
+		regno = X86_MODRM_RM(insn->modrm.value);
+		if (X86_REX_B(insn->rex_prefix.value))
+			regno += 8;
+		break;
+
+	case REG_TYPE_INDEX:
+		regno = X86_SIB_INDEX(insn->sib.value);
+		if (X86_REX_X(insn->rex_prefix.value))
+			regno += 8;
+		/*
+		 * If ModRM.mod !=3 and SIB.index (regno=4) the scale*index
+		 * portion of the address computation is null. This is
+		 * true only if REX.X is 0. In such a case, the SIB index
+		 * is used in the address computation.
+		 */
+		if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
+			return -EDOM;
+		break;
+
+	case REG_TYPE_BASE:
+		regno = X86_SIB_BASE(insn->sib.value);
+		/*
+		 * If ModRM.mod is 0 and SIB.base == 5, the base of the
+		 * register-indirect addressing is 0. In this case, a
+		 * 32-bit displacement is expected in this case; the
+		 * instruction decoder finds such displacement for us.
+		 */
+		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+			return -EDOM;
+
+		if (X86_REX_B(insn->rex_prefix.value))
+			regno += 8;
+		break;
+
+	default:
+		pr_err("invalid register type");
+		BUG();
+		break;
+	}
+
+	if (regno >= nr_registers) {
+		WARN_ONCE(1, "decoded an instruction with an invalid register");
+		return -EINVAL;
+	}
+	return regoff[regno];
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+	unsigned long linear_addr;
+	long eff_addr, base, indx;
+	int addr_offset, base_offset, indx_offset;
+	insn_byte_t sib;
+
+	insn_get_modrm(insn);
+	insn_get_sib(insn);
+	sib = insn->sib.value;
+
+	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+		if (addr_offset < 0)
+			goto out_err;
+		eff_addr = regs_get_register(regs, addr_offset);
+	} else {
+		if (insn->sib.nbytes) {
+			/*
+			 * Negative values in the base and index offset means
+			 * an error when decoding the SIB byte. Except -EDOM,
+			 * which means that the registers should not be used
+			 * in the address computation.
+			 */
+			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
+			if (base_offset == -EDOM)
+				base = 0;
+			else if (base_offset < 0)
+				goto out_err;
+			else
+				base = regs_get_register(regs, base_offset);
+
+			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
+			if (indx_offset == -EDOM)
+				indx = 0;
+			else if (indx_offset < 0)
+				goto out_err;
+			else
+				indx = regs_get_register(regs, indx_offset);
+
+			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+		} else {
+			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+			if (addr_offset < 0)
+				goto out_err;
+			eff_addr = regs_get_register(regs, addr_offset);
+		}
+		eff_addr += insn->displacement.value;
+	}
+	linear_addr = (unsigned long)eff_addr;
+
+	return (void __user *)linear_addr;
+out_err:
+	return (void __user *)-1;
+}
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 30aef92..c3f02be 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -12,6 +12,7 @@
 #include <linux/sched/sysctl.h>
 
 #include <asm/insn.h>
+#include <asm/insn-eval.h>
 #include <asm/mman.h>
 #include <asm/mmu_context.h>
 #include <asm/mpx.h>
@@ -60,155 +61,6 @@ static unsigned long mpx_mmap(unsigned long len)
 	return addr;
 }
 
-enum reg_type {
-	REG_TYPE_RM = 0,
-	REG_TYPE_INDEX,
-	REG_TYPE_BASE,
-};
-
-static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
-			  enum reg_type type)
-{
-	int regno = 0;
-
-	static const int regoff[] = {
-		offsetof(struct pt_regs, ax),
-		offsetof(struct pt_regs, cx),
-		offsetof(struct pt_regs, dx),
-		offsetof(struct pt_regs, bx),
-		offsetof(struct pt_regs, sp),
-		offsetof(struct pt_regs, bp),
-		offsetof(struct pt_regs, si),
-		offsetof(struct pt_regs, di),
-#ifdef CONFIG_X86_64
-		offsetof(struct pt_regs, r8),
-		offsetof(struct pt_regs, r9),
-		offsetof(struct pt_regs, r10),
-		offsetof(struct pt_regs, r11),
-		offsetof(struct pt_regs, r12),
-		offsetof(struct pt_regs, r13),
-		offsetof(struct pt_regs, r14),
-		offsetof(struct pt_regs, r15),
-#endif
-	};
-	int nr_registers = ARRAY_SIZE(regoff);
-	/*
-	 * Don't possibly decode a 32-bit instructions as
-	 * reading a 64-bit-only register.
-	 */
-	if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
-		nr_registers -= 8;
-
-	switch (type) {
-	case REG_TYPE_RM:
-		regno = X86_MODRM_RM(insn->modrm.value);
-		if (X86_REX_B(insn->rex_prefix.value))
-			regno += 8;
-		break;
-
-	case REG_TYPE_INDEX:
-		regno = X86_SIB_INDEX(insn->sib.value);
-		if (X86_REX_X(insn->rex_prefix.value))
-			regno += 8;
-		/*
-		 * If ModRM.mod !=3 and SIB.index (regno=4) the scale*index
-		 * portion of the address computation is null. This is
-		 * true only if REX.X is 0. In such a case, the SIB index
-		 * is used in the address computation.
-		 */
-		if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
-			return -EDOM;
-		break;
-
-	case REG_TYPE_BASE:
-		regno = X86_SIB_BASE(insn->sib.value);
-		/*
-		 * If ModRM.mod is 0 and SIB.base == 5, the base of the
-		 * register-indirect addressing is 0. In this case, a
-		 * 32-bit displacement is expected in this case; the
-		 * instruction decoder finds such displacement for us.
-		 */
-		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
-			return -EDOM;
-
-		if (X86_REX_B(insn->rex_prefix.value))
-			regno += 8;
-		break;
-
-	default:
-		pr_err("invalid register type");
-		BUG();
-		break;
-	}
-
-	if (regno >= nr_registers) {
-		WARN_ONCE(1, "decoded an instruction with an invalid register");
-		return -EINVAL;
-	}
-	return regoff[regno];
-}
-
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
- */
-static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
-{
-	unsigned long linear_addr;
-	long eff_addr, base, indx;
-	int addr_offset, base_offset, indx_offset;
-	insn_byte_t sib;
-
-	insn_get_modrm(insn);
-	insn_get_sib(insn);
-	sib = insn->sib.value;
-
-	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
-		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-		if (addr_offset < 0)
-			goto out_err;
-		eff_addr = regs_get_register(regs, addr_offset);
-	} else {
-		if (insn->sib.nbytes) {
-			/*
-			 * Negative values in the base and index offset means
-			 * an error when decoding the SIB byte. Except -EDOM,
-			 * which means that the registers should not be used
-			 * in the address computation.
-			 */
-			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-			if (base_offset == -EDOM)
-				base = 0;
-			else if (base_offset < 0)
-				goto out_err;
-			else
-				base = regs_get_register(regs, base_offset);
-
-			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			if (indx_offset == -EDOM)
-				indx = 0;
-			else if (indx_offset < 0)
-				goto out_err;
-			else
-				indx = regs_get_register(regs, indx_offset);
-
-			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
-		} else {
-			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-			if (addr_offset < 0)
-				goto out_err;
-			eff_addr = regs_get_register(regs, addr_offset);
-		}
-		eff_addr += insn->displacement.value;
-	}
-	linear_addr = (unsigned long)eff_addr;
-
-	return (void __user *)linear_addr;
-out_err:
-	return (void __user *)-1;
-}
-
 static int mpx_insn_decode(struct insn *insn,
 			   struct pt_regs *regs)
 {
@@ -321,7 +173,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
 	info->si_signo = SIGSEGV;
 	info->si_errno = 0;
 	info->si_code = SEGV_BNDERR;
-	info->si_addr = mpx_get_addr_ref(&insn, regs);
+	info->si_addr = insn_get_addr_ref(&insn, regs);
 	/*
 	 * We were not able to extract an address from the instruction,
 	 * probably because there was something invalid in it.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (5 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 06/26] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-29 16:37   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 08/26] x86/insn-eval: Add a utility function to get register offsets Ricardo Neri
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

We are not in a critical failure path. The invalid register type is caused
when trying to decode invalid instruction bytes from a user-space program.
Thus, simply print an error message. To prevent this warning from being
abused from user space programs, use the rate-limited variant of printk.

Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index e746a6f..182e2ae 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,6 +5,7 @@
  */
 #include <linux/kernel.h>
 #include <linux/string.h>
+#include <linux/ratelimit.h>
 #include <asm/inat.h>
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
@@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 		break;
 
 	default:
-		pr_err("invalid register type");
-		BUG();
-		break;
+		printk_ratelimited(KERN_ERR "insn-eval: x86: invalid register type");
+		return -EINVAL;
 	}
 
 	if (regno >= nr_registers) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 08/26] x86/insn-eval: Add a utility function to get register offsets
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (6 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-29 17:16   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions Ricardo Neri
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

The function get_reg_offset() returns the offset to the register the
argument specifies as indicated in an enumeration of type offset. Callers
of this function would need the definition of such enumeration. This is
not needed. Instead, add helper functions for this purpose. These functions
are useful in cases when, for instance, the caller needs to decide whether
the operand is a register or a memory location by looking at the rm part
of the ModRM byte. As of now, this is the only helper function that is
needed.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |  1 +
 arch/x86/lib/insn-eval.c         | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1..7e8c963 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,6 @@
 #include <asm/ptrace.h>
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 182e2ae..8b16761 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 	return regoff[regno];
 }
 
+/**
+ * insn_get_reg_offset_modrm_rm() - Obtain register in r/m part of ModRM byte
+ * @insn:	Instruction structure containing the ModRM byte
+ * @regs:	Structure with register values as seen when entering kernel mode
+ *
+ * Return: The register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register and shall be ignored.
+ */
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
+{
+	return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (7 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 08/26] x86/insn-eval: Add a utility function to get register offsets Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-29 21:48   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

String instructions are special because in protected mode, the linear
address is always obtained via the ES segment register in operands that
use the (E)DI register. Segment override prefixes are ignored. non-
string instructions use DS as the default segment register and it can
be overridden with a segment override prefix.

This function will be used in a subsequent commmit that introduces a
function to determine the segment register to use given the instruction,
operands and segment override prefixes.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8b16761..1634762 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -16,6 +16,73 @@ enum reg_type {
 	REG_TYPE_BASE,
 };
 
+enum string_instruction {
+	INSB		= 0x6c,
+	INSW_INSD	= 0x6d,
+	OUTSB		= 0x6e,
+	OUTSW_OUTSD	= 0x6f,
+	MOVSB		= 0xa4,
+	MOVSW_MOVSD	= 0xa5,
+	CMPSB		= 0xa6,
+	CMPSW_CMPSD	= 0xa7,
+	STOSB		= 0xaa,
+	STOSW_STOSD	= 0xab,
+	LODSB		= 0xac,
+	LODSW_LODSD	= 0xad,
+	SCASB		= 0xae,
+	SCASW_SCASD	= 0xaf,
+};
+
+/**
+ * is_string_instruction - Determine if instruction is a string instruction
+ * @insn:	Instruction structure containing the opcode
+ *
+ * Return: true if the instruction, determined by the opcode, is any of the
+ * string instructions as defined in the Intel Software Development manual.
+ * False otherwise.
+ */
+static bool is_string_instruction(struct insn *insn)
+{
+	insn_get_opcode(insn);
+
+	/* all string instructions have a 1-byte opcode */
+	if (insn->opcode.nbytes != 1)
+		return false;
+
+	switch (insn->opcode.bytes[0]) {
+	case INSB:
+		/* fall through */
+	case INSW_INSD:
+		/* fall through */
+	case OUTSB:
+		/* fall through */
+	case OUTSW_OUTSD:
+		/* fall through */
+	case MOVSB:
+		/* fall through */
+	case MOVSW_MOVSD:
+		/* fall through */
+	case CMPSB:
+		/* fall through */
+	case CMPSW_CMPSD:
+		/* fall through */
+	case STOSB:
+		/* fall through */
+	case STOSW_STOSD:
+		/* fall through */
+	case LODSB:
+		/* fall through */
+	case LODSW_LODSD:
+		/* fall through */
+	case SCASB:
+		/* fall through */
+	case SCASW_SCASD:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 			  enum reg_type type)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (8 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-30 10:35   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 11/26] x86/insn-eval: Add utility function to get segment descriptor Ricardo Neri
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero .Thus, the segment base address is needed to
calculate correctly the linear address.

The segment selector to be used when computing a linear address is
determined by either any of segment override prefixes in the
instruction or inferred from the registers involved in the computation of
the effective address; in that order. Also, there are cases when the
overrides shall be ignored (code segments are always selected by the CS
segment register; string instructions always use the ES segment register
along with the EDI register).

For clarity, this process can be split into two steps: resolving the
relevant segment register to use and, once known, read its value to
obtain the segment selector.

The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into the pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
the respective segment registers.

Lastly, the only two segment registers that are not ignored in long mode
are FS and GS. In these two cases, base addresses are obtained from the
respective MSRs.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 256 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 256 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 1634762..0a496f4 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -9,6 +9,7 @@
 #include <asm/inat.h>
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
+#include <asm/vm86.h>
 
 enum reg_type {
 	REG_TYPE_RM = 0,
@@ -33,6 +34,17 @@ enum string_instruction {
 	SCASW_SCASD	= 0xaf,
 };
 
+enum segment_register {
+	SEG_REG_INVAL = -1,
+	SEG_REG_IGNORE = 0,
+	SEG_REG_CS = 0x23,
+	SEG_REG_SS = 0x36,
+	SEG_REG_DS = 0x3e,
+	SEG_REG_ES = 0x26,
+	SEG_REG_FS = 0x64,
+	SEG_REG_GS = 0x65,
+};
+
 /**
  * is_string_instruction - Determine if instruction is a string instruction
  * @insn:	Instruction structure containing the opcode
@@ -83,6 +95,250 @@ static bool is_string_instruction(struct insn *insn)
 	}
 }
 
+/**
+ * resolve_seg_register() - obtain segment register
+ * @insn:	Instruction structure with segment override prefixes
+ * @regs:	Structure with register values as seen when entering kernel mode
+ * @regoff:	Operand offset, in pt_regs, used to deterimine segment register
+ *
+ * The segment register to which an effective address refers depends on
+ * a) whether segment override prefixes must be ignored: always use CS when
+ * the register is (R|E)IP; always use ES when operand register is (E)DI with
+ * string instructions as defined in the Intel documentation. b) If segment
+ * overrides prefixes are used in the instruction instruction prefixes. C) Use
+ * the default segment register associated with the operand register.
+ *
+ * The operand register, regoff, is represented as the offset from the base of
+ * pt_regs. Also, regoff can be -EDOM for cases in which registers are not
+ * used as operands (e.g., displacement-only memory addressing).
+ *
+ * This function returns the segment register as value from an enumeration
+ * as per the conditions described above. Please note that this function
+ * does not return the value in the segment register (i.e., the segment
+ * selector). The segment selector needs to be obtained using
+ * get_segment_selector() and passing the segment register resolved by
+ * this function.
+ *
+ * Return: Enumerated segment register to use, among CS, SS, DS, ES, FS, GS,
+ * ignore (in 64-bit mode as applicable), or -EINVAL in case of error.
+ */
+static enum segment_register resolve_seg_register(struct insn *insn,
+						  struct pt_regs *regs,
+						  int regoff)
+{
+	int i;
+	int sel_overrides = 0;
+	int seg_register = SEG_REG_IGNORE;
+
+	if (!insn)
+		return SEG_REG_INVAL;
+
+	/* First handle cases when segment override prefixes must be ignored */
+	if (regoff == offsetof(struct pt_regs, ip)) {
+		if (user_64bit_mode(regs))
+			return SEG_REG_IGNORE;
+		else
+			return SEG_REG_CS;
+		return SEG_REG_CS;
+	}
+
+	/*
+	 * If the (E)DI register is used with string instructions, the ES
+	 * segment register is always used.
+	 */
+	if ((regoff == offsetof(struct pt_regs, di)) &&
+	    is_string_instruction(insn)) {
+		if (user_64bit_mode(regs))
+			return SEG_REG_IGNORE;
+		else
+			return SEG_REG_ES;
+		return SEG_REG_CS;
+	}
+
+	/* Then check if we have segment overrides prefixes*/
+	for (i = 0; i < insn->prefixes.nbytes; i++) {
+		switch (insn->prefixes.bytes[i]) {
+		case SEG_REG_CS:
+			seg_register = SEG_REG_CS;
+			sel_overrides++;
+			break;
+		case SEG_REG_SS:
+			seg_register = SEG_REG_SS;
+			sel_overrides++;
+			break;
+		case SEG_REG_DS:
+			seg_register = SEG_REG_DS;
+			sel_overrides++;
+			break;
+		case SEG_REG_ES:
+			seg_register = SEG_REG_ES;
+			sel_overrides++;
+			break;
+		case SEG_REG_FS:
+			seg_register = SEG_REG_FS;
+			sel_overrides++;
+			break;
+		case SEG_REG_GS:
+			seg_register = SEG_REG_GS;
+			sel_overrides++;
+			break;
+		default:
+			return SEG_REG_INVAL;
+		}
+	}
+
+	/*
+	 * Having more than one segment override prefix leads to undefined
+	 * behavior. If this is the case, return with error.
+	 */
+	if (sel_overrides > 1)
+		return SEG_REG_INVAL;
+
+	if (sel_overrides == 1) {
+		/*
+		 * If in long mode all segment registers but FS and GS are
+		 * ignored.
+		 */
+		if (user_64bit_mode(regs) && !(seg_register == SEG_REG_FS ||
+					       seg_register == SEG_REG_GS))
+			return SEG_REG_IGNORE;
+
+		return seg_register;
+	}
+
+	/* In long mode, all segment registers except FS and GS are ignored */
+	if (user_64bit_mode(regs))
+		return SEG_REG_IGNORE;
+
+	/*
+	 * Lastly, if no segment overrides were found, determine the default
+	 * segment register as described in the Intel documentation: SS for
+	 * (E)SP or (E)BP. DS for all data references, AX, CX and DX are not
+	 * valid register operands in 16-bit address encodings.
+	 * -EDOM is reserved to identify for cases in which no register is used
+	 * the default segment register (displacement-only addressing). The
+	 * default segment register used in these cases is DS.
+	 */
+
+	switch (regoff) {
+	case offsetof(struct pt_regs, ax):
+		/* fall through */
+	case offsetof(struct pt_regs, cx):
+		/* fall through */
+	case offsetof(struct pt_regs, dx):
+		if (insn && insn->addr_bytes == 2)
+			return SEG_REG_INVAL;
+	case offsetof(struct pt_regs, di):
+		/* fall through */
+	case -EDOM:
+		/* fall through */
+	case offsetof(struct pt_regs, bx):
+		/* fall through */
+	case offsetof(struct pt_regs, si):
+		return SEG_REG_DS;
+	case offsetof(struct pt_regs, bp):
+		/* fall through */
+	case offsetof(struct pt_regs, sp):
+		return SEG_REG_SS;
+	case offsetof(struct pt_regs, ip):
+		return SEG_REG_CS;
+	default:
+		return SEG_REG_INVAL;
+	}
+}
+
+/**
+ * get_segment_selector() - obtain segment selector
+ * @regs:	Structure with register values as seen when entering kernel mode
+ * @seg_reg:	Segment register to use
+ *
+ * Obtain the segment selector from any of the CS, SS, DS, ES, FS, GS segment
+ * registers. In CONFIG_X86_32, the segment is obtained from either pt_regs or
+ * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
+ * from pt_regs. DS, ES, FS and GS are obtained by reading the actual CPU
+ * registers. This done for only for completeness as in CONFIG_X86_64 segment
+ * registers are ignored.
+ *
+ * Return: Value of the segment selector, including null when running in
+ * long mode. -1 on error.
+ */
+static unsigned short get_segment_selector(struct pt_regs *regs,
+					   enum segment_register seg_reg)
+{
+#ifdef CONFIG_X86_64
+	unsigned short sel;
+
+	switch (seg_reg) {
+	case SEG_REG_IGNORE:
+		return 0;
+	case SEG_REG_CS:
+		return (unsigned short)(regs->cs & 0xffff);
+	case SEG_REG_SS:
+		return (unsigned short)(regs->ss & 0xffff);
+	case SEG_REG_DS:
+		savesegment(ds, sel);
+		return sel;
+	case SEG_REG_ES:
+		savesegment(es, sel);
+		return sel;
+	case SEG_REG_FS:
+		savesegment(fs, sel);
+		return sel;
+	case SEG_REG_GS:
+		savesegment(gs, sel);
+		return sel;
+	default:
+		return -1;
+	}
+#else /* CONFIG_X86_32 */
+	struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs;
+
+	if (v8086_mode(regs)) {
+		switch (seg_reg) {
+		case SEG_REG_CS:
+			return (unsigned short)(regs->cs & 0xffff);
+		case SEG_REG_SS:
+			return (unsigned short)(regs->ss & 0xffff);
+		case SEG_REG_DS:
+			return vm86regs->ds;
+		case SEG_REG_ES:
+			return vm86regs->es;
+		case SEG_REG_FS:
+			return vm86regs->fs;
+		case SEG_REG_GS:
+			return vm86regs->gs;
+		case SEG_REG_IGNORE:
+			/* fall through */
+		default:
+			return -1;
+		}
+	}
+
+	switch (seg_reg) {
+	case SEG_REG_CS:
+		return (unsigned short)(regs->cs & 0xffff);
+	case SEG_REG_SS:
+		return (unsigned short)(regs->ss & 0xffff);
+	case SEG_REG_DS:
+		return (unsigned short)(regs->ds & 0xffff);
+	case SEG_REG_ES:
+		return (unsigned short)(regs->es & 0xffff);
+	case SEG_REG_FS:
+		return (unsigned short)(regs->fs & 0xffff);
+	case SEG_REG_GS:
+		/*
+		 * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
+		 * The macro below takes care of both cases.
+		 */
+		return get_user_gs(regs);
+	case SEG_REG_IGNORE:
+		/* fall through */
+	default:
+		return -1;
+	}
+#endif /* CONFIG_X86_64 */
+}
+
 static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 			  enum reg_type type)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 11/26] x86/insn-eval: Add utility function to get segment descriptor
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (9 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 12/26] x86/insn-eval: Add utility functions to get segment descriptor base address and limit Ricardo Neri
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

The segment descriptor contains information that is relevant to how linear
address need to be computed. It contains the default size of addresses as
well as the base address of the segment. Thus, given a segment selector,
we ought look at segment descriptor to correctly calculate the linear
address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is the initial implementation for subsequent functions that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 0a496f4..f46cb31 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -6,9 +6,13 @@
 #include <linux/kernel.h>
 #include <linux/string.h>
 #include <linux/ratelimit.h>
+#include <linux/mmu_context.h>
+#include <asm/desc_defs.h>
+#include <asm/desc.h>
 #include <asm/inat.h>
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
+#include <asm/ldt.h>
 #include <asm/vm86.h>
 
 enum reg_type {
@@ -421,6 +425,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @sel:	Segment selector
+ *
+ * Given a segment selector, obtain a pointer to the segment descriptor.
+ * Both global and local descriptor tables are supported.
+ *
+ * Return: pointer to segment descriptor on success. NULL on failure.
+ */
+static struct desc_struct *get_desc(unsigned short sel)
+{
+	struct desc_ptr gdt_desc = {0, 0};
+	struct desc_struct *desc = NULL;
+	unsigned long desc_base;
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+		/* Bits [15:3] contain the index of the desired entry. */
+		sel >>= 3;
+
+		mutex_lock(&current->active_mm->context.lock);
+		/* The size of the LDT refers to the number of entries. */
+		if (!current->active_mm->context.ldt ||
+		    sel >= current->active_mm->context.ldt->size) {
+			mutex_unlock(&current->active_mm->context.lock);
+			return NULL;
+		}
+
+		desc = &current->active_mm->context.ldt->entries[sel];
+		mutex_unlock(&current->active_mm->context.lock);
+		return desc;
+	}
+#endif
+	native_store_gdt(&gdt_desc);
+
+	/*
+	 * Segment descriptors have a size of 8 bytes. Thus, the index is
+	 * multiplied by 8 to obtain the offset of the desired descriptor from
+	 * the start of the GDT. As bits [15:3] of the segment selector contain
+	 * the index, it can be regarded multiplied by 8 already. All that
+	 * remains is to clear bits [2:0].
+	 */
+	desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+	if (desc_base > gdt_desc.size)
+		return NULL;
+
+	desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+	return desc;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm() - Obtain register in r/m part of ModRM byte
  * @insn:	Instruction structure containing the ModRM byte
  * @regs:	Structure with register values as seen when entering kernel mode
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 12/26] x86/insn-eval: Add utility functions to get segment descriptor base address and limit
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (10 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 11/26] x86/insn-eval: Add utility function to get segment descriptor Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-31 16:58   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment Ricardo Neri
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

With segmentation, the base address of the segment descriptor is needed
to compute a linear address. The segment descriptor used in the address
computation depends on either any segment override prefixes in the
instruction or the default segment determined by the registers involved
in the address computation. Thus, both the instruction as well as the
register (specified as the offset from the base of pt_regs) are given as
inputs, along with a boolean variable to select between override and
default.

The segment selector is determined by get_seg_selector() with the inputs
described above. Once the selector is known, the base address is
determined. In protected mode, the selector is used to obtain the segment
descriptor and then its base address. If in 64-bit user mode, the segment
base address is zero except when FS or GS are used. In virtual-8086 mode,
the base address is computed as the value of the segment selector shifted 4
positions to the left.

In protected mode, segment limits are enforced. Thus, a function to
determine the limit of the segment is added. Segment limits are not
enforced in long or virtual-8086. For the latter, addresses are limited
to 20 bits; address size will be handled when computing the linear
address.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |   2 +
 arch/x86/lib/insn-eval.c         | 127 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7e8c963..7f3c7fe 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -13,5 +13,7 @@
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+				int regoff);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index f46cb31..c77ed80 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -476,6 +476,133 @@ static struct desc_struct *get_desc(unsigned short sel)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address of segment descriptor.
+ * @regs:	Structure with register values as seen when entering kernel mode
+ * @insn:	Instruction structure with selector override prefixes
+ * @regoff:	Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the base address of the segment descriptor as indicated by either
+ * any segment override prefixes contained in insn or the default segment
+ * applicable to the register indicated by regoff. regoff is specified as the
+ * offset in bytes from the base of pt_regs.
+ *
+ * Return: In protected mode, base address of the segment. Zero in for long
+ * mode, except when FS or GS are used. In virtual-8086 mode, the segment
+ * selector shifted 4 positions to the right. -1L in case of
+ * error.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+				int regoff)
+{
+	struct desc_struct *desc;
+	unsigned short sel;
+	enum segment_register seg_reg;
+
+	seg_reg = resolve_seg_register(insn, regs, regoff);
+	if (seg_reg == SEG_REG_INVAL)
+		return -1L;
+
+	sel = get_segment_selector(regs, seg_reg);
+	if ((short)sel < 0)
+		return -1L;
+
+	if (v8086_mode(regs))
+		/*
+		 * Base is simply the segment selector shifted 4
+		 * positions to the right.
+		 */
+		return (unsigned long)(sel << 4);
+
+	if (user_64bit_mode(regs)) {
+		/*
+		 * Only FS or GS will have a base address, the rest of
+		 * the segments' bases are forced to 0.
+		 */
+		unsigned long base;
+
+		if (seg_reg == SEG_REG_FS)
+			rdmsrl(MSR_FS_BASE, base);
+		else if (seg_reg == SEG_REG_GS)
+			/*
+			 * swapgs was called at the kernel entry point. Thus,
+			 * MSR_KERNEL_GS_BASE will have the user-space GS base.
+			 */
+			rdmsrl(MSR_KERNEL_GS_BASE, base);
+		else if (seg_reg != SEG_REG_IGNORE)
+			/* We should ignore the rest of segment registers */
+			base = -1L;
+		else
+			base = 0;
+		return base;
+	}
+
+	/* In protected mode the segment selector cannot be null */
+	if (!sel)
+		return -1L;
+
+	desc = get_desc(sel);
+	if (!desc)
+		return -1L;
+
+	return get_desc_base(desc);
+}
+
+/**
+ * get_seg_limit() - Obtain the limit of a segment descriptor
+ * @regs:	Structure with register values as seen when entering kernel mode
+ * @insn:	Instruction structure with selector override prefixes
+ * @regoff:	Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the limit of the segment descriptor. The segment selector is obtained
+ * by inspecting any segment override prefixes or the default selector
+ * inferred by regoff. regoff is specified as the offset in bytes from the base
+ * of pt_regs.
+ *
+ * Return: In protected mode, the limit of the segment descriptor in bytes.
+ * In long mode and virtual-8086 mode, segment limits are not enforced. Thus,
+ * limit is returned as -1L to imply a limit-less segment. Zero is returned on
+ * error.
+ */
+static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
+				   int regoff)
+{
+	struct desc_struct *desc;
+	unsigned short sel;
+	unsigned long limit;
+	enum segment_register seg_reg;
+
+	seg_reg = resolve_seg_register(insn, regs, regoff);
+	if (seg_reg == SEG_REG_INVAL)
+		return 0;
+
+	sel = get_segment_selector(regs, seg_reg);
+	if ((short)sel < 0)
+		return 0;
+
+	if (user_64bit_mode(regs) || v8086_mode(regs))
+		return -1L;
+
+	if (!sel)
+		return 0;
+
+	desc = get_desc(sel);
+	if (!desc)
+		return 0;
+
+	/*
+	 * If the granularity bit is set, the limit is given in multiples
+	 * of 4096. When the granularity bit is set, the least 12 significant
+	 * bits are not tested when checking the segment limits. In practice,
+	 * this means that the segment ends in (limit << 12) + 0xfff.
+	 */
+	limit = get_desc_limit(desc);
+	if (desc->g)
+		limit <<= 12 | 0x7;
+
+	return limit;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm() - Obtain register in r/m part of ModRM byte
  * @insn:	Instruction structure containing the ModRM byte
  * @regs:	Structure with register values as seen when entering kernel mode
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (11 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 12/26] x86/insn-eval: Add utility functions to get segment descriptor base address and limit Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-07 12:59   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5 Ricardo Neri
                   ` (13 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

This function returns the default values of the address and operand sizes
as specified in the segment descriptor. This information is determined
from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
32-bit legacy modes. For virtual-8086 mode, the default address and
operand sizes are always 2 bytes.

The D bit is only meaningful for code segments. Thus, these functions
always use the code segment selector contained in regs.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |  6 ++++
 arch/x86/lib/insn-eval.c         | 65 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7f3c7fe..9ed1c88 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -11,9 +11,15 @@
 #include <linux/err.h>
 #include <asm/ptrace.h>
 
+struct insn_code_seg_defaults {
+	unsigned char address_bytes;
+	unsigned char operand_bytes;
+};
+
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
 				int regoff);
+struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index c77ed80..693e5a8 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -603,6 +603,71 @@ static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
 }
 
 /**
+ * insn_get_code_seg_defaults() - Obtain code segment default parameters
+ * @regs:	Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the default parameters of the code segment: address and operand sizes.
+ * The code segment is obtained from the selector contained in the CS register
+ * in regs. In protected mode, the default address is determined by inspecting
+ * the L and D bits of the segment descriptor. In virtual-8086 mode, the default
+ * is always two bytes for both address and operand sizes.
+ *
+ * Return: A populated insn_code_seg_defaults structure on success. The
+ * structure contains only zeros on failure.
+ */
+struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs *regs)
+{
+	struct desc_struct *desc;
+	struct insn_code_seg_defaults defs;
+	unsigned short sel;
+	/*
+	 * The most significant byte of AR_TYPE_MASK determines whether a
+	 * segment contains data or code.
+	 */
+	unsigned int type_mask = AR_TYPE_MASK & (1 << 11);
+
+	memset(&defs, 0, sizeof(defs));
+
+	if (v8086_mode(regs)) {
+		defs.address_bytes = 2;
+		defs.operand_bytes = 2;
+		return defs;
+	}
+
+	sel = (unsigned short)regs->cs;
+
+	desc = get_desc(sel);
+	if (!desc)
+		return defs;
+
+	/* if data segment, return */
+	if (!(desc->b & type_mask))
+		return defs;
+
+	switch ((desc->l << 1) | desc->d) {
+	case 0: /* Legacy mode. CS.L=0, CS.D=0 */
+		defs.address_bytes = 2;
+		defs.operand_bytes = 2;
+		break;
+	case 1: /* Legacy mode. CS.L=0, CS.D=1 */
+		defs.address_bytes = 4;
+		defs.operand_bytes = 4;
+		break;
+	case 2: /* IA-32e 64-bit mode. CS.L=1, CS.D=0 */
+		defs.address_bytes = 8;
+		defs.operand_bytes = 4;
+		break;
+	case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+		/* fall through */
+	default:
+		defs.address_bytes = 0;
+		defs.operand_bytes = 0;
+	}
+
+	return defs;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm() - Obtain register in r/m part of ModRM byte
  * @insn:	Instruction structure containing the ModRM byte
  * @regs:	Structure with register values as seen when entering kernel mode
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (12 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-07 13:15   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 15/26] x86/insn-eval: Incorporate segment base and limit in linear address computation Ricardo Neri
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod is zero and
ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means
that none of the registers are used in the computation of the effective
address. A return value of -EDOM signals callers that they should not use
the value of registers when computing the effective address for the
instruction.

In IA-32e 64-bit mode (long mode), the effective address is given by the
32-bit displacement plus the value of RIP of the next instruction.
In IA-32e compatibility mode (protected mode), only the displacement is
used.

The instruction decoder takes care of obtaining the displacement.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 693e5a8..4f600de 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -379,6 +379,12 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 	switch (type) {
 	case REG_TYPE_RM:
 		regno = X86_MODRM_RM(insn->modrm.value);
+		/*
+		 * ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement
+		 * follows the ModRM byte.
+		 */
+		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+			return -EDOM;
 		if (X86_REX_B(insn->rex_prefix.value))
 			regno += 8;
 		break;
@@ -730,9 +736,21 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-			if (addr_offset < 0)
+			/*
+			 * -EDOM means that we must ignore the address_offset.
+			 * In such a case, in 64-bit mode the effective address
+			 * relative to the RIP of the following instruction.
+			 */
+			if (addr_offset == -EDOM) {
+				eff_addr = 0;
+				if (user_64bit_mode(regs))
+					eff_addr = (long)regs->ip +
+						   insn->length;
+			} else if (addr_offset < 0) {
 				goto out_err;
-			eff_addr = regs_get_register(regs, addr_offset);
+			} else {
+				eff_addr = regs_get_register(regs, addr_offset);
+			}
 		}
 		eff_addr += insn->displacement.value;
 	}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 15/26] x86/insn-eval: Incorporate segment base and limit in linear address computation
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (13 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5 Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

insn_get_addr_ref() returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to compute the linear address, we must add
to the effective address the segment base address as set in the segment
descriptor. Furthermore, the segment descriptor to use depends on the
register that is used as the base of the effective address. The effective
base address varies depending on whether the operand is a register or a
memory address and on whether a SiB byte is used.

In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
segment is used or if segmentation is not used. However, the base address
is not necessarily zero if a user programs defines its own segments. This
is possible by using a local descriptor table.

Since the effective address is a signed quantity, the unsigned segment
base address is saved in a separate variable and added to the final
effective address.

Before returning the linear address, we check if the computed effective
address is within the segment limit. In protected mode segment limits are
not enforced. We can keep the check as get_seg_limit() return -1L in this
case.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4f600de..1a5f5a6 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -695,7 +695,7 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
  */
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-	unsigned long linear_addr;
+	unsigned long linear_addr, seg_base_addr, seg_limit;
 	long eff_addr, base, indx;
 	int addr_offset, base_offset, indx_offset;
 	insn_byte_t sib;
@@ -709,6 +709,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 		if (addr_offset < 0)
 			goto out_err;
 		eff_addr = regs_get_register(regs, addr_offset);
+		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
+		if (seg_base_addr == -1L)
+			goto out_err;
+		seg_limit = get_seg_limit(regs, insn, addr_offset);
 	} else {
 		if (insn->sib.nbytes) {
 			/*
@@ -734,6 +738,11 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 				indx = regs_get_register(regs, indx_offset);
 
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+			seg_base_addr = insn_get_seg_base(regs, insn,
+							  base_offset);
+			if (seg_base_addr == -1L)
+				goto out_err;
+			seg_limit = get_seg_limit(regs, insn, base_offset);
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 			/*
@@ -751,10 +760,25 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			} else {
 				eff_addr = regs_get_register(regs, addr_offset);
 			}
+			seg_base_addr = insn_get_seg_base(regs, insn,
+							  addr_offset);
+			if (seg_base_addr == -1L)
+				goto out_err;
+			seg_limit = get_seg_limit(regs, insn, addr_offset);
 		}
 		eff_addr += insn->displacement.value;
 	}
+
 	linear_addr = (unsigned long)eff_addr;
+	/*
+	 * Make sure the effective address is within the limits of the
+	 * segment. In long mode, the limit is -1L. Thus, the second part
+	 * of the check always succeeds.
+	 */
+	if (linear_addr > seg_limit)
+		goto out_err;
+
+	linear_addr += seg_base_addr;
 
 	return (void __user *)linear_addr;
 out_err:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (14 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 15/26] x86/insn-eval: Incorporate segment base and limit in linear address computation Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-07 15:48   ` Borislav Petkov
  2017-06-07 15:49   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 17/26] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode Ricardo Neri
                   ` (10 subsequent siblings)
  26 siblings, 2 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

The 32-bit and 64-bit address encodings are identical. This means that we
can use the same function in both cases. In order to reuse the function
for 32-bit address encodings, we must sign-extend our 32-bit signed
operands to 64-bit signed variables (only for 64-bit builds). To decide on
whether sign extension is needed, we rely on the address size as given by
the instruction structure.

Once the effective address has been computed, a special verification is
needed for 32-bit processes. If running on a 64-bit kernel, such processes
can address up to 4GB of memory. Hence, for instance, an effective
address of 0xffff1234 would be misinterpreted as 0xffffffffffff1234 due to
the sign extension mentioned above. For this reason, the 4 must be
truncated to obtain the true effective address.

Lastly, before computing the linear address, we verify that the effective
address is within the limits of the segment. The check is kept for long
mode because in such a case the limit is set to -1L. This is the largest
unsigned number possible. This is equivalent to a limit-less segment.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 99 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 88 insertions(+), 11 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 1a5f5a6..c7c1239 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -688,6 +688,62 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
 	return get_reg_offset(insn, regs, REG_TYPE_RM);
 }
 
+/**
+ * _to_signed_long() - Cast an unsigned long into signed long
+ * @val		A 32-bit or 64-bit unsigned long
+ * @long_bytes	The number of bytes used to represent a long number
+ * @out		The casted signed long
+ *
+ * Return: A signed long of either 32 or 64 bits, as per the build configuration
+ * of the kernel.
+ */
+static int _to_signed_long(unsigned long val, int long_bytes, long *out)
+{
+	if (!out)
+		return -EINVAL;
+
+#ifdef CONFIG_X86_64
+	if (long_bytes == 4) {
+		/* higher bytes should all be zero */
+		if (val & ~0xffffffff)
+			return -EINVAL;
+
+		/* sign-extend to a 64-bit long */
+		*out = (long)((int)(val));
+		return 0;
+	} else if (long_bytes == 8) {
+		*out = (long)val;
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+#else
+	*out = (long)val;
+	return 0;
+#endif
+}
+
+/** get_mem_offset() - Obtain the memory offset indicated in operand register
+ * @regs	Structure with register values as seen when entering kernel mode
+ * @reg_offset	Offset from the base of pt_regs of the operand register
+ * @addr_size	Address size of the code segment in use
+ *
+ * Obtain the offset (a signed number with size as specified in addr_size)
+ * indicated in the register used for register-indirect memory adressing.
+ *
+ * Return: A memory offset to be used in the computation of effective address.
+ */
+long get_mem_offset(struct pt_regs *regs, int reg_offset, int addr_size)
+{
+	int ret;
+	long offset = -1L;
+	unsigned long uoffset = regs_get_register(regs, reg_offset);
+
+	ret = _to_signed_long(uoffset, addr_size, &offset);
+	if (ret)
+		return -1L;
+	return offset;
+}
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
@@ -697,18 +753,21 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
 	unsigned long linear_addr, seg_base_addr, seg_limit;
 	long eff_addr, base, indx;
-	int addr_offset, base_offset, indx_offset;
+	int addr_offset, base_offset, indx_offset, addr_bytes;
 	insn_byte_t sib;
 
 	insn_get_modrm(insn);
 	insn_get_sib(insn);
 	sib = insn->sib.value;
+	addr_bytes = insn->addr_bytes;
 
 	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
 		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 		if (addr_offset < 0)
 			goto out_err;
-		eff_addr = regs_get_register(regs, addr_offset);
+		eff_addr = get_mem_offset(regs, addr_offset, addr_bytes);
+		if (eff_addr == -1L)
+			goto out_err;
 		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
 		if (seg_base_addr == -1L)
 			goto out_err;
@@ -722,20 +781,28 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			 * in the address computation.
 			 */
 			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-			if (base_offset == -EDOM)
+			if (base_offset == -EDOM) {
 				base = 0;
-			else if (base_offset < 0)
+			} else if (base_offset < 0) {
 				goto out_err;
-			else
-				base = regs_get_register(regs, base_offset);
+			} else {
+				base = get_mem_offset(regs, base_offset,
+						      addr_bytes);
+				if (base == -1L)
+					goto out_err;
+			}
 
 			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			if (indx_offset == -EDOM)
+			if (indx_offset == -EDOM) {
 				indx = 0;
-			else if (indx_offset < 0)
+			} else if (indx_offset < 0) {
 				goto out_err;
-			else
-				indx = regs_get_register(regs, indx_offset);
+			} else {
+				indx = get_mem_offset(regs, indx_offset,
+						      addr_bytes);
+				if (indx == -1L)
+					goto out_err;
+			}
 
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 			seg_base_addr = insn_get_seg_base(regs, insn,
@@ -758,7 +825,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			} else if (addr_offset < 0) {
 				goto out_err;
 			} else {
-				eff_addr = regs_get_register(regs, addr_offset);
+				eff_addr = get_mem_offset(regs, addr_offset,
+							  addr_bytes);
+				if (eff_addr == -1L)
+					goto out_err;
 			}
 			seg_base_addr = insn_get_seg_base(regs, insn,
 							  addr_offset);
@@ -771,6 +841,13 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 
 	linear_addr = (unsigned long)eff_addr;
 	/*
+	 * If address size is 32-bit, truncate the 4 most significant bytes.
+	 * This is to avoid phony negative offsets.
+	 */
+	if (addr_bytes == 4)
+		linear_addr &= 0xffffffff;
+
+	/*
 	 * Make sure the effective address is within the limits of the
 	 * segment. In long mode, the limit is -1L. Thus, the second part
 	 * of the check always succeeds.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 17/26] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (15 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings Ricardo Neri
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

It is possible to utilize 32-bit address encodings in virtual-8086 mode via
an address override instruction prefix. However, the range of address is
still limited to [0x-0xffff]. In such a case, return error.

Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
such limit by truncating the most significant bytes of the computed linear
address.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index c7c1239..9822061 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -848,6 +848,12 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 		linear_addr &= 0xffffffff;
 
 	/*
+	 * Even though 32-bit address encodings are allowed in virtual-8086
+	 * mode, the address range is still limited to [0x-0xffff].
+	 */
+	if (v8086_mode(regs) && (linear_addr & ~0xffff))
+		goto out_err;
+	/*
 	 * Make sure the effective address is within the limits of the
 	 * segment. In long mode, the limit is -1L. Thus, the second part
 	 * of the check always succeeds.
@@ -857,6 +863,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 
 	linear_addr += seg_base_addr;
 
+	/* Limit linear address to 20 bits */
+	if (v8086_mode(regs))
+		linear_addr &= 0xfffff;
+
 	return (void __user *)linear_addr;
 out_err:
 	return (void __user *)-1;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (16 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 17/26] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-07 16:28   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 19/26] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings Ricardo Neri
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Tasks running in virtual-8086 mode or in protected mode with code
segment descriptors that specify 16-bit default address sizes via the
D bit will use 16-bit addressing form encodings as described in the Intel
64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
2.1.5. 16-bit addressing encodings differ in several ways from the
32-bit/64-bit addressing form encodings: ModRM.rm points to different
registers and, in some cases, effective addresses are indicated by the
addition of the value of two registers. Also, there is no support for SIB
bytes. Thus, a separate function is needed to parse this form of
addressing.

A couple of functions are introduced. get_reg_offset_16() obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. get_addr_ref_16() computes the linear
address indicated by the instructions using the value of the registers
given by ModRM as well as the base address of the segment.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 9822061..928a662 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -431,6 +431,73 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16 - Obtain offset of register indicated by instruction
+ * @insn:	Instruction structure containing ModRM and SiB bytes
+ * @regs:	Structure with register values as seen when entering kernel mode
+ * @offs1:	Offset of the first operand register
+ * @offs2:	Offset of the second opeand register, if applicable.
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+			     int *offs1, int *offs2)
+{
+	/* 16-bit addressing can use one or two registers */
+	static const int regoff1[] = {
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, bx),
+	};
+
+	static const int regoff2[] = {
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+		-EDOM,
+		-EDOM,
+		-EDOM,
+		-EDOM,
+	};
+
+	if (!offs1 || !offs2)
+		return -EINVAL;
+
+	/* operand is a register, use the generic function */
+	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+		*offs1 = insn_get_modrm_rm_off(insn, regs);
+		*offs2 = -EDOM;
+		return 0;
+	}
+
+	*offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+	*offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+	/*
+	 * If no displacement is indicated in the mod part of the ModRM byte,
+	 * (mod part is 0) and the r/m part of the same byte is 6, no register
+	 * is used caculate the operand address. An r/m part of 6 means that
+	 * the second register offset is already invalid.
+	 */
+	if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+	    (X86_MODRM_RM(insn->modrm.value) == 6))
+		*offs1 = -EDOM;
+
+	return 0;
+}
+
+/**
  * get_desc() - Obtain address of segment descriptor
  * @sel:	Segment selector
  *
@@ -689,6 +756,94 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
 }
 
 /**
+ * get_addr_ref_16() - Obtain the 16-bit address referred by instruction
+ * @insn:	Instruction structure containing ModRM byte and displacement
+ * @regs:	Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 16-bit address encodings. Obtain the memory
+ * address referred by the instruction's ModRM bytes and displacement. Also, the
+ * segment used as base is determined by either any segment override prefixes in
+ * insn or the default segment of the registers involved in the address
+ * computation. In protected mode, segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on failure.
+ */
+static void __user *get_addr_ref_16(struct insn *insn, struct pt_regs *regs)
+{
+	unsigned long linear_addr, seg_base_addr, seg_limit;
+	short eff_addr, addr1 = 0, addr2 = 0;
+	int addr_offset1, addr_offset2;
+	int ret;
+
+	insn_get_modrm(insn);
+	insn_get_displacement(insn);
+
+	/*
+	 * If operand is a register, the layout is the same as in
+	 * 32-bit and 64-bit addressing.
+	 */
+	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+		addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM);
+		if (addr_offset1 < 0)
+			goto out_err;
+		eff_addr = regs_get_register(regs, addr_offset1);
+		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
+		if (seg_base_addr == -1L)
+			goto out_err;
+		seg_limit = get_seg_limit(regs, insn, addr_offset1);
+	} else {
+		ret = get_reg_offset_16(insn, regs, &addr_offset1,
+					&addr_offset2);
+		if (ret < 0)
+			goto out_err;
+		/*
+		 * Don't fail on invalid offset values. They might be invalid
+		 * because they cannot be used for this particular value of
+		 * the ModRM. Instead, use them in the computation only if
+		 * they contain a valid value.
+		 */
+		if (addr_offset1 != -EDOM)
+			addr1 = 0xffff & regs_get_register(regs, addr_offset1);
+		if (addr_offset2 != -EDOM)
+			addr2 = 0xffff & regs_get_register(regs, addr_offset2);
+		eff_addr = addr1 + addr2;
+		/*
+		 * The first register is in the operand implies the SS or DS
+		 * segment selectors, the second register in the operand can
+		 * only imply DS. Thus, use the first register to obtain
+		 * the segment selector.
+		 */
+		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
+		if (seg_base_addr == -1L)
+			goto out_err;
+		seg_limit = get_seg_limit(regs, insn, addr_offset1);
+
+		eff_addr += (insn->displacement.value & 0xffff);
+	}
+
+	linear_addr = (unsigned long)(eff_addr & 0xffff);
+
+	/*
+	 * Make sure the effective address is within the limits of the
+	 * segment. In long mode, the limit is -1L. Thus, the second part
+	 * of the check always succeeds.
+	 */
+	if (linear_addr > seg_limit)
+		goto out_err;
+
+	linear_addr += seg_base_addr;
+
+	/* Limit linear address to 20 bits */
+	if (v8086_mode(regs))
+		linear_addr &= 0xfffff;
+
+	return (void __user *)linear_addr;
+out_err:
+	return (void __user *)-1;
+}
+
+/**
  * _to_signed_long() - Cast an unsigned long into signed long
  * @val		A 32-bit or 64-bit unsigned long
  * @long_bytes	The number of bytes used to represent a long number
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 19/26] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (17 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Convert the function insn_get_add_ref() into a wrapper function that calls
the correct static address-decoding function depending on the address size
In this way, callers do not need to worry about calling the correct
function and decreases the number of functions that need to be exposed.

To this end, the function insn_get_addr_ref() used to obtain linear
addresses from the 32/64-bit encodings is renamed as get_addr_ref_32_64()
to reflect the type of address encodings that it handles.

Documentation is added to the new wrapper function and the documentation
for the 32/64-bit address decoding function is improved.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 928a662..8914884 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -899,12 +899,22 @@ long get_mem_offset(struct pt_regs *regs, int reg_offset, int addr_size)
 		return -1L;
 	return offset;
 }
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+
+/**
+ * get_addr_ref_32_64() - Obtain a 32/64-bit linear address
+ * @insn:	Instruction struct with ModRM and SiB bytes and displacement
+ * @regs:	Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 32-bit and 64-bit address encodings to
+ * obtain the effective memory address referred by the instruction's ModRM,
+ * SIB, and displacement bytes, as applicable. Also, the segment base is used
+ * to compute the linear address. In protected mode, segment limits are
+ * enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on failure.
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+static void __user *get_addr_ref_32_64(struct insn *insn, struct pt_regs *regs)
 {
 	unsigned long linear_addr, seg_base_addr, seg_limit;
 	long eff_addr, base, indx;
@@ -1026,3 +1036,31 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 out_err:
 	return (void __user *)-1;
 }
+
+/**
+ * insn_get_addr_ref() - Obtain the linear address referred by instruction
+ * @insn:	Instruction structure containing ModRM byte and displacement
+ * @regs:	Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the memory address referred by the instruction's ModRM bytes and
+ * displacement. Also, the segment used as base is determined by either any
+ * segment override prefixes in insn or the default segment of the registers
+ * involved in the address computation. In protected mode, segment limits
+ * are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on failure.
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+	switch (insn->addr_bytes) {
+	case 2:
+		return get_addr_ref_16(insn, regs);
+	case 4:
+		/* fall through */
+	case 8:
+		return get_addr_ref_32_64(insn, regs);
+	default:
+		return (void __user *)-1;
+	}
+}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (18 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 19/26] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-06  9:04   ` Paolo Bonzini
  2017-06-07 18:24   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 21/26] x86: Add emulation code for UMIP instructions Ricardo Neri
                   ` (6 subsequent siblings)
  26 siblings, 2 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 2701e5f..f1d61d2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -289,6 +289,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
+#define X86_FEATURE_UMIP	(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU		(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE	(16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 5dff775..7adaef7 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX	(1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP	0
+#else
+# define DISABLE_UMIP	(1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME		(1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR	(1<<(X86_FEATURE_K6_MTRR & 31))
@@ -61,7 +67,7 @@
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
-#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57)
+#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50..d2c2af8 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR		_BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT	10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT	_BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT	11 /* enable UMIP support */
+#define X86_CR4_UMIP		_BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_VMXE_BIT	13 /* enable VMX virtualization */
 #define X86_CR4_VMXE		_BITUL(X86_CR4_VMXE_BIT)
 #define X86_CR4_SMXE_BIT	14 /* enable safer mode (TXT) */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 21/26] x86: Add emulation code for UMIP instructions
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (19 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-08 18:38   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 22/26] x86/umip: Force a page fault when unable to copy emulated result to user Ricardo Neri
                   ` (5 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.

Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).

This emulation is needed because certain applications (e.g., WineHQ and
DOSEMU2) rely on this subset of instructions to function.

The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications
such as WineHQ rely on the result being located in the kernel memory space.
The result is emulated as a hard-coded value that, lies close to the top
of the kernel memory. The limit for the GDT and the IDT are set to zero.

Given that sldt and str are not used in common in programs supported by
WineHQ and DOSEMU2, they are not emulated.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. This is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref() inspects the segment descriptor pointed by the
registers in pt_regs. This ensures that we correctly obtain the segment
base address and the address and operand sizes even if the user space
application uses local descriptor table.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/umip.h |  15 +++
 arch/x86/kernel/Makefile    |   1 +
 arch/x86/kernel/umip.c      | 245 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 0000000..077b236
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs)
+{
+	return false;
+}
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4b99423..cc1b7cc 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)		+= umip.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y					+= unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 0000000..c7c5795
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,245 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ *    sgdt
+ *    sldt
+ *    sidt
+ *    str
+ *    smsw
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri <ricardo.neri@linux.intel.com>
+ */
+
+#include <linux/uaccess.h>
+#include <asm/umip.h>
+#include <asm/traps.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+#include <linux/ratelimit.h>
+
+/*
+ * == Base addresses of GDT and IDT
+ * Some applications to function rely finding the global descriptor table (GDT)
+ * and the interrupt descriptor table (IDT) in kernel memory.
+ * For x86_32, the selected values do not match any particular hole, but it
+ * suffices to provide a memory location within kernel memory.
+ *
+ * == CRO flags for SMSW
+ * Use the flags given when booting, as found in head_32.S
+ */
+
+#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
+		   X86_CR0_WP | X86_CR0_AM)
+#define UMIP_DUMMY_GDT_BASE 0xfffe0000
+#define UMIP_DUMMY_IDT_BASE 0xffff0000
+
+enum umip_insn {
+	UMIP_SGDT = 0,	/* opcode 0f 01 ModR/M reg 0 */
+	UMIP_SIDT,	/* opcode 0f 01 ModR/M reg 1 */
+	UMIP_SLDT,	/* opcode 0f 00 ModR/M reg 0 */
+	UMIP_SMSW,	/* opcode 0f 01 ModR/M reg 4 */
+	UMIP_STR,	/* opcode 0f 00 ModR/M reg 1 */
+};
+
+/**
+ * __identify_insn() - Identify a UMIP-protected instruction
+ * @insn:	Instruction structure with opcode and ModRM byte.
+ *
+ * From the instruction opcode and the reg part of the ModRM byte, identify,
+ * if any, a UMIP-protected instruction.
+ *
+ * Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure.
+ */
+static int __identify_insn(struct insn *insn)
+{
+	/* By getting modrm we also get the opcode. */
+	insn_get_modrm(insn);
+
+	/* All the instructions of interest start with 0x0f. */
+	if (insn->opcode.bytes[0] != 0xf)
+		return -EINVAL;
+
+	if (insn->opcode.bytes[1] == 0x1) {
+		switch (X86_MODRM_REG(insn->modrm.value)) {
+		case 0:
+			return UMIP_SGDT;
+		case 1:
+			return UMIP_SIDT;
+		case 4:
+			return UMIP_SMSW;
+		default:
+			return -EINVAL;
+		}
+	}
+	/* SLDT AND STR are not emulated */
+	return -EINVAL;
+}
+
+/**
+ * __emulate_umip_insn() - Emulate UMIP instructions with dummy values
+ * @insn:	Instruction structure with ModRM byte
+ * @umip_inst:	Instruction to emulate
+ * @data:	Buffer onto which the dummy values will be copied
+ * @data_size:	Size of the emulated result
+ *
+ * Emulate an instruction protected by UMIP. The result of the emulation
+ * is saved in the provided buffer. The size of the results depends on both
+ * the instruction and type of operand (register vs memory address). Thus,
+ * the size of the result needs to be updated.
+ *
+ * Result: 0 if success, -EINVAL on failure to emulate
+ */
+static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
+			       unsigned char *data, int *data_size)
+{
+	unsigned long dummy_base_addr;
+	unsigned short dummy_limit = 0;
+	unsigned int dummy_value = 0;
+
+	switch (umip_inst) {
+	/*
+	 * These two instructions return the base address and limit of the
+	 * global and interrupt descriptor table. The base address can be
+	 * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand
+	 * size is 16-bit the returned value of the base address is supposed
+	 * to be a zero-extended 24-byte number. However, it seems that a
+	 * 32-byte number is always returned in legacy protected mode
+	 * irrespective of the operand size.
+	 */
+	case UMIP_SGDT:
+		/* fall through */
+	case UMIP_SIDT:
+		if (umip_inst == UMIP_SGDT)
+			dummy_base_addr = UMIP_DUMMY_GDT_BASE;
+		else
+			dummy_base_addr = UMIP_DUMMY_IDT_BASE;
+		if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+			/* SGDT and SIDT do not take register as argument. */
+			return -EINVAL;
+		}
+
+		memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
+		memcpy(data, &dummy_limit, sizeof(dummy_limit));
+		*data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);
+		break;
+	case UMIP_SMSW:
+		/*
+		 * Even though CR0_STATE contain 4 bytes, the number
+		 * of bytes to be copied in the result buffer is determined
+		 * by whether the operand is a register or a memory location.
+		 */
+		dummy_value = CR0_STATE;
+		/*
+		 * These two instructions return a 16-bit value. We return
+		 * all zeros. This is equivalent to a null descriptor for
+		 * str and sldt.
+		 */
+		/* SLDT and STR are not emulated */
+		/* fall through */
+	case UMIP_SLDT:
+		/* fall through */
+	case UMIP_STR:
+		/* fall through */
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * fixup_umip_exception() - Fixup #GP faults caused by UMIP
+ * @regs:	Registers as saved when entering the #GP trap
+ *
+ * The instructions sgdt, sidt, str, smsw, sldt cause a general protection
+ * fault if with CPL > 0 (i.e., from user space). This function can be
+ * used to emulate the results of the aforementioned instructions with
+ * dummy values. Results are copied to user-space memory as indicated by
+ * the instruction pointed by EIP using the registers indicated in the
+ * instruction operands. This function also takes care of determining
+ * the address to which the results must be copied.
+ */
+bool fixup_umip_exception(struct pt_regs *regs)
+{
+	struct insn insn;
+	unsigned char buf[MAX_INSN_SIZE];
+	/* 10 bytes is the maximum size of the result of UMIP instructions */
+	unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+	unsigned long seg_base;
+	int not_copied, nr_copied, reg_offset, dummy_data_size;
+	void __user *uaddr;
+	unsigned long *reg_addr;
+	enum umip_insn umip_inst;
+	struct insn_code_seg_defaults seg_defs;
+
+	/*
+	 * Use the segment base in case user space used a different code
+	 * segment, either in protected (e.g., from an LDT) or virtual-8086
+	 * modes. In most of the cases seg_base will be zero as in USER_CS.
+	 */
+	seg_base = insn_get_seg_base(regs, &insn,
+				     offsetof(struct pt_regs, ip));
+	not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
+				    sizeof(buf));
+	nr_copied = sizeof(buf) - not_copied;
+	/*
+	 * The copy_from_user above could have failed if user code is protected
+	 * by a memory protection key. Give up on emulation in such a case.
+	 * Should we issue a page fault?
+	 */
+	if (!nr_copied)
+		return false;
+
+	insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
+
+	/*
+	 * Override the default operand and address sizes to what is specified
+	 * in the code segment descriptor. The instruction decoder only sets
+	 * the address size it to either 4 or 8 address bytes and does nothing
+	 * for the operand bytes. This OK for most of the cases, but we could
+	 * have special cases where, for instance, a 16-bit code segment
+	 * descriptor is used.
+	 * If there are overrides, the instruction decoder correctly updates
+	 * these values, even for 16-bit defaults.
+	 */
+	seg_defs = insn_get_code_seg_defaults(regs);
+	insn.addr_bytes = seg_defs.address_bytes;
+	insn.opnd_bytes = seg_defs.operand_bytes;
+
+	if (!insn.addr_bytes || !insn.opnd_bytes)
+		return false;
+
+	if (user_64bit_mode(regs))
+		return false;
+
+	insn_get_length(&insn);
+	if (nr_copied < insn.length)
+		return false;
+
+	umip_inst = __identify_insn(&insn);
+	/* Check if we found an instruction protected by UMIP */
+	if (umip_inst < 0)
+		return false;
+
+	if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
+		return false;
+
+	/* If operand is a register, write directly to it */
+	if (X86_MODRM_MOD(insn.modrm.value) == 3) {
+		reg_offset = insn_get_modrm_rm_off(&insn, regs);
+		reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
+		memcpy(reg_addr, dummy_data, dummy_data_size);
+	} else {
+		uaddr = insn_get_addr_ref(&insn, regs);
+		/* user address could not be determined, abort emulation */
+		if ((unsigned long)uaddr == -1L)
+			return false;
+		nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
+		if (nr_copied  > 0)
+			return false;
+	}
+
+	/* increase IP to let the program keep going */
+	regs->ip += insn.length;
+	return true;
+}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 22/26] x86/umip: Force a page fault when unable to copy emulated result to user
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (20 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 21/26] x86: Add emulation code for UMIP instructions Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-09 11:02   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 23/26] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
                   ` (4 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

fixup_umip_exception() will be called from do_general_protection. If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function is inspired in
force_sig_info_fault is introduced to model the page fault.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/umip.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index c7c5795..ff7366a 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -148,6 +148,41 @@ static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
 }
 
 /**
+ * __force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
+ * @address:	Address that caused the signal
+ * @regs:	Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void __force_sig_info_umip_fault(void __user *address,
+					struct pt_regs *regs)
+{
+	siginfo_t info;
+	struct task_struct *tsk = current;
+
+	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
+		printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n",
+				   tsk->comm, task_pid_nr(tsk), regs->ip,
+				   regs->sp, X86_PF_USER | X86_PF_WRITE,
+				   regs->ip);
+	}
+
+	tsk->thread.cr2		= (unsigned long)address;
+	tsk->thread.error_code	= X86_PF_USER | X86_PF_WRITE;
+	tsk->thread.trap_nr	= X86_TRAP_PF;
+
+	info.si_signo	= SIGSEGV;
+	info.si_errno	= 0;
+	info.si_code	= SEGV_MAPERR;
+	info.si_addr	= address;
+	force_sig_info(SIGSEGV, &info, tsk);
+}
+
+/**
  * fixup_umip_exception() - Fixup #GP faults caused by UMIP
  * @regs:	Registers as saved when entering the #GP trap
  *
@@ -235,8 +270,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
 		if ((unsigned long)uaddr == -1L)
 			return false;
 		nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-		if (nr_copied  > 0)
-			return false;
+		if (nr_copied  > 0) {
+			/*
+			 * If copy fails, send a signal and tell caller that
+			 * fault was fixed up
+			 */
+			__force_sig_info_umip_fault(uaddr, regs);
+			return true;
+		}
 	}
 
 	/* increase IP to let the program keep going */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 23/26] x86/traps: Fixup general protection faults caused by UMIP
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (21 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 22/26] x86/umip: Force a page fault when unable to copy emulated result to user Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-09 13:02   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (3 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception will emulate dummy results for these
instructions. If emulation is successful, the result is passed to the
user space program and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/traps.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3995d3a..cec548d 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include <asm/trace/mpx.h>
 #include <asm/mpx.h>
 #include <asm/vm86.h>
+#include <asm/umip.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -526,6 +527,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
 	cond_local_irq_enable(regs);
 
+	if (user_mode(regs) && fixup_umip_exception(regs))
+		return;
+
 	if (v8086_mode(regs)) {
 		local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (22 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 23/26] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-06-09 16:10   ` Borislav Petkov
  2017-05-05 18:17 ` [PATCH v7 25/26] selftests/x86: Add tests for " Ricardo Neri
                   ` (2 subsequent siblings)
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/Kconfig             | 10 ++++++++++
 arch/x86/kernel/cpu/common.c | 16 +++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 702002b..1b1bbeb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1745,6 +1745,16 @@ config X86_SMAP
 
 	  If unsure, say Y.
 
+config X86_INTEL_UMIP
+	def_bool y
+	depends on CPU_SUP_INTEL
+	prompt "Intel User Mode Instruction Prevention" if EXPERT
+	---help---
+	  The User Mode Instruction Prevention (UMIP) is a security
+	  feature in newer Intel processors. If enabled, a general
+	  protection fault is issued if the instructions SGDT, SLDT,
+	  SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8ee3211..66ebded 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -311,6 +311,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 	}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+	if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
+	    cpu_has(c, X86_FEATURE_UMIP))
+		cr4_set_bits(X86_CR4_UMIP);
+	else
+		/*
+		 * Make sure UMIP is disabled in case it was enabled in a
+		 * previous boot (e.g., via kexec).
+		 */
+		cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1121,9 +1134,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP */
+	/* Set up SMEP/SMAP/UMIP */
 	setup_smep(c);
 	setup_smap(c);
+	setup_umip(c);
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 25/26] selftests/x86: Add tests for User-Mode Instruction Prevention
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (23 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-05 18:17 ` [PATCH v7 26/26] selftests/x86: Add tests for instruction str and sldt Ricardo Neri
  2017-05-17 18:42 ` [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri

Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel traps it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed
without causing such #GP. If no #GP exceptions occur, we expect to exit
virtual-8086 mode from INT3.

The instructions protected by UMIP are executed in representative use
cases:
 a) displacement-only memory addressing
 b) register-indirect memory addressing
 c) results stored directly in operands

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not
have the UMIP feature. Instead, results are printed for verification. A
simple verification is done to ensure that results of all tests are
identical.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 tools/testing/selftests/x86/entry_from_vm86.c | 73 ++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..130e8ad 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
 	"int3\n\t"
 	"vmcode_int80:\n\t"
 	"int $0x80\n\t"
+	"vmcode_umip:\n\t"
+	/* addressing via displacements */
+	"smsw (2052)\n\t"
+	"sidt (2054)\n\t"
+	"sgdt (2060)\n\t"
+	/* addressing via registers */
+	"mov $2066, %bx\n\t"
+	"smsw (%bx)\n\t"
+	"mov $2068, %bx\n\t"
+	"sidt (%bx)\n\t"
+	"mov $2074, %bx\n\t"
+	"sgdt (%bx)\n\t"
+	/* register operands, only for smsw */
+	"smsw %ax\n\t"
+	"mov %ax, (2080)\n\t"
+	"int3\n\t"
 	".size vmcode, . - vmcode\n\t"
 	"end_vmcode:\n\t"
 	".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-	vmcode_sti[], vmcode_int3[], vmcode_int80[];
+	vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
 	return true;
 }
 
+void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
+{
+	struct table_desc {
+		unsigned short limit;
+		unsigned long base;
+	} __attribute__((packed));
+
+	/* Initialize variables with arbitrary values */
+	struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x9999 };
+	struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae };
+	struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 };
+	struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 };
+	unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737;
+
+	/* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */
+	do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests");
+
+	/* Results from displacement-only addressing */
+	msw1 = *(unsigned short *)(test_mem + 2052);
+	memcpy(&idt1, test_mem + 2054, sizeof(idt1));
+	memcpy(&gdt1, test_mem + 2060, sizeof(gdt1));
+
+	/* Results from register-indirect addressing */
+	msw2 = *(unsigned short *)(test_mem + 2066);
+	memcpy(&idt2, test_mem + 2068, sizeof(idt2));
+	memcpy(&gdt2, test_mem + 2074, sizeof(gdt2));
+
+	/* Results when using register operands */
+	msw3 = *(unsigned short *)(test_mem + 2080);
+
+	printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1);
+	printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n",
+	       idt1.limit, idt1.base);
+	printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n",
+	       gdt1.limit, gdt1.base);
+
+	if ((msw1 != msw2) || (msw1 != msw3))
+		printf("[FAIL]\tAll the results of SMSW should be the same.\n");
+	else
+		printf("[PASS]\tAll the results from SMSW are identical.\n");
+
+	if (memcmp(&gdt1, &gdt2, sizeof(gdt1)))
+		printf("[FAIL]\tAll the results of SGDT should be the same.\n");
+	else
+		printf("[PASS]\tAll the results from SGDT are identical.\n");
+
+	if (memcmp(&idt1, &idt2, sizeof(idt1)))
+		printf("[FAIL]\tAll the results of SIDT should be the same.\n");
+	else
+		printf("[PASS]\tAll the results from SIDT are identical.\n");
+}
+
 int main(void)
 {
 	struct vm86plus_struct v86;
@@ -218,6 +286,9 @@ int main(void)
 	v86.regs.eax = (unsigned int)-1;
 	do_test(&v86, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
 
+	/* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */
+	do_umip_tests(&v86, addr);
+
 	/* Execute a null pointer */
 	v86.regs.cs = 0;
 	v86.regs.ss = 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH v7 26/26] selftests/x86: Add tests for instruction str and sldt
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (24 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 25/26] selftests/x86: Add tests for " Ricardo Neri
@ 2017-05-05 18:17 ` Ricardo Neri
  2017-05-17 18:42 ` [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
  26 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:17 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri

The instructions str and sldt are not recognized when running on virtual-
8086 mode and generate an invalid operand exception. These two
instructions are protected by the Intel User-Mode Instruction Prevention
(UMIP) security feature. In protected mode, if UMIP is enabled, these
instructions generate a general protection fault if called from CPL > 0.
Linux traps the general protection fault and emulate the results with
dummy values.

These tests are added to verify that the emulation code does not emulate
these two instructions but issue the expected invalid operand exception.

Tests fallback to exit with int3 in case emulation does happen.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 tools/testing/selftests/x86/entry_from_vm86.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c
index 130e8ad..b7a0c90 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -111,6 +111,11 @@ asm (
 	"smsw %ax\n\t"
 	"mov %ax, (2080)\n\t"
 	"int3\n\t"
+	"vmcode_umip_str:\n\t"
+	"str %eax\n\t"
+	"vmcode_umip_sldt:\n\t"
+	"sldt %eax\n\t"
+	"int3\n\t"
 	".size vmcode, . - vmcode\n\t"
 	"end_vmcode:\n\t"
 	".code32\n\t"
@@ -119,7 +124,8 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-	vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
+	vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[],
+	vmcode_umip_str[], vmcode_umip_sldt[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
 		printf("[FAIL]\tAll the results of SIDT should be the same.\n");
 	else
 		printf("[PASS]\tAll the results from SIDT are identical.\n");
+
+	sethandler(SIGILL, sighandler, 0);
+	do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0,
+		"STR instruction");
+	clearhandler(SIGILL);
+
+	sethandler(SIGILL, sighandler, 0);
+	do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0,
+		"SLDT instruction");
+	clearhandler(SIGILL);
 }
 
 int main(void)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions
  2017-05-05 18:17 ` [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
@ 2017-05-06  9:04   ` Paolo Bonzini
  2017-05-11  3:23     ` Ricardo Neri
  2017-06-07 18:24   ` Borislav Petkov
  1 sibling, 1 reply; 81+ messages in thread
From: Paolo Bonzini @ 2017-05-06  9:04 UTC (permalink / raw)
  To: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck



On 05/05/2017 20:17, Ricardo Neri wrote:
> User-Mode Instruction Prevention is a security feature present in new
> Intel processors that, when set, prevents the execution of a subset of
> instructions if such instructions are executed in user mode (CPL > 0).
> Attempting to execute such instructions causes a general protection
> exception.
> 
> The subset of instructions comprises:
> 
>  * SGDT - Store Global Descriptor Table
>  * SIDT - Store Interrupt Descriptor Table
>  * SLDT - Store Local Descriptor Table
>  * SMSW - Store Machine Status Word
>  * STR  - Store Task Register
> 
> This feature is also added to the list of disabled-features to allow
> a cleaner handling of build-time configuration.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Chen Yucong <slaoub@gmail.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang Z. Li <liang.z.li@intel.com>
> Cc: Alexandre Julliard <julliard@winehq.org>
> Cc: Stas Sergeev <stsp@list.ru>
> Cc: x86@kernel.org
> Cc: linux-msdos@vger.kernel.org
> 
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Would it be possible to have this patch in a topic branch for KVM's
consumption?

Thanks,

Paolo

> ---
>  arch/x86/include/asm/cpufeatures.h          | 1 +
>  arch/x86/include/asm/disabled-features.h    | 8 +++++++-
>  arch/x86/include/uapi/asm/processor-flags.h | 2 ++
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 2701e5f..f1d61d2 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -289,6 +289,7 @@
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
>  #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
> +#define X86_FEATURE_UMIP	(16*32+ 2) /* User Mode Instruction Protection */
>  #define X86_FEATURE_PKU		(16*32+ 3) /* Protection Keys for Userspace */
>  #define X86_FEATURE_OSPKE	(16*32+ 4) /* OS Protection Keys Enable */
>  #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 5dff775..7adaef7 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -16,6 +16,12 @@
>  # define DISABLE_MPX	(1<<(X86_FEATURE_MPX & 31))
>  #endif
>  
> +#ifdef CONFIG_X86_INTEL_UMIP
> +# define DISABLE_UMIP	0
> +#else
> +# define DISABLE_UMIP	(1<<(X86_FEATURE_UMIP & 31))
> +#endif
> +
>  #ifdef CONFIG_X86_64
>  # define DISABLE_VME		(1<<(X86_FEATURE_VME & 31))
>  # define DISABLE_K6_MTRR	(1<<(X86_FEATURE_K6_MTRR & 31))
> @@ -61,7 +67,7 @@
>  #define DISABLED_MASK13	0
>  #define DISABLED_MASK14	0
>  #define DISABLED_MASK15	0
> -#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57)
> +#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
>  #define DISABLED_MASK17	0
>  #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
>  
> diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
> index 567de50..d2c2af8 100644
> --- a/arch/x86/include/uapi/asm/processor-flags.h
> +++ b/arch/x86/include/uapi/asm/processor-flags.h
> @@ -104,6 +104,8 @@
>  #define X86_CR4_OSFXSR		_BITUL(X86_CR4_OSFXSR_BIT)
>  #define X86_CR4_OSXMMEXCPT_BIT	10 /* enable unmasked SSE exceptions */
>  #define X86_CR4_OSXMMEXCPT	_BITUL(X86_CR4_OSXMMEXCPT_BIT)
> +#define X86_CR4_UMIP_BIT	11 /* enable UMIP support */
> +#define X86_CR4_UMIP		_BITUL(X86_CR4_UMIP_BIT)
>  #define X86_CR4_VMXE_BIT	13 /* enable VMX virtualization */
>  #define X86_CR4_VMXE		_BITUL(X86_CR4_VMXE_BIT)
>  #define X86_CR4_SMXE_BIT	14 /* enable safer mode (TXT) */
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions
  2017-05-06  9:04   ` Paolo Bonzini
@ 2017-05-11  3:23     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-11  3:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Tony Luck

On Sat, 2017-05-06 at 11:04 +0200, Paolo Bonzini wrote:
> 
> 
> On 05/05/2017 20:17, Ricardo Neri wrote:
> > User-Mode Instruction Prevention is a security feature present in
> new
> > Intel processors that, when set, prevents the execution of a subset
> of
> > instructions if such instructions are executed in user mode (CPL >
> 0).
> > Attempting to execute such instructions causes a general protection
> > exception.
> > 
> > The subset of instructions comprises:
> > 
> >  * SGDT - Store Global Descriptor Table
> >  * SIDT - Store Interrupt Descriptor Table
> >  * SLDT - Store Local Descriptor Table
> >  * SMSW - Store Machine Status Word
> >  * STR  - Store Task Register
> > 
> > This feature is also added to the list of disabled-features to allow
> > a cleaner handling of build-time configuration.
> > 
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Brian Gerst <brgerst@gmail.com>
> > Cc: Chen Yucong <slaoub@gmail.com>
> > Cc: Chris Metcalf <cmetcalf@mellanox.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: Jiri Slaby <jslaby@suse.cz>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang Z. Li <liang.z.li@intel.com>
> > Cc: Alexandre Julliard <julliard@winehq.org>
> > Cc: Stas Sergeev <stsp@list.ru>
> > Cc: x86@kernel.org
> > Cc: linux-msdos@vger.kernel.org
> > 
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> 
> Would it be possible to have this patch in a topic branch for KVM's
> consumption?
> 
I have put a branch here with this single patch:

https://github.com/ricardon/tip.git rneri/umip_for_kvm

This is based on Linux v4.11. Please let me know if this works for your
or you'd prefer it to be based on a different branch/commit/repo.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention
  2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (25 preceding siblings ...)
  2017-05-05 18:17 ` [PATCH v7 26/26] selftests/x86: Add tests for instruction str and sldt Ricardo Neri
@ 2017-05-17 18:42 ` Ricardo Neri
  2017-05-27  3:49   ` Neri, Ricardo
  26 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-17 18:42 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: H. Peter Anvin, Andy Lutomirski, Borislav Petkov, Peter Zijlstra,
	Andrew Morton, Brian Gerst, Chris Metcalf, Dave Hansen,
	Paolo Bonzini, Liang Z Li, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel

Hi Ingo, Thomas,

On Fri, 2017-05-05 at 11:16 -0700, Ricardo Neri wrote:
> This is v7 of this series. The six previous submissions can be found
> here [1], here [2], here[3], here[4], here[5] and here[6]. This
> version
> addresses the comments received in v6 plus improvements of the
> handling
> of exceptions unrelated to UMIP as well as corner cases in
> virtual-8086
> mode. Please see details in the change log.

Since there have been no more comments in the version and if this series
look good to you, could this be considered to be merged into the tip
tree?

The only remaining item is a cleanup patch that Borislav Petkov
suggested [1]. I could work on it incrementally on top of this series.

Thanks and BR,
Ricardo

[1]. https://lkml.org/lkml/2017/5/4/244

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 01/26] ptrace,x86: Make user_64bit_mode() available to 32-bit builds
  2017-05-05 18:16 ` [PATCH v7 01/26] ptrace,x86: Make user_64bit_mode() available to 32-bit builds Ricardo Neri
@ 2017-05-21 14:19   ` Borislav Petkov
  0 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-05-21 14:19 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Adam Buchbinder,
	Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:16:59AM -0700, Ricardo Neri wrote:
> In its current form, user_64bit_mode() can only be used when CONFIG_X86_64
> is selected. This implies that code built with CONFIG_X86_64=n cannot use
> it. If a piece of code needs to be built for both CONFIG_X86_64=y and
> CONFIG_X86_64=n and wants to use this function, it needs to wrap it in
> an #ifdef/#endif; potentially, in multiple places.
> 
> This can be easily avoided with a single #ifdef/#endif pair within
> user_64bit_mode() itself.
> 
> Suggested-by: Borislav Petkov <bp@suse.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/ptrace.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h
  2017-05-05 18:17 ` [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
@ 2017-05-21 14:23   ` Borislav Petkov
  2017-05-27  3:40     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-21 14:23 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Kirill A. Shutemov,
	Josh Poimboeuf

On Fri, May 05, 2017 at 11:17:00AM -0700, Ricardo Neri wrote:
> Up to this point, only fault.c used the definitions of the page fault error
> codes. Thus, it made sense to keep them within such file. Other portions of
> code might be interested in those definitions too. For instance, the User-
> Mode Instruction Prevention emulation code will use such definitions to
> emulate a page fault when it is unable to successfully copy the results
> of the emulated instructions to user space.
> 
> While relocating the error code enumeration, the prefix X86_ is used to
> make it consistent with the rest of the definitions in traps.h. Of course,
> code using the enumeration had to be updated as well. No functional changes
> were performed.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: x86@kernel.org
> Reviewed-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/traps.h | 18 +++++++++
>  arch/x86/mm/fault.c          | 88 +++++++++++++++++---------------------------
>  2 files changed, 52 insertions(+), 54 deletions(-)

...

> @@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  	 * space check, thus avoiding the deadlock:
>  	 */
>  	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
> -		if ((error_code & PF_USER) == 0 &&
> +		if ((error_code & X86_PF_USER) == 0 &&

	if (!(error_code & X86_PF_USER))

With that fixed:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 04/26] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b
  2017-05-05 18:17 ` [PATCH v7 04/26] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b Ricardo Neri
@ 2017-05-24 13:37   ` Borislav Petkov
  2017-05-27  3:36     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-24 13:37 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Adam Buchbinder,
	Colin Ian King, Lorenzo Stoakes, Qiaowei Ren, Nathan Howard,
	Adan Hawthorn, Joe Perches

On Fri, May 05, 2017 at 11:17:02AM -0700, Ricardo Neri wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when ModRM.mod !=11b and
> ModRM.rm = 100b indexed register-indirect addressing is used. In other
> words, a SIB byte follows the ModRM byte. In the specific case of
> SIB.index = 100b, the scale*index portion of the computation of the
> effective address is null. To signal callers of this particular situation,
> get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
> error when decoding the SIB byte).
> 
> An example of this situation can be the following instruction:
> 
>    8b 4c 23 80       mov -0x80(%rbx,%riz,1),%rcx
>    ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
>    SIB:              0x23 [scale:0b][index:100b][base:11b]
>    Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)
> 
> The %riz 'register' indicates a null index.
> 
> In long mode, a REX prefix may be used. When a REX prefix is present,
> REX.X adds a fourth bit to the register selection of SIB.index. This gives
> the ability to refer to all the 16 general purpose registers. When REX.X is
> 1b and SIB.index is 100b, the index is indicated in %r12. In our example,
> this would look like:
> 
>    42 8b 4c 23 80    mov -0x80(%rbx,%r12,1),%rcx
>    REX:              0x42 [W:0b][R:0b][X:1b][B:0b]
>    ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
>    SIB:              0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
>    Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)
> 
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Nathan Howard <liverlint@gmail.com>
> Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/mm/mpx.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index ebdead8..7397b81 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -110,6 +110,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  		regno = X86_SIB_INDEX(insn->sib.value);
>  		if (X86_REX_X(insn->rex_prefix.value))
>  			regno += 8;

<--- newline.

> +		/*
> +		 * If ModRM.mod !=3 and SIB.index (regno=4) the scale*index
> +		 * portion of the address computation is null. This is
> +		 * true only if REX.X is 0. In such a case, the SIB index
> +		 * is used in the address computation.
> +		 */
> +		if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
> +			return -EDOM;
>  		break;
>  
>  	case REG_TYPE_BASE:
> @@ -159,11 +167,19 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  				goto out_err;
>  
>  			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> -			if (indx_offset < 0)

<--- newline.

> +			/*
> +			 * A negative offset generally means a error, except

							     an

> +			 * -EDOM, which means that the contents of the register
> +			 * should not be used as index.
> +			 */
> +			if (indx_offset == -EDOM)
> +				indx = 0;
> +			else if (indx_offset < 0)
>  				goto out_err;
> +			else
> +				indx = regs_get_register(regs, indx_offset);
>  
>  			base = regs_get_register(regs, base_offset);
> -			indx = regs_get_register(regs, indx_offset);
>  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
>  		} else {
>  			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> -- 
> 2.9.3
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 04/26] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b
  2017-05-24 13:37   ` Borislav Petkov
@ 2017-05-27  3:36     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-05-27  3:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Adam Buchbinder,
	Colin Ian King, Lorenzo Stoakes, Qiaowei Ren, Nathan Howard,
	Adan Hawthorn, Joe Perches

On Wed, 2017-05-24 at 15:37 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:02AM -0700, Ricardo Neri wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when ModRM.mod !=11b and
> > ModRM.rm = 100b indexed register-indirect addressing is used. In other
> > words, a SIB byte follows the ModRM byte. In the specific case of
> > SIB.index = 100b, the scale*index portion of the computation of the
> > effective address is null. To signal callers of this particular situation,
> > get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
> > error when decoding the SIB byte).
> > 
> > An example of this situation can be the following instruction:
> > 
> >    8b 4c 23 80       mov -0x80(%rbx,%riz,1),%rcx
> >    ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
> >    SIB:              0x23 [scale:0b][index:100b][base:11b]
> >    Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)
> > 
> > The %riz 'register' indicates a null index.
> > 
> > In long mode, a REX prefix may be used. When a REX prefix is present,
> > REX.X adds a fourth bit to the register selection of SIB.index. This gives
> > the ability to refer to all the 16 general purpose registers. When REX.X is
> > 1b and SIB.index is 100b, the index is indicated in %r12. In our example,
> > this would look like:
> > 
> >    42 8b 4c 23 80    mov -0x80(%rbx,%r12,1),%rcx
> >    REX:              0x42 [W:0b][R:0b][X:1b][B:0b]
> >    ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
> >    SIB:              0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
> >    Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)
> > 
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Nathan Howard <liverlint@gmail.com>
> > Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> > Cc: Joe Perches <joe@perches.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/mm/mpx.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> > index ebdead8..7397b81 100644
> > --- a/arch/x86/mm/mpx.c
> > +++ b/arch/x86/mm/mpx.c
> > @@ -110,6 +110,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  		regno = X86_SIB_INDEX(insn->sib.value);
> >  		if (X86_REX_X(insn->rex_prefix.value))
> >  			regno += 8;
> 
> <--- newline.
I will add a new line here.

> 
> > +		/*
> > +		 * If ModRM.mod !=3 and SIB.index (regno=4) the scale*index
> > +		 * portion of the address computation is null. This is
> > +		 * true only if REX.X is 0. In such a case, the SIB index
> > +		 * is used in the address computation.
> > +		 */
> > +		if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
> > +			return -EDOM;
> >  		break;
> >  
> >  	case REG_TYPE_BASE:
> > @@ -159,11 +167,19 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  				goto out_err;
> >  
> >  			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> > -			if (indx_offset < 0)
> 
> <--- newline.
I will add a new line here.

> 
> > +			/*
> > +			 * A negative offset generally means a error, except
> 
> 							     an
> 
> > +			 * -EDOM, which means that the contents of the register
> > +			 * should not be used as index.
> > +			 */
> > +			if (indx_offset == -EDOM)
> > +				indx = 0;
> > +			else if (indx_offset < 0)
> >  				goto out_err;
> > +			else
> > +				indx = regs_get_register(regs, indx_offset);
> >  
> >  			base = regs_get_register(regs, base_offset);
> > -			indx = regs_get_register(regs, indx_offset);
> >  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
> >  		} else {
> >  			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > -- 
> > 2.9.3
> > 
> 
> -- 
> Regards/Gruss,
>     Boris.

Thanks for reviewing!

BR,
Ricardo
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h
  2017-05-21 14:23   ` Borislav Petkov
@ 2017-05-27  3:40     ` Ricardo Neri
  2017-05-27 10:13       ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-05-27  3:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Kirill A. Shutemov,
	Josh Poimboeuf

On Sun, 2017-05-21 at 16:23 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:00AM -0700, Ricardo Neri wrote:
> > Up to this point, only fault.c used the definitions of the page fault error
> > codes. Thus, it made sense to keep them within such file. Other portions of
> > code might be interested in those definitions too. For instance, the User-
> > Mode Instruction Prevention emulation code will use such definitions to
> > emulate a page fault when it is unable to successfully copy the results
> > of the emulated instructions to user space.
> > 
> > While relocating the error code enumeration, the prefix X86_ is used to
> > make it consistent with the rest of the definitions in traps.h. Of course,
> > code using the enumeration had to be updated as well. No functional changes
> > were performed.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: x86@kernel.org
> > Reviewed-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/include/asm/traps.h | 18 +++++++++
> >  arch/x86/mm/fault.c          | 88 +++++++++++++++++---------------------------
> >  2 files changed, 52 insertions(+), 54 deletions(-)
> 
> ...
> 
> > @@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
> >  	 * space check, thus avoiding the deadlock:
> >  	 */
> >  	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
> > -		if ((error_code & PF_USER) == 0 &&
> > +		if ((error_code & X86_PF_USER) == 0 &&
> 
> 	if (!(error_code & X86_PF_USER))

This change was initially intended to only rename the error codes,
without functional changes. Would making change be considered a change
in functionality? The behavior would be preserved, though.

Thanks and BR,
Ricardo


> 
> With that fixed:
> 
> Reviewed-by: Borislav Petkov <bp@suse.de>

Thank you for your review!

BR,
Ricardo
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention
  2017-05-17 18:42 ` [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
@ 2017-05-27  3:49   ` Neri, Ricardo
  0 siblings, 0 replies; 81+ messages in thread
From: Neri, Ricardo @ 2017-05-27  3:49 UTC (permalink / raw)
  To: tglx, mingo
  Cc: corbet, liang.z.li, peterz, linux-kernel, bp, akpm, ray.huang,
	linux-msdos, Yu, Fenghua, dave.hansen, vbabka, x86, mst, hpa,
	wine-devel, brgerst, mhiramat, shuah, Shankar, Ravi V, pbonzini,
	Gortmaker, Paul (Wind River),
	jslaby, stsp, luto, cmetcalf, julliard, slaoub

Hi again Ingo, Thomas,
On Wed, 2017-05-17 at 11:42 -0700, Ricardo Neri wrote:
> Hi Ingo, Thomas,
> 
> On Fri, 2017-05-05 at 11:16 -0700, Ricardo Neri wrote:
> > This is v7 of this series. The six previous submissions can be found
> > here [1], here [2], here[3], here[4], here[5] and here[6]. This
> > version
> > addresses the comments received in v6 plus improvements of the
> > handling
> > of exceptions unrelated to UMIP as well as corner cases in
> > virtual-8086
> > mode. Please see details in the change log.
> 
> Since there have been no more comments in the version and if this series
> look good to you, could this be considered to be merged into the tip
> tree?
> 
> The only remaining item is a cleanup patch that Borislav Petkov
> suggested [1]. I could work on it incrementally on top of this series.

More items have accumulated from the latest review from Borislav Petkov.
These items are preparatory changes and are mostly minimal and would
impact functionality. There have been no comments on other parts of the
implementation. If I spin a v8 of the series, would it be considered
sufficiently mature to be included in v4.13?

Thanks and BR,
Ricardo


> 
> Thanks and BR,
> Ricardo
> 
> [1]. https://lkml.org/lkml/2017/5/4/244
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h
  2017-05-27  3:40     ` Ricardo Neri
@ 2017-05-27 10:13       ` Borislav Petkov
  2017-06-01  3:09         ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-27 10:13 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Kirill A. Shutemov, Josh Poimboeuf

On Fri, May 26, 2017 at 08:40:26PM -0700, Ricardo Neri wrote:
> This change was initially intended to only rename the error codes,
> without functional changes. Would making change be considered a change
> in functionality?

How?

The before-and-after asm should be the identical.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 05/26] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0
  2017-05-05 18:17 ` [PATCH v7 05/26] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0 Ricardo Neri
@ 2017-05-29 13:07   ` Borislav Petkov
  2017-06-06  6:08     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-29 13:07 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Stas Sergeev, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, x86, linux-msdos, wine-devel,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

On Fri, May 05, 2017 at 11:17:03AM -0700, Ricardo Neri wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when a SIB byte is used and the
> base of the SIB byte points is base = 101b and the mod part
> of the ModRM byte is zero, the base port on the effective address
> computation is null. In this case, a 32-bit displacement follows the SIB
> byte. This is obtained when the instruction decoder parses the operands.
> 
> To signal this scenario, a -EDOM error is returned to indicate callers that
> they should ignore the base.
> 
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Nathan Howard <liverlint@gmail.com>
> Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/mm/mpx.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index 7397b81..30aef92 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -122,6 +122,15 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  
>  	case REG_TYPE_BASE:
>  		regno = X86_SIB_BASE(insn->sib.value);
> +		/*
> +		 * If ModRM.mod is 0 and SIB.base == 5, the base of the
> +		 * register-indirect addressing is 0. In this case, a
> +		 * 32-bit displacement is expected in this case; the
> +		 * instruction decoder finds such displacement for us.

That last sentence reads funny. Just say:

"In this case, a 32-bit displacement follows the SIB byte."

> +		 */
> +		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
> +			return -EDOM;
> +
>  		if (X86_REX_B(insn->rex_prefix.value))
>  			regno += 8;
>  		break;

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type
  2017-05-05 18:17 ` [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type Ricardo Neri
@ 2017-05-29 16:37   ` Borislav Petkov
  2017-06-06  6:06     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-29 16:37 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:05AM -0700, Ricardo Neri wrote:
> We are not in a critical failure path. The invalid register type is caused
> when trying to decode invalid instruction bytes from a user-space program.
> Thus, simply print an error message. To prevent this warning from being
> abused from user space programs, use the rate-limited variant of printk.
> 
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index e746a6f..182e2ae 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -5,6 +5,7 @@
>   */
>  #include <linux/kernel.h>
>  #include <linux/string.h>
> +#include <linux/ratelimit.h>
>  #include <asm/inat.h>
>  #include <asm/insn.h>
>  #include <asm/insn-eval.h>
> @@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  		break;
>  
>  	default:
> -		pr_err("invalid register type");
> -		BUG();
> -		break;
> +		printk_ratelimited(KERN_ERR "insn-eval: x86: invalid register type");

You can use pr_err_ratelimited() and define "insn-eval" with pr_fmt.
Look for examples in the tree.

Btw, "insn-eval" is perhaps not the right name - since we're building
an instruction decoder, maybe it should be called "insn-dec" or so. I'm
looking at those other arch/x86/lib/insn.c, arch/x86/include/asm/inat.h
things and how they're starting to morph into one decoding facility,
AFAICT.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 08/26] x86/insn-eval: Add a utility function to get register offsets
  2017-05-05 18:17 ` [PATCH v7 08/26] x86/insn-eval: Add a utility function to get register offsets Ricardo Neri
@ 2017-05-29 17:16   ` Borislav Petkov
  2017-06-06  6:02     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-29 17:16 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:06AM -0700, Ricardo Neri wrote:
> The function get_reg_offset() returns the offset to the register the
> argument specifies as indicated in an enumeration of type offset. Callers
> of this function would need the definition of such enumeration. This is
> not needed. Instead, add helper functions for this purpose. These functions
> are useful in cases when, for instance, the caller needs to decide whether
> the operand is a register or a memory location by looking at the rm part
> of the ModRM byte. As of now, this is the only helper function that is
> needed.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/insn-eval.h |  1 +
>  arch/x86/lib/insn-eval.c         | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> index 5cab1b1..7e8c963 100644
> --- a/arch/x86/include/asm/insn-eval.h
> +++ b/arch/x86/include/asm/insn-eval.h
> @@ -12,5 +12,6 @@
>  #include <asm/ptrace.h>
>  
>  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> +int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
>  
>  #endif /* _ASM_X86_INSN_EVAL_H */
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 182e2ae..8b16761 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  	return regoff[regno];
>  }
>  
> +/**
> + * insn_get_reg_offset_modrm_rm() - Obtain register in r/m part of ModRM byte

That name needs to be synced with the function name below.

> + * @insn:	Instruction structure containing the ModRM byte
> + * @regs:	Structure with register values as seen when entering kernel mode
> + *
> + * Return: The register indicated by the r/m part of the ModRM byte. The
> + * register is obtained as an offset from the base of pt_regs. In specific
> + * cases, the returned value can be -EDOM to indicate that the particular value
> + * of ModRM does not refer to a register and shall be ignored.
> + */
> +int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
	^^^^^^^^^^^^^^^^^^^^

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions
  2017-05-05 18:17 ` [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions Ricardo Neri
@ 2017-05-29 21:48   ` Borislav Petkov
  2017-06-06  6:01     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-29 21:48 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:07AM -0700, Ricardo Neri wrote:
> String instructions are special because in protected mode, the linear
> address is always obtained via the ES segment register in operands that
> use the (E)DI register.

 ... and DS for rSI.

If we're going to account for both operands of string instructions with
two operands.

Btw, LODS and OUTS use only DS:rSI as a source operand. So we have to be
careful with the generalization here. So if ES:rDI is the only seg. reg
we want, then we don't need to look at those insns... (we assume DS by
default).

...

> +/**
> + * is_string_instruction - Determine if instruction is a string instruction
> + * @insn:	Instruction structure containing the opcode
> + *
> + * Return: true if the instruction, determined by the opcode, is any of the
> + * string instructions as defined in the Intel Software Development manual.
> + * False otherwise.
> + */
> +static bool is_string_instruction(struct insn *insn)
> +{
> +	insn_get_opcode(insn);
> +
> +	/* all string instructions have a 1-byte opcode */
> +	if (insn->opcode.nbytes != 1)
> +		return false;
> +
> +	switch (insn->opcode.bytes[0]) {
> +	case INSB:
> +		/* fall through */
> +	case INSW_INSD:
> +		/* fall through */
> +	case OUTSB:
> +		/* fall through */
> +	case OUTSW_OUTSD:
> +		/* fall through */
> +	case MOVSB:
> +		/* fall through */
> +	case MOVSW_MOVSD:
> +		/* fall through */
> +	case CMPSB:
> +		/* fall through */
> +	case CMPSW_CMPSD:
> +		/* fall through */
> +	case STOSB:
> +		/* fall through */
> +	case STOSW_STOSD:
> +		/* fall through */
> +	case LODSB:
> +		/* fall through */
> +	case LODSW_LODSD:
> +		/* fall through */
> +	case SCASB:
> +		/* fall through */

That "fall through" for every opcode is just too much. Also, you can use
the regularity of the x86 opcode space and do:

	case 0x6c ... 0x6f:	/* INS/OUTS */
	case 0xa4 ... 0xa7:	/* MOVS/CMPS */
	case 0xaa ... 0xaf:	/* STOS/LODS/SCAS */
		return true;
	default:
		return false;
}

And voila, there's your compact is_string_insn() function! :^)

(Modulo the exact list, as I mentioned above).

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector
  2017-05-05 18:17 ` [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
@ 2017-05-30 10:35   ` Borislav Petkov
  2017-06-15 18:37     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-30 10:35 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:08AM -0700, Ricardo Neri wrote:
> When computing a linear address and segmentation is used, we need to know
> the base address of the segment involved in the computation. In most of
> the cases, the segment base address will be zero as in USER_DS/USER32_DS.
> However, it may be possible that a user space program defines its own
> segments via a local descriptor table. In such a case, the segment base
> address may not be zero .Thus, the segment base address is needed to
> calculate correctly the linear address.
> 
> The segment selector to be used when computing a linear address is
> determined by either any of segment override prefixes in the
> instruction or inferred from the registers involved in the computation of
> the effective address; in that order. Also, there are cases when the
> overrides shall be ignored (code segments are always selected by the CS
> segment register; string instructions always use the ES segment register
> along with the EDI register).
> 
> For clarity, this process can be split into two steps: resolving the
> relevant segment register to use and, once known, read its value to
> obtain the segment selector.
> 
> The method to obtain the segment selector depends on several factors. In
> 32-bit builds, segment selectors are saved into the pt_regs structure
> when switching to kernel mode. The same is also true for virtual-8086
> mode. In 64-bit builds, segmentation is mostly ignored, except when
> running a program in 32-bit legacy mode. In this case, CS and SS can be
> obtained from pt_regs. DS, ES, FS and GS can be read directly from
> the respective segment registers.
> 
> Lastly, the only two segment registers that are not ignored in long mode
> are FS and GS. In these two cases, base addresses are obtained from the
> respective MSRs.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 256 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 256 insertions(+)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 1634762..0a496f4 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -9,6 +9,7 @@
>  #include <asm/inat.h>
>  #include <asm/insn.h>
>  #include <asm/insn-eval.h>
> +#include <asm/vm86.h>
>  
>  enum reg_type {
>  	REG_TYPE_RM = 0,
> @@ -33,6 +34,17 @@ enum string_instruction {
>  	SCASW_SCASD	= 0xaf,
>  };
>  
> +enum segment_register {
> +	SEG_REG_INVAL = -1,
> +	SEG_REG_IGNORE = 0,
> +	SEG_REG_CS = 0x23,
> +	SEG_REG_SS = 0x36,
> +	SEG_REG_DS = 0x3e,
> +	SEG_REG_ES = 0x26,
> +	SEG_REG_FS = 0x64,
> +	SEG_REG_GS = 0x65,
> +};

Yuck, didn't we talk about this already?

Those are segment override prefixes so call them as such.

#define SEG_OVR_PFX_CS	0x23
#define SEG_OVR_PFX_SS	0x36
...

and we already have those!

arch/x86/include/asm/inat.h:
...
#define INAT_PFX_CS     5       /* 0x2E */
#define INAT_PFX_DS     6       /* 0x3E */
#define INAT_PFX_ES     7       /* 0x26 */
#define INAT_PFX_FS     8       /* 0x64 */
#define INAT_PFX_GS     9       /* 0x65 */
#define INAT_PFX_SS     10      /* 0x36 */

well, kinda, they're numbers there and not the actual prefix values.

And then there's:

arch/x86/kernel/uprobes.c::is_prefix_bad() which looks at some of those.

Please add your defines to inat.h and make that function is_prefix_bad()
use them instead of naked numbers. We need to pay attention to all those
different things needing to look at insn opcodes and not let them go
unwieldy by each defining and duplicating stuff.

>  /**
>   * is_string_instruction - Determine if instruction is a string instruction
>   * @insn:	Instruction structure containing the opcode
> @@ -83,6 +95,250 @@ static bool is_string_instruction(struct insn *insn)
>  	}
>  }
>  
> +/**
> + * resolve_seg_register() - obtain segment register

That function is still returning the segment override prefix and we use
*that* to determine the segment register.

> + * @insn:	Instruction structure with segment override prefixes
> + * @regs:	Structure with register values as seen when entering kernel mode
> + * @regoff:	Operand offset, in pt_regs, used to deterimine segment register
> + *
> + * The segment register to which an effective address refers depends on
> + * a) whether segment override prefixes must be ignored: always use CS when
> + * the register is (R|E)IP; always use ES when operand register is (E)DI with
> + * string instructions as defined in the Intel documentation. b) If segment
> + * overrides prefixes are used in the instruction instruction prefixes. C) Use
> + * the default segment register associated with the operand register.
> + *
> + * The operand register, regoff, is represented as the offset from the base of
> + * pt_regs. Also, regoff can be -EDOM for cases in which registers are not
> + * used as operands (e.g., displacement-only memory addressing).
> + *
> + * This function returns the segment register as value from an enumeration
> + * as per the conditions described above. Please note that this function
> + * does not return the value in the segment register (i.e., the segment
> + * selector). The segment selector needs to be obtained using
> + * get_segment_selector() and passing the segment register resolved by
> + * this function.
> + *
> + * Return: Enumerated segment register to use, among CS, SS, DS, ES, FS, GS,
> + * ignore (in 64-bit mode as applicable), or -EINVAL in case of error.
> + */
> +static enum segment_register resolve_seg_register(struct insn *insn,
> +						  struct pt_regs *regs,
> +						  int regoff)
> +{
> +	int i;
> +	int sel_overrides = 0;
> +	int seg_register = SEG_REG_IGNORE;
> +
> +	if (!insn)
> +		return SEG_REG_INVAL;
> +
> +	/* First handle cases when segment override prefixes must be ignored */
> +	if (regoff == offsetof(struct pt_regs, ip)) {
> +		if (user_64bit_mode(regs))
> +			return SEG_REG_IGNORE;
> +		else
> +			return SEG_REG_CS;
> +		return SEG_REG_CS;

Simplify:

		if (user_64bit_mode(regs))
			return SEG_REG_IGNORE;

		return SEG_REG_CS;

> +	}
> +
> +	/*
> +	 * If the (E)DI register is used with string instructions, the ES
> +	 * segment register is always used.
> +	 */
> +	if ((regoff == offsetof(struct pt_regs, di)) &&
> +	    is_string_instruction(insn)) {
> +		if (user_64bit_mode(regs))
> +			return SEG_REG_IGNORE;
> +		else
> +			return SEG_REG_ES;
> +		return SEG_REG_CS;

What is that second return actually supposed to do?

> +	}
> +
> +	/* Then check if we have segment overrides prefixes*/

Missing space and fullstop: "... overrides prefixes. */"

> +	for (i = 0; i < insn->prefixes.nbytes; i++) {
> +		switch (insn->prefixes.bytes[i]) {
> +		case SEG_REG_CS:
> +			seg_register = SEG_REG_CS;
> +			sel_overrides++;
> +			break;
> +		case SEG_REG_SS:
> +			seg_register = SEG_REG_SS;
> +			sel_overrides++;
> +			break;
> +		case SEG_REG_DS:
> +			seg_register = SEG_REG_DS;
> +			sel_overrides++;
> +			break;
> +		case SEG_REG_ES:
> +			seg_register = SEG_REG_ES;
> +			sel_overrides++;
> +			break;
> +		case SEG_REG_FS:
> +			seg_register = SEG_REG_FS;
> +			sel_overrides++;
> +			break;
> +		case SEG_REG_GS:
> +			seg_register = SEG_REG_GS;
> +			sel_overrides++;
> +			break;
> +		default:
> +			return SEG_REG_INVAL;

So SEG_REG_NONE or so? It is not invalid if it is not a segment override
prefix.

> +	/*
> +	 * Having more than one segment override prefix leads to undefined
> +	 * behavior. If this is the case, return with error.
> +	 */
> +	if (sel_overrides > 1)
> +		return SEG_REG_INVAL;

Yuck, wrapping of -E value in a SEG_REG enum. Just return -EINVAL here
and make the function return an int, not that ugly enum.

And the return convention should be straight-forward: default segment if
no prefix or ignored, -EINVAL if error and the actual override prefix if
present.

Also, that test should be *after* the user_64bit_mode() because in long
mode, segment overrides get ignored. IOW, those three if-tests around here
should be combined into a single one, i.e., something like this:

	if (64-bit) {
		if (!FS || !GS)
			ignore
		else
			return seg_override_pfx;	<--- Yes, that variable should be called seg_override_pfx to denote what it is.
	} else if (sel_overrides > 1)
		-EINVAL
	else if (sel_overrides)
		return seg_override_pfx;

> +
> +	if (sel_overrides == 1) {
> +		/*
> +		 * If in long mode all segment registers but FS and GS are
> +		 * ignored.
> +		 */
> +		if (user_64bit_mode(regs) && !(seg_register == SEG_REG_FS ||
> +					       seg_register == SEG_REG_GS))
> +			return SEG_REG_IGNORE;
> +
> +		return seg_register;
> +	}
> +
> +	/* In long mode, all segment registers except FS and GS are ignored */
> +	if (user_64bit_mode(regs))
> +		return SEG_REG_IGNORE;
> +
> +	/*
> +	 * Lastly, if no segment overrides were found, determine the default
> +	 * segment register as described in the Intel documentation: SS for
> +	 * (E)SP or (E)BP. DS for all data references, AX, CX and DX are not
> +	 * valid register operands in 16-bit address encodings.
> +	 * -EDOM is reserved to identify for cases in which no register is used
> +	 * the default segment register (displacement-only addressing). The
> +	 * default segment register used in these cases is DS.
> +	 */
> +
> +	switch (regoff) {
> +	case offsetof(struct pt_regs, ax):
> +		/* fall through */
> +	case offsetof(struct pt_regs, cx):
> +		/* fall through */
> +	case offsetof(struct pt_regs, dx):
> +		if (insn && insn->addr_bytes == 2)
> +			return SEG_REG_INVAL;
> +	case offsetof(struct pt_regs, di):
> +		/* fall through */
> +	case -EDOM:
> +		/* fall through */
> +	case offsetof(struct pt_regs, bx):
> +		/* fall through */
> +	case offsetof(struct pt_regs, si):
> +		return SEG_REG_DS;
> +	case offsetof(struct pt_regs, bp):
> +		/* fall through */
> +	case offsetof(struct pt_regs, sp):
> +		return SEG_REG_SS;
> +	case offsetof(struct pt_regs, ip):
> +		return SEG_REG_CS;
> +	default:
> +		return SEG_REG_INVAL;
> +	}

So group all the fall through cases together so that you don't have this
dense block of code with "/* fall through */" on every other line.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 12/26] x86/insn-eval: Add utility functions to get segment descriptor base address and limit
  2017-05-05 18:17 ` [PATCH v7 12/26] x86/insn-eval: Add utility functions to get segment descriptor base address and limit Ricardo Neri
@ 2017-05-31 16:58   ` Borislav Petkov
  2017-06-03 17:23     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-05-31 16:58 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:10AM -0700, Ricardo Neri wrote:
> With segmentation, the base address of the segment descriptor is needed
> to compute a linear address. The segment descriptor used in the address
> computation depends on either any segment override prefixes in the
> instruction or the default segment determined by the registers involved
> in the address computation. Thus, both the instruction as well as the
> register (specified as the offset from the base of pt_regs) are given as
> inputs, along with a boolean variable to select between override and
> default.

...

> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index f46cb31..c77ed80 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -476,6 +476,133 @@ static struct desc_struct *get_desc(unsigned short sel)
>  }
>  
>  /**
> + * insn_get_seg_base() - Obtain base address of segment descriptor.
> + * @regs:	Structure with register values as seen when entering kernel mode
> + * @insn:	Instruction structure with selector override prefixes
> + * @regoff:	Operand offset, in pt_regs, of which the selector is needed
> + *
> + * Obtain the base address of the segment descriptor as indicated by either
> + * any segment override prefixes contained in insn or the default segment
> + * applicable to the register indicated by regoff. regoff is specified as the
> + * offset in bytes from the base of pt_regs.
> + *
> + * Return: In protected mode, base address of the segment. Zero in for long
> + * mode, except when FS or GS are used. In virtual-8086 mode, the segment
> + * selector shifted 4 positions to the right. -1L in case of
> + * error.
> + */
> +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> +				int regoff)
> +{
> +	struct desc_struct *desc;
> +	unsigned short sel;
> +	enum segment_register seg_reg;
> +
> +	seg_reg = resolve_seg_register(insn, regs, regoff);
> +	if (seg_reg == SEG_REG_INVAL)
> +		return -1L;
> +
> +	sel = get_segment_selector(regs, seg_reg);
> +	if ((short)sel < 0)

I guess it would be better if that function returned a signed short so
you don't have to cast it here. (You're casting it to an unsigned long
below anyway.)

> +		return -1L;
> +
> +	if (v8086_mode(regs))
> +		/*
> +		 * Base is simply the segment selector shifted 4
> +		 * positions to the right.
> +		 */
> +		return (unsigned long)(sel << 4);
> +

...

> +static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
> +				   int regoff)
> +{
> +	struct desc_struct *desc;
> +	unsigned short sel;
> +	unsigned long limit;
> +	enum segment_register seg_reg;
> +
> +	seg_reg = resolve_seg_register(insn, regs, regoff);
> +	if (seg_reg == SEG_REG_INVAL)
> +		return 0;
> +
> +	sel = get_segment_selector(regs, seg_reg);
> +	if ((short)sel < 0)

Ditto.

> +		return 0;
> +
> +	if (user_64bit_mode(regs) || v8086_mode(regs))
> +		return -1L;
> +
> +	if (!sel)
> +		return 0;
> +
> +	desc = get_desc(sel);
> +	if (!desc)
> +		return 0;
> +
> +	/*
> +	 * If the granularity bit is set, the limit is given in multiples
> +	 * of 4096. When the granularity bit is set, the least 12 significant

						     the 12 least significant bits

> +	 * bits are not tested when checking the segment limits. In practice,
> +	 * this means that the segment ends in (limit << 12) + 0xfff.
> +	 */
> +	limit = get_desc_limit(desc);
> +	if (desc->g)
> +		limit <<= 12 | 0x7;

That 0x7 doesn't look like 0xfff - it shifts limit by 15 instead. You
can simply write it like you mean it:

	limit = (limit << 12) + 0xfff;


-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h
  2017-05-27 10:13       ` Borislav Petkov
@ 2017-06-01  3:09         ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-01  3:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Kirill A. Shutemov, Josh Poimboeuf

On Sat, 2017-05-27 at 12:13 +0200, Borislav Petkov wrote:
> On Fri, May 26, 2017 at 08:40:26PM -0700, Ricardo Neri wrote:
> > This change was initially intended to only rename the error codes,
> > without functional changes. Would making change be considered a
> change
> > in functionality?
> 
> How?
> 
> The before-and-after asm should be the identical.

Yes but it reads differently. I just wanted to double check. I will make
this change, which keeps functionality but is written differently.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 12/26] x86/insn-eval: Add utility functions to get segment descriptor base address and limit
  2017-05-31 16:58   ` Borislav Petkov
@ 2017-06-03 17:23     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-03 17:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-05-31 at 18:58 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:10AM -0700, Ricardo Neri wrote:
> > With segmentation, the base address of the segment descriptor is needed
> > to compute a linear address. The segment descriptor used in the address
> > computation depends on either any segment override prefixes in the
> > instruction or the default segment determined by the registers involved
> > in the address computation. Thus, both the instruction as well as the
> > register (specified as the offset from the base of pt_regs) are given as
> > inputs, along with a boolean variable to select between override and
> > default.
> 
> ...
> 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index f46cb31..c77ed80 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -476,6 +476,133 @@ static struct desc_struct *get_desc(unsigned short sel)
> >  }
> >  
> >  /**
> > + * insn_get_seg_base() - Obtain base address of segment descriptor.
> > + * @regs:	Structure with register values as seen when entering kernel mode
> > + * @insn:	Instruction structure with selector override prefixes
> > + * @regoff:	Operand offset, in pt_regs, of which the selector is needed
> > + *
> > + * Obtain the base address of the segment descriptor as indicated by either
> > + * any segment override prefixes contained in insn or the default segment
> > + * applicable to the register indicated by regoff. regoff is specified as the
> > + * offset in bytes from the base of pt_regs.
> > + *
> > + * Return: In protected mode, base address of the segment. Zero in for long
> > + * mode, except when FS or GS are used. In virtual-8086 mode, the segment
> > + * selector shifted 4 positions to the right. -1L in case of
> > + * error.
> > + */
> > +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> > +				int regoff)
> > +{
> > +	struct desc_struct *desc;
> > +	unsigned short sel;
> > +	enum segment_register seg_reg;
> > +
> > +	seg_reg = resolve_seg_register(insn, regs, regoff);
> > +	if (seg_reg == SEG_REG_INVAL)
> > +		return -1L;
> > +
> > +	sel = get_segment_selector(regs, seg_reg);
> > +	if ((short)sel < 0)
> 
> I guess it would be better if that function returned a signed short so
> you don't have to cast it here. (You're casting it to an unsigned long
> below anyway.)

Yes, this make sense. I will make this change.
> 
> > +		return -1L;
> > +
> > +	if (v8086_mode(regs))
> > +		/*
> > +		 * Base is simply the segment selector shifted 4
> > +		 * positions to the right.
> > +		 */
> > +		return (unsigned long)(sel << 4);
> > +
> 
> ...
> 
> > +static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
> > +				   int regoff)
> > +{
> > +	struct desc_struct *desc;
> > +	unsigned short sel;
> > +	unsigned long limit;
> > +	enum segment_register seg_reg;
> > +
> > +	seg_reg = resolve_seg_register(insn, regs, regoff);
> > +	if (seg_reg == SEG_REG_INVAL)
> > +		return 0;
> > +
> > +	sel = get_segment_selector(regs, seg_reg);
> > +	if ((short)sel < 0)
> 
> Ditto.

Here as well.

> 
> > +		return 0;
> > +
> > +	if (user_64bit_mode(regs) || v8086_mode(regs))
> > +		return -1L;
> > +
> > +	if (!sel)
> > +		return 0;
> > +
> > +	desc = get_desc(sel);
> > +	if (!desc)
> > +		return 0;
> > +
> > +	/*
> > +	 * If the granularity bit is set, the limit is given in multiples
> > +	 * of 4096. When the granularity bit is set, the least 12 significant
> 
> 						     the 12 least significant bits
> 
> > +	 * bits are not tested when checking the segment limits. In practice,
> > +	 * this means that the segment ends in (limit << 12) + 0xfff.
> > +	 */
> > +	limit = get_desc_limit(desc);
> > +	if (desc->g)
> > +		limit <<= 12 | 0x7;
> 
> That 0x7 doesn't look like 0xfff - it shifts limit by 15 instead. You
> can simply write it like you mean it:
> 
> 	limit = (limit << 12) + 0xfff;

You are right, this wrong. I will implement as you mention.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions
  2017-05-29 21:48   ` Borislav Petkov
@ 2017-06-06  6:01     ` Ricardo Neri
  2017-06-06 12:04       ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-06-06  6:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Mon, 2017-05-29 at 23:48 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:07AM -0700, Ricardo Neri wrote:
> > String instructions are special because in protected mode, the linear
> > address is always obtained via the ES segment register in operands that
> > use the (E)DI register.
> 
>  ... and DS for rSI.

Right, I omitted this in the commit message.
> 
> If we're going to account for both operands of string instructions with
> two operands.
> 
> Btw, LODS and OUTS use only DS:rSI as a source operand. So we have to be
> careful with the generalization here. So if ES:rDI is the only seg. reg
> we want, then we don't need to look at those insns... (we assume DS by
> default).

My intention with this function is to write a function that does only
one thing: identify string instructions, irrespective of the operands
they use. A separate function, resolve_seg_register, will have the logic
to decide what to segment register to use based on the registers used as
operands, whether we are looking at a string instruction, whether we
have segment override prefixes and whether such overrides should be
ignored.

If I was to leave out string instructions from this function it should
be renamed as is_string_instruction_non_lods_outs. In my opinion this
separation makes the code more clear and I would end up having logic to
decide which segment register to use in two places. Does it makes sense
to you?

> 
> ...
> 
> > +/**
> > + * is_string_instruction - Determine if instruction is a string instruction
> > + * @insn:	Instruction structure containing the opcode
> > + *
> > + * Return: true if the instruction, determined by the opcode, is any of the
> > + * string instructions as defined in the Intel Software Development manual.
> > + * False otherwise.
> > + */
> > +static bool is_string_instruction(struct insn *insn)
> > +{
> > +	insn_get_opcode(insn);
> > +
> > +	/* all string instructions have a 1-byte opcode */
> > +	if (insn->opcode.nbytes != 1)
> > +		return false;
> > +
> > +	switch (insn->opcode.bytes[0]) {
> > +	case INSB:
> > +		/* fall through */
> > +	case INSW_INSD:
> > +		/* fall through */
> > +	case OUTSB:
> > +		/* fall through */
> > +	case OUTSW_OUTSD:
> > +		/* fall through */
> > +	case MOVSB:
> > +		/* fall through */
> > +	case MOVSW_MOVSD:
> > +		/* fall through */
> > +	case CMPSB:
> > +		/* fall through */
> > +	case CMPSW_CMPSD:
> > +		/* fall through */
> > +	case STOSB:
> > +		/* fall through */
> > +	case STOSW_STOSD:
> > +		/* fall through */
> > +	case LODSB:
> > +		/* fall through */
> > +	case LODSW_LODSD:
> > +		/* fall through */
> > +	case SCASB:
> > +		/* fall through */
> 
> That "fall through" for every opcode is just too much. Also, you can use
> the regularity of the x86 opcode space and do:
> 
> 	case 0x6c ... 0x6f:	/* INS/OUTS */
> 	case 0xa4 ... 0xa7:	/* MOVS/CMPS */
> 	case 0xaa ... 0xaf:	/* STOS/LODS/SCAS */
> 		return true;
> 	default:
> 		return false;
> }
> 
> And voila, there's your compact is_string_insn() function! :^)

Thanks for the suggestion! It looks really nice. I will implement
accordingly.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 08/26] x86/insn-eval: Add a utility function to get register offsets
  2017-05-29 17:16   ` Borislav Petkov
@ 2017-06-06  6:02     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-06  6:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Mon, 2017-05-29 at 19:16 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:06AM -0700, Ricardo Neri wrote:
> > The function get_reg_offset() returns the offset to the register the
> > argument specifies as indicated in an enumeration of type offset. Callers
> > of this function would need the definition of such enumeration. This is
> > not needed. Instead, add helper functions for this purpose. These functions
> > are useful in cases when, for instance, the caller needs to decide whether
> > the operand is a register or a memory location by looking at the rm part
> > of the ModRM byte. As of now, this is the only helper function that is
> > needed.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/include/asm/insn-eval.h |  1 +
> >  arch/x86/lib/insn-eval.c         | 15 +++++++++++++++
> >  2 files changed, 16 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> > index 5cab1b1..7e8c963 100644
> > --- a/arch/x86/include/asm/insn-eval.h
> > +++ b/arch/x86/include/asm/insn-eval.h
> > @@ -12,5 +12,6 @@
> >  #include <asm/ptrace.h>
> >  
> >  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> > +int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
> >  
> >  #endif /* _ASM_X86_INSN_EVAL_H */
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 182e2ae..8b16761 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  	return regoff[regno];
> >  }
> >  
> > +/**
> > + * insn_get_reg_offset_modrm_rm() - Obtain register in r/m part of ModRM byte
> 
> That name needs to be synced with the function name below.

Ugh! I missed this. I will update accordingly. Thanks for the detailed
review.

BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type
  2017-05-29 16:37   ` Borislav Petkov
@ 2017-06-06  6:06     ` Ricardo Neri
  2017-06-06 11:58       ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-06-06  6:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Mon, 2017-05-29 at 18:37 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:05AM -0700, Ricardo Neri wrote:
> > We are not in a critical failure path. The invalid register type is caused
> > when trying to decode invalid instruction bytes from a user-space program.
> > Thus, simply print an error message. To prevent this warning from being
> > abused from user space programs, use the rate-limited variant of printk.
> > 
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index e746a6f..182e2ae 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -5,6 +5,7 @@
> >   */
> >  #include <linux/kernel.h>
> >  #include <linux/string.h>
> > +#include <linux/ratelimit.h>
> >  #include <asm/inat.h>
> >  #include <asm/insn.h>
> >  #include <asm/insn-eval.h>
> > @@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  		break;
> >  
> >  	default:
> > -		pr_err("invalid register type");
> > -		BUG();
> > -		break;
> > +		printk_ratelimited(KERN_ERR "insn-eval: x86: invalid register type");
> 
> You can use pr_err_ratelimited() and define "insn-eval" with pr_fmt.
> Look for examples in the tree.

Will do. I have looked at the examples.
> 
> Btw, "insn-eval" is perhaps not the right name - since we're building
> an instruction decoder, maybe it should be called "insn-dec" or so. I'm
> looking at those other arch/x86/lib/insn.c, arch/x86/include/asm/inat.h
> things and how they're starting to morph into one decoding facility,
> AFAICT.

I agree that insn-eval reads somewhat funny. I did not want to go with
insn-dec.c as insn.c, in my opinion, already decodes the instruction
(i.e., it finds prefixes, opcodes, ModRM, SIB and displacement bytes).
In insn-eval.c I simply take those decoded parameters and evaluate them
to obtain the values they contain (e.g., a specific memory location).
Perhaps, insn-resolve.c could be a better name? Or maybe isnn-operands?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 05/26] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0
  2017-05-29 13:07   ` Borislav Petkov
@ 2017-06-06  6:08     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-06  6:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Stas Sergeev, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, x86, linux-msdos, wine-devel,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

On Mon, 2017-05-29 at 15:07 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:03AM -0700, Ricardo Neri wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when a SIB byte is used and the
> > base of the SIB byte points is base = 101b and the mod part
> > of the ModRM byte is zero, the base port on the effective address
> > computation is null. In this case, a 32-bit displacement follows the SIB
> > byte. This is obtained when the instruction decoder parses the operands.
> > 
> > To signal this scenario, a -EDOM error is returned to indicate callers that
> > they should ignore the base.
> > 
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Nathan Howard <liverlint@gmail.com>
> > Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> > Cc: Joe Perches <joe@perches.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/mm/mpx.c | 27 ++++++++++++++++++++-------
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> > index 7397b81..30aef92 100644
> > --- a/arch/x86/mm/mpx.c
> > +++ b/arch/x86/mm/mpx.c
> > @@ -122,6 +122,15 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  
> >  	case REG_TYPE_BASE:
> >  		regno = X86_SIB_BASE(insn->sib.value);
> > +		/*
> > +		 * If ModRM.mod is 0 and SIB.base == 5, the base of the
> > +		 * register-indirect addressing is 0. In this case, a
> > +		 * 32-bit displacement is expected in this case; the
> > +		 * instruction decoder finds such displacement for us.
> 
> That last sentence reads funny. Just say:
> 
> "In this case, a 32-bit displacement follows the SIB byte."

Agreed. I will update the comment to make more sense.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type
  2017-06-06  6:06     ` Ricardo Neri
@ 2017-06-06 11:58       ` Borislav Petkov
  2017-06-07  0:28         ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-06 11:58 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Mon, Jun 05, 2017 at 11:06:58PM -0700, Ricardo Neri wrote:
> I agree that insn-eval reads somewhat funny. I did not want to go with
> insn-dec.c as insn.c, in my opinion, already decodes the instruction
> (i.e., it finds prefixes, opcodes, ModRM, SIB and displacement bytes).
> In insn-eval.c I simply take those decoded parameters and evaluate them
> to obtain the values they contain (e.g., a specific memory location).
> Perhaps, insn-resolve.c could be a better name? Or maybe isnn-operands?

So actually I'm gravitating towards calling all that instruction
"massaging" code with a single prefix to denote this comes from the insn
decoder/handler/whatever...

I.e.,

	"insn-decoder: x86: invalid register type"

or

	"inat: x86: invalid register type"

or something to that effect.

I mean, If we're going to grow our own - as we do, apparently - maybe it
all should be a separate entity with its proper name.

Hmm.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions
  2017-06-06  6:01     ` Ricardo Neri
@ 2017-06-06 12:04       ` Borislav Petkov
  0 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-06-06 12:04 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Mon, Jun 05, 2017 at 11:01:21PM -0700, Ricardo Neri wrote:
> If I was to leave out string instructions from this function it should
> be renamed as is_string_instruction_non_lods_outs. In my opinion this
> separation makes the code more clear and I would end up having logic to
> decide which segment register to use in two places. Does it makes sense
> to you?

Ok, sure.

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type
  2017-06-06 11:58       ` Borislav Petkov
@ 2017-06-07  0:28         ` Ricardo Neri
  2017-06-07 12:21           ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-06-07  0:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, 2017-06-06 at 13:58 +0200, Borislav Petkov wrote:
> On Mon, Jun 05, 2017 at 11:06:58PM -0700, Ricardo Neri wrote:
> > I agree that insn-eval reads somewhat funny. I did not want to go with
> > insn-dec.c as insn.c, in my opinion, already decodes the instruction
> > (i.e., it finds prefixes, opcodes, ModRM, SIB and displacement bytes).
> > In insn-eval.c I simply take those decoded parameters and evaluate them
> > to obtain the values they contain (e.g., a specific memory location).
> > Perhaps, insn-resolve.c could be a better name? Or maybe isnn-operands?
> 
> So actually I'm gravitating towards calling all that instruction
> "massaging" code with a single prefix to denote this comes from the insn
> decoder/handler/whatever...
> 
> I.e.,
> 
> 	"insn-decoder: x86: invalid register type"
> 
> or
> 
> 	"inat: x86: invalid register type"
> 
> or something to that effect.
> 
> I mean, If we're going to grow our own - as we do, apparently - maybe it
> all should be a separate entity with its proper name.

I see. You were more concerned about the naming of the coding artifacts
(e.g., function names, error prints, etc) than the actual filenames. I
think I have aligned with the function naming of insn.c in all the
functions that are exposed via header by using the inns_ prefix. For
static functions I don't use that prefix. Perhaps I can use the __
prefix as insn.c does.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type
  2017-06-07  0:28         ` Ricardo Neri
@ 2017-06-07 12:21           ` Borislav Petkov
  0 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-06-07 12:21 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Jun 06, 2017 at 05:28:52PM -0700, Ricardo Neri wrote:
> I see. You were more concerned about the naming of the coding artifacts
> (e.g., function names, error prints, etc) than the actual filenames.

Well, I'm not sure here. We could either have a generalized prefix or
put the function name in there - __func__ - for easier debuggability,
i.e., find the origin of the error message faster.

But I'm sensing that we're already well inside the bikeshed so let's not
change anything now. :)

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment
  2017-05-05 18:17 ` [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment Ricardo Neri
@ 2017-06-07 12:59   ` Borislav Petkov
  2017-06-15 19:24     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-07 12:59 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:11AM -0700, Ricardo Neri wrote:
> This function returns the default values of the address and operand sizes
> as specified in the segment descriptor. This information is determined
> from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
> 32-bit legacy modes. For virtual-8086 mode, the default address and
> operand sizes are always 2 bytes.
> 
> The D bit is only meaningful for code segments. Thus, these functions
> always use the code segment selector contained in regs.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/insn-eval.h |  6 ++++
>  arch/x86/lib/insn-eval.c         | 65 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
> 
> diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> index 7f3c7fe..9ed1c88 100644
> --- a/arch/x86/include/asm/insn-eval.h
> +++ b/arch/x86/include/asm/insn-eval.h
> @@ -11,9 +11,15 @@
>  #include <linux/err.h>
>  #include <asm/ptrace.h>
>  
> +struct insn_code_seg_defaults {

A whole struct for a function which gets called only once?

Bah, that's a bit too much, if you ask me.

So you're returning two small unsigned integers - i.e., you can just as
well return a single u8 and put address and operand sizes in there:

	ret = oper_sz | addr_sz << 4;

No need for special structs for that.

> +	unsigned char address_bytes;
> +	unsigned char operand_bytes;
> +};
> +
>  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
>  int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
>  unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
>  				int regoff);
> +struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs *regs);
>  
>  #endif /* _ASM_X86_INSN_EVAL_H */
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index c77ed80..693e5a8 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -603,6 +603,71 @@ static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
>  }
>  
>  /**
> + * insn_get_code_seg_defaults() - Obtain code segment default parameters
> + * @regs:	Structure with register values as seen when entering kernel mode
> + *
> + * Obtain the default parameters of the code segment: address and operand sizes.
> + * The code segment is obtained from the selector contained in the CS register
> + * in regs. In protected mode, the default address is determined by inspecting
> + * the L and D bits of the segment descriptor. In virtual-8086 mode, the default
> + * is always two bytes for both address and operand sizes.
> + *
> + * Return: A populated insn_code_seg_defaults structure on success. The
> + * structure contains only zeros on failure.

s/failure/error/

> + */
> +struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs *regs)
> +{
> +	struct desc_struct *desc;
> +	struct insn_code_seg_defaults defs;
> +	unsigned short sel;
> +	/*
> +	 * The most significant byte of AR_TYPE_MASK determines whether a
> +	 * segment contains data or code.
> +	 */
> +	unsigned int type_mask = AR_TYPE_MASK & (1 << 11);
> +
> +	memset(&defs, 0, sizeof(defs));
> +
> +	if (v8086_mode(regs)) {
> +		defs.address_bytes = 2;
> +		defs.operand_bytes = 2;
> +		return defs;
> +	}
> +
> +	sel = (unsigned short)regs->cs;
> +
> +	desc = get_desc(sel);
> +	if (!desc)
> +		return defs;
> +
> +	/* if data segment, return */
> +	if (!(desc->b & type_mask))
> +		return defs;

So you can simplify that into:

	/* A code segment? */
	if (!(desc->b & BIT(11)))
		return defs;

and remove that type_mask thing.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5
  2017-05-05 18:17 ` [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5 Ricardo Neri
@ 2017-06-07 13:15   ` Borislav Petkov
  2017-06-15 19:36     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-07 13:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:12AM -0700, Ricardo Neri wrote:
> Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when ModRM.mod is zero and
> ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means
> that none of the registers are used in the computation of the effective
> address. A return value of -EDOM signals callers that they should not use
> the value of registers when computing the effective address for the
> instruction.
> 
> In IA-32e 64-bit mode (long mode), the effective address is given by the
> 32-bit displacement plus the value of RIP of the next instruction.
> In IA-32e compatibility mode (protected mode), only the displacement is
> used.
> 
> The instruction decoder takes care of obtaining the displacement.

...

> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 693e5a8..4f600de 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -379,6 +379,12 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  	switch (type) {
>  	case REG_TYPE_RM:
>  		regno = X86_MODRM_RM(insn->modrm.value);


<---- newline here.

> +		/*
> +		 * ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement
> +		 * follows the ModRM byte.
> +		 */
> +		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
> +			return -EDOM;
>  		if (X86_REX_B(insn->rex_prefix.value))
>  			regno += 8;
>  		break;
> @@ -730,9 +736,21 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
>  		} else {
>  			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> -			if (addr_offset < 0)

ditto.

> +			/*
> +			 * -EDOM means that we must ignore the address_offset.
> +			 * In such a case, in 64-bit mode the effective address
> +			 * relative to the RIP of the following instruction.
> +			 */
> +			if (addr_offset == -EDOM) {
> +				eff_addr = 0;
> +				if (user_64bit_mode(regs))
> +					eff_addr = (long)regs->ip +
> +						   insn->length;

Let that line stick out and write it balanced:

                        if (addr_offset == -EDOM) {
                                if (user_64bit_mode(regs))
                                        eff_addr = (long)regs->ip + insn->length;
                                else
                                        eff_addr = 0;

should be easier parseable this way.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-05-05 18:17 ` [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
@ 2017-06-07 15:48   ` Borislav Petkov
  2017-07-25 23:48     ` Ricardo Neri
  2017-06-07 15:49   ` Borislav Petkov
  1 sibling, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-07 15:48 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:14AM -0700, Ricardo Neri wrote:
> The 32-bit and 64-bit address encodings are identical. This means that we
> can use the same function in both cases. In order to reuse the function
> for 32-bit address encodings, we must sign-extend our 32-bit signed
> operands to 64-bit signed variables (only for 64-bit builds). To decide on
> whether sign extension is needed, we rely on the address size as given by
> the instruction structure.
> 
> Once the effective address has been computed, a special verification is
> needed for 32-bit processes. If running on a 64-bit kernel, such processes
> can address up to 4GB of memory. Hence, for instance, an effective
> address of 0xffff1234 would be misinterpreted as 0xffffffffffff1234 due to
> the sign extension mentioned above. For this reason, the 4 must be

Which 4?

> truncated to obtain the true effective address.
> 
> Lastly, before computing the linear address, we verify that the effective
> address is within the limits of the segment. The check is kept for long
> mode because in such a case the limit is set to -1L. This is the largest
> unsigned number possible. This is equivalent to a limit-less segment.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 99 ++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 88 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 1a5f5a6..c7c1239 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -688,6 +688,62 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
>  	return get_reg_offset(insn, regs, REG_TYPE_RM);
>  }
>  
> +/**
> + * _to_signed_long() - Cast an unsigned long into signed long
> + * @val		A 32-bit or 64-bit unsigned long
> + * @long_bytes	The number of bytes used to represent a long number
> + * @out		The casted signed long
> + *
> + * Return: A signed long of either 32 or 64 bits, as per the build configuration
> + * of the kernel.
> + */
> +static int _to_signed_long(unsigned long val, int long_bytes, long *out)
> +{
> +	if (!out)
> +		return -EINVAL;
> +
> +#ifdef CONFIG_X86_64
> +	if (long_bytes == 4) {
> +		/* higher bytes should all be zero */
> +		if (val & ~0xffffffff)
> +			return -EINVAL;
> +
> +		/* sign-extend to a 64-bit long */

So this is a 32-bit userspace on a 64-bit kernel, right?

If so, how can a memory offset be > 32-bits and we have to extend it to
a 64-bit long?!?

I *think* you want to say that you want to convert it to long so that
you can do the calculation in longs.

However!

If you're a 64-bit kernel running a 32-bit userspace, you need to do
the calculation in 32-bits only so that it overflows, as it would do
on 32-bit hardware. IOW, the clamping to 32-bits at the end is not
something you wanna do but actually let it wrap if it overflows.

Or am I missing something?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-05-05 18:17 ` [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
  2017-06-07 15:48   ` Borislav Petkov
@ 2017-06-07 15:49   ` Borislav Petkov
  2017-06-15 19:58     ` Ricardo Neri
  1 sibling, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-07 15:49 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:14AM -0700, Ricardo Neri wrote:
> @@ -697,18 +753,21 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  {
>  	unsigned long linear_addr, seg_base_addr, seg_limit;
>  	long eff_addr, base, indx;
> -	int addr_offset, base_offset, indx_offset;
> +	int addr_offset, base_offset, indx_offset, addr_bytes;
>  	insn_byte_t sib;
>  
>  	insn_get_modrm(insn);
>  	insn_get_sib(insn);
>  	sib = insn->sib.value;
> +	addr_bytes = insn->addr_bytes;
>  
>  	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
>  		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
>  		if (addr_offset < 0)
>  			goto out_err;
> -		eff_addr = regs_get_register(regs, addr_offset);
> +		eff_addr = get_mem_offset(regs, addr_offset, addr_bytes);
> +		if (eff_addr == -1L)
> +			goto out_err;
>  		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
>  		if (seg_base_addr == -1L)
>  			goto out_err;

This code here is too dense, it needs spacing for better readability.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings
  2017-05-05 18:17 ` [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings Ricardo Neri
@ 2017-06-07 16:28   ` Borislav Petkov
  2017-06-15 21:50     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-07 16:28 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, May 05, 2017 at 11:17:16AM -0700, Ricardo Neri wrote:
> Tasks running in virtual-8086 mode or in protected mode with code
> segment descriptors that specify 16-bit default address sizes via the
> D bit will use 16-bit addressing form encodings as described in the Intel
> 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
> 2.1.5. 16-bit addressing encodings differ in several ways from the
> 32-bit/64-bit addressing form encodings: ModRM.rm points to different
> registers and, in some cases, effective addresses are indicated by the
> addition of the value of two registers. Also, there is no support for SIB
> bytes. Thus, a separate function is needed to parse this form of
> addressing.
> 
> A couple of functions are introduced. get_reg_offset_16() obtains the
> offset from the base of pt_regs of the registers indicated by the ModRM
> byte of the address encoding. get_addr_ref_16() computes the linear
> address indicated by the instructions using the value of the registers
> given by ModRM as well as the base address of the segment.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 155 insertions(+)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 9822061..928a662 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -431,6 +431,73 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  }
>  
>  /**
> + * get_reg_offset_16 - Obtain offset of register indicated by instruction

Please end function names with parentheses.

> + * @insn:	Instruction structure containing ModRM and SiB bytes

s/SiB/SIB/g

> + * @regs:	Structure with register values as seen when entering kernel mode
> + * @offs1:	Offset of the first operand register
> + * @offs2:	Offset of the second opeand register, if applicable.
> + *
> + * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
> + * within insn. This function is to be used with 16-bit address encodings. The
> + * offs1 and offs2 will be written with the offset of the two registers
> + * indicated by the instruction. In cases where any of the registers is not
> + * referenced by the instruction, the value will be set to -EDOM.
> + *
> + * Return: 0 on success, -EINVAL on failure.
> + */
> +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
> +			     int *offs1, int *offs2)
> +{
> +	/* 16-bit addressing can use one or two registers */
> +	static const int regoff1[] = {
> +		offsetof(struct pt_regs, bx),
> +		offsetof(struct pt_regs, bx),
> +		offsetof(struct pt_regs, bp),
> +		offsetof(struct pt_regs, bp),
> +		offsetof(struct pt_regs, si),
> +		offsetof(struct pt_regs, di),
> +		offsetof(struct pt_regs, bp),
> +		offsetof(struct pt_regs, bx),
> +	};
> +
> +	static const int regoff2[] = {
> +		offsetof(struct pt_regs, si),
> +		offsetof(struct pt_regs, di),
> +		offsetof(struct pt_regs, si),
> +		offsetof(struct pt_regs, di),
> +		-EDOM,
> +		-EDOM,
> +		-EDOM,
> +		-EDOM,
> +	};

You mean "Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte" in
the SDM, right?

Please add a comment pointing to it here because it is not trivial to
map that code to the documentation.

> +
> +	if (!offs1 || !offs2)
> +		return -EINVAL;
> +
> +	/* operand is a register, use the generic function */
> +	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> +		*offs1 = insn_get_modrm_rm_off(insn, regs);
> +		*offs2 = -EDOM;
> +		return 0;
> +	}
> +
> +	*offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
> +	*offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
> +
> +	/*
> +	 * If no displacement is indicated in the mod part of the ModRM byte,

s/"no "//

> +	 * (mod part is 0) and the r/m part of the same byte is 6, no register
> +	 * is used caculate the operand address. An r/m part of 6 means that
> +	 * the second register offset is already invalid.
> +	 */
> +	if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
> +	    (X86_MODRM_RM(insn->modrm.value) == 6))
> +		*offs1 = -EDOM;
> +
> +	return 0;
> +}
> +
> +/**
>   * get_desc() - Obtain address of segment descriptor
>   * @sel:	Segment selector
>   *
> @@ -689,6 +756,94 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
>  }
>  
>  /**
> + * get_addr_ref_16() - Obtain the 16-bit address referred by instruction
> + * @insn:	Instruction structure containing ModRM byte and displacement
> + * @regs:	Structure with register values as seen when entering kernel mode
> + *
> + * This function is to be used with 16-bit address encodings. Obtain the memory
> + * address referred by the instruction's ModRM bytes and displacement. Also, the
> + * segment used as base is determined by either any segment override prefixes in
> + * insn or the default segment of the registers involved in the address
> + * computation. In protected mode, segment limits are enforced.
> + *
> + * Return: linear address referenced by instruction and registers on success.
> + * -1L on failure.
> + */
> +static void __user *get_addr_ref_16(struct insn *insn, struct pt_regs *regs)
> +{
> +	unsigned long linear_addr, seg_base_addr, seg_limit;
> +	short eff_addr, addr1 = 0, addr2 = 0;
> +	int addr_offset1, addr_offset2;
> +	int ret;
> +
> +	insn_get_modrm(insn);
> +	insn_get_displacement(insn);
> +
> +	/*
> +	 * If operand is a register, the layout is the same as in
> +	 * 32-bit and 64-bit addressing.
> +	 */
> +	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> +		addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM);
> +		if (addr_offset1 < 0)
> +			goto out_err;

<---- newline here.

> +		eff_addr = regs_get_register(regs, addr_offset1);
> +		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
> +		if (seg_base_addr == -1L)
> +			goto out_err;

ditto.

> +		seg_limit = get_seg_limit(regs, insn, addr_offset1);
> +	} else {
> +		ret = get_reg_offset_16(insn, regs, &addr_offset1,
> +					&addr_offset2);
> +		if (ret < 0)
> +			goto out_err;

ditto.

> +		/*
> +		 * Don't fail on invalid offset values. They might be invalid
> +		 * because they cannot be used for this particular value of
> +		 * the ModRM. Instead, use them in the computation only if
> +		 * they contain a valid value.
> +		 */
> +		if (addr_offset1 != -EDOM)
> +			addr1 = 0xffff & regs_get_register(regs, addr_offset1);
> +		if (addr_offset2 != -EDOM)
> +			addr2 = 0xffff & regs_get_register(regs, addr_offset2);
> +		eff_addr = addr1 + addr2;

ditto.

Space those codelines out, we want to be able to read that code again at
some point :-)))

> +		/*
> +		 * The first register is in the operand implies the SS or DS
> +		 * segment selectors, the second register in the operand can
> +		 * only imply DS. Thus, use the first register to obtain
> +		 * the segment selector.
> +		 */
> +		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
> +		if (seg_base_addr == -1L)
> +			goto out_err;
> +		seg_limit = get_seg_limit(regs, insn, addr_offset1);
> +
> +		eff_addr += (insn->displacement.value & 0xffff);
> +	}
> +
> +	linear_addr = (unsigned long)(eff_addr & 0xffff);
> +
> +	/*
> +	 * Make sure the effective address is within the limits of the
> +	 * segment. In long mode, the limit is -1L. Thus, the second part

Long mode in a 16-bit handling function?

> +	 * of the check always succeeds.
> +	 */
> +	if (linear_addr > seg_limit)
> +		goto out_err;
> +
> +	linear_addr += seg_base_addr;
> +
> +	/* Limit linear address to 20 bits */
> +	if (v8086_mode(regs))
> +		linear_addr &= 0xfffff;
> +
> +	return (void __user *)linear_addr;
> +out_err:
> +	return (void __user *)-1;
> +}
> +
> +/**
>   * _to_signed_long() - Cast an unsigned long into signed long
>   * @val		A 32-bit or 64-bit unsigned long
>   * @long_bytes	The number of bytes used to represent a long number
> -- 
> 2.9.3
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions
  2017-05-05 18:17 ` [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
  2017-05-06  9:04   ` Paolo Bonzini
@ 2017-06-07 18:24   ` Borislav Petkov
  1 sibling, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-06-07 18:24 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Fri, May 05, 2017 at 11:17:18AM -0700, Ricardo Neri wrote:
> User-Mode Instruction Prevention is a security feature present in new
> Intel processors that, when set, prevents the execution of a subset of
> instructions if such instructions are executed in user mode (CPL > 0).
> Attempting to execute such instructions causes a general protection
> exception.
> 
> The subset of instructions comprises:
> 
>  * SGDT - Store Global Descriptor Table
>  * SIDT - Store Interrupt Descriptor Table
>  * SLDT - Store Local Descriptor Table
>  * SMSW - Store Machine Status Word
>  * STR  - Store Task Register
> 
> This feature is also added to the list of disabled-features to allow
> a cleaner handling of build-time configuration.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Chen Yucong <slaoub@gmail.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang Z. Li <liang.z.li@intel.com>
> Cc: Alexandre Julliard <julliard@winehq.org>
> Cc: Stas Sergeev <stsp@list.ru>
> Cc: x86@kernel.org
> Cc: linux-msdos@vger.kernel.org
> 
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/cpufeatures.h          | 1 +
>  arch/x86/include/asm/disabled-features.h    | 8 +++++++-
>  arch/x86/include/uapi/asm/processor-flags.h | 2 ++
>  3 files changed, 10 insertions(+), 1 deletion(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 21/26] x86: Add emulation code for UMIP instructions
  2017-05-05 18:17 ` [PATCH v7 21/26] x86: Add emulation code for UMIP instructions Ricardo Neri
@ 2017-06-08 18:38   ` Borislav Petkov
  2017-06-17  1:34     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-08 18:38 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Fri, May 05, 2017 at 11:17:19AM -0700, Ricardo Neri wrote:
> The feature User-Mode Instruction Prevention present in recent Intel
> processor prevents a group of instructions from being executed with
> CPL > 0. Otherwise, a general protection fault is issued.

This is one of the best opening paragraphs of a commit message I've
read this year! This is how you open: short, succinct, to the point, no
marketing bullshit. Good!

> Rather than relaying this fault to the user space (in the form of a SIGSEGV
> signal), the instructions protected by UMIP can be emulated to provide
> dummy results. This allows to conserve the current kernel behavior and not
> reveal the system resources that UMIP intends to protect (the global
> descriptor and interrupt descriptor tables, the segment selectors of the
> local descriptor table and the task state and the machine status word).
> 
> This emulation is needed because certain applications (e.g., WineHQ and
> DOSEMU2) rely on this subset of instructions to function.
> 
> The instructions protected by UMIP can be split in two groups. Those who

s/who/which/

> return a kernel memory address (sgdt and sidt) and those who return a

ditto.

> value (sldt, str and smsw).
>
> For the instructions that return a kernel memory address, applications
> such as WineHQ rely on the result being located in the kernel memory space.
> The result is emulated as a hard-coded value that, lies close to the top
> of the kernel memory. The limit for the GDT and the IDT are set to zero.

Nice.

> Given that sldt and str are not used in common in programs supported by

You wanna say "in common programs" here? Or "not commonly used in programs" ?

> WineHQ and DOSEMU2, they are not emulated.
> 
> The instruction smsw is emulated to return the value that the register CR0
> has at boot time as set in the head_32.
> 
> Care is taken to appropriately emulate the results when segmentation is
> used. This is, rather than relying on USER_DS and USER_CS, the function

	"That is,... "

> insn_get_addr_ref() inspects the segment descriptor pointed by the
> registers in pt_regs. This ensures that we correctly obtain the segment
> base address and the address and operand sizes even if the user space
> application uses local descriptor table.

Btw, I could very well use all that nice explanation in umip.c too so
that the high-level behavior is documented.

> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Chen Yucong <slaoub@gmail.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang Z. Li <liang.z.li@intel.com>
> Cc: Alexandre Julliard <julliard@winehq.org>
> Cc: Stas Sergeev <stsp@list.ru>
> Cc: x86@kernel.org
> Cc: linux-msdos@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/umip.h |  15 +++
>  arch/x86/kernel/Makefile    |   1 +
>  arch/x86/kernel/umip.c      | 245 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 261 insertions(+)
>  create mode 100644 arch/x86/include/asm/umip.h
>  create mode 100644 arch/x86/kernel/umip.c
> 
> diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
> new file mode 100644
> index 0000000..077b236
> --- /dev/null
> +++ b/arch/x86/include/asm/umip.h
> @@ -0,0 +1,15 @@
> +#ifndef _ASM_X86_UMIP_H
> +#define _ASM_X86_UMIP_H
> +
> +#include <linux/types.h>
> +#include <asm/ptrace.h>
> +
> +#ifdef CONFIG_X86_INTEL_UMIP
> +bool fixup_umip_exception(struct pt_regs *regs);
> +#else
> +static inline bool fixup_umip_exception(struct pt_regs *regs)
> +{
> +	return false;
> +}

Let's save some header lines:

static inline bool fixup_umip_exception(struct pt_regs *regs) 	{ return false; }

those trunks take too much space as it is.

> +#endif  /* CONFIG_X86_INTEL_UMIP */
> +#endif  /* _ASM_X86_UMIP_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 4b99423..cc1b7cc 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
>  obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
>  obj-$(CONFIG_TRACING)			+= tracepoint.o
>  obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
> +obj-$(CONFIG_X86_INTEL_UMIP)		+= umip.o
>  
>  ifdef CONFIG_FRAME_POINTER
>  obj-y					+= unwind_frame.o
> diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> new file mode 100644
> index 0000000..c7c5795
> --- /dev/null
> +++ b/arch/x86/kernel/umip.c
> @@ -0,0 +1,245 @@
> +/*
> + * umip.c Emulation for instruction protected by the Intel User-Mode
> + * Instruction Prevention. The instructions are:
> + *    sgdt
> + *    sldt
> + *    sidt
> + *    str
> + *    smsw
> + *
> + * Copyright (c) 2017, Intel Corporation.
> + * Ricardo Neri <ricardo.neri@linux.intel.com>
> + */
> +
> +#include <linux/uaccess.h>
> +#include <asm/umip.h>
> +#include <asm/traps.h>
> +#include <asm/insn.h>
> +#include <asm/insn-eval.h>
> +#include <linux/ratelimit.h>
> +
> +/*
> + * == Base addresses of GDT and IDT
> + * Some applications to function rely finding the global descriptor table (GDT)

That formulation reads funny.

> + * and the interrupt descriptor table (IDT) in kernel memory.
> + * For x86_32, the selected values do not match any particular hole, but it
> + * suffices to provide a memory location within kernel memory.
> + *
> + * == CRO flags for SMSW
> + * Use the flags given when booting, as found in head_32.S
> + */
> +
> +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
> +		   X86_CR0_WP | X86_CR0_AM)

Why not pull those up in asm/processor-flags.h or so and share the
definition instead of duplicating it?

> +#define UMIP_DUMMY_GDT_BASE 0xfffe0000
> +#define UMIP_DUMMY_IDT_BASE 0xffff0000
> +
> +enum umip_insn {
> +	UMIP_SGDT = 0,	/* opcode 0f 01 ModR/M reg 0 */
> +	UMIP_SIDT,	/* opcode 0f 01 ModR/M reg 1 */
> +	UMIP_SLDT,	/* opcode 0f 00 ModR/M reg 0 */
> +	UMIP_SMSW,	/* opcode 0f 01 ModR/M reg 4 */
> +	UMIP_STR,	/* opcode 0f 00 ModR/M reg 1 */

Let's stick to a single spelling: ModRM.reg=0, etc.

Better yet, use the SDM format:

	UMIP_SGDT = 0,		/* 0F 01 /0 */
	UMIP_SIDT,		/* 0F 01 /1 */
	...

> +};
> +
> +/**
> + * __identify_insn() - Identify a UMIP-protected instruction
> + * @insn:	Instruction structure with opcode and ModRM byte.
> + *
> + * From the instruction opcode and the reg part of the ModRM byte, identify,
> + * if any, a UMIP-protected instruction.
> + *
> + * Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure.
> + */
> +static int __identify_insn(struct insn *insn)

static enum umip_insn __identify_insn(...

But frankly, that enum looks pointless to me - it is used locally only
and you can just as well use plain ints.

> +{
> +	/* By getting modrm we also get the opcode. */
> +	insn_get_modrm(insn);
> +
> +	/* All the instructions of interest start with 0x0f. */
> +	if (insn->opcode.bytes[0] != 0xf)
> +		return -EINVAL;
> +
> +	if (insn->opcode.bytes[1] == 0x1) {
> +		switch (X86_MODRM_REG(insn->modrm.value)) {
> +		case 0:
> +			return UMIP_SGDT;
> +		case 1:
> +			return UMIP_SIDT;
> +		case 4:
> +			return UMIP_SMSW;
> +		default:
> +			return -EINVAL;
> +		}
> +	}
> +	/* SLDT AND STR are not emulated */
> +	return -EINVAL;
> +}
> +
> +/**
> + * __emulate_umip_insn() - Emulate UMIP instructions with dummy values
> + * @insn:	Instruction structure with ModRM byte
> + * @umip_inst:	Instruction to emulate
> + * @data:	Buffer onto which the dummy values will be copied
> + * @data_size:	Size of the emulated result
> + *
> + * Emulate an instruction protected by UMIP. The result of the emulation
> + * is saved in the provided buffer. The size of the results depends on both
> + * the instruction and type of operand (register vs memory address). Thus,
> + * the size of the result needs to be updated.
> + *
> + * Result: 0 if success, -EINVAL on failure to emulate
> + */
> +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
> +			       unsigned char *data, int *data_size)
> +{
> +	unsigned long dummy_base_addr;
> +	unsigned short dummy_limit = 0;
> +	unsigned int dummy_value = 0;
> +
> +	switch (umip_inst) {
> +	/*
> +	 * These two instructions return the base address and limit of the
> +	 * global and interrupt descriptor table. The base address can be
> +	 * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand
> +	 * size is 16-bit the returned value of the base address is supposed
> +	 * to be a zero-extended 24-byte number. However, it seems that a
> +	 * 32-byte number is always returned in legacy protected mode
> +	 * irrespective of the operand size.
> +	 */
> +	case UMIP_SGDT:
> +		/* fall through */
> +	case UMIP_SIDT:
> +		if (umip_inst == UMIP_SGDT)
> +			dummy_base_addr = UMIP_DUMMY_GDT_BASE;
> +		else
> +			dummy_base_addr = UMIP_DUMMY_IDT_BASE;
> +		if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> +			/* SGDT and SIDT do not take register as argument. */

Comment above the if.

> +			return -EINVAL;
> +		}

So that check needs to go first, then the dummy_base_addr assignment.

> +
> +		memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
> +		memcpy(data, &dummy_limit, sizeof(dummy_limit));
> +		*data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);

Huh, that value will always be the same - why do you have a specific
variable? It could be a define, once for 32-bit and once for 64-bit.

> +		break;
> +	case UMIP_SMSW:
> +		/*
> +		 * Even though CR0_STATE contain 4 bytes, the number
> +		 * of bytes to be copied in the result buffer is determined
> +		 * by whether the operand is a register or a memory location.
> +		 */
> +		dummy_value = CR0_STATE;

Something's wrong here: how does that local, write-only variable have
any effect?

> +		/*
> +		 * These two instructions return a 16-bit value. We return
> +		 * all zeros. This is equivalent to a null descriptor for
> +		 * str and sldt.
> +		 */
> +		/* SLDT and STR are not emulated */
> +		/* fall through */
> +	case UMIP_SLDT:
> +		/* fall through */
> +	case UMIP_STR:
> +		/* fall through */
> +	default:
> +		return -EINVAL;

That switch-case has a majority of fall-throughs. So make it an if-else
instead.

> +	}
> +	return 0;
> +}
> +
> +/**
> + * fixup_umip_exception() - Fixup #GP faults caused by UMIP
> + * @regs:	Registers as saved when entering the #GP trap
> + *
> + * The instructions sgdt, sidt, str, smsw, sldt cause a general protection
> + * fault if with CPL > 0 (i.e., from user space). This function can be
> + * used to emulate the results of the aforementioned instructions with
> + * dummy values. Results are copied to user-space memory as indicated by
> + * the instruction pointed by EIP using the registers indicated in the
> + * instruction operands. This function also takes care of determining
> + * the address to which the results must be copied.
> + */
> +bool fixup_umip_exception(struct pt_regs *regs)
> +{
> +	struct insn insn;
> +	unsigned char buf[MAX_INSN_SIZE];
> +	/* 10 bytes is the maximum size of the result of UMIP instructions */
> +	unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

	unsigned char dummy_data[10] = { 0 };

One 0 should be enough :)

> +	unsigned long seg_base;
> +	int not_copied, nr_copied, reg_offset, dummy_data_size;
> +	void __user *uaddr;
> +	unsigned long *reg_addr;
> +	enum umip_insn umip_inst;
> +	struct insn_code_seg_defaults seg_defs;

Please sort function local variables declaration in a reverse christmas
tree order:

	<type> longest_variable_name;
	<type> shorter_var_name;
	<type> even_shorter;
	<type> i;

> +
> +	/*
> +	 * Use the segment base in case user space used a different code
> +	 * segment, either in protected (e.g., from an LDT) or virtual-8086
> +	 * modes. In most of the cases seg_base will be zero as in USER_CS.
> +	 */
> +	seg_base = insn_get_seg_base(regs, &insn,
> +				     offsetof(struct pt_regs, ip));

Oh boy, where's the error handling?! That can return -1L.

> +	not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),

-1L + regs->ip is then your pwnage.

> +				    sizeof(buf));

Just let them stick out.

> +	nr_copied = sizeof(buf) - not_copied;

<---- newline here.

> +	/*
> +	 * The copy_from_user above could have failed if user code is protected
			    ()

> +	 * by a memory protection key. Give up on emulation in such a case.
> +	 * Should we issue a page fault?

Why? AFAICT, you're in the #GP handler. Simply you return unhandled.

> +	 */
> +	if (!nr_copied)
> +		return false;
> +
> +	insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
> +
> +	/*
> +	 * Override the default operand and address sizes to what is specified
> +	 * in the code segment descriptor. The instruction decoder only sets
> +	 * the address size it to either 4 or 8 address bytes and does nothing
> +	 * for the operand bytes. This OK for most of the cases, but we could
> +	 * have special cases where, for instance, a 16-bit code segment
> +	 * descriptor is used.
> +	 * If there are overrides, the instruction decoder correctly updates
> +	 * these values, even for 16-bit defaults.
> +	 */
> +	seg_defs = insn_get_code_seg_defaults(regs);
> +	insn.addr_bytes = seg_defs.address_bytes;
> +	insn.opnd_bytes = seg_defs.operand_bytes;
> +
> +	if (!insn.addr_bytes || !insn.opnd_bytes)
> +		return false;
> +
> +	if (user_64bit_mode(regs))
> +		return false;
> +
> +	insn_get_length(&insn);
> +	if (nr_copied < insn.length)
> +		return false;
> +
> +	umip_inst = __identify_insn(&insn);
> +	/* Check if we found an instruction protected by UMIP */

Put comment above the function call.

> +	if (umip_inst < 0)
> +		return false;
> +
> +	if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
> +		return false;
> +
> +	/* If operand is a register, write directly to it */
> +	if (X86_MODRM_MOD(insn.modrm.value) == 3) {
> +		reg_offset = insn_get_modrm_rm_off(&insn, regs);

Grr, error handling!! That reg_offset can be -E<something>.

> +		reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
> +		memcpy(reg_addr, dummy_data, dummy_data_size);
> +	} else {
> +		uaddr = insn_get_addr_ref(&insn, regs);
> +		/* user address could not be determined, abort emulation */

That comment is kinda obvious. But yes, this has error handling.

> +		if ((unsigned long)uaddr == -1L)
> +			return false;
> +		nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
> +		if (nr_copied  > 0)
> +			return false;
> +	}
> +
> +	/* increase IP to let the program keep going */
> +	regs->ip += insn.length;
> +	return true;
> +}
> -- 
> 2.9.3
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 22/26] x86/umip: Force a page fault when unable to copy emulated result to user
  2017-05-05 18:17 ` [PATCH v7 22/26] x86/umip: Force a page fault when unable to copy emulated result to user Ricardo Neri
@ 2017-06-09 11:02   ` Borislav Petkov
  2017-07-25 23:50     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-09 11:02 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Fri, May 05, 2017 at 11:17:20AM -0700, Ricardo Neri wrote:
> fixup_umip_exception() will be called from do_general_protection. If the
								  ^
								  |
Please end function names with parentheses.		       ---+

> former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
> However, when emulation is successful but the emulated result cannot be
> copied to user space memory, it is more accurate to issue a SIGSEGV with
> SEGV_MAPERR with the offending address.
> A new function is inspired in

That reads funny.

> force_sig_info_fault is introduced to model the page fault.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Chen Yucong <slaoub@gmail.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang Z. Li <liang.z.li@intel.com>
> Cc: Alexandre Julliard <julliard@winehq.org>
> Cc: Stas Sergeev <stsp@list.ru>
> Cc: x86@kernel.org
> Cc: linux-msdos@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/kernel/umip.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> index c7c5795..ff7366a 100644
> --- a/arch/x86/kernel/umip.c
> +++ b/arch/x86/kernel/umip.c
> @@ -148,6 +148,41 @@ static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
>  }
>  
>  /**
> + * __force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
> + * @address:	Address that caused the signal
> + * @regs:	Register set containing the instruction pointer
> + *
> + * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
> + * intended to be used to provide a segmentation fault when the result of the
> + * UMIP emulation could not be copied to the user space memory.
> + *
> + * Return: none
> + */
> +static void __force_sig_info_umip_fault(void __user *address,
> +					struct pt_regs *regs)
> +{
> +	siginfo_t info;
> +	struct task_struct *tsk = current;
> +
> +	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {

Save an indentation level:

	if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
		return;

	printk...



> +		printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n",
> +				   tsk->comm, task_pid_nr(tsk), regs->ip,
> +				   regs->sp, X86_PF_USER | X86_PF_WRITE,
> +				   regs->ip);
> +	}
> +
> +	tsk->thread.cr2		= (unsigned long)address;
> +	tsk->thread.error_code	= X86_PF_USER | X86_PF_WRITE;
> +	tsk->thread.trap_nr	= X86_TRAP_PF;
> +
> +	info.si_signo	= SIGSEGV;
> +	info.si_errno	= 0;
> +	info.si_code	= SEGV_MAPERR;
> +	info.si_addr	= address;
> +	force_sig_info(SIGSEGV, &info, tsk);
> +}
> +
> +/**
>   * fixup_umip_exception() - Fixup #GP faults caused by UMIP
>   * @regs:	Registers as saved when entering the #GP trap
>   *
> @@ -235,8 +270,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
>  		if ((unsigned long)uaddr == -1L)
>  			return false;
>  		nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
> -		if (nr_copied  > 0)
> -			return false;
> +		if (nr_copied  > 0) {
> +			/*
> +			 * If copy fails, send a signal and tell caller that
> +			 * fault was fixed up

Pls end sentences in the comments with a fullstop.

> +			 */
> +			__force_sig_info_umip_fault(uaddr, regs);
> +			return true;
> +		}
>  	}
>  
>  	/* increase IP to let the program keep going */
> -- 
> 2.9.3
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 23/26] x86/traps: Fixup general protection faults caused by UMIP
  2017-05-05 18:17 ` [PATCH v7 23/26] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
@ 2017-06-09 13:02   ` Borislav Petkov
  2017-07-25 23:51     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-09 13:02 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Fri, May 05, 2017 at 11:17:21AM -0700, Ricardo Neri wrote:
> If the User-Mode Instruction Prevention CPU feature is available and
> enabled, a general protection fault will be issued if the instructions
> sgdt, sldt, sidt, str or smsw are executed from user-mode context
> (CPL > 0). If the fault was caused by any of the instructions protected
> by UMIP, fixup_umip_exception will emulate dummy results for these

Please end function names with parentheses.

> instructions. If emulation is successful, the result is passed to the
> user space program and no SIGSEGV signal is emitted.
> 
> Please note that fixup_umip_exception also caters for the case when
> the fault originated while running in virtual-8086 mode.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Chen Yucong <slaoub@gmail.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang Z. Li <liang.z.li@intel.com>
> Cc: Alexandre Julliard <julliard@winehq.org>
> Cc: Stas Sergeev <stsp@list.ru>
> Cc: x86@kernel.org
> Cc: linux-msdos@vger.kernel.org
> Reviewed-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/kernel/traps.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3995d3a..cec548d 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -65,6 +65,7 @@
>  #include <asm/trace/mpx.h>
>  #include <asm/mpx.h>
>  #include <asm/vm86.h>
> +#include <asm/umip.h>
>  
>  #ifdef CONFIG_X86_64
>  #include <asm/x86_init.h>
> @@ -526,6 +527,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
>  	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
>  	cond_local_irq_enable(regs);
>  

Almost definitely:

	if (static_cpu_has(X86_FEATURE_UMIP)) {
		if (...

> +	if (user_mode(regs) && fixup_umip_exception(regs))
> +		return;

We don't want to punish !UMIP machines.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention
  2017-05-05 18:17 ` [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
@ 2017-06-09 16:10   ` Borislav Petkov
  2017-07-26  0:44     ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-06-09 16:10 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Fri, May 05, 2017 at 11:17:22AM -0700, Ricardo Neri wrote:
> User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
> bit in %cr4.
> 
> It makes sense to enable UMIP at some point while booting, before user
> spaces come up. Like SMAP and SMEP, is not critical to have it enabled
> very early during boot. This is because UMIP is relevant only when there is
> a userspace to be protected from. Given the similarities in relevance, it
> makes sense to enable UMIP along with SMAP and SMEP.
> 
> UMIP is enabled by default. It can be disabled by adding clearcpuid=514
> to the kernel parameters.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Chen Yucong <slaoub@gmail.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang Z. Li <liang.z.li@intel.com>
> Cc: Alexandre Julliard <julliard@winehq.org>
> Cc: Stas Sergeev <stsp@list.ru>
> Cc: x86@kernel.org
> Cc: linux-msdos@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/Kconfig             | 10 ++++++++++
>  arch/x86/kernel/cpu/common.c | 16 +++++++++++++++-
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 702002b..1b1bbeb 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1745,6 +1745,16 @@ config X86_SMAP
>  
>  	  If unsure, say Y.
>  
> +config X86_INTEL_UMIP
> +	def_bool y

That's a bit too much. It makes sense on distro kernels but how many
machines out there actually have UMIP?

> +	depends on CPU_SUP_INTEL
> +	prompt "Intel User Mode Instruction Prevention" if EXPERT
> +	---help---
> +	  The User Mode Instruction Prevention (UMIP) is a security
> +	  feature in newer Intel processors. If enabled, a general
> +	  protection fault is issued if the instructions SGDT, SLDT,
> +	  SIDT, SMSW and STR are executed in user mode.
> +
>  config X86_INTEL_MPX
>  	prompt "Intel MPX (Memory Protection Extensions)"
>  	def_bool n
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 8ee3211..66ebded 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -311,6 +311,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
>  	}
>  }
>  
> +static __always_inline void setup_umip(struct cpuinfo_x86 *c)
> +{
> +	if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
> +	    cpu_has(c, X86_FEATURE_UMIP))

Hmm, so if UMIP is not build-time disabled, the cpu_feature_enabled()
will call static_cpu_has().

Looks like you want to call cpu_has() too because alternatives haven't
run yet and static_cpu_has() will reply wrong. Please state that in a
comment.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector
  2017-05-30 10:35   ` Borislav Petkov
@ 2017-06-15 18:37     ` Ricardo Neri
  2017-06-15 19:04       ` Ricardo Neri
  2017-06-19 15:37       ` Borislav Petkov
  0 siblings, 2 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-15 18:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, 2017-05-30 at 12:35 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:08AM -0700, Ricardo Neri wrote:
> > When computing a linear address and segmentation is used, we need to know
> > the base address of the segment involved in the computation. In most of
> > the cases, the segment base address will be zero as in USER_DS/USER32_DS.
> > However, it may be possible that a user space program defines its own
> > segments via a local descriptor table. In such a case, the segment base
> > address may not be zero .Thus, the segment base address is needed to
> > calculate correctly the linear address.
> > 
> > The segment selector to be used when computing a linear address is
> > determined by either any of segment override prefixes in the
> > instruction or inferred from the registers involved in the computation of
> > the effective address; in that order. Also, there are cases when the
> > overrides shall be ignored (code segments are always selected by the CS
> > segment register; string instructions always use the ES segment register
> > along with the EDI register).
> > 
> > For clarity, this process can be split into two steps: resolving the
> > relevant segment register to use and, once known, read its value to
> > obtain the segment selector.
> > 
> > The method to obtain the segment selector depends on several factors. In
> > 32-bit builds, segment selectors are saved into the pt_regs structure
> > when switching to kernel mode. The same is also true for virtual-8086
> > mode. In 64-bit builds, segmentation is mostly ignored, except when
> > running a program in 32-bit legacy mode. In this case, CS and SS can be
> > obtained from pt_regs. DS, ES, FS and GS can be read directly from
> > the respective segment registers.
> > 
> > Lastly, the only two segment registers that are not ignored in long mode
> > are FS and GS. In these two cases, base addresses are obtained from the
> > respective MSRs.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 256 +++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 256 insertions(+)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 1634762..0a496f4 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -9,6 +9,7 @@
> >  #include <asm/inat.h>
> >  #include <asm/insn.h>
> >  #include <asm/insn-eval.h>
> > +#include <asm/vm86.h>
> >  
> >  enum reg_type {
> >  	REG_TYPE_RM = 0,
> > @@ -33,6 +34,17 @@ enum string_instruction {
> >  	SCASW_SCASD	= 0xaf,
> >  };
> >  
> > +enum segment_register {
> > +	SEG_REG_INVAL = -1,
> > +	SEG_REG_IGNORE = 0,
> > +	SEG_REG_CS = 0x23,
> > +	SEG_REG_SS = 0x36,
> > +	SEG_REG_DS = 0x3e,
> > +	SEG_REG_ES = 0x26,
> > +	SEG_REG_FS = 0x64,
> > +	SEG_REG_GS = 0x65,
> > +};
> 
> Yuck, didn't we talk about this already?

I am sorry Borislav. I thought you agreed that I could use the values of
the segment override prefixes to identify the segment registers [1].
> 
> Those are segment override prefixes so call them as such.
> 
> #define SEG_OVR_PFX_CS	0x23
> #define SEG_OVR_PFX_SS	0x36
> ...
> 
> and we already have those!
> 
> arch/x86/include/asm/inat.h:
> ...
> #define INAT_PFX_CS     5       /* 0x2E */
> #define INAT_PFX_DS     6       /* 0x3E */
> #define INAT_PFX_ES     7       /* 0x26 */
> #define INAT_PFX_FS     8       /* 0x64 */
> #define INAT_PFX_GS     9       /* 0x65 */
> #define INAT_PFX_SS     10      /* 0x36 */
> 
> well, kinda, they're numbers there and not the actual prefix values.

These numbers can 'translated' to the actual value of the prefixes via
inat_get_opcode_attribute(). In my next version I am planning to use
these function and reuse the aforementioned definitions.

> 
> And then there's:
> 
> arch/x86/kernel/uprobes.c::is_prefix_bad() which looks at some of those.
> 
> Please add your defines to inat.h

Will do.

> and make that function is_prefix_bad()
> use them instead of naked numbers. We need to pay attention to all those
> different things needing to look at insn opcodes and not let them go
> unwieldy by each defining and duplicating stuff.

I have implemented this change and will be part of my next version.
> 
> >  /**
> >   * is_string_instruction - Determine if instruction is a string instruction
> >   * @insn:	Instruction structure containing the opcode
> > @@ -83,6 +95,250 @@ static bool is_string_instruction(struct insn *insn)
> >  	}
> >  }
> >  
> > +/**
> > + * resolve_seg_register() - obtain segment register
> 
> That function is still returning the segment override prefix and we use
> *that* to determine the segment register.

Once I add new definitions for the segment registers and reuse the
existing definitions of the segment override prefixes this problem will
be fixed.

> 
> > + * @insn:	Instruction structure with segment override prefixes
> > + * @regs:	Structure with register values as seen when entering kernel mode
> > + * @regoff:	Operand offset, in pt_regs, used to deterimine segment register
> > + *
> > + * The segment register to which an effective address refers depends on
> > + * a) whether segment override prefixes must be ignored: always use CS when
> > + * the register is (R|E)IP; always use ES when operand register is (E)DI with
> > + * string instructions as defined in the Intel documentation. b) If segment
> > + * overrides prefixes are used in the instruction instruction prefixes. C) Use
> > + * the default segment register associated with the operand register.
> > + *
> > + * The operand register, regoff, is represented as the offset from the base of
> > + * pt_regs. Also, regoff can be -EDOM for cases in which registers are not
> > + * used as operands (e.g., displacement-only memory addressing).
> > + *
> > + * This function returns the segment register as value from an enumeration
> > + * as per the conditions described above. Please note that this function
> > + * does not return the value in the segment register (i.e., the segment
> > + * selector). The segment selector needs to be obtained using
> > + * get_segment_selector() and passing the segment register resolved by
> > + * this function.
> > + *
> > + * Return: Enumerated segment register to use, among CS, SS, DS, ES, FS, GS,
> > + * ignore (in 64-bit mode as applicable), or -EINVAL in case of error.
> > + */
> > +static enum segment_register resolve_seg_register(struct insn *insn,
> > +						  struct pt_regs *regs,
> > +						  int regoff)
> > +{
> > +	int i;
> > +	int sel_overrides = 0;
> > +	int seg_register = SEG_REG_IGNORE;
> > +
> > +	if (!insn)
> > +		return SEG_REG_INVAL;
> > +
> > +	/* First handle cases when segment override prefixes must be ignored */
> > +	if (regoff == offsetof(struct pt_regs, ip)) {
> > +		if (user_64bit_mode(regs))
> > +			return SEG_REG_IGNORE;
> > +		else
> > +			return SEG_REG_CS;
> > +		return SEG_REG_CS;
> 
> Simplify:
> 
> 		if (user_64bit_mode(regs))
> 			return SEG_REG_IGNORE;
> 
> 		return SEG_REG_CS;

Will do.
> 
> > +	}
> > +
> > +	/*
> > +	 * If the (E)DI register is used with string instructions, the ES
> > +	 * segment register is always used.
> > +	 */
> > +	if ((regoff == offsetof(struct pt_regs, di)) &&
> > +	    is_string_instruction(insn)) {
> > +		if (user_64bit_mode(regs))
> > +			return SEG_REG_IGNORE;
> > +		else
> > +			return SEG_REG_ES;
> > +		return SEG_REG_CS;
> 
> What is that second return actually supposed to do?

This is not correct and I will remove it. Actually, will never run due
to the if/else above it. Thanks for noticing it.
> 
> > +	}
> > +
> > +	/* Then check if we have segment overrides prefixes*/
> 
> Missing space and fullstop: "... overrides prefixes. */"

Will fix.

> 
> > +	for (i = 0; i < insn->prefixes.nbytes; i++) {
> > +		switch (insn->prefixes.bytes[i]) {
> > +		case SEG_REG_CS:
> > +			seg_register = SEG_REG_CS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_SS:
> > +			seg_register = SEG_REG_SS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_DS:
> > +			seg_register = SEG_REG_DS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_ES:
> > +			seg_register = SEG_REG_ES;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_FS:
> > +			seg_register = SEG_REG_FS;
> > +			sel_overrides++;
> > +			break;
> > +		case SEG_REG_GS:
> > +			seg_register = SEG_REG_GS;
> > +			sel_overrides++;
> > +			break;
> > +		default:
> > +			return SEG_REG_INVAL;
> 
> So SEG_REG_NONE or so? It is not invalid if it is not a segment override
> prefix.

Right, we can have more prefixes. We should need a default action as we
are only looking for the segment override prefixes, as you mention.

> 
> > +	/*
> > +	 * Having more than one segment override prefix leads to undefined
> > +	 * behavior. If this is the case, return with error.
> > +	 */
> > +	if (sel_overrides > 1)
> > +		return SEG_REG_INVAL;
> 
> Yuck, wrapping of -E value in a SEG_REG enum. Just return -EINVAL here
> and make the function return an int, not that ugly enum.

Will do.
> 
> And the return convention should be straight-forward: default segment if
> no prefix or ignored, -EINVAL if error and the actual override prefix if
> present.

Wouldn't this be ending up mixing the actual segment register and
segment register overrides? I plan to have a function that parses the
segment override prefixes and returns SEG_REG_CS/DS/ES/FS/GS or
SEG_REG_IGNORE for long mode or SEG_REG_DEFAULT when the default segment
register needs to be used. A separate function will determine what such
default segment register is. Does this make sense?

> 
> Also, that test should be *after* the user_64bit_mode() because in long
> mode, segment overrides get ignored. IOW, those three if-tests around here
> should be combined into a single one, i.e., something like this:
> 
> 	if (64-bit) {
> 		if (!FS || !GS)
> 			ignore
> 		else
> 			return seg_override_pfx;	<--- Yes, that variable should be called seg_override_pfx to denote what it is.

Perhaps it can return what I have described above?

> 	} else if (sel_overrides > 1)
> 		-EINVAL
> 	else if (sel_overrides)
> 		return seg_override_pfx;
> 

Will re-do these tests are you mention.
> > +
> > +	if (sel_overrides == 1) {
> > +		/*
> > +		 * If in long mode all segment registers but FS and GS are
> > +		 * ignored.
> > +		 */
> > +		if (user_64bit_mode(regs) && !(seg_register == SEG_REG_FS ||
> > +					       seg_register == SEG_REG_GS))
> > +			return SEG_REG_IGNORE;
> > +
> > +		return seg_register;
> > +	}
> > +
> > +	/* In long mode, all segment registers except FS and GS are ignored */
> > +	if (user_64bit_mode(regs))
> > +		return SEG_REG_IGNORE;
> > +
> > +	/*
> > +	 * Lastly, if no segment overrides were found, determine the default
> > +	 * segment register as described in the Intel documentation: SS for
> > +	 * (E)SP or (E)BP. DS for all data references, AX, CX and DX are not
> > +	 * valid register operands in 16-bit address encodings.
> > +	 * -EDOM is reserved to identify for cases in which no register is used
> > +	 * the default segment register (displacement-only addressing). The
> > +	 * default segment register used in these cases is DS.
> > +	 */
> > +
> > +	switch (regoff) {
> > +	case offsetof(struct pt_regs, ax):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, cx):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, dx):
> > +		if (insn && insn->addr_bytes == 2)
> > +			return SEG_REG_INVAL;
> > +	case offsetof(struct pt_regs, di):
> > +		/* fall through */
> > +	case -EDOM:
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, bx):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, si):
> > +		return SEG_REG_DS;
> > +	case offsetof(struct pt_regs, bp):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, sp):
> > +		return SEG_REG_SS;
> > +	case offsetof(struct pt_regs, ip):
> > +		return SEG_REG_CS;
> > +	default:
> > +		return SEG_REG_INVAL;
> > +	}
> 
> So group all the fall through cases together so that you don't have this
> dense block of code with "/* fall through */" on every other line.

Will do.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector
  2017-06-15 18:37     ` Ricardo Neri
@ 2017-06-15 19:04       ` Ricardo Neri
  2017-06-19 15:29         ` Borislav Petkov
  2017-06-19 15:37       ` Borislav Petkov
  1 sibling, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-06-15 19:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, 2017-06-15 at 11:37 -0700, Ricardo Neri wrote:
> > Yuck, didn't we talk about this already?
> 
> I am sorry Borislav. I thought you agreed that I could use the values
> of
> the segment override prefixes to identify the segment registers [1].

This time with the reference:
[1]. https://lkml.org/lkml/2017/5/5/377

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment
  2017-06-07 12:59   ` Borislav Petkov
@ 2017-06-15 19:24     ` Ricardo Neri
  2017-06-19 17:11       ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-06-15 19:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-06-07 at 14:59 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:11AM -0700, Ricardo Neri wrote:
> > This function returns the default values of the address and operand sizes
> > as specified in the segment descriptor. This information is determined
> > from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
> > 32-bit legacy modes. For virtual-8086 mode, the default address and
> > operand sizes are always 2 bytes.
> > 
> > The D bit is only meaningful for code segments. Thus, these functions
> > always use the code segment selector contained in regs.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/include/asm/insn-eval.h |  6 ++++
> >  arch/x86/lib/insn-eval.c         | 65 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 71 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> > index 7f3c7fe..9ed1c88 100644
> > --- a/arch/x86/include/asm/insn-eval.h
> > +++ b/arch/x86/include/asm/insn-eval.h
> > @@ -11,9 +11,15 @@
> >  #include <linux/err.h>
> >  #include <asm/ptrace.h>
> >  
> > +struct insn_code_seg_defaults {
> 
> A whole struct for a function which gets called only once?
> 
> Bah, that's a bit too much, if you ask me.
> 
> So you're returning two small unsigned integers - i.e., you can just as
> well return a single u8 and put address and operand sizes in there:
> 
> 	ret = oper_sz | addr_sz << 4;
> 
> No need for special structs for that.

OK. This makes sense. Perhaps I can use a couple of #define's to set and
get the the address and operand sizes in a single u8. This would make
the code more readable.

> 
> > +	unsigned char address_bytes;
> > +	unsigned char operand_bytes;
> > +};
> > +
> >  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
> >  unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> >  				int regoff);
> > +struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs *regs);
> >  
> >  #endif /* _ASM_X86_INSN_EVAL_H */
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index c77ed80..693e5a8 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -603,6 +603,71 @@ static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
> >  }
> >  
> >  /**
> > + * insn_get_code_seg_defaults() - Obtain code segment default parameters
> > + * @regs:	Structure with register values as seen when entering kernel mode
> > + *
> > + * Obtain the default parameters of the code segment: address and operand sizes.
> > + * The code segment is obtained from the selector contained in the CS register
> > + * in regs. In protected mode, the default address is determined by inspecting
> > + * the L and D bits of the segment descriptor. In virtual-8086 mode, the default
> > + * is always two bytes for both address and operand sizes.
> > + *
> > + * Return: A populated insn_code_seg_defaults structure on success. The
> > + * structure contains only zeros on failure.
> 
> s/failure/error/

Will correct.
> 
> > + */
> > +struct insn_code_seg_defaults insn_get_code_seg_defaults(struct pt_regs *regs)
> > +{
> > +	struct desc_struct *desc;
> > +	struct insn_code_seg_defaults defs;
> > +	unsigned short sel;
> > +	/*
> > +	 * The most significant byte of AR_TYPE_MASK determines whether a
> > +	 * segment contains data or code.
> > +	 */
> > +	unsigned int type_mask = AR_TYPE_MASK & (1 << 11);
> > +
> > +	memset(&defs, 0, sizeof(defs));
> > +
> > +	if (v8086_mode(regs)) {
> > +		defs.address_bytes = 2;
> > +		defs.operand_bytes = 2;
> > +		return defs;
> > +	}
> > +
> > +	sel = (unsigned short)regs->cs;
> > +
> > +	desc = get_desc(sel);
> > +	if (!desc)
> > +		return defs;
> > +
> > +	/* if data segment, return */
> > +	if (!(desc->b & type_mask))
> > +		return defs;
> 
> So you can simplify that into:
> 
> 	/* A code segment? */
> 	if (!(desc->b & BIT(11)))
> 		return defs;
> 
> and remove that type_mask thing.

Alternatively, I can do desc->type & BIT(3) to avoid using desc-b, which
is less elegant.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5
  2017-06-07 13:15   ` Borislav Petkov
@ 2017-06-15 19:36     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-15 19:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-06-07 at 15:15 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:12AM -0700, Ricardo Neri wrote:
> > Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when ModRM.mod is zero and
> > ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means
> > that none of the registers are used in the computation of the effective
> > address. A return value of -EDOM signals callers that they should not use
> > the value of registers when computing the effective address for the
> > instruction.
> > 
> > In IA-32e 64-bit mode (long mode), the effective address is given by the
> > 32-bit displacement plus the value of RIP of the next instruction.
> > In IA-32e compatibility mode (protected mode), only the displacement is
> > used.
> > 
> > The instruction decoder takes care of obtaining the displacement.
> 
> ...
> 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 693e5a8..4f600de 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -379,6 +379,12 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  	switch (type) {
> >  	case REG_TYPE_RM:
> >  		regno = X86_MODRM_RM(insn->modrm.value);
> 
> 
> <---- newline here.

Will add the new line.
> 
> > +		/*
> > +		 * ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement
> > +		 * follows the ModRM byte.
> > +		 */
> > +		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
> > +			return -EDOM;
> >  		if (X86_REX_B(insn->rex_prefix.value))
> >  			regno += 8;
> >  		break;
> > @@ -730,9 +736,21 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
> >  		} else {
> >  			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > -			if (addr_offset < 0)
> 
> ditto.

Will add the new line.
> 
> > +			/*
> > +			 * -EDOM means that we must ignore the address_offset.
> > +			 * In such a case, in 64-bit mode the effective address
> > +			 * relative to the RIP of the following instruction.
> > +			 */
> > +			if (addr_offset == -EDOM) {
> > +				eff_addr = 0;
> > +				if (user_64bit_mode(regs))
> > +					eff_addr = (long)regs->ip +
> > +						   insn->length;
> 
> Let that line stick out and write it balanced:
> 
>                         if (addr_offset == -EDOM) {
>                                 if (user_64bit_mode(regs))
>                                         eff_addr = (long)regs->ip + insn->length;
>                                 else
>                                         eff_addr = 0;
> 
> should be easier parseable this way.

Will rewrite as you suggest.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-06-07 15:49   ` Borislav Petkov
@ 2017-06-15 19:58     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-15 19:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-06-07 at 17:49 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:14AM -0700, Ricardo Neri wrote:
> > @@ -697,18 +753,21 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  {
> >  	unsigned long linear_addr, seg_base_addr, seg_limit;
> >  	long eff_addr, base, indx;
> > -	int addr_offset, base_offset, indx_offset;
> > +	int addr_offset, base_offset, indx_offset, addr_bytes;
> >  	insn_byte_t sib;
> >  
> >  	insn_get_modrm(insn);
> >  	insn_get_sib(insn);
> >  	sib = insn->sib.value;
> > +	addr_bytes = insn->addr_bytes;
> >  
> >  	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> >  		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> >  		if (addr_offset < 0)
> >  			goto out_err;
> > -		eff_addr = regs_get_register(regs, addr_offset);
> > +		eff_addr = get_mem_offset(regs, addr_offset, addr_bytes);
> > +		if (eff_addr == -1L)
> > +			goto out_err;
> >  		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
> >  		if (seg_base_addr == -1L)
> >  			goto out_err;
> 
> This code here is too dense, it needs spacing for better readability.

I have spaced out in my upcoming version.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings
  2017-06-07 16:28   ` Borislav Petkov
@ 2017-06-15 21:50     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-15 21:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-06-07 at 18:28 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:16AM -0700, Ricardo Neri wrote:
> > Tasks running in virtual-8086 mode or in protected mode with code
> > segment descriptors that specify 16-bit default address sizes via the
> > D bit will use 16-bit addressing form encodings as described in the Intel
> > 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
> > 2.1.5. 16-bit addressing encodings differ in several ways from the
> > 32-bit/64-bit addressing form encodings: ModRM.rm points to different
> > registers and, in some cases, effective addresses are indicated by the
> > addition of the value of two registers. Also, there is no support for SIB
> > bytes. Thus, a separate function is needed to parse this form of
> > addressing.
> > 
> > A couple of functions are introduced. get_reg_offset_16() obtains the
> > offset from the base of pt_regs of the registers indicated by the ModRM
> > byte of the address encoding. get_addr_ref_16() computes the linear
> > address indicated by the instructions using the value of the registers
> > given by ModRM as well as the base address of the segment.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 155 insertions(+)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 9822061..928a662 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -431,6 +431,73 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  }
> >  
> >  /**
> > + * get_reg_offset_16 - Obtain offset of register indicated by instruction
> 
> Please end function names with parentheses.

I will correct.
> 
> > + * @insn:	Instruction structure containing ModRM and SiB bytes
> 
> s/SiB/SIB/g

I will correct.
> 
> > + * @regs:	Structure with register values as seen when entering kernel mode
> > + * @offs1:	Offset of the first operand register
> > + * @offs2:	Offset of the second opeand register, if applicable.
> > + *
> > + * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
> > + * within insn. This function is to be used with 16-bit address encodings. The
> > + * offs1 and offs2 will be written with the offset of the two registers
> > + * indicated by the instruction. In cases where any of the registers is not
> > + * referenced by the instruction, the value will be set to -EDOM.
> > + *
> > + * Return: 0 on success, -EINVAL on failure.
> > + */
> > +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
> > +			     int *offs1, int *offs2)
> > +{
> > +	/* 16-bit addressing can use one or two registers */
> > +	static const int regoff1[] = {
> > +		offsetof(struct pt_regs, bx),
> > +		offsetof(struct pt_regs, bx),
> > +		offsetof(struct pt_regs, bp),
> > +		offsetof(struct pt_regs, bp),
> > +		offsetof(struct pt_regs, si),
> > +		offsetof(struct pt_regs, di),
> > +		offsetof(struct pt_regs, bp),
> > +		offsetof(struct pt_regs, bx),
> > +	};
> > +
> > +	static const int regoff2[] = {
> > +		offsetof(struct pt_regs, si),
> > +		offsetof(struct pt_regs, di),
> > +		offsetof(struct pt_regs, si),
> > +		offsetof(struct pt_regs, di),
> > +		-EDOM,
> > +		-EDOM,
> > +		-EDOM,
> > +		-EDOM,
> > +	};
> 
> You mean "Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte" in
> the SDM, right?

Yes.
> 
> Please add a comment pointing to it here because it is not trivial to
> map that code to the documentation.

Sure, I will add a comment pointing to this table.

> 
> > +
> > +	if (!offs1 || !offs2)
> > +		return -EINVAL;
> > +
> > +	/* operand is a register, use the generic function */
> > +	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > +		*offs1 = insn_get_modrm_rm_off(insn, regs);
> > +		*offs2 = -EDOM;
> > +		return 0;
> > +	}
> > +
> > +	*offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
> > +	*offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
> > +
> > +	/*
> > +	 * If no displacement is indicated in the mod part of the ModRM byte,
> 
> s/"no "//
> 
> > +	 * (mod part is 0) and the r/m part of the same byte is 6, no register
> > +	 * is used caculate the operand address. An r/m part of 6 means that
> > +	 * the second register offset is already invalid.

Perhaps my comment was misleading. When ModRM.mod is 0, no displacement
is used except for ModRM.mod = 0 and ModRM.rm 110b. In this case we have
displacement-only addressing. I will reword the comment to reflect this
fact.

> > +	 */
> > +	if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
> > +	    (X86_MODRM_RM(insn->modrm.value) == 6))
> > +		*offs1 = -EDOM;
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> >   * get_desc() - Obtain address of segment descriptor
> >   * @sel:	Segment selector
> >   *
> > @@ -689,6 +756,94 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
> >  }
> >  
> >  /**
> > + * get_addr_ref_16() - Obtain the 16-bit address referred by instruction
> > + * @insn:	Instruction structure containing ModRM byte and displacement
> > + * @regs:	Structure with register values as seen when entering kernel mode
> > + *
> > + * This function is to be used with 16-bit address encodings. Obtain the memory
> > + * address referred by the instruction's ModRM bytes and displacement. Also, the
> > + * segment used as base is determined by either any segment override prefixes in
> > + * insn or the default segment of the registers involved in the address
> > + * computation. In protected mode, segment limits are enforced.
> > + *
> > + * Return: linear address referenced by instruction and registers on success.
> > + * -1L on failure.
> > + */
> > +static void __user *get_addr_ref_16(struct insn *insn, struct pt_regs *regs)
> > +{
> > +	unsigned long linear_addr, seg_base_addr, seg_limit;
> > +	short eff_addr, addr1 = 0, addr2 = 0;
> > +	int addr_offset1, addr_offset2;
> > +	int ret;
> > +
> > +	insn_get_modrm(insn);
> > +	insn_get_displacement(insn);
> > +
> > +	/*
> > +	 * If operand is a register, the layout is the same as in
> > +	 * 32-bit and 64-bit addressing.
> > +	 */
> > +	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > +		addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM);
> > +		if (addr_offset1 < 0)
> > +			goto out_err;
> 
> <---- newline here.

Will add newline.

> 
> > +		eff_addr = regs_get_register(regs, addr_offset1);
> > +		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
> > +		if (seg_base_addr == -1L)
> > +			goto out_err;
> 
> ditto.

Will add newline.
> 
> > +		seg_limit = get_seg_limit(regs, insn, addr_offset1);
> > +	} else {
> > +		ret = get_reg_offset_16(insn, regs, &addr_offset1,
> > +					&addr_offset2);
> > +		if (ret < 0)
> > +			goto out_err;
> 
> ditto.

Will add newline.
> 
> > +		/*
> > +		 * Don't fail on invalid offset values. They might be invalid
> > +		 * because they cannot be used for this particular value of
> > +		 * the ModRM. Instead, use them in the computation only if
> > +		 * they contain a valid value.
> > +		 */
> > +		if (addr_offset1 != -EDOM)
> > +			addr1 = 0xffff & regs_get_register(regs, addr_offset1);
> > +		if (addr_offset2 != -EDOM)
> > +			addr2 = 0xffff & regs_get_register(regs, addr_offset2);
> > +		eff_addr = addr1 + addr2;
> 
> ditto.

Will add newline.
> 
> Space those codelines out, we want to be able to read that code again at
> some point :-)))

Sure! I have gone through all this code adding newlines as necessary.

> 
> > +		/*
> > +		 * The first register is in the operand implies the SS or DS
> > +		 * segment selectors, the second register in the operand can
> > +		 * only imply DS. Thus, use the first register to obtain
> > +		 * the segment selector.
> > +		 */
> > +		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
> > +		if (seg_base_addr == -1L)
> > +			goto out_err;
> > +		seg_limit = get_seg_limit(regs, insn, addr_offset1);
> > +
> > +		eff_addr += (insn->displacement.value & 0xffff);
> > +	}
> > +
> > +	linear_addr = (unsigned long)(eff_addr & 0xffff);
> > +
> > +	/*
> > +	 * Make sure the effective address is within the limits of the
> > +	 * segment. In long mode, the limit is -1L. Thus, the second part
> 
> Long mode in a 16-bit handling function?

Yes, this is not correct. However, it is true for virtual-8086 mode. I
will update the comment accordingly.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 21/26] x86: Add emulation code for UMIP instructions
  2017-06-08 18:38   ` Borislav Petkov
@ 2017-06-17  1:34     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-06-17  1:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Thu, 2017-06-08 at 20:38 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:19AM -0700, Ricardo Neri wrote:
> > The feature User-Mode Instruction Prevention present in recent Intel
> > processor prevents a group of instructions from being executed with
> > CPL > 0. Otherwise, a general protection fault is issued.
> 
> This is one of the best opening paragraphs of a commit message I've
> read this year! This is how you open: short, succinct, to the point, no
> marketing bullshit. Good!

Thanks you!
> 
> > Rather than relaying this fault to the user space (in the form of a SIGSEGV
> > signal), the instructions protected by UMIP can be emulated to provide
> > dummy results. This allows to conserve the current kernel behavior and not
> > reveal the system resources that UMIP intends to protect (the global
> > descriptor and interrupt descriptor tables, the segment selectors of the
> > local descriptor table and the task state and the machine status word).
> > 
> > This emulation is needed because certain applications (e.g., WineHQ and
> > DOSEMU2) rely on this subset of instructions to function.
> > 
> > The instructions protected by UMIP can be split in two groups. Those who
> 
> s/who/which/

I will correct.
> 
> > return a kernel memory address (sgdt and sidt) and those who return a
> 
> ditto.

I will correct here also.
> 
> > value (sldt, str and smsw).
> >
> > For the instructions that return a kernel memory address, applications
> > such as WineHQ rely on the result being located in the kernel memory space.
> > The result is emulated as a hard-coded value that, lies close to the top
> > of the kernel memory. The limit for the GDT and the IDT are set to zero.
> 
> Nice.
> 
> > Given that sldt and str are not used in common in programs supported by
> 
> You wanna say "in common programs" here? Or "not commonly used in programs" ?

I will rephrase this comment.
> 
> > WineHQ and DOSEMU2, they are not emulated.
> > 
> > The instruction smsw is emulated to return the value that the register CR0
> > has at boot time as set in the head_32.
> > 
> > Care is taken to appropriately emulate the results when segmentation is
> > used. This is, rather than relying on USER_DS and USER_CS, the function
> 
> 	"That is,... "

I will correct it.
> 
> > insn_get_addr_ref() inspects the segment descriptor pointed by the
> > registers in pt_regs. This ensures that we correctly obtain the segment
> > base address and the address and operand sizes even if the user space
> > application uses local descriptor table.
> 
> Btw, I could very well use all that nice explanation in umip.c too so
> that the high-level behavior is documented.

Sure, I will include a high-level description in the file itself.

> 
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Brian Gerst <brgerst@gmail.com>
> > Cc: Chen Yucong <slaoub@gmail.com>
> > Cc: Chris Metcalf <cmetcalf@mellanox.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: Jiri Slaby <jslaby@suse.cz>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang Z. Li <liang.z.li@intel.com>
> > Cc: Alexandre Julliard <julliard@winehq.org>
> > Cc: Stas Sergeev <stsp@list.ru>
> > Cc: x86@kernel.org
> > Cc: linux-msdos@vger.kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/include/asm/umip.h |  15 +++
> >  arch/x86/kernel/Makefile    |   1 +
> >  arch/x86/kernel/umip.c      | 245 ++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 261 insertions(+)
> >  create mode 100644 arch/x86/include/asm/umip.h
> >  create mode 100644 arch/x86/kernel/umip.c
> > 
> > diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
> > new file mode 100644
> > index 0000000..077b236
> > --- /dev/null
> > +++ b/arch/x86/include/asm/umip.h
> > @@ -0,0 +1,15 @@
> > +#ifndef _ASM_X86_UMIP_H
> > +#define _ASM_X86_UMIP_H
> > +
> > +#include <linux/types.h>
> > +#include <asm/ptrace.h>
> > +
> > +#ifdef CONFIG_X86_INTEL_UMIP
> > +bool fixup_umip_exception(struct pt_regs *regs);
> > +#else
> > +static inline bool fixup_umip_exception(struct pt_regs *regs)
> > +{
> > +	return false;
> > +}
> 
> Let's save some header lines:
> 
> static inline bool fixup_umip_exception(struct pt_regs *regs) 	{ return false; }
> 
> those trunks take too much space as it is.

I will correct.
> 
> > +#endif  /* CONFIG_X86_INTEL_UMIP */
> > +#endif  /* _ASM_X86_UMIP_H */
> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> > index 4b99423..cc1b7cc 100644
> > --- a/arch/x86/kernel/Makefile
> > +++ b/arch/x86/kernel/Makefile
> > @@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
> >  obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
> >  obj-$(CONFIG_TRACING)			+= tracepoint.o
> >  obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
> > +obj-$(CONFIG_X86_INTEL_UMIP)		+= umip.o
> >  
> >  ifdef CONFIG_FRAME_POINTER
> >  obj-y					+= unwind_frame.o
> > diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> > new file mode 100644
> > index 0000000..c7c5795
> > --- /dev/null
> > +++ b/arch/x86/kernel/umip.c
> > @@ -0,0 +1,245 @@
> > +/*
> > + * umip.c Emulation for instruction protected by the Intel User-Mode
> > + * Instruction Prevention. The instructions are:
> > + *    sgdt
> > + *    sldt
> > + *    sidt
> > + *    str
> > + *    smsw
> > + *
> > + * Copyright (c) 2017, Intel Corporation.
> > + * Ricardo Neri <ricardo.neri@linux.intel.com>
> > + */
> > +
> > +#include <linux/uaccess.h>
> > +#include <asm/umip.h>
> > +#include <asm/traps.h>
> > +#include <asm/insn.h>
> > +#include <asm/insn-eval.h>
> > +#include <linux/ratelimit.h>
> > +
> > +/*
> > + * == Base addresses of GDT and IDT
> > + * Some applications to function rely finding the global descriptor table (GDT)
> 
> That formulation reads funny.

I will correct.

> 
> > + * and the interrupt descriptor table (IDT) in kernel memory.
> > + * For x86_32, the selected values do not match any particular hole, but it
> > + * suffices to provide a memory location within kernel memory.
> > + *
> > + * == CRO flags for SMSW
> > + * Use the flags given when booting, as found in head_32.S
> > + */
> > +
> > +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
> > +		   X86_CR0_WP | X86_CR0_AM)
> 
> Why not pull those up in asm/processor-flags.h or so and share the
> definition instead of duplicating it?

Sure, I will relocate this definition.
> 
> > +#define UMIP_DUMMY_GDT_BASE 0xfffe0000
> > +#define UMIP_DUMMY_IDT_BASE 0xffff0000
> > +
> > +enum umip_insn {
> > +	UMIP_SGDT = 0,	/* opcode 0f 01 ModR/M reg 0 */
> > +	UMIP_SIDT,	/* opcode 0f 01 ModR/M reg 1 */
> > +	UMIP_SLDT,	/* opcode 0f 00 ModR/M reg 0 */
> > +	UMIP_SMSW,	/* opcode 0f 01 ModR/M reg 4 */
> > +	UMIP_STR,	/* opcode 0f 00 ModR/M reg 1 */
> 
> Let's stick to a single spelling: ModRM.reg=0, etc.
> 
> Better yet, use the SDM format:
> 
> 	UMIP_SGDT = 0,		/* 0F 01 /0 */
> 	UMIP_SIDT,		/* 0F 01 /1 */
> 	...
> 

I will update accordingly.

> > +};
> > +
> > +/**
> > + * __identify_insn() - Identify a UMIP-protected instruction
> > + * @insn:	Instruction structure with opcode and ModRM byte.
> > + *
> > + * From the instruction opcode and the reg part of the ModRM byte, identify,
> > + * if any, a UMIP-protected instruction.
> > + *
> > + * Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure.
> > + */
> > +static int __identify_insn(struct insn *insn)
> 
> static enum umip_insn __identify_insn(...
> 
> But frankly, that enum looks pointless to me - it is used locally only
> and you can just as well use plain ints.

I will change to plain ints.
> 
> > +{
> > +	/* By getting modrm we also get the opcode. */
> > +	insn_get_modrm(insn);
> > +
> > +	/* All the instructions of interest start with 0x0f. */
> > +	if (insn->opcode.bytes[0] != 0xf)
> > +		return -EINVAL;
> > +
> > +	if (insn->opcode.bytes[1] == 0x1) {
> > +		switch (X86_MODRM_REG(insn->modrm.value)) {
> > +		case 0:
> > +			return UMIP_SGDT;
> > +		case 1:
> > +			return UMIP_SIDT;
> > +		case 4:
> > +			return UMIP_SMSW;
> > +		default:
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	/* SLDT AND STR are not emulated */
> > +	return -EINVAL;
> > +}
> > +
> > +/**
> > + * __emulate_umip_insn() - Emulate UMIP instructions with dummy values
> > + * @insn:	Instruction structure with ModRM byte
> > + * @umip_inst:	Instruction to emulate
> > + * @data:	Buffer onto which the dummy values will be copied
> > + * @data_size:	Size of the emulated result
> > + *
> > + * Emulate an instruction protected by UMIP. The result of the emulation
> > + * is saved in the provided buffer. The size of the results depends on both
> > + * the instruction and type of operand (register vs memory address). Thus,
> > + * the size of the result needs to be updated.
> > + *
> > + * Result: 0 if success, -EINVAL on failure to emulate
> > + */
> > +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
> > +			       unsigned char *data, int *data_size)
> > +{
> > +	unsigned long dummy_base_addr;
> > +	unsigned short dummy_limit = 0;
> > +	unsigned int dummy_value = 0;
> > +
> > +	switch (umip_inst) {
> > +	/*
> > +	 * These two instructions return the base address and limit of the
> > +	 * global and interrupt descriptor table. The base address can be
> > +	 * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand
> > +	 * size is 16-bit the returned value of the base address is supposed
> > +	 * to be a zero-extended 24-byte number. However, it seems that a
> > +	 * 32-byte number is always returned in legacy protected mode
> > +	 * irrespective of the operand size.
> > +	 */
> > +	case UMIP_SGDT:
> > +		/* fall through */
> > +	case UMIP_SIDT:
> > +		if (umip_inst == UMIP_SGDT)
> > +			dummy_base_addr = UMIP_DUMMY_GDT_BASE;
> > +		else
> > +			dummy_base_addr = UMIP_DUMMY_IDT_BASE;
> > +		if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > +			/* SGDT and SIDT do not take register as argument. */
> 
> Comment above the if.

I will correct.

> 
> > +			return -EINVAL;
> > +		}
> 
> So that check needs to go first, then the dummy_base_addr assignment.

I will rearrange.

> 
> > +
> > +		memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
> > +		memcpy(data, &dummy_limit, sizeof(dummy_limit));
> > +		*data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);
> 
> Huh, that value will always be the same - why do you have a specific
> variable? It could be a define, once for 32-bit and once for 64-bit.

Sure. I will use #define's.

> > +		break;
> > +	case UMIP_SMSW:
> > +		/*
> > +		 * Even though CR0_STATE contain 4 bytes, the number
> > +		 * of bytes to be copied in the result buffer is determined
> > +		 * by whether the operand is a register or a memory location.
> > +		 */
> > +		dummy_value = CR0_STATE;
> 
> Something's wrong here: how does that local, write-only variable have
> any effect?

Ah yes, initially SMSW, SLDT and STR were handled equally. Since I
removed support for the last two, I inadvertently removed the code that
copies the result of SMSW. I will re-add it.
 
> 
> > +		/*
> > +		 * These two instructions return a 16-bit value. We return
> > +		 * all zeros. This is equivalent to a null descriptor for
> > +		 * str and sldt.
> > +		 */
> > +		/* SLDT and STR are not emulated */
> > +		/* fall through */
> > +	case UMIP_SLDT:
> > +		/* fall through */
> > +	case UMIP_STR:
> > +		/* fall through */
> > +	default:
> > +		return -EINVAL;
> 
> That switch-case has a majority of fall-throughs. So make it an if-else
> instead.

Sure, I will update.

> 
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * fixup_umip_exception() - Fixup #GP faults caused by UMIP
> > + * @regs:	Registers as saved when entering the #GP trap
> > + *
> > + * The instructions sgdt, sidt, str, smsw, sldt cause a general protection
> > + * fault if with CPL > 0 (i.e., from user space). This function can be
> > + * used to emulate the results of the aforementioned instructions with
> > + * dummy values. Results are copied to user-space memory as indicated by
> > + * the instruction pointed by EIP using the registers indicated in the
> > + * instruction operands. This function also takes care of determining
> > + * the address to which the results must be copied.
> > + */
> > +bool fixup_umip_exception(struct pt_regs *regs)
> > +{
> > +	struct insn insn;
> > +	unsigned char buf[MAX_INSN_SIZE];
> > +	/* 10 bytes is the maximum size of the result of UMIP instructions */
> > +	unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> 
> 	unsigned char dummy_data[10] = { 0 };
> 
> One 0 should be enough :)

Right. I will update.

> 
> > +	unsigned long seg_base;
> > +	int not_copied, nr_copied, reg_offset, dummy_data_size;
> > +	void __user *uaddr;
> > +	unsigned long *reg_addr;
> > +	enum umip_insn umip_inst;
> > +	struct insn_code_seg_defaults seg_defs;
> 
> Please sort function local variables declaration in a reverse christmas
> tree order:
> 
> 	<type> longest_variable_name;
> 	<type> shorter_var_name;
> 	<type> even_shorter;
> 	<type> i;
> 
I will rearrange my variables.

> > +
> > +	/*
> > +	 * Use the segment base in case user space used a different code
> > +	 * segment, either in protected (e.g., from an LDT) or virtual-8086
> > +	 * modes. In most of the cases seg_base will be zero as in USER_CS.
> > +	 */
> > +	seg_base = insn_get_seg_base(regs, &insn,
> > +				     offsetof(struct pt_regs, ip));
> 
> Oh boy, where's the error handling?! That can return -1L.
> 
> > +	not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
> 
> -1L + regs->ip is then your pwnage.

I will add the error handling code.
> 
> > +				    sizeof(buf));
> 
> Just let them stick out.

Sure.

> 
> > +	nr_copied = sizeof(buf) - not_copied;
> 
> <---- newline here.

I will add the new line.
> 
> > +	/*
> > +	 * The copy_from_user above could have failed if user code is protected
> 			    ()
> 
> > +	 * by a memory protection key. Give up on emulation in such a case.
> > +	 * Should we issue a page fault?
> 
> Why? AFAICT, you're in the #GP handler. Simply you return unhandled.

If I returned unhandled, a SIGSEGV will be sent to the user space
application but siginfo will look like a #GP. However, memory protection
keys cause page faults and siginfo is filled differently.

> 
> > +	 */
> > +	if (!nr_copied)
> > +		return false;
> > +
> > +	insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
> > +
> > +	/*
> > +	 * Override the default operand and address sizes to what is specified
> > +	 * in the code segment descriptor. The instruction decoder only sets
> > +	 * the address size it to either 4 or 8 address bytes and does nothing
> > +	 * for the operand bytes. This OK for most of the cases, but we could
> > +	 * have special cases where, for instance, a 16-bit code segment
> > +	 * descriptor is used.
> > +	 * If there are overrides, the instruction decoder correctly updates
> > +	 * these values, even for 16-bit defaults.
> > +	 */
> > +	seg_defs = insn_get_code_seg_defaults(regs);
> > +	insn.addr_bytes = seg_defs.address_bytes;
> > +	insn.opnd_bytes = seg_defs.operand_bytes;
> > +
> > +	if (!insn.addr_bytes || !insn.opnd_bytes)
> > +		return false;
> > +
> > +	if (user_64bit_mode(regs))
> > +		return false;
> > +
> > +	insn_get_length(&insn);
> > +	if (nr_copied < insn.length)
> > +		return false;
> > +
> > +	umip_inst = __identify_insn(&insn);
> > +	/* Check if we found an instruction protected by UMIP */
> 
> Put comment above the function call.

Will do.

> 
> > +	if (umip_inst < 0)
> > +		return false;
> > +
> > +	if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
> > +		return false;
> > +
> > +	/* If operand is a register, write directly to it */
> > +	if (X86_MODRM_MOD(insn.modrm.value) == 3) {
> > +		reg_offset = insn_get_modrm_rm_off(&insn, regs);
> 
> Grr, error handling!! That reg_offset can be -E<something>.

I will add the error handling code.

> 
> > +		reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
> > +		memcpy(reg_addr, dummy_data, dummy_data_size);
> > +	} else {
> > +		uaddr = insn_get_addr_ref(&insn, regs);
> > +		/* user address could not be determined, abort emulation */
> 
> That comment is kinda obvious. But yes, this has error handling.

OK, I will remove this comment.

Many thanks for your detailed review!

BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector
  2017-06-15 19:04       ` Ricardo Neri
@ 2017-06-19 15:29         ` Borislav Petkov
  0 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-06-19 15:29 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, Jun 15, 2017 at 12:04:21PM -0700, Ricardo Neri wrote:
> On Thu, 2017-06-15 at 11:37 -0700, Ricardo Neri wrote:
> > > Yuck, didn't we talk about this already?
> > 
> > I am sorry Borislav. I thought you agreed that I could use the values
> > of
> > the segment override prefixes to identify the segment registers [1].

Yes, I agreed with that but...

> This time with the reference:
> [1]. https://lkml.org/lkml/2017/5/5/377

... this says it already: "... but you should call them what they are:
"enum seg_override_pfxs" or "enum seg_ovr_pfx" or..." IOW, those are
segment *override* prefixes and should be called such and not "enum
segment_register" as this way is misleading.

IOW, here's what I think you should do:

/* Segment override prefixes: */
#define	SEG_CS_OVERRIDE		0x23
#define SEG_SS_OVERRIDE		0x36
#define SEG_DS_OVERRIDE		0x3e

... and so on...

and use the defines directly. The enum is fine and dandy but then you
need to return an error value too so you can just as well have the
function return an int simply and make sure you check the retval.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector
  2017-06-15 18:37     ` Ricardo Neri
  2017-06-15 19:04       ` Ricardo Neri
@ 2017-06-19 15:37       ` Borislav Petkov
  1 sibling, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-06-19 15:37 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, Jun 15, 2017 at 11:37:51AM -0700, Ricardo Neri wrote:
> Wouldn't this be ending up mixing the actual segment register and
> segment register overrides? I plan to have a function that parses the
> segment override prefixes and returns SEG_REG_CS/DS/ES/FS/GS or
> SEG_REG_IGNORE for long mode or SEG_REG_DEFAULT when the default segment
> register needs to be used. A separate function will determine what such
> default segment register is. Does this make sense?

Yes.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment
  2017-06-15 19:24     ` Ricardo Neri
@ 2017-06-19 17:11       ` Borislav Petkov
  0 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-06-19 17:11 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, Jun 15, 2017 at 12:24:35PM -0700, Ricardo Neri wrote:
> OK. This makes sense. Perhaps I can use a couple of #define's to set and
> get the the address and operand sizes in a single u8. This would make
> the code more readable.

Sure but don't get too tangled in defines if it is going to be used
in one place only. Sometimes a clear comment and the naked bitwise
operations are already clear enough.

> Alternatively, I can do desc->type & BIT(3) to avoid using desc-b, which
> is less elegant.

Sure.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-06-07 15:48   ` Borislav Petkov
@ 2017-07-25 23:48     ` Ricardo Neri
  2017-07-27 13:26       ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-07-25 23:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

I am sorry Boris, while working on this series I missed a few of your
feedback comments.

On Wed, 2017-06-07 at 17:48 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:14AM -0700, Ricardo Neri wrote:
> > The 32-bit and 64-bit address encodings are identical. This means that we
> > can use the same function in both cases. In order to reuse the function
> > for 32-bit address encodings, we must sign-extend our 32-bit signed
> > operands to 64-bit signed variables (only for 64-bit builds). To decide on
> > whether sign extension is needed, we rely on the address size as given by
> > the instruction structure.
> > 
> > Once the effective address has been computed, a special verification is
> > needed for 32-bit processes. If running on a 64-bit kernel, such processes
> > can address up to 4GB of memory. Hence, for instance, an effective
> > address of 0xffff1234 would be misinterpreted as 0xffffffffffff1234 due to
> > the sign extension mentioned above. For this reason, the 4 must be
> 
> Which 4?

I meant to say the 4 most significant bytes. In this case, the
64-address 0xffffffffffff1234 would lie in the kernel memory while
0xffff1234 would correctly be in the user space memory.
> 
> > truncated to obtain the true effective address.
> > 
> > Lastly, before computing the linear address, we verify that the effective
> > address is within the limits of the segment. The check is kept for long
> > mode because in such a case the limit is set to -1L. This is the largest
> > unsigned number possible. This is equivalent to a limit-less segment.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 99 ++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 88 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 1a5f5a6..c7c1239 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -688,6 +688,62 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
> >  	return get_reg_offset(insn, regs, REG_TYPE_RM);
> >  }
> >  
> > +/**
> > + * _to_signed_long() - Cast an unsigned long into signed long
> > + * @val		A 32-bit or 64-bit unsigned long
> > + * @long_bytes	The number of bytes used to represent a long number
> > + * @out		The casted signed long
> > + *
> > + * Return: A signed long of either 32 or 64 bits, as per the build configuration
> > + * of the kernel.
> > + */
> > +static int _to_signed_long(unsigned long val, int long_bytes, long *out)
> > +{
> > +	if (!out)
> > +		return -EINVAL;
> > +
> > +#ifdef CONFIG_X86_64
> > +	if (long_bytes == 4) {
> > +		/* higher bytes should all be zero */
> > +		if (val & ~0xffffffff)
> > +			return -EINVAL;
> > +
> > +		/* sign-extend to a 64-bit long */
> 
> So this is a 32-bit userspace on a 64-bit kernel, right?

Yes.
> 
> If so, how can a memory offset be > 32-bits and we have to extend it to
> a 64-bit long?!?

Yes, perhaps the check above is not needed. I included that check as
part of my argument validation. In a 64-bit kernel, this function could
be called with val with non-zero most significant bytes.
> 
> I *think* you want to say that you want to convert it to long so that
> you can do the calculation in longs.

That is exactly what I meant. More specifically, I want to convert my
32-bit variables into 64-bit signed longs; this is the reason I need the
sign extension.
> 
> However!
> 
> If you're a 64-bit kernel running a 32-bit userspace, you need to do
> the calculation in 32-bits only so that it overflows, as it would do
> on 32-bit hardware. IOW, the clamping to 32-bits at the end is not
> something you wanna do but actually let it wrap if it overflows.

I have looked into this closely and as far as I can see, the 4 least
significant bytes will wrap around when using 64-bit signed numbers as
they would when using 32-bit signed numbers. For instance, for two
positive numbers we have:

7fff:ffff + 7000:0000 = efff:ffff.

The addition above overflows. When sign-extended to 64-bit numbers we
would have:

0000:0000:7fff:ffff + 0000:0000:7000:0000 = 0000:0000:efff:ffff.

The addition above does not overflow. However, the 4 least significant
bytes overflow as we expect. We can clamp the 4 most significant bytes.

For a two's complement negative numbers we can have:

ffff:ffff + 8000:0000 = 7fff:ffff with a carry flag.

The addition above overflows.

When sign-extending to 64-bit numbers we would have:

ffff:ffff:ffff:ffff + ffff:ffff:8000:0000 = ffff:ffff:7fff:ffff with a
carry flag.

The addition above does not overflow. However, the 4 least significant
bytes overflew and wrapped around as they would when using 32-bit signed
numbers.

> Or am I missing something?

Now, am I missing something?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 22/26] x86/umip: Force a page fault when unable to copy emulated result to user
  2017-06-09 11:02   ` Borislav Petkov
@ 2017-07-25 23:50     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-07-25 23:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Fri, 2017-06-09 at 13:02 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:20AM -0700, Ricardo Neri wrote:
> > fixup_umip_exception() will be called from do_general_protection. If the
> 								  ^
> 								  |
> Please end function names with parentheses.		       ---+
> 
> > former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
> > However, when emulation is successful but the emulated result cannot be
> > copied to user space memory, it is more accurate to issue a SIGSEGV with
> > SEGV_MAPERR with the offending address.
> > A new function is inspired in
> 
> That reads funny.

I will correct this.
> 
> > force_sig_info_fault is introduced to model the page fault.
> > 
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Brian Gerst <brgerst@gmail.com>
> > Cc: Chen Yucong <slaoub@gmail.com>
> > Cc: Chris Metcalf <cmetcalf@mellanox.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: Jiri Slaby <jslaby@suse.cz>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang Z. Li <liang.z.li@intel.com>
> > Cc: Alexandre Julliard <julliard@winehq.org>
> > Cc: Stas Sergeev <stsp@list.ru>
> > Cc: x86@kernel.org
> > Cc: linux-msdos@vger.kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/kernel/umip.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 43 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> > index c7c5795..ff7366a 100644
> > --- a/arch/x86/kernel/umip.c
> > +++ b/arch/x86/kernel/umip.c
> > @@ -148,6 +148,41 @@ static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
> >  }
> >  
> >  /**
> > + * __force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
> > + * @address:	Address that caused the signal
> > + * @regs:	Register set containing the instruction pointer
> > + *
> > + * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
> > + * intended to be used to provide a segmentation fault when the result of the
> > + * UMIP emulation could not be copied to the user space memory.
> > + *
> > + * Return: none
> > + */
> > +static void __force_sig_info_umip_fault(void __user *address,
> > +					struct pt_regs *regs)
> > +{
> > +	siginfo_t info;
> > +	struct task_struct *tsk = current;
> > +
> > +	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
> 
> Save an indentation level:
> 
> 	if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
> 		return;
> 
> 	printk...
> 
I will implement like this.
> 
> 
> > +		printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n",
> > +				   tsk->comm, task_pid_nr(tsk), regs->ip,
> > +				   regs->sp, X86_PF_USER | X86_PF_WRITE,
> > +				   regs->ip);
> > +	}
> > +
> > +	tsk->thread.cr2		= (unsigned long)address;
> > +	tsk->thread.error_code	= X86_PF_USER | X86_PF_WRITE;
> > +	tsk->thread.trap_nr	= X86_TRAP_PF;
> > +
> > +	info.si_signo	= SIGSEGV;
> > +	info.si_errno	= 0;
> > +	info.si_code	= SEGV_MAPERR;
> > +	info.si_addr	= address;
> > +	force_sig_info(SIGSEGV, &info, tsk);
> > +}
> > +
> > +/**
> >   * fixup_umip_exception() - Fixup #GP faults caused by UMIP
> >   * @regs:	Registers as saved when entering the #GP trap
> >   *
> > @@ -235,8 +270,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
> >  		if ((unsigned long)uaddr == -1L)
> >  			return false;
> >  		nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
> > -		if (nr_copied  > 0)
> > -			return false;
> > +		if (nr_copied  > 0) {
> > +			/*
> > +			 * If copy fails, send a signal and tell caller that
> > +			 * fault was fixed up
> 
> Pls end sentences in the comments with a fullstop.

I will correct this.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 23/26] x86/traps: Fixup general protection faults caused by UMIP
  2017-06-09 13:02   ` Borislav Petkov
@ 2017-07-25 23:51     ` Ricardo Neri
  0 siblings, 0 replies; 81+ messages in thread
From: Ricardo Neri @ 2017-07-25 23:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

I am sorry Boris, I also missed this feedback.

On Fri, 2017-06-09 at 15:02 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:21AM -0700, Ricardo Neri wrote:
> > If the User-Mode Instruction Prevention CPU feature is available and
> > enabled, a general protection fault will be issued if the instructions
> > sgdt, sldt, sidt, str or smsw are executed from user-mode context
> > (CPL > 0). If the fault was caused by any of the instructions protected
> > by UMIP, fixup_umip_exception will emulate dummy results for these
> 
> Please end function names with parentheses.

I have audited my commit messages to remove all instances of this error.
> 
> > instructions. If emulation is successful, the result is passed to the
> > user space program and no SIGSEGV signal is emitted.
> > 
> > Please note that fixup_umip_exception also caters for the case when
> > the fault originated while running in virtual-8086 mode.
> > 
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Brian Gerst <brgerst@gmail.com>
> > Cc: Chen Yucong <slaoub@gmail.com>
> > Cc: Chris Metcalf <cmetcalf@mellanox.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: Jiri Slaby <jslaby@suse.cz>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang Z. Li <liang.z.li@intel.com>
> > Cc: Alexandre Julliard <julliard@winehq.org>
> > Cc: Stas Sergeev <stsp@list.ru>
> > Cc: x86@kernel.org
> > Cc: linux-msdos@vger.kernel.org
> > Reviewed-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/kernel/traps.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 3995d3a..cec548d 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -65,6 +65,7 @@
> >  #include <asm/trace/mpx.h>
> >  #include <asm/mpx.h>
> >  #include <asm/vm86.h>
> > +#include <asm/umip.h>
> >  
> >  #ifdef CONFIG_X86_64
> >  #include <asm/x86_init.h>
> > @@ -526,6 +527,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
> >  	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> >  	cond_local_irq_enable(regs);
> >  
> 
> Almost definitely:
> 
> 	if (static_cpu_has(X86_FEATURE_UMIP)) {
> 		if (...

I will make this update.

> 
> > +	if (user_mode(regs) && fixup_umip_exception(regs))
> > +		return;
> 
> We don't want to punish !UMIP machines.

I will add this check.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention
  2017-06-09 16:10   ` Borislav Petkov
@ 2017-07-26  0:44     ` Ricardo Neri
  2017-07-27 13:57       ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-07-26  0:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Fri, 2017-06-09 at 18:10 +0200, Borislav Petkov wrote:
> On Fri, May 05, 2017 at 11:17:22AM -0700, Ricardo Neri wrote:
> > User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
> > bit in %cr4.
> > 
> > It makes sense to enable UMIP at some point while booting, before user
> > spaces come up. Like SMAP and SMEP, is not critical to have it enabled
> > very early during boot. This is because UMIP is relevant only when there is
> > a userspace to be protected from. Given the similarities in relevance, it
> > makes sense to enable UMIP along with SMAP and SMEP.
> > 
> > UMIP is enabled by default. It can be disabled by adding clearcpuid=514
> > to the kernel parameters.
> > 
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Brian Gerst <brgerst@gmail.com>
> > Cc: Chen Yucong <slaoub@gmail.com>
> > Cc: Chris Metcalf <cmetcalf@mellanox.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: Jiri Slaby <jslaby@suse.cz>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang Z. Li <liang.z.li@intel.com>
> > Cc: Alexandre Julliard <julliard@winehq.org>
> > Cc: Stas Sergeev <stsp@list.ru>
> > Cc: x86@kernel.org
> > Cc: linux-msdos@vger.kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/Kconfig             | 10 ++++++++++
> >  arch/x86/kernel/cpu/common.c | 16 +++++++++++++++-
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 702002b..1b1bbeb 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1745,6 +1745,16 @@ config X86_SMAP
> >  
> >  	  If unsure, say Y.
> >  
> > +config X86_INTEL_UMIP
> > +	def_bool y
> 
> That's a bit too much. It makes sense on distro kernels but how many
> machines out there actually have UMIP?

So would this become a y when more machines have UMIP?
> 
> > +	depends on CPU_SUP_INTEL
> > +	prompt "Intel User Mode Instruction Prevention" if EXPERT
> > +	---help---
> > +	  The User Mode Instruction Prevention (UMIP) is a security
> > +	  feature in newer Intel processors. If enabled, a general
> > +	  protection fault is issued if the instructions SGDT, SLDT,
> > +	  SIDT, SMSW and STR are executed in user mode.
> > +
> >  config X86_INTEL_MPX
> >  	prompt "Intel MPX (Memory Protection Extensions)"
> >  	def_bool n
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index 8ee3211..66ebded 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -311,6 +311,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
> >  	}
> >  }
> >  
> > +static __always_inline void setup_umip(struct cpuinfo_x86 *c)
> > +{
> > +	if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
> > +	    cpu_has(c, X86_FEATURE_UMIP))
> 
> Hmm, so if UMIP is not build-time disabled, the cpu_feature_enabled()
> will call static_cpu_has().
> 
> Looks like you want to call cpu_has() too because alternatives haven't
> run yet and static_cpu_has() will reply wrong. Please state that in a
> comment.

Why would static_cpu_has() reply wrong if alternatives are not in place?
Because it uses the boot CPU data? When it calls _static_cpu_has() it
would do something equivalent to

   testb test_bit, boot_cpu_data.x86_capability[bit].

I am calling cpu_has because cpu_feature_enabled(), via
static_cpu_has(), will use the boot CPU data while cpu_has would use the
local CPU data. Is this what you meant?

I can definitely add a comment with this explanation, if it makes sense.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-07-25 23:48     ` Ricardo Neri
@ 2017-07-27 13:26       ` Borislav Petkov
  2017-07-28  2:04         ` Ricardo Neri
  0 siblings, 1 reply; 81+ messages in thread
From: Borislav Petkov @ 2017-07-27 13:26 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Jul 25, 2017 at 04:48:13PM -0700, Ricardo Neri wrote:
> I meant to say the 4 most significant bytes. In this case, the
> 64-address 0xffffffffffff1234 would lie in the kernel memory while
> 0xffff1234 would correctly be in the user space memory.

That explanation is better.

> Yes, perhaps the check above is not needed. I included that check as
> part of my argument validation. In a 64-bit kernel, this function could
> be called with val with non-zero most significant bytes.

So say that in the comment so that it is obvious *why*.

> I have looked into this closely and as far as I can see, the 4 least
> significant bytes will wrap around when using 64-bit signed numbers as
> they would when using 32-bit signed numbers. For instance, for two
> positive numbers we have:
> 
> 7fff:ffff + 7000:0000 = efff:ffff.
> 
> The addition above overflows.

Yes, MSB changes.

> When sign-extended to 64-bit numbers we would have:
> 
> 0000:0000:7fff:ffff + 0000:0000:7000:0000 = 0000:0000:efff:ffff.
> 
> The addition above does not overflow. However, the 4 least significant
> bytes overflow as we expect.

No they don't - you are simply using 64-bit regs:

   0x00005555555546b8 <+8>:     movq   $0x7fffffff,-0x8(%rbp)
   0x00005555555546c0 <+16>:    movq   $0x70000000,-0x10(%rbp)
   0x00005555555546c8 <+24>:    mov    -0x8(%rbp),%rdx
   0x00005555555546cc <+28>:    mov    -0x10(%rbp),%rax
=> 0x00005555555546d0 <+32>:    add    %rdx,%rax

rax            0xefffffff       4026531839
rbx            0x0      0
rcx            0x0      0
rdx            0x7fffffff       2147483647

...

eflags         0x206    [ PF IF ]

(OF flag is not set).

> We can clamp the 4 most significant bytes.
> 
> For a two's complement negative numbers we can have:
> 
> ffff:ffff + 8000:0000 = 7fff:ffff with a carry flag.
> 
> The addition above overflows.

Yes.

> When sign-extending to 64-bit numbers we would have:
> 
> ffff:ffff:ffff:ffff + ffff:ffff:8000:0000 = ffff:ffff:7fff:ffff with a
> carry flag.
> 
> The addition above does not overflow. However, the 4 least significant
> bytes overflew and wrapped around as they would when using 32-bit signed
> numbers.

Right. Ok.

And come to think of it now, I'm wondering, whether it would be
better/easier/simpler/more straight-forward, to do the 32-bit operations
with 32-bit types and separate 32-bit functions and have the hardware do
that for you.

This way you can save yourself all that ugly and possibly error-prone
casting back and forth and have the code much more readable too.

Hmmm.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention
  2017-07-26  0:44     ` Ricardo Neri
@ 2017-07-27 13:57       ` Borislav Petkov
  0 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-07-27 13:57 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Tony Luck

On Tue, Jul 25, 2017 at 05:44:08PM -0700, Ricardo Neri wrote:
> On Fri, 2017-06-09 at 18:10 +0200, Borislav Petkov wrote:
> > On Fri, May 05, 2017 at 11:17:22AM -0700, Ricardo Neri wrote:
> > > User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
> > > bit in %cr4.
> > > 
> > > It makes sense to enable UMIP at some point while booting, before user
> > > spaces come up. Like SMAP and SMEP, is not critical to have it enabled
> > > very early during boot. This is because UMIP is relevant only when there is
> > > a userspace to be protected from. Given the similarities in relevance, it
> > > makes sense to enable UMIP along with SMAP and SMEP.
> > > 
> > > UMIP is enabled by default. It can be disabled by adding clearcpuid=514
> > > to the kernel parameters.

...

> So would this become a y when more machines have UMIP?

I guess. Stuff which proves reliable and widespread gets automatically
enabled with time, in most cases. IMHO, of course.

> Why would static_cpu_has() reply wrong if alternatives are not in place?
> Because it uses the boot CPU data? When it calls _static_cpu_has() it
> would do something equivalent to

Nevermind - I forgot that static_cpu_has() now drops to dynamic check
before alternatives application.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-07-27 13:26       ` Borislav Petkov
@ 2017-07-28  2:04         ` Ricardo Neri
  2017-07-28  6:50           ` Borislav Petkov
  0 siblings, 1 reply; 81+ messages in thread
From: Ricardo Neri @ 2017-07-28  2:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, 2017-07-27 at 15:26 +0200, Borislav Petkov wrote:
> On Tue, Jul 25, 2017 at 04:48:13PM -0700, Ricardo Neri wrote:
> > I meant to say the 4 most significant bytes. In this case, the
> > 64-address 0xffffffffffff1234 would lie in the kernel memory while
> > 0xffff1234 would correctly be in the user space memory.
> 
> That explanation is better.
> 
> > Yes, perhaps the check above is not needed. I included that check as
> > part of my argument validation. In a 64-bit kernel, this function could
> > be called with val with non-zero most significant bytes.
> 
> So say that in the comment so that it is obvious *why*.
> 
> > I have looked into this closely and as far as I can see, the 4 least
> > significant bytes will wrap around when using 64-bit signed numbers as
> > they would when using 32-bit signed numbers. For instance, for two
> > positive numbers we have:
> > 
> > 7fff:ffff + 7000:0000 = efff:ffff.
> > 
> > The addition above overflows.
> 
> Yes, MSB changes.
> 
> > When sign-extended to 64-bit numbers we would have:
> > 
> > 0000:0000:7fff:ffff + 0000:0000:7000:0000 = 0000:0000:efff:ffff.
> > 
> > The addition above does not overflow. However, the 4 least significant
> > bytes overflow as we expect.
> 
> No they don't - you are simply using 64-bit regs:
> 
>    0x00005555555546b8 <+8>:     movq   $0x7fffffff,-0x8(%rbp)
>    0x00005555555546c0 <+16>:    movq   $0x70000000,-0x10(%rbp)
>    0x00005555555546c8 <+24>:    mov    -0x8(%rbp),%rdx
>    0x00005555555546cc <+28>:    mov    -0x10(%rbp),%rax
> => 0x00005555555546d0 <+32>:    add    %rdx,%rax
> 
> rax            0xefffffff       4026531839
> rbx            0x0      0
> rcx            0x0      0
> rdx            0x7fffffff       2147483647
> 
> ...
> 
> eflags         0x206    [ PF IF ]
> 
> (OF flag is not set).

True, I don't have the OF set. However the 4 least significant bytes
wrapped around; which is what I needed.
> 
> > We can clamp the 4 most significant bytes.
> > 
> > For a two's complement negative numbers we can have:
> > 
> > ffff:ffff + 8000:0000 = 7fff:ffff with a carry flag.
> > 
> > The addition above overflows.
> 
> Yes.
> 
> > When sign-extending to 64-bit numbers we would have:
> > 
> > ffff:ffff:ffff:ffff + ffff:ffff:8000:0000 = ffff:ffff:7fff:ffff with a
> > carry flag.
> > 
> > The addition above does not overflow. However, the 4 least significant
> > bytes overflew and wrapped around as they would when using 32-bit signed
> > numbers.
> 
> Right. Ok.
> 
> And come to think of it now, I'm wondering, whether it would be
> better/easier/simpler/more straight-forward, to do the 32-bit operations
> with 32-bit types and separate 32-bit functions and have the hardware do
> that for you.
> 
> This way you can save yourself all that ugly and possibly error-prone
> casting back and forth and have the code much more readable too.

That sounds fair. I had to explain a lot this code and probably is not
worth it. I can definitely use 32-bit variable types for the 32-bit case
and drop all these castings.

The 32-bit and 64-bit functions would look identical except for the
variables used to compute the effective address. Perhaps I could use a
union:

union eff_addr {
#if  CONFIG_X86_64
	long	addr64;
#endif
	int	addr32;
};

And use one or the other based on the address size given by the CS.L
CS.D bits of the segment descriptor or address size overrides.

However using the union could be less readable than having two almost
identical functions.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
  2017-07-28  2:04         ` Ricardo Neri
@ 2017-07-28  6:50           ` Borislav Petkov
  0 siblings, 0 replies; 81+ messages in thread
From: Borislav Petkov @ 2017-07-28  6:50 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, Jul 27, 2017 at 07:04:52PM -0700, Ricardo Neri wrote:
> However using the union could be less readable than having two almost
> identical functions.

So having some small duplication for the sake of clarity and readability
is much better, if you ask me. And it's not like you're duplicating a
lot of code - it is only a handful of functions.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2017-07-28  6:51 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-05 18:16 [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
2017-05-05 18:16 ` [PATCH v7 01/26] ptrace,x86: Make user_64bit_mode() available to 32-bit builds Ricardo Neri
2017-05-21 14:19   ` Borislav Petkov
2017-05-05 18:17 ` [PATCH v7 02/26] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
2017-05-21 14:23   ` Borislav Petkov
2017-05-27  3:40     ` Ricardo Neri
2017-05-27 10:13       ` Borislav Petkov
2017-06-01  3:09         ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 03/26] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 04/26] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b Ricardo Neri
2017-05-24 13:37   ` Borislav Petkov
2017-05-27  3:36     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 05/26] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0 Ricardo Neri
2017-05-29 13:07   ` Borislav Petkov
2017-06-06  6:08     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 06/26] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 07/26] x86/insn-eval: Do not BUG on invalid register type Ricardo Neri
2017-05-29 16:37   ` Borislav Petkov
2017-06-06  6:06     ` Ricardo Neri
2017-06-06 11:58       ` Borislav Petkov
2017-06-07  0:28         ` Ricardo Neri
2017-06-07 12:21           ` Borislav Petkov
2017-05-05 18:17 ` [PATCH v7 08/26] x86/insn-eval: Add a utility function to get register offsets Ricardo Neri
2017-05-29 17:16   ` Borislav Petkov
2017-06-06  6:02     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 09/26] x86/insn-eval: Add utility function to identify string instructions Ricardo Neri
2017-05-29 21:48   ` Borislav Petkov
2017-06-06  6:01     ` Ricardo Neri
2017-06-06 12:04       ` Borislav Petkov
2017-05-05 18:17 ` [PATCH v7 10/26] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
2017-05-30 10:35   ` Borislav Petkov
2017-06-15 18:37     ` Ricardo Neri
2017-06-15 19:04       ` Ricardo Neri
2017-06-19 15:29         ` Borislav Petkov
2017-06-19 15:37       ` Borislav Petkov
2017-05-05 18:17 ` [PATCH v7 11/26] x86/insn-eval: Add utility function to get segment descriptor Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 12/26] x86/insn-eval: Add utility functions to get segment descriptor base address and limit Ricardo Neri
2017-05-31 16:58   ` Borislav Petkov
2017-06-03 17:23     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 13/26] x86/insn-eval: Add function to get default params of code segment Ricardo Neri
2017-06-07 12:59   ` Borislav Petkov
2017-06-15 19:24     ` Ricardo Neri
2017-06-19 17:11       ` Borislav Petkov
2017-05-05 18:17 ` [PATCH v7 14/26] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 5 Ricardo Neri
2017-06-07 13:15   ` Borislav Petkov
2017-06-15 19:36     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 15/26] x86/insn-eval: Incorporate segment base and limit in linear address computation Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
2017-06-07 15:48   ` Borislav Petkov
2017-07-25 23:48     ` Ricardo Neri
2017-07-27 13:26       ` Borislav Petkov
2017-07-28  2:04         ` Ricardo Neri
2017-07-28  6:50           ` Borislav Petkov
2017-06-07 15:49   ` Borislav Petkov
2017-06-15 19:58     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 17/26] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 18/26] x86/insn-eval: Add support to resolve 16-bit addressing encodings Ricardo Neri
2017-06-07 16:28   ` Borislav Petkov
2017-06-15 21:50     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 19/26] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 20/26] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
2017-05-06  9:04   ` Paolo Bonzini
2017-05-11  3:23     ` Ricardo Neri
2017-06-07 18:24   ` Borislav Petkov
2017-05-05 18:17 ` [PATCH v7 21/26] x86: Add emulation code for UMIP instructions Ricardo Neri
2017-06-08 18:38   ` Borislav Petkov
2017-06-17  1:34     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 22/26] x86/umip: Force a page fault when unable to copy emulated result to user Ricardo Neri
2017-06-09 11:02   ` Borislav Petkov
2017-07-25 23:50     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 23/26] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
2017-06-09 13:02   ` Borislav Petkov
2017-07-25 23:51     ` Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 24/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
2017-06-09 16:10   ` Borislav Petkov
2017-07-26  0:44     ` Ricardo Neri
2017-07-27 13:57       ` Borislav Petkov
2017-05-05 18:17 ` [PATCH v7 25/26] selftests/x86: Add tests for " Ricardo Neri
2017-05-05 18:17 ` [PATCH v7 26/26] selftests/x86: Add tests for instruction str and sldt Ricardo Neri
2017-05-17 18:42 ` [PATCH v7 00/26] x86: Enable User-Mode Instruction Prevention Ricardo Neri
2017-05-27  3:49   ` Neri, Ricardo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).