linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
@ 2017-03-08  0:32 Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
                   ` (22 more replies)
  0 siblings, 23 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri

This is v6 of this series. The five previous submissions can be found
here [1], here [2], here[3], here[4], and here[5]. This version addresses
the comments received in v4 plus improvements of the handling of emulation
in 64-bit builds. Please see details in the change log.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table.

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

There is a caveat, however. Certain applications rely on some of these
instructions to function. An example of this are applications that use
WineHQ[6]. For instance, these applications rely on sidt returning a non-
accessible memory location[8]. During the discussions, it was proposed that
the fault could be relied to the user-space and perform the emulation in
user-mode. However, this would break existing applications until, for
instance, they update to a new WineHQ version. However, this approach
would require UMIP to be disabled by default. The consensus in this forum
is to always enable it.

This patchset initially treated tasks running in virtual-8086 mode as a
special case. However, I received clarification that DOSEMU[8] does not
support applications that use these instructions. It relies on WineHQ for
this [9]. Furthermore, the applications for which the concern was raised
run in protected mode [8].

Please note that UMIP is always enabled for both 64-bit and 32-bit Linux
builds. However, emulation of the UMIP-protected instructions is not done
for 64-bit processes. 64-bit user space applications will receive the
SIGSEGV signal when UMIP instructions causes a general protection fault.

=== How are UMIP-protected instructions emulated?

This version keeps UMIP enabled at all times and by default. If a general
protection fault caused by the instructions protected by UMIP is
detected, such fault will be fixed-up by returning dummy values as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments in x86_32, the base of GDT and IDT was always a 4-byte
   address, even for 16-bit operands. Thus, my emulation code does the
   same. In all cases, the limit of the table is set to 0.
 * STR and SLDT return 0 as the segment selector. This looks appropriate
   since we are providing a dummy value as the base address of the global
   descriptor table.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabled:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask. As per the Intel 64 and IA-32 Architectures Software Developer's
   Manual, SMSW returns a 16-bit results for memory operands. However, when
   the operand is a register, the results can be up to CR0[63:0]. Since
   the emulation code only kicks-in in x86_32, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.

=== How is this series laid out?

++ Fix bugs in MPX address evaluator
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general
purpose registers when used as operands. I put some of this code in
a separate library file that both MPX and UMIP can access and avoid code
duplication. Before creating the new library, I fixed a couple of bugs
that I found in how MPX determines the address contained in the
instruction and operands.

++ Provide a new x86 instruction evaluating library
With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c
library. The basic functionality of this library is extended to obtain the
segment descriptor selected by either segment override prefixes or the
default segment by the involved registers in the calculation of the
effective address. It was also extended to obtain the default address and
operand sizes as well as the segment base address. Also, support to 
process 16-bit address encodings. Armed with this arsenal, it is now
possible to determine the linear address onto which the emulated results
shall be copied.

This code supports Normal 32-bit and 64-bit (i.e., __USER32_CS and/or
__USER_CS) protected mode, virtual-8086 mode, 16-bit protected mode with
32-bit base address. 

++ Emulate UMIP instructions
A new fixup_umip_exception functions inspect the instruction at the
instruction pointer. If it is an UMIP-protected instruction, it executes
the emulation code. This uses all the address-computing code of the
previous section.

++ Add self-tests
Lastly, self-tests are added to entry_from_v86.c to exercise the most
typical use cases of UMIP-protected instructions in a virtual-8086 mode.

++ Extensive tests
Extensive tests were performed to test all the combinations of ModRM,
SiB and displacements for 16-bit and 32-bit encodings for the ss, ds,
es, fs and gs segments. Tests also include a 64-bit program that uses
segmentation via fs and gs. For this purpose, I temporarily, and not
as part of this patchset, enabled UMIP support for 64-bit process with
the intention to test the computations of linear addresses in 64-bit
mode, including the extra R8-R15 registers. Extensive test is also
implemented for virtual-8086 tasks. Code of these tests can be found here
[10] and here [11].
 
[1]. https://lwn.net/Articles/705877/
[2]. https://lkml.org/lkml/2016/12/23/265
[3]. https://lkml.org/lkml/2017/1/25/622
[4]. https://lkml.org/lkml/2017/2/23/40
[5]. https://lkml.org/lkml/2017/3/3/678
[7]. https://www.winehq.org/
[8]. https://www.winehq.org/pipermail/wine-devel/2016-November/115320.html
[9]. http://www.dosemu.org/
[9]. http://marc.info/?l=linux-kernel&m=147876798717927&w=2
[10]. https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umip/files
[11]. https://github.com/01org/luv-yocto/commit/a72a7fe7d68693c0f4100ad86de6ecabde57334f#diff-3860c136a63add269bce4ea50222c248R1

Thanks and BR,
Ricardo

Changes since V5:
* Relocate the page fault error code enumerations to traps.h

Changes since V4:
* Audited patches to use braces in all the branches of conditional.
  statements, except those in which the conditional action only takes one
  line.
* Implemented support in 64-builds for both 32-bit and 64-bit tasks in the
  instruction evaluating library.
* Split segment selector function in the instruction evaluating library
  into two functions to resolve the segment type by instruction override
  or default and a separate function to actually read the segment selector.
* Fixed a bug when evaluating 32-bit effective addresses with 64-bit
  kernels.
* Split patches further for for easier review.
* Use signed variables for computation of effective address.
* Fixed issue with a spurious static modifier in function insn_get_addr_ref
  found by kbuild test bot.
* Removed comparison between true and fixup_umip_exception.
* Reworked check logic when identifying erroneous vs invalid values of the
  SiB base and index.

Changes since V3:
* Limited emulation to 32-bit and 16-bit modes. For 64-bit mode, a general
  protection fault is still issued when UMIP-protected instructions are
  executed with CPL > 0.
* Expanded instruction-evaluating code to obtain segment descriptor along
  with their attributes such as base address and default address and
  operand sizes. Also, support for 16-bit encodings in protected mode was
  implemented.
* When getting a segment descriptor, this include support to obtain those
  of a local descriptor table.
* Now the instruction-evaluating code returns -EDOM when the value of
  registers should not be used in calculating the effective address. The
  value -EINVAL is left for errors.
* Incorporate the value of the segment base address in the computation of
  linear addresses.
* Renamed new instruction evaluation library from insn-kernel.c to
  insn-eval.c
* Exported functions insn_get_reg_offset_* to obtain the register offset
  by ModRM r/m, SiB base and SiB index.
* Improved documentation of functions.
* Split patches further for easier review.

Changes since V2:
* Added new utility functions to decode the memory addresses contained in
  registers when the 16-bit addressing encodings are used. This includes
  code to obtain and compute memory addresses using segment selectors for
  real-mode address translation.
* Added support to emulate UMIP-protected instructions for virtual-8086
  tasks.
* Added self-tests for virtual-8086 mode that contains representative
  use cases: address represented as a displacement, address in registers
  and registers as operands.
* Instead of maintaining a static variable for the dummy base addresses
  of the IDT and GDT, a hard-coded value is used.
* The emulated SMSW instructions now return the value with which the CR0
  register is programmed in head_32/64.S This is: PE | MP | ET | NE | WP
  | AM. For x86_64, PG is also enabled.
* The new file arch/x86/lib/insn-utils.c is now renamed as arch/x86/lib/
  insn-kernel.c. It also has its own header. This helps keep in sync the
  the kernel and objtool instruction decoders. Also, the new insn-kernel.c
  contains utility functions that are only relevant in a kernel context.
* Removed printed warnings for errors that occur when decoding instructions
  with invalid operands.
* Added more comments on fixes in the instruction-decoding MPX functions.
* Now user_64bit_mode(regs) is used instead of test_thread_flag(TIF_IA32)
  to determine if the task is 32-bit or 64-bit.
* Found and fixed a bug in insn-decoder in which X86_MODRM_RM was
  incorrectly used to obtain the mod part of the ModRM byte.
* Added more explanatory code in emulation and instruction decoding code.
  This includes a comment regarding that copy_from_user could fail if there
  exists a memory protection key in place.
* Tested code with CONFIG_X86_DECODER_SELFTEST=y and everything passes now.
* Prefixed get_reg_offset_rm with insn_ as this function is exposed
  via a header file. For clarity, this function was added in a separate
  patch.

Changes since V1:
* Virtual-8086 mode tasks are not treated in a special manner. All code
  for this purpose was removed.
* Instead of attempting to disable UMIP during a context switch or when
  entering virtual-8086 mode, UMIP remains enabled all the time. General
  protection faults that occur are fixed-up by returning dummy values as
  detailed above.
* Removed umip= kernel parameter in favor of using clearcpuid=514 to
  disable UMIP.
* Removed selftests designed to detect the absence of SIGSEGV signals when
  running in virtual-8086 mode.
* Reused code from MPX to decode instructions operands. For this purpose
  code was put in a common location.
* Fixed two bugs in MPX code that decodes operands.

Ricardo Neri (21):
  x86/mpx: Use signed variables to compute effective addresses
  x86/mpx: Do not use SIB index if index points to R/ESP
  x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0
  x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel
  x86/insn-eval: Add utility functions to get register offsets
  x86/insn-eval: Add utility functions to get segment selector
  x86/insn-eval: Add utility function to get segment descriptor
  x86/insn-eval: Add utility function to get segment descriptor base
    address
  x86/insn-eval: Add functions to get default operand and address sizes
  x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero
  insn/eval: Incorporate segment base in address computation
  x86/insn: Support both signed 32-bit and 64-bit effective addresses
  x86/insn-eval: Add support to resolve 16-bit addressing encodings
  x86/insn-eval: Add wrapper function for 16-bit and 32-bit address
    encodings
  x86/mm: Relocate page fault error codes to traps.h
  x86/cpufeature: Add User-Mode Instruction Prevention definitions
  x86: Add emulation code for UMIP instructions
  x86/umip: Force a page fault when unable to copy emulated result to
    user
  x86/traps: Fixup general protection faults caused by UMIP
  x86: Enable User-Mode Instruction Prevention
  selftests/x86: Add tests for User-Mode Instruction Prevention

 arch/x86/Kconfig                              |  10 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/insn-eval.h              |  23 +
 arch/x86/include/asm/traps.h                  |  18 +
 arch/x86/include/asm/umip.h                   |  15 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  16 +-
 arch/x86/kernel/traps.c                       |   4 +
 arch/x86/kernel/umip.c                        | 298 +++++++++
 arch/x86/lib/Makefile                         |   2 +-
 arch/x86/lib/insn-eval.c                      | 832 ++++++++++++++++++++++++++
 arch/x86/mm/fault.c                           |  88 ++-
 arch/x86/mm/mpx.c                             | 120 +---
 tools/testing/selftests/x86/entry_from_vm86.c |  39 +-
 16 files changed, 1301 insertions(+), 176 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c
 create mode 100644 arch/x86/lib/insn-eval.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-11 21:56   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points to R/ESP Ricardo Neri
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

Even though memory addresses are unsigned. The operands used to compute the
effective address do have a sign. This is true for the r/m part of the
ModRM byte, the base and index parts of the SiB byte as well as the
displacement. Thus, signed variables shall be used when computing the
effective address from these operands. Once the signed effective address
has been computed, it is casted to an unsigned long to determine the
linear address.

Variables are renamed to better reflect the type of address being
computed.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nathan Howard <liverlint@gmail.com>
Cc: Adan Hawthorn <adanhawthorn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/mm/mpx.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 5126dfd..ff112e3 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -138,7 +138,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
  */
 static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-	unsigned long addr, base, indx;
+	unsigned long linear_addr;
+	long eff_addr, base, indx;
 	int addr_offset, base_offset, indx_offset;
 	insn_byte_t sib;
 
@@ -150,7 +151,7 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 		if (addr_offset < 0)
 			goto out_err;
-		addr = regs_get_register(regs, addr_offset);
+		eff_addr = regs_get_register(regs, addr_offset);
 	} else {
 		if (insn->sib.nbytes) {
 			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
@@ -163,16 +164,18 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 
 			base = regs_get_register(regs, base_offset);
 			indx = regs_get_register(regs, indx_offset);
-			addr = base + indx * (1 << X86_SIB_SCALE(sib));
+			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 			if (addr_offset < 0)
 				goto out_err;
-			addr = regs_get_register(regs, addr_offset);
+			eff_addr = regs_get_register(regs, addr_offset);
 		}
-		addr += insn->displacement.value;
+		eff_addr += insn->displacement.value;
 	}
-	return (void __user *)addr;
+	linear_addr = (unsigned long)eff_addr;
+
+	return (void __user *)linear_addr;
 out_err:
 	return (void __user *)-1;
 }
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points to R/ESP
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-11 11:31   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0 Ricardo Neri
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when memory addressing is used
(i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
used in the computation of the memory address.

In these cases the address is simply the value present in the register
pointed by the base part of the SIB byte plus the displacement byte.

An example of such instruction could be

    insn -0x80(%rsp)

This is represented as:

     [opcode] 4c 23 80

      ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
      SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
      Displacement -0x80

The correct address is (base) + displacement; no index is used.

We can achieve the desired effect of not using the index by making
get_reg_offset return -EDOM in this particular case. This value indicates
callers that they should not use the index to calculate the address.
EINVAL continues to indicate that an error when decoding the SIB byte.

Care is taken to allow R12 to be used as index, which is a valid scenario.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nathan Howard <liverlint@gmail.com>
Cc: Adan Hawthorn <adanhawthorn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/mm/mpx.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index ff112e3..d9e92d6 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 		regno = X86_SIB_INDEX(insn->sib.value);
 		if (X86_REX_X(insn->rex_prefix.value))
 			regno += 8;
+		/*
+		 * If mod !=3, register R/ESP (regno=4) is not used as index in
+		 * the address computation. Check is done after looking at REX.X
+		 * This is because R12 (regno=12) can be used as an index.
+		 */
+		if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
+			return -EDOM;
 		break;
 
 	case REG_TYPE_BASE:
@@ -159,11 +166,19 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 				goto out_err;
 
 			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			if (indx_offset < 0)
+			/*
+			 * A negative offset generally means a error, except
+			 * -EDOM, which means that the contents of the register
+			 * should not be used as index.
+			 */
+			if (unlikely(indx_offset == -EDOM))
+				indx = 0;
+			else if (unlikely(indx_offset < 0))
 				goto out_err;
+			else
+				indx = regs_get_register(regs, indx_offset);
 
 			base = regs_get_register(regs, base_offset);
-			indx = regs_get_register(regs, indx_offset);
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points to R/ESP Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-11 22:08   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 04/21] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel Ricardo Neri
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Nathan Howard, Adan Hawthorn, Joe Perches

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when a SIB byte is used and the
base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part
of the ModRM byte is zero, the value of such register will not be used
as part of the address computation. To signal this, a -EDOM error is
returned to indicate callers that they should ignore the value.

Also, for this particular case, a displacement of 32-bits should follow
the SIB byte if the mod part of ModRM is equal to zero. The instruction
decoder ensures that this is the case.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nathan Howard <liverlint@gmail.com>
Cc: Adan Hawthorn <adanhawthorn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/mm/mpx.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index d9e92d6..ef7eb67 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 
 	case REG_TYPE_BASE:
 		regno = X86_SIB_BASE(insn->sib.value);
+		/*
+		 * If mod is 0 and register R/EBP (regno=5) is indicated in the
+		 * base part of the SIB byte, the value of such register should
+		 * not be used in the address computation. Also, a 32-bit
+		 * displacement is expected in this case; the instruction
+		 * decoder takes care of it. This is true for both R13 and
+		 * R/EBP as REX.B will not be decoded.
+		 */
+		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+			return -EDOM;
+
 		if (X86_REX_B(insn->rex_prefix.value))
 			regno += 8;
 		break;
@@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 		eff_addr = regs_get_register(regs, addr_offset);
 	} else {
 		if (insn->sib.nbytes) {
+			/*
+			 * Negative values in the base and index offset means
+			 * an error when decoding the SIB byte. Except -EDOM,
+			 * which means that the registers should not be used
+			 * in the address computation.
+			 */
 			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-			if (base_offset < 0)
+			if (unlikely(base_offset == -EDOM))
+				base = 0;
+			else if (unlikely(base_offset < 0))
 				goto out_err;
+			else
+				base = regs_get_register(regs, base_offset);
 
 			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			/*
-			 * A negative offset generally means a error, except
-			 * -EDOM, which means that the contents of the register
-			 * should not be used as index.
-			 */
 			if (unlikely(indx_offset == -EDOM))
 				indx = 0;
 			else if (unlikely(indx_offset < 0))
@@ -178,7 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			else
 				indx = regs_get_register(regs, indx_offset);
 
-			base = regs_get_register(regs, base_offset);
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 04/21] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (2 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0 Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-12 10:03   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets Ricardo Neri
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve effective and userspace linear addresses
based on the contents of the instruction operands as well as the contents
of pt_regs structure.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |  16 ++++
 arch/x86/lib/Makefile            |   2 +-
 arch/x86/lib/insn-eval.c         | 160 +++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/mpx.c                | 153 +------------------------------------
 4 files changed, 179 insertions(+), 152 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index 0000000..5cab1b1
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include <linux/compiler.h>
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <asm/ptrace.h>
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..675d7b0 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index 0000000..23cf010
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,160 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2017
+ */
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <asm/inat.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+
+enum reg_type {
+	REG_TYPE_RM = 0,
+	REG_TYPE_INDEX,
+	REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+			  enum reg_type type)
+{
+	int regno = 0;
+
+	static const int regoff[] = {
+		offsetof(struct pt_regs, ax),
+		offsetof(struct pt_regs, cx),
+		offsetof(struct pt_regs, dx),
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, sp),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+		offsetof(struct pt_regs, r8),
+		offsetof(struct pt_regs, r9),
+		offsetof(struct pt_regs, r10),
+		offsetof(struct pt_regs, r11),
+		offsetof(struct pt_regs, r12),
+		offsetof(struct pt_regs, r13),
+		offsetof(struct pt_regs, r14),
+		offsetof(struct pt_regs, r15),
+#endif
+	};
+	int nr_registers = ARRAY_SIZE(regoff);
+	/*
+	 * Don't possibly decode a 32-bit instructions as
+	 * reading a 64-bit-only register.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+		nr_registers -= 8;
+
+	switch (type) {
+	case REG_TYPE_RM:
+		regno = X86_MODRM_RM(insn->modrm.value);
+		if (X86_REX_B(insn->rex_prefix.value))
+			regno += 8;
+		break;
+
+	case REG_TYPE_INDEX:
+		regno = X86_SIB_INDEX(insn->sib.value);
+		if (X86_REX_X(insn->rex_prefix.value))
+			regno += 8;
+		/*
+		 * If mod !=3, register R/ESP (regno=4) is not used as index in
+		 * the address computation. Check is done after looking at REX.X
+		 * This is because R12 (regno=12) can be used as an index.
+		 */
+		if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
+			return -EDOM;
+		break;
+
+	case REG_TYPE_BASE:
+		regno = X86_SIB_BASE(insn->sib.value);
+		/*
+		 * If mod is 0 and register R/EBP (regno=5) is indicated in the
+		 * base part of the SIB byte, the value of such register should
+		 * not be used in the address computation. Also, a 32-bit
+		 * displacement is expected in this case; the instruction
+		 * decoder takes care of it. This is true for both R13 and
+		 * R/EBP as REX.B will not be decoded.
+		 */
+		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+			return -EDOM;
+
+		if (X86_REX_B(insn->rex_prefix.value))
+			regno += 8;
+		break;
+
+	default:
+		pr_err("invalid register type");
+		BUG();
+		break;
+	}
+
+	if (regno >= nr_registers) {
+		WARN_ONCE(1, "decoded an instruction with an invalid register");
+		return -EINVAL;
+	}
+	return regoff[regno];
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+	unsigned long linear_addr;
+	long eff_addr, base, indx;
+	int addr_offset, base_offset, indx_offset;
+	insn_byte_t sib;
+
+	insn_get_modrm(insn);
+	insn_get_sib(insn);
+	sib = insn->sib.value;
+
+	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+		if (addr_offset < 0)
+			goto out_err;
+		eff_addr = regs_get_register(regs, addr_offset);
+	} else {
+		if (insn->sib.nbytes) {
+			/*
+			 * Negative values in the base and index offset means
+			 * an error when decoding the SIB byte. Except -EDOM,
+			 * which means that the registers should not be used
+			 * in the address computation.
+			 */
+			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
+			if (unlikely(base_offset == -EDOM))
+				base = 0;
+			else if (unlikely(base_offset < 0))
+				goto out_err;
+			else
+				base = regs_get_register(regs, base_offset);
+
+			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
+			if (unlikely(indx_offset == -EDOM))
+				indx = 0;
+			else if (unlikely(indx_offset < 0))
+				goto out_err;
+			else
+				indx = regs_get_register(regs, indx_offset);
+
+			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+		} else {
+			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+			if (addr_offset < 0)
+				goto out_err;
+			eff_addr = regs_get_register(regs, addr_offset);
+		}
+		eff_addr += insn->displacement.value;
+	}
+	linear_addr = (unsigned long)eff_addr;
+
+	return (void __user *)linear_addr;
+out_err:
+	return (void __user *)-1;
+}
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index ef7eb67..4c3efd6 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -12,6 +12,7 @@
 #include <linux/sched/sysctl.h>
 
 #include <asm/insn.h>
+#include <asm/insn-eval.h>
 #include <asm/mman.h>
 #include <asm/mmu_context.h>
 #include <asm/mpx.h>
@@ -60,156 +61,6 @@ static unsigned long mpx_mmap(unsigned long len)
 	return addr;
 }
 
-enum reg_type {
-	REG_TYPE_RM = 0,
-	REG_TYPE_INDEX,
-	REG_TYPE_BASE,
-};
-
-static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
-			  enum reg_type type)
-{
-	int regno = 0;
-
-	static const int regoff[] = {
-		offsetof(struct pt_regs, ax),
-		offsetof(struct pt_regs, cx),
-		offsetof(struct pt_regs, dx),
-		offsetof(struct pt_regs, bx),
-		offsetof(struct pt_regs, sp),
-		offsetof(struct pt_regs, bp),
-		offsetof(struct pt_regs, si),
-		offsetof(struct pt_regs, di),
-#ifdef CONFIG_X86_64
-		offsetof(struct pt_regs, r8),
-		offsetof(struct pt_regs, r9),
-		offsetof(struct pt_regs, r10),
-		offsetof(struct pt_regs, r11),
-		offsetof(struct pt_regs, r12),
-		offsetof(struct pt_regs, r13),
-		offsetof(struct pt_regs, r14),
-		offsetof(struct pt_regs, r15),
-#endif
-	};
-	int nr_registers = ARRAY_SIZE(regoff);
-	/*
-	 * Don't possibly decode a 32-bit instructions as
-	 * reading a 64-bit-only register.
-	 */
-	if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
-		nr_registers -= 8;
-
-	switch (type) {
-	case REG_TYPE_RM:
-		regno = X86_MODRM_RM(insn->modrm.value);
-		if (X86_REX_B(insn->rex_prefix.value))
-			regno += 8;
-		break;
-
-	case REG_TYPE_INDEX:
-		regno = X86_SIB_INDEX(insn->sib.value);
-		if (X86_REX_X(insn->rex_prefix.value))
-			regno += 8;
-		/*
-		 * If mod !=3, register R/ESP (regno=4) is not used as index in
-		 * the address computation. Check is done after looking at REX.X
-		 * This is because R12 (regno=12) can be used as an index.
-		 */
-		if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
-			return -EDOM;
-		break;
-
-	case REG_TYPE_BASE:
-		regno = X86_SIB_BASE(insn->sib.value);
-		/*
-		 * If mod is 0 and register R/EBP (regno=5) is indicated in the
-		 * base part of the SIB byte, the value of such register should
-		 * not be used in the address computation. Also, a 32-bit
-		 * displacement is expected in this case; the instruction
-		 * decoder takes care of it. This is true for both R13 and
-		 * R/EBP as REX.B will not be decoded.
-		 */
-		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
-			return -EDOM;
-
-		if (X86_REX_B(insn->rex_prefix.value))
-			regno += 8;
-		break;
-
-	default:
-		pr_err("invalid register type");
-		BUG();
-		break;
-	}
-
-	if (regno >= nr_registers) {
-		WARN_ONCE(1, "decoded an instruction with an invalid register");
-		return -EINVAL;
-	}
-	return regoff[regno];
-}
-
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
- */
-static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
-{
-	unsigned long linear_addr;
-	long eff_addr, base, indx;
-	int addr_offset, base_offset, indx_offset;
-	insn_byte_t sib;
-
-	insn_get_modrm(insn);
-	insn_get_sib(insn);
-	sib = insn->sib.value;
-
-	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
-		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-		if (addr_offset < 0)
-			goto out_err;
-		eff_addr = regs_get_register(regs, addr_offset);
-	} else {
-		if (insn->sib.nbytes) {
-			/*
-			 * Negative values in the base and index offset means
-			 * an error when decoding the SIB byte. Except -EDOM,
-			 * which means that the registers should not be used
-			 * in the address computation.
-			 */
-			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-			if (unlikely(base_offset == -EDOM))
-				base = 0;
-			else if (unlikely(base_offset < 0))
-				goto out_err;
-			else
-				base = regs_get_register(regs, base_offset);
-
-			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			if (unlikely(indx_offset == -EDOM))
-				indx = 0;
-			else if (unlikely(indx_offset < 0))
-				goto out_err;
-			else
-				indx = regs_get_register(regs, indx_offset);
-
-			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
-		} else {
-			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-			if (addr_offset < 0)
-				goto out_err;
-			eff_addr = regs_get_register(regs, addr_offset);
-		}
-		eff_addr += insn->displacement.value;
-	}
-	linear_addr = (unsigned long)eff_addr;
-
-	return (void __user *)linear_addr;
-out_err:
-	return (void __user *)-1;
-}
-
 static int mpx_insn_decode(struct insn *insn,
 			   struct pt_regs *regs)
 {
@@ -322,7 +173,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
 	info->si_signo = SIGSEGV;
 	info->si_errno = 0;
 	info->si_code = SEGV_BNDERR;
-	info->si_addr = mpx_get_addr_ref(&insn, regs);
+	info->si_addr = insn_get_addr_ref(&insn, regs);
 	/*
 	 * We were not able to extract an address from the instruction,
 	 * probably because there was something invalid in it.
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (3 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 04/21] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-12 16:28   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

The function insn_get_reg_offset takes as argument an enumeration that
indicates the type of offset that is returned: the R/M part of the ModRM
byte, the index of the SIB byte or the base of the SIB byte. Callers of
this function would need the definition of such enumeration. This is not
needed. Instead, helper functions can be defined for this purpose can be
added. These functions are useful in cases when, for instance, the caller
needs to decide whether the operand is a register or a memory location by
looking at the mod part of the ModRM byte.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |  3 +++
 arch/x86/lib/insn-eval.c         | 51 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1..754211b 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,8 @@
 #include <asm/ptrace.h>
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 23cf010..78df1c9 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 	return regoff[regno];
 }
 
+/**
+ * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
+ * @insn:	Instruction structure containing the ModRM byte
+ * @regs:	Set of registers indicated by the ModRM byte
+ *
+ * Obtain the register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register.
+ *
+ * Return: Register indicated by r/m, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
+{
+	return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
+/**
+ * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
+ * @insn:	Instruction structure containing the SiB byte
+ * @regs:	Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the base part of the SiB byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
+{
+	return get_reg_offset(insn, regs, REG_TYPE_BASE);
+}
+
+/**
+ * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
+ * @insn:	Instruction structure containing the SiB byte
+ * @regs:	Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the index part of the SiB byte. The
+ * register is obtained as an offset from the index of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
+{
+	return get_reg_offset(insn, regs, REG_TYPE_INDEX);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (4 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-18  9:42   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor Ricardo Neri
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero .Thus, the segment base address is needed to
calculate correctly the linear address.

The segment selector to be used when computing a linear address is
determined by either any of segment select override prefixes in the
instruction or inferred from the registers involved in the computation of
the effective address; in that order. Also, there are cases when the
overrides shall be ignored.

For clarity, this process can be split into two steps: resolving the
relevant segment and, once known, read the applicable segment selector.
The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into the pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
registers. Lastly, segmentation is possible in 64-bit mode via FS and GS.
In these two cases, base addresses are obtained from the relevant MSRs.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 78df1c9..8d45df8 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -8,6 +8,7 @@
 #include <asm/inat.h>
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
+#include <asm/vm86.h>
 
 enum reg_type {
 	REG_TYPE_RM = 0,
@@ -15,6 +16,200 @@ enum reg_type {
 	REG_TYPE_BASE,
 };
 
+enum segment {
+	SEG_CS = 0x23,
+	SEG_SS = 0x36,
+	SEG_DS = 0x3e,
+	SEG_ES = 0x26,
+	SEG_FS = 0x64,
+	SEG_GS = 0x65
+};
+
+/**
+ * resolve_seg_selector() - obtain segment selector
+ * @regs:	Set of registers containing the segment selector
+ * @insn:	Instruction structure with selector override prefixes
+ * @regoff:	Operand offset, in pt_regs, of which the selector is needed
+ * @default:	Resolve default segment selector (i.e., ignore overrides)
+ *
+ * The segment selector to which an effective address refers depends on
+ * a) segment selector overrides instruction prefixes or b) the operand
+ * register indicated in the ModRM or SiB byte.
+ *
+ * For case a), the function inspects any prefixes in the insn instruction;
+ * insn can be null to indicate that selector override prefixes shall be
+ * ignored. This is useful when the use of prefixes is forbidden (e.g.,
+ * obtaining the code selector). For case b), the operand register shall be
+ * represented as the offset from the base address of pt_regs. Also, regoff
+ * can be -EINVAL for cases in which registers are not used as operands (e.g.,
+ * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
+ *
+ * This function returns the segment selector to utilize as per the conditions
+ * described above. Please note that this functin does not return the value
+ * of the segment selector. The value of the segment selector needs to be
+ * obtained using get_segment_selector and passing the segment selector type
+ * resolved by this function.
+ *
+ * Return: Segment selector to use, among CS, SS, DS, ES, FS or GS.
+ */
+static int resolve_seg_selector(struct insn *insn, int regoff, bool get_default)
+{
+	int i;
+
+	if (!insn)
+		return -EINVAL;
+
+	if (get_default)
+		goto default_seg;
+	/*
+	 * Check first if we have selector overrides. Having more than
+	 * one selector override leads to undefined behavior. We
+	 * only use the first one and return
+	 */
+	for (i = 0; i < insn->prefixes.nbytes; i++) {
+		switch (insn->prefixes.bytes[i]) {
+		case SEG_CS:
+			return SEG_CS;
+		case SEG_SS:
+			return SEG_SS;
+		case SEG_DS:
+			return SEG_DS;
+		case SEG_ES:
+			return SEG_ES;
+		case SEG_FS:
+			return SEG_FS;
+		case SEG_GS:
+			return SEG_GS;
+		default:
+			return -EINVAL;
+		}
+	}
+
+default_seg:
+	/*
+	 * If no overrides, use default selectors as described in the
+	 * Intel documentation: SS for ESP or EBP. DS for all data references,
+	 * except when relative to stack or string destination.
+	 * Also, AX, CX and DX are not valid register operands in 16-bit
+	 * address encodings.
+	 * Callers must interpret the result correctly according to the type
+	 * of instructions (e.g., use ES for string instructions).
+	 * Also, some values of modrm and sib might seem to indicate the use
+	 * of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but actually
+	 * they refer to cases in which only a displacement used. These cases
+	 * should be indentified by the caller and not with this function.
+	 */
+	switch (regoff) {
+	case offsetof(struct pt_regs, ax):
+		/* fall through */
+	case offsetof(struct pt_regs, cx):
+		/* fall through */
+	case offsetof(struct pt_regs, dx):
+		if (insn && insn->addr_bytes == 2)
+			return -EINVAL;
+	case -EDOM: /* no register involved in address computation */
+	case offsetof(struct pt_regs, bx):
+		/* fall through */
+	case offsetof(struct pt_regs, di):
+		/* fall through */
+	case offsetof(struct pt_regs, si):
+		return SEG_DS;
+	case offsetof(struct pt_regs, bp):
+		/* fall through */
+	case offsetof(struct pt_regs, sp):
+		return SEG_SS;
+	case offsetof(struct pt_regs, ip):
+		return SEG_CS;
+	default:
+		return -EINVAL;
+	}
+}
+
+/**
+ * get_segment_selector() - obtain segment selector
+ * @regs:	Set of registers containing the segment selector
+ * @seg_type:	Type of segment selector to obtain
+ * @regoff:	Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the segment selector for any of CS, SS, DS, ES, FS, GS. In
+ * CONFIG_X86_32, the segment is obtained from either pt_regs or
+ * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
+ * from pt_regs. DS, ES, FS and GS are obtained by reading the ds and es, fs
+ * and gs, respectively.
+ *
+ * Return: Value of the segment selector
+ */
+static unsigned short get_segment_selector(struct pt_regs *regs,
+					   enum segment seg_type)
+{
+#ifdef CONFIG_X86_64
+	unsigned short seg_sel;
+
+	switch (seg_type) {
+	case SEG_CS:
+		return (unsigned short)(regs->cs & 0xffff);
+	case SEG_SS:
+		return (unsigned short)(regs->ss & 0xffff);
+	case SEG_DS:
+		savesegment(ds, seg_sel);
+		return seg_sel;
+	case SEG_ES:
+		savesegment(es, seg_sel);
+		return seg_sel;
+	case SEG_FS:
+		savesegment(fs, seg_sel);
+		return seg_sel;
+	case SEG_GS:
+		savesegment(gs, seg_sel);
+		return seg_sel;
+	default:
+		return -1;
+	}
+#else /* CONFIG_X86_32 */
+	struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs;
+
+	if (v8086_mode(regs)) {
+		switch (seg_type) {
+		case SEG_CS:
+			return (unsigned short)(regs->cs & 0xffff);
+		case SEG_SS:
+			return (unsigned short)(regs->ss & 0xffff);
+		case SEG_DS:
+			return vm86regs->ds;
+		case SEG_ES:
+			return vm86regs->es;
+		case SEG_FS:
+			return vm86regs->fs;
+		case SEG_GS:
+			return vm86regs->gs;
+		default:
+			return -1;
+		}
+	}
+
+	switch (seg_type) {
+	case SEG_CS:
+		return (unsigned short)(regs->cs & 0xffff);
+	case SEG_SS:
+		return (unsigned short)(regs->ss & 0xffff);
+	case SEG_DS:
+		return (unsigned short)(regs->ds & 0xffff);
+	case SEG_ES:
+		return (unsigned short)(regs->es & 0xffff);
+	case SEG_FS:
+		return (unsigned short)(regs->fs & 0xffff);
+	case SEG_GS:
+		/*
+		 * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
+		 * The macro below takes care of both cases.
+		 */
+		return get_user_gs(regs);
+	default:
+		return -1;
+	}
+#endif /* CONFIG_X86_64 */
+}
+
 static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 			  enum reg_type type)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (5 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-19 10:26   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address Ricardo Neri
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

The segment descriptor contains information that is relevant to how linear
address need to be computed. It contains the default size of addresses as
well as the base address of the segment. Thus, given a segment selector,
we ought look at segment descriptor to correctly calculate the linear
address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is the initial implementation for subsequent functions that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8d45df8..8608adf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,9 +5,13 @@
  */
 #include <linux/kernel.h>
 #include <linux/string.h>
+#include <asm/desc_defs.h>
+#include <asm/desc.h>
 #include <asm/inat.h>
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
+#include <asm/ldt.h>
+#include <linux/mmu_context.h>
 #include <asm/vm86.h>
 
 enum reg_type {
@@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @seg:	Segment selector
+ * @desc:	Pointer to the selected segment descriptor
+ *
+ * Given a segment selector, obtain a memory pointer to the segment
+ * descriptor. Both global and local descriptor tables are supported.
+ * desc will contain the address of the descriptor.
+ *
+ * Return: 0 if success, -EINVAL if failure
+ */
+static int get_desc(unsigned short seg, struct desc_struct **desc)
+{
+	struct desc_ptr gdt_desc = {0, 0};
+	unsigned long desc_base;
+
+	if (!desc)
+		return -EINVAL;
+
+	desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+		seg >>= 3;
+
+		mutex_lock(&current->active_mm->context.lock);
+		if (unlikely(!current->active_mm->context.ldt ||
+			     seg >= current->active_mm->context.ldt->size)) {
+			*desc = NULL;
+			mutex_unlock(&current->active_mm->context.lock);
+			return -EINVAL;
+		}
+
+		*desc = &current->active_mm->context.ldt->entries[seg];
+		mutex_unlock(&current->active_mm->context.lock);
+		return 0;
+	}
+#endif
+	native_store_gdt(&gdt_desc);
+
+	/*
+	 * Bits [15:3] of the segment selector contain the index. Such
+	 * index needs to be multiplied by 8. However, as the index
+	 * least significant bit is already in bit 3, we don't have
+	 * to perform the multiplication.
+	 */
+	desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+	if (desc_base > gdt_desc.size) {
+		*desc = NULL;
+		return -EINVAL;
+	}
+
+	*desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+	return 0;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:	Instruction structure containing the ModRM byte
  * @regs:	Set of registers indicated by the ModRM byte
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (6 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-20  8:25   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 09/21] x86/insn-eval: Add functions to get default operand and address sizes Ricardo Neri
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

With segmentation, the base address of the segment descriptor is needed
to compute a linear address. The segment descriptor used in the address
computation depends on either any segment override prefixes in the in the
instruction or the default segment determined by the registers involved
in the address computation. Thus, both the instruction as well as the
register (specified as the offset from the base of pt_regs) are given as
inputs, along with a boolean variable to select between override and
default.

The segment selector is determined by get_seg_selector with the inputs
described above. Once the selector is known the base address is
determined. In protected mode, the selector is used to obtain the segment
descriptor and then its base address. If in 64-bit user mode, the segment =
base address is zero except when FS or GS are used. In virtual-8086 mode,
the base address is computed as the value of the segment selector shifted 4
positions to the left.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |  2 ++
 arch/x86/lib/insn-eval.c         | 66 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 754211b..b201742 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+				int regoff, bool use_default_seg);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8608adf..383ca83 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct **desc)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address contained in descriptor
+ * @regs:	Set of registers containing the segment selector
+ * @insn:	Instruction structure with selector override prefixes
+ * @regoff:	Operand offset, in pt_regs, of which the selector is needed
+ * @use_default_seg: Use the default segment instead of prefix overrides
+ *
+ * Obtain the base address of the segment descriptor as indicated by either
+ * any segment override prefixes contained in insn or the default segment
+ * applicable to the register indicated by regoff. regoff is specified as the
+ * offset in bytes from the base of pt_regs.
+ *
+ * Return: In protected mode, base address of the segment. It may be zero in
+ * certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086
+ * mode, the segment selector shifed 4 positions to the right. -1L in case of
+ * error.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+				int regoff, bool use_default_seg)
+{
+	struct desc_struct *desc;
+	unsigned short seg;
+	enum segment seg_type;
+	int ret;
+
+	seg_type = resolve_seg_selector(insn, regoff, use_default_seg);
+
+	seg = get_segment_selector(regs, seg_type);
+	if (seg < 0)
+		return -1L;
+
+	if (v8086_mode(regs))
+		/*
+		 * Base is simply the segment selector shifted 4
+		 * positions to the right.
+		 */
+		return (unsigned long)(seg << 4);
+
+#ifdef CONFIG_X86_64
+	if (user_64bit_mode(regs)) {
+		/*
+		 * Only FS or GS will have a base address, the rest of
+		 * the segments' bases are forced to 0.
+		 */
+		unsigned long base;
+
+		if (seg_type == SEG_FS)
+			rdmsrl(MSR_FS_BASE, base);
+		else if (seg_type == SEG_GS)
+			/*
+			 * swapgs was called at the kernel entry point. Thus,
+			 * MSR_KERNEL_GS_BASE will have the user-space GS base.
+			 */
+			rdmsrl(MSR_KERNEL_GS_BASE, base);
+		else
+			base = 0;
+		return base;
+	}
+#endif
+	ret = get_desc(seg, &desc);
+	if (ret)
+		return -1L;
+
+	return get_desc_base(desc);
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:	Instruction structure containing the ModRM byte
  * @regs:	Set of registers indicated by the ModRM byte
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 09/21] x86/insn-eval: Add functions to get default operand and address sizes
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (7 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-20 13:06   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero Ricardo Neri
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

These functions read the default values of the address and operand sizes
as specified in the segment descriptor. This information is determined
from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
32-bit legacy modes. For virtual-8086 mode, the default address and
operand sizes are always 2 bytes.

The D bit is only meaningful for code segments. Thus, these functions
always use the code segment selector contained in regs.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/insn-eval.h |  2 +
 arch/x86/lib/insn-eval.c         | 80 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index b201742..a0d81fc 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs);
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
 				int regoff, bool use_default_seg);
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 383ca83..cda6c71 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
 }
 
 /**
+ * insn_get_seg_default_address_bytes - Obtain default address size of segment
+ * @regs:	Set of registers containing the segment selector
+ *
+ * Obtain the default address size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * address is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default address size of segment
+ */
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs)
+{
+	struct desc_struct *desc;
+	unsigned short seg;
+	int ret;
+
+	if (v8086_mode(regs))
+		return 2;
+
+	seg = (unsigned short)regs->cs;
+
+	ret = get_desc(seg, &desc);
+	if (ret)
+		return 0;
+
+	switch ((desc->l << 1) | desc->d) {
+	case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
+		return 2;
+	case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
+		return 4;
+	case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
+		return 8;
+	case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+		/* fall through */
+	default:
+		return 0;
+	}
+}
+
+/**
+ * insn_get_seg_default_operand_bytes - Obtain default operand size of segment
+ * @regs:	Set of registers containing the segment selector
+ *
+ * Obtain the default operand size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * operand size is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default operand size of segment
+ */
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
+{
+	struct desc_struct *desc;
+	unsigned short seg;
+	int ret;
+
+	if (v8086_mode(regs))
+		return 2;
+
+	seg = (unsigned short)regs->cs;
+
+	ret = get_desc(seg, &desc);
+	if (ret)
+		return 0;
+
+	switch ((desc->l << 1) | desc->d) {
+	case 0: /* Legacy mode. 16-bit or 8-bit operands CS.L=0, CS.D=0 */
+		return 2;
+	case 1: /* Legacy mode. 32- or 8 bit operands CS.L=0, CS.D=1 */
+		/* fall through */
+	case 2: /* IA-32e 64-bit mode. 32- or 8-bit opnds. CS.L=1, CS.D=0 */
+		return 4;
+	case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+		/* fall through */
+	default:
+		return 0;
+	}
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:	Instruction structure containing the ModRM byte
  * @regs:	Set of registers indicated by the ModRM byte
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (8 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 09/21] x86/insn-eval: Add functions to get default operand and address sizes Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-21 10:52   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 11/21] insn/eval: Incorporate segment base in address computation Ricardo Neri
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when the mod part of the ModRM
byte is zero and R/EBP is specified in the R/M part of such bit, the value
of the aforementioned register should not be used in the address
computation. Instead, a 32-bit displacement is expected. The instruction
decoder takes care of setting the displacement to the expected value.
Returning -EDOM signals callers that they should ignore the value of such
register when computing the address encoded in the instruction operands.

Also, callers should exercise care to correctly interpret this particular
case. In IA-32e 64-bit mode, the address is given by the displacement plus
the value of the RIP. In IA-32e compatibility mode, the value of EIP is
ignored. This correction is done for our insn_get_addr_ref.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index cda6c71..ea10b03 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 	switch (type) {
 	case REG_TYPE_RM:
 		regno = X86_MODRM_RM(insn->modrm.value);
+		/* if mod=0, register R/EBP is not used in the address
+		 * computation. Instead, a 32-bit displacement is expected;
+		 * the instruction decoder takes care of reading such
+		 * displacement. This is true for both R/EBP and R13, as the
+		 * REX.B bit is not decoded.
+		 */
+		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+			return -EDOM;
 		if (X86_REX_B(insn->rex_prefix.value))
 			regno += 8;
 		break;
@@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-			if (addr_offset < 0)
+			/* -EDOM means that we must ignore the address_offset.
+			 * The only case in which we see this value is when
+			 * R/M points to R/EBP. In such a case, in 64-bit mode
+			 * the effective address is relative to tho RIP.
+			 */
+			if (addr_offset == -EDOM) {
+				eff_addr = 0;
+#ifdef CONFIG_X86_64
+				if (user_64bit_mode(regs))
+					eff_addr = (long)regs->ip;
+#endif
+			} else if (addr_offset < 0) {
 				goto out_err;
-			eff_addr = regs_get_register(regs, addr_offset);
+			} else {
+				eff_addr = regs_get_register(regs, addr_offset);
+			}
 		}
 		eff_addr += insn->displacement.value;
 	}
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 11/21] insn/eval: Incorporate segment base in address computation
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (9 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-21 14:55   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

insn_get_addr_ref returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to compute the linear address, we must add
to the effective address the segment base address as set in the segment
descriptor. Furthermore, the segment descriptor to use depends on the
register that is used as the base of the effective address. The effective
base address varies depending on whether the operand is a register or a
memory address and on whether a SiB byte is used.

In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
segment is used or if segmentation is not used. However, the base address
is not necessarily zero if a user programs defines its own segments. This
is possible by using a local descriptor table.

Since the effective address is a signed quantity, the unsigned segment
base address saved in a separate variable and added to the final effective
address.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index ea10b03..edb360f 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -566,7 +566,7 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
  */
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-	unsigned long linear_addr;
+	unsigned long linear_addr, seg_base_addr;
 	long eff_addr, base, indx;
 	int addr_offset, base_offset, indx_offset;
 	insn_byte_t sib;
@@ -580,6 +580,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 		if (addr_offset < 0)
 			goto out_err;
 		eff_addr = regs_get_register(regs, addr_offset);
+		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
+						  false);
 	} else {
 		if (insn->sib.nbytes) {
 			/*
@@ -605,6 +607,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 				indx = regs_get_register(regs, indx_offset);
 
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+			seg_base_addr = insn_get_seg_base(regs, insn,
+							  base_offset, false);
 		} else {
 			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 			/* -EDOM means that we must ignore the address_offset.
@@ -623,10 +627,12 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			} else {
 				eff_addr = regs_get_register(regs, addr_offset);
 			}
+			seg_base_addr = insn_get_seg_base(regs, insn,
+							  addr_offset, false);
 		}
 		eff_addr += insn->displacement.value;
 	}
-	linear_addr = (unsigned long)eff_addr;
+	linear_addr = (unsigned long)eff_addr + seg_base_addr;
 
 	return (void __user *)linear_addr;
 out_err:
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (10 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 11/21] insn/eval: Incorporate segment base in address computation Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-04-25 13:51   ` Borislav Petkov
  2017-03-08  0:32 ` [v6 PATCH 13/21] x86/insn-eval: Add support to resolve 16-bit addressing encodings Ricardo Neri
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

The 32-bit and 64-bit address encodings are identical. This means that we
can use the same function in both cases. In order to reuse the function for
32-bit address encodings, we must sign-extend our 32-bit signed operands to
64-bit signed variables (only for 64-bit builds). To decide on whether sign
extension is needed, we rely on the address size as given by the
instruction structure.

Lastly, before computing the linear address, we must truncate our signed
64-bit signed effective address if the address size is 32-bit.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index edb360f..a9a1704 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
 	return get_reg_offset(insn, regs, REG_TYPE_INDEX);
 }
 
+static inline long __to_signed_long(unsigned long val, int long_bytes)
+{
+#ifdef CONFIG_X86_64
+	return long_bytes == 4 ? (long)((int)((val) & 0xffffffff)) : (long)val;
+#else
+	return (long)val;
+#endif
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
@@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
 	unsigned long linear_addr, seg_base_addr;
-	long eff_addr, base, indx;
-	int addr_offset, base_offset, indx_offset;
+	long eff_addr, base, indx, tmp;
+	int addr_offset, base_offset, indx_offset, addr_bytes;
 	insn_byte_t sib;
 
 	insn_get_modrm(insn);
 	insn_get_sib(insn);
 	sib = insn->sib.value;
+	addr_bytes = insn->addr_bytes;
 
 	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
 		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 		if (addr_offset < 0)
 			goto out_err;
-		eff_addr = regs_get_register(regs, addr_offset);
+		tmp = regs_get_register(regs, addr_offset);
+		eff_addr = __to_signed_long(tmp, addr_bytes);
 		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
 						  false);
 	} else {
@@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			 * in the address computation.
 			 */
 			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-			if (unlikely(base_offset == -EDOM))
+			if (unlikely(base_offset == -EDOM)) {
 				base = 0;
-			else if (unlikely(base_offset < 0))
+			} else if (unlikely(base_offset < 0)) {
 				goto out_err;
-			else
-				base = regs_get_register(regs, base_offset);
+			} else {
+				tmp = regs_get_register(regs, base_offset);
+				base = __to_signed_long(tmp, addr_bytes);
+			}
 
 			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-			if (unlikely(indx_offset == -EDOM))
+			if (unlikely(indx_offset == -EDOM)) {
 				indx = 0;
-			else if (unlikely(indx_offset < 0))
+			} else if (unlikely(indx_offset < 0)) {
 				goto out_err;
-			else
-				indx = regs_get_register(regs, indx_offset);
+			} else {
+				tmp = regs_get_register(regs, indx_offset);
+				indx = __to_signed_long(tmp, addr_bytes);
+			}
 
 			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
 			seg_base_addr = insn_get_seg_base(regs, insn,
@@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 			} else if (addr_offset < 0) {
 				goto out_err;
 			} else {
-				eff_addr = regs_get_register(regs, addr_offset);
+				tmp = regs_get_register(regs, addr_offset);
+				eff_addr = __to_signed_long(tmp, addr_bytes);
 			}
 			seg_base_addr = insn_get_seg_base(regs, insn,
 							  addr_offset, false);
 		}
 		eff_addr += insn->displacement.value;
 	}
+	/* truncate to 4 bytes for 32-bit effective addresses */
+	if (addr_bytes == 4)
+		eff_addr &= 0xffffffff;
+
 	linear_addr = (unsigned long)eff_addr + seg_base_addr;
 
 	return (void __user *)linear_addr;
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 13/21] x86/insn-eval: Add support to resolve 16-bit addressing encodings
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (11 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 14/21] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings Ricardo Neri
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Tasks running in virtual-8086 mode or in protected mode with code
segment descriptors that specify 16-bit default address sizes via the
D bit will use 16-bit addressing form encodings as described in the Intel
64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
2.1.5. 16-bit addressing encodings differ in several ways from the
32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte
points to different registers and, in some cases, addresses can be
indicated by the addition of the value of two registers. Also, there is
no support for SiB bytes. Thus, a separate function is needed to parse
this form of addressing.

A couple of functions are introduced. get_reg_offset_16 obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. insn_get_addr_ref_16 computes the linear
address indicated by the instructions using the value of the registers
given by ModRM as well as the base address of the segment.

Lastly, the original function insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64. A new insn_get_addr_ref function decides what
type of address decoding must be done base on the number of address bytes
given by the instruction. Documentation for insn_get_addr_ref_32_64 is
also improved.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a9a1704..cb1076d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -306,6 +306,73 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16 - Obtain offset of register indicated by instruction
+ * @insn:	Instruction structure containing ModRM and SiB bytes
+ * @regs:	Set of registers referred by the instruction
+ * @offs1:	Offset of the first operand register
+ * @offs2:	Offset of the second opeand register, if applicable.
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+			     int *offs1, int *offs2)
+{
+	/* 16-bit addressing can use one or two registers */
+	static const int regoff1[] = {
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, bx),
+	};
+
+	static const int regoff2[] = {
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+		-EDOM,
+		-EDOM,
+		-EDOM,
+		-EDOM,
+	};
+
+	if (!offs1 || !offs2)
+		return -EINVAL;
+
+	/* operand is a register, use the generic function */
+	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+		*offs1 = insn_get_reg_offset_modrm_rm(insn, regs);
+		*offs2 = -EDOM;
+		return 0;
+	}
+
+	*offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+	*offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+	/*
+	 * If no displacement is indicated in the mod part of the ModRM byte,
+	 * (mod part is 0) and the r/m part of the same byte is 6, no register
+	 * is used caculate the operand address. An r/m part of 6 means that
+	 * the second register offset is already invalid.
+	 */
+	if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+	    (X86_MODRM_RM(insn->modrm.value) == 6))
+		*offs1 = -EDOM;
+
+	return 0;
+}
+
+/**
  * get_desc() - Obtain address of segment descriptor
  * @seg:	Segment selector
  * @desc:	Pointer to the selected segment descriptor
@@ -559,6 +626,76 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
 	return get_reg_offset(insn, regs, REG_TYPE_INDEX);
 }
 
+/**
+ * insn_get_addr_ref_16 - Obtain the 16-bit address referred by instruction
+ * @insn:	Instruction structure containing ModRM byte and displacement
+ * @regs:	Set of registers referred by the instruction
+ *
+ * This function is to be used with 16-bit address encodings. Obtain the memory
+ * address referred by the instruction's ModRM bytes and displacement. Also, the
+ * segment used as base is determined by either any segment override prefixes in
+ * insn or the default segment of the registers involved in the address
+ * computation.
+ * the ModRM byte
+ *
+ * Return: linear address referenced by instruction and registers
+ */
+static void __user *insn_get_addr_ref_16(struct insn *insn,
+					 struct pt_regs *regs)
+{
+	unsigned long linear_addr, seg_base_addr;
+	short eff_addr, addr1 = 0, addr2 = 0;
+	int addr_offset1, addr_offset2;
+	int ret;
+
+	insn_get_modrm(insn);
+	insn_get_displacement(insn);
+
+	/*
+	 * If operand is a register, the layout is the same as in
+	 * 32-bit and 64-bit addressing.
+	 */
+	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+		addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM);
+		if (addr_offset1 < 0)
+			goto out_err;
+		eff_addr = regs_get_register(regs, addr_offset1);
+		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1,
+						  false);
+	} else {
+		ret = get_reg_offset_16(insn, regs, &addr_offset1,
+					&addr_offset2);
+		if (ret < 0)
+			goto out_err;
+		/*
+		 * Don't fail on invalid offset values. They might be invalid
+		 * because they cannot be used for this particular value of
+		 * the ModRM. Instead, use them in the computation only if
+		 * they contain a valid value.
+		 */
+		if (addr_offset1 != -EDOM)
+			addr1 = 0xffff & regs_get_register(regs, addr_offset1);
+		if (addr_offset2 != -EDOM)
+			addr2 = 0xffff & regs_get_register(regs, addr_offset2);
+		eff_addr = addr1 + addr2;
+		/*
+		 * The first register is in the operand implies the SS or DS
+		 * segment selectors, the second register in the operand can
+		 * only imply DS. Thus, use the first register to obtain
+		 * the segment selector.
+		 */
+		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1,
+						  false);
+
+		eff_addr += (insn->displacement.value & 0xffff);
+	}
+	linear_addr = (unsigned short)eff_addr + seg_base_addr;
+
+	return (void __user *)linear_addr;
+out_err:
+	return (void __user *)-1;
+}
+
 static inline long __to_signed_long(unsigned long val, int long_bytes)
 {
 #ifdef CONFIG_X86_64
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 14/21] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (12 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 13/21] x86/insn-eval: Add support to resolve 16-bit addressing encodings Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 15/21] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Adam Buchbinder, Colin Ian King, Lorenzo Stoakes, Qiaowei Ren,
	Arnaldo Carvalho de Melo, Adrian Hunter, Kees Cook,
	Thomas Garnier, Dmitry Vyukov

Convert the function insn_get_add_ref into a wrapper function that calls
the correct static address-decoding function depending on the size of the
address. In this way, callers do not need to worry about calling the
correct function and decreases the number of functions that need to be
exposed.

To this end, the original 32/64-bit insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64 to reflect the type of address encodings that it
handles.

Documentation is added to the new wrapper function and the documentation
for the 32/64-bit address decoding function is improved.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/lib/insn-eval.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index cb1076d..e633588 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -705,12 +705,21 @@ static inline long __to_signed_long(unsigned long val, int long_bytes)
 #endif
 }
 
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * insn_get_addr_ref_32_64 - Obtain a 32/64-bit address referred by instruction
+ * @insn:	Instruction struct with ModRM and SiB bytes and displacement
+ * @regs:	Set of registers referred by the instruction
+ *
+ * This function is to be used with 32-bit and 64-bit address encodings. Obtain
+ * the memory address referred by the instruction's ModRM bytes and
+ * displacement. Also, the segment used as base is determined by either any
+ * segment override prefixes in insn or the default segment of the registers
+ * involved in the linear address computation.
+ *
+ * Return: linear address referenced by instruction and registers
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+static void __user *insn_get_addr_ref_32_64(struct insn *insn,
+					    struct pt_regs *regs)
 {
 	unsigned long linear_addr, seg_base_addr;
 	long eff_addr, base, indx, tmp;
@@ -795,3 +804,29 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 out_err:
 	return (void __user *)-1;
 }
+
+/**
+ * insn_get_addr_ref - Obtain the linear address referred by instruction
+ * @insn:	Instruction structure containing ModRM byte and displacement
+ * @regs:	Set of registers referred by the instruction
+ *
+ * Obtain the memory address referred by the instruction's ModRM bytes and
+ * displacement. Also, the segment used as base is determined by either any
+ * segment override prefixes in insn or the default segment of the registers
+ * involved in the address computation.
+ *
+ * Return: linear address referenced by instruction and registers
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+	switch (insn->addr_bytes) {
+	case 2:
+		return insn_get_addr_ref_16(insn, regs);
+	case 4:
+		/* fall through */
+	case 8:
+		return insn_get_addr_ref_32_64(insn, regs);
+	default:
+		return (void __user *)-1;
+	}
+}
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 15/21] x86/mm: Relocate page fault error codes to traps.h
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (13 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 14/21] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08 16:08   ` Andy Lutomirski
  2017-03-08  0:32 ` [v6 PATCH 16/21] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Kirill A. Shutemov, Josh Poimboeuf

Up to this point, only fault.c used the definitions of the page fault error
codes. Thus, it made sense to keep them within such file. Other portions of
code might be interested in those definitions too. For instance, the User-
Mode Instruction Prevention emulation code will use such definitions to
emulate a page fault when it is unable to successfully copy the results
of the emulated instructions to user space.

While relocating the error code enumeration, the prefix X86_ is used to
make it consistent with the rest of the definitions in traps.h. Of course,
code using the enumeration had to be updated as well. No functional changes
were performed.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/traps.h | 18 +++++++++
 arch/x86/mm/fault.c          | 88 +++++++++++++++++---------------------------
 2 files changed, 52 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 01fd0a7..4a2e585 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -148,4 +148,22 @@ enum {
 	X86_TRAP_IRET = 32,	/* 32, IRET Exception */
 };
 
+/*
+ * Page fault error code bits:
+ *
+ *   bit 0 ==	 0: no page found	1: protection fault
+ *   bit 1 ==	 0: read access		1: write access
+ *   bit 2 ==	 0: kernel-mode access	1: user-mode access
+ *   bit 3 ==				1: use of reserved bit detected
+ *   bit 4 ==				1: fault was an instruction fetch
+ *   bit 5 ==				1: protection keys block access
+ */
+enum x86_pf_error_code {
+	X86_PF_PROT	=		1 << 0,
+	X86_PF_WRITE	=		1 << 1,
+	X86_PF_USER	=		1 << 2,
+	X86_PF_RSVD	=		1 << 3,
+	X86_PF_INSTR	=		1 << 4,
+	X86_PF_PK	=		1 << 5,
+};
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 428e3176..e859a9c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -29,26 +29,6 @@
 #include <asm/trace/exceptions.h>
 
 /*
- * Page fault error code bits:
- *
- *   bit 0 ==	 0: no page found	1: protection fault
- *   bit 1 ==	 0: read access		1: write access
- *   bit 2 ==	 0: kernel-mode access	1: user-mode access
- *   bit 3 ==				1: use of reserved bit detected
- *   bit 4 ==				1: fault was an instruction fetch
- *   bit 5 ==				1: protection keys block access
- */
-enum x86_pf_error_code {
-
-	PF_PROT		=		1 << 0,
-	PF_WRITE	=		1 << 1,
-	PF_USER		=		1 << 2,
-	PF_RSVD		=		1 << 3,
-	PF_INSTR	=		1 << 4,
-	PF_PK		=		1 << 5,
-};
-
-/*
  * Returns 0 if mmiotrace is disabled, or if the fault is not
  * handled by mmiotrace:
  */
@@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
 	 * If it was a exec (instruction fetch) fault on NX page, then
 	 * do not ignore the fault:
 	 */
-	if (error_code & PF_INSTR)
+	if (error_code & X86_PF_INSTR)
 		return 0;
 
 	instr = (void *)convert_ip_to_linear(current, regs);
@@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
  * siginfo so userspace can discover which protection key was set
  * on the PTE.
  *
- * If we get here, we know that the hardware signaled a PF_PK
+ * If we get here, we know that the hardware signaled a X86_PF_PK
  * fault and that there was a VMA once we got in the fault
  * handler.  It does *not* guarantee that the VMA we find here
  * was the one that we faulted on.
@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info,
 	/*
 	 * force_sig_info_fault() is called from a number of
 	 * contexts, some of which have a VMA and some of which
-	 * do not.  The PF_PK handing happens after we have a
+	 * do not.  The X86_PF_PK handing happens after we have a
 	 * valid VMA, so we should never reach this without a
 	 * valid VMA.
 	 */
@@ -655,7 +635,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 	if (!oops_may_print())
 		return;
 
-	if (error_code & PF_INSTR) {
+	if (error_code & X86_PF_INSTR) {
 		unsigned int level;
 		pgd_t *pgd;
 		pte_t *pte;
@@ -739,7 +719,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 		 */
 		if (current->thread.sig_on_uaccess_err && signal) {
 			tsk->thread.trap_nr = X86_TRAP_PF;
-			tsk->thread.error_code = error_code | PF_USER;
+			tsk->thread.error_code = error_code | X86_PF_USER;
 			tsk->thread.cr2 = address;
 
 			/* XXX: hwpoison faults will set the wrong code. */
@@ -859,7 +839,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	struct task_struct *tsk = current;
 
 	/* User mode accesses just cause a SIGSEGV */
-	if (error_code & PF_USER) {
+	if (error_code & X86_PF_USER) {
 		/*
 		 * It's possible to have interrupts off here:
 		 */
@@ -880,7 +860,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		 * Instruction fetch faults in the vsyscall page might need
 		 * emulation.
 		 */
-		if (unlikely((error_code & PF_INSTR) &&
+		if (unlikely((error_code & X86_PF_INSTR) &&
 			     ((address & ~0xfff) == VSYSCALL_ADDR))) {
 			if (emulate_vsyscall(regs, address))
 				return;
@@ -893,7 +873,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		 * are always protection faults.
 		 */
 		if (address >= TASK_SIZE_MAX)
-			error_code |= PF_PROT;
+			error_code |= X86_PF_PROT;
 
 		if (likely(show_unhandled_signals))
 			show_signal_msg(regs, error_code, address, tsk);
@@ -949,11 +929,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 
 	if (!boot_cpu_has(X86_FEATURE_OSPKE))
 		return false;
-	if (error_code & PF_PK)
+	if (error_code & X86_PF_PK)
 		return true;
 	/* this checks permission keys on the VMA: */
-	if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE),
-				(error_code & PF_INSTR), foreign))
+	if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
+				       (error_code & X86_PF_INSTR), foreign))
 		return true;
 	return false;
 }
@@ -981,7 +961,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 	int code = BUS_ADRERR;
 
 	/* Kernel mode? Handle exceptions or die: */
-	if (!(error_code & PF_USER)) {
+	if (!(error_code & X86_PF_USER)) {
 		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
 		return;
 	}
@@ -1010,14 +990,14 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	       unsigned long address, struct vm_area_struct *vma,
 	       unsigned int fault)
 {
-	if (fatal_signal_pending(current) && !(error_code & PF_USER)) {
+	if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
 		no_context(regs, error_code, address, 0, 0);
 		return;
 	}
 
 	if (fault & VM_FAULT_OOM) {
 		/* Kernel mode? Handle exceptions or die: */
-		if (!(error_code & PF_USER)) {
+		if (!(error_code & X86_PF_USER)) {
 			no_context(regs, error_code, address,
 				   SIGSEGV, SEGV_MAPERR);
 			return;
@@ -1042,16 +1022,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 
 static int spurious_fault_check(unsigned long error_code, pte_t *pte)
 {
-	if ((error_code & PF_WRITE) && !pte_write(*pte))
+	if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
 		return 0;
 
-	if ((error_code & PF_INSTR) && !pte_exec(*pte))
+	if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
 		return 0;
 	/*
 	 * Note: We do not do lazy flushing on protection key
-	 * changes, so no spurious fault will ever set PF_PK.
+	 * changes, so no spurious fault will ever set X86_PF_PK.
 	 */
-	if ((error_code & PF_PK))
+	if ((error_code & X86_PF_PK))
 		return 1;
 
 	return 1;
@@ -1096,8 +1076,8 @@ spurious_fault(unsigned long error_code, unsigned long address)
 	 * change, so user accesses are not expected to cause spurious
 	 * faults.
 	 */
-	if (error_code != (PF_WRITE | PF_PROT)
-	    && error_code != (PF_INSTR | PF_PROT))
+	if (error_code != (X86_PF_WRITE | X86_PF_PROT) &&
+	    error_code != (X86_PF_INSTR | X86_PF_PROT))
 		return 0;
 
 	pgd = init_mm.pgd + pgd_index(address);
@@ -1150,19 +1130,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
 	 * always an unconditional error and can never result in
 	 * a follow-up action to resolve the fault, like a COW.
 	 */
-	if (error_code & PF_PK)
+	if (error_code & X86_PF_PK)
 		return 1;
 
 	/*
 	 * Make sure to check the VMA so that we do not perform
-	 * faults just to hit a PF_PK as soon as we fill in a
+	 * faults just to hit a X86_PF_PK as soon as we fill in a
 	 * page.
 	 */
-	if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE),
-				(error_code & PF_INSTR), foreign))
+	if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
+				       (error_code & X86_PF_INSTR), foreign))
 		return 1;
 
-	if (error_code & PF_WRITE) {
+	if (error_code & X86_PF_WRITE) {
 		/* write, present and write, not present: */
 		if (unlikely(!(vma->vm_flags & VM_WRITE)))
 			return 1;
@@ -1170,7 +1150,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
 	}
 
 	/* read, present: */
-	if (unlikely(error_code & PF_PROT))
+	if (unlikely(error_code & X86_PF_PROT))
 		return 1;
 
 	/* read, not present: */
@@ -1193,7 +1173,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
 	if (!static_cpu_has(X86_FEATURE_SMAP))
 		return false;
 
-	if (error_code & PF_USER)
+	if (error_code & X86_PF_USER)
 		return false;
 
 	if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC))
@@ -1249,7 +1229,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * protection error (error_code & 9) == 0.
 	 */
 	if (unlikely(fault_in_kernel_space(address))) {
-		if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) {
+		if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
 			if (vmalloc_fault(address) >= 0)
 				return;
 
@@ -1277,7 +1257,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	if (unlikely(kprobes_fault(regs)))
 		return;
 
-	if (unlikely(error_code & PF_RSVD))
+	if (unlikely(error_code & X86_PF_RSVD))
 		pgtable_bad(regs, error_code, address);
 
 	if (unlikely(smap_violation(error_code, regs))) {
@@ -1303,7 +1283,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 */
 	if (user_mode(regs)) {
 		local_irq_enable();
-		error_code |= PF_USER;
+		error_code |= X86_PF_USER;
 		flags |= FAULT_FLAG_USER;
 	} else {
 		if (regs->flags & X86_EFLAGS_IF)
@@ -1312,9 +1292,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
-	if (error_code & PF_WRITE)
+	if (error_code & X86_PF_WRITE)
 		flags |= FAULT_FLAG_WRITE;
-	if (error_code & PF_INSTR)
+	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
 	/*
@@ -1334,7 +1314,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * space check, thus avoiding the deadlock:
 	 */
 	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
-		if ((error_code & PF_USER) == 0 &&
+		if ((error_code & X86_PF_USER) == 0 &&
 		    !search_exception_tables(regs->ip)) {
 			bad_area_nosemaphore(regs, error_code, address, NULL);
 			return;
@@ -1361,7 +1341,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		bad_area(regs, error_code, address);
 		return;
 	}
-	if (error_code & PF_USER) {
+	if (error_code & X86_PF_USER) {
 		/*
 		 * Accessing the stack below %sp is always a bug.
 		 * The large cushion allows instructions like enter
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 16/21] x86/cpufeature: Add User-Mode Instruction Prevention definitions
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (14 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 15/21] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 17/21] x86: Add emulation code for UMIP instructions Ricardo Neri
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4e77723..0739f1e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,6 +286,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
+#define X86_FEATURE_UMIP	(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU		(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE	(16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 85599ad..4707445 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX	(1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP	0
+#else
+# define DISABLE_UMIP	(1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME		(1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR	(1<<(X86_FEATURE_K6_MTRR & 31))
@@ -55,7 +61,7 @@
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
-#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE)
+#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50..d2c2af8 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR		_BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT	10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT	_BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT	11 /* enable UMIP support */
+#define X86_CR4_UMIP		_BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_VMXE_BIT	13 /* enable VMX virtualization */
 #define X86_CR4_VMXE		_BITUL(X86_CR4_VMXE_BIT)
 #define X86_CR4_SMXE_BIT	14 /* enable safer mode (TXT) */
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 17/21] x86: Add emulation code for UMIP instructions
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (15 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 16/21] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 18/21] x86/umip: Force a page fault when unable to copy emulated result to user Ricardo Neri
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.

Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).

This emulation is needed because certain applications (e.g., WineHQ) rely
on this subset of instructions to function.

The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications
such as WineHQ rely on the result being located in the kernel memory space.
The result is emulated as a hard-coded value that, lies close to the top
of the kernel memory. The limit for the GDT and the IDT are set to zero.

The instructions sldt and str return a segment selector relative to the
base address of the global descriptor table. Since the actual address of
such table is not revealed, it makes sense to emulate the result as zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. This is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref inspects the segment descriptor pointed by the registers
in pt_regs. This ensures that we correctly obtain the segment base address
and the address and operand sizes even if the user space application uses
local descriptor table.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/umip.h |  15 +++
 arch/x86/kernel/Makefile    |   1 +
 arch/x86/kernel/umip.c      | 257 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 273 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 0000000..077b236
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs)
+{
+	return false;
+}
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 84c0059..0ded7b1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -122,6 +122,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)		+= umip.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y					+= unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 0000000..e64d8e5
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,257 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ *    sgdt
+ *    sldt
+ *    sidt
+ *    str
+ *    smsw
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri <ricardo.neri@linux.intel.com>
+ */
+
+#include <linux/uaccess.h>
+#include <asm/umip.h>
+#include <asm/traps.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+#include <linux/ratelimit.h>
+
+/*
+ * == Base addresses of GDT and IDT
+ * Some applications to function rely finding the global descriptor table (GDT)
+ * and the interrupt descriptor table (IDT) in kernel memory.
+ * For x86_32, the selected values do not match any particular hole, but it
+ * suffices to provide a memory location within kernel memory.
+ *
+ * == CRO flags for SMSW
+ * Use the flags given when booting, as found in head_32.S
+ */
+
+#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
+		   X86_CR0_WP | X86_CR0_AM)
+#define UMIP_DUMMY_GDT_BASE 0xfffe0000
+#define UMIP_DUMMY_IDT_BASE 0xffff0000
+
+enum umip_insn {
+	UMIP_SGDT = 0,	/* opcode 0f 01 ModR/M reg 0 */
+	UMIP_SIDT,	/* opcode 0f 01 ModR/M reg 1 */
+	UMIP_SLDT,	/* opcode 0f 00 ModR/M reg 0 */
+	UMIP_SMSW,	/* opcode 0f 01 ModR/M reg 4 */
+	UMIP_STR,	/* opcode 0f 00 ModR/M reg 1 */
+};
+
+/**
+ * __identify_insn - Identify a UMIP-protected instruction
+ * @insn:	Instruction structure with opcode and ModRM byte.
+ *
+ * From the instruction opcode and the reg part of the ModRM byte, identify,
+ * if any, a UMIP-protected instruction.
+ *
+ * Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure.
+ */
+static int __identify_insn(struct insn *insn)
+{
+	/* By getting modrm we also get the opcode. */
+	insn_get_modrm(insn);
+
+	/* All the instructions of interest start with 0x0f. */
+	if (insn->opcode.bytes[0] != 0xf)
+		return -EINVAL;
+
+	if (insn->opcode.bytes[1] == 0x1) {
+		switch (X86_MODRM_REG(insn->modrm.value)) {
+		case 0:
+			return UMIP_SGDT;
+		case 1:
+			return UMIP_SIDT;
+		case 4:
+			return UMIP_SMSW;
+		default:
+			return -EINVAL;
+		}
+	} else if (insn->opcode.bytes[1] == 0x0) {
+		if (X86_MODRM_REG(insn->modrm.value) == 0)
+			return UMIP_SLDT;
+		else if (X86_MODRM_REG(insn->modrm.value) == 1)
+			return UMIP_STR;
+		else
+			return -EINVAL;
+	} else {
+		return -EINVAL;
+	}
+}
+
+/**
+ * __emulate_umip_insn - Emulate UMIP instructions with dummy values
+ * @insn:	Instruction structure with ModRM byte
+ * @umip_inst:	Instruction to emulate
+ * @data:	Buffer onto which the dummy values will be copied
+ * @data_size:	Size of the emulated result
+ *
+ * Emulate an instruction protected by UMIP. The result of the emulation
+ * is saved in the provided buffer. The size of the results depends on both
+ * the instruction and type of operand (register vs memory address). Thus,
+ * the size of the result needs to be updated.
+ *
+ * Result: 0 if success, -EINVAL on failure to emulate
+ */
+static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
+			       unsigned char *data, int *data_size)
+{
+	unsigned long dummy_base_addr;
+	unsigned short dummy_limit = 0;
+	unsigned int dummy_value = 0;
+
+	switch (umip_inst) {
+	/*
+	 * These two instructions return the base address and limit of the
+	 * global and interrupt descriptor table. The base address can be
+	 * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand
+	 * size is 16-bit the returned value of the base address is supposed
+	 * to be a zero-extended 24-byte number. However, it seems that a
+	 * 32-byte number is always returned in legacy protected mode
+	 * irrespective of the operand size.
+	 */
+	case UMIP_SGDT:
+		/* fall through */
+	case UMIP_SIDT:
+		if (umip_inst == UMIP_SGDT)
+			dummy_base_addr = UMIP_DUMMY_GDT_BASE;
+		else
+			dummy_base_addr = UMIP_DUMMY_IDT_BASE;
+		if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+			/* SGDT and SIDT do not take register as argument. */
+			return -EINVAL;
+		}
+
+		memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
+		memcpy(data, &dummy_limit, sizeof(dummy_limit));
+		*data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);
+		break;
+	case UMIP_SMSW:
+		/*
+		 * Even though CR0_STATE contain 4 bytes, the number
+		 * of bytes to be copied in the result buffer is determined
+		 * by whether the operand is a register or a memory location.
+		 */
+		dummy_value = CR0_STATE;
+		/*
+		 * These two instructions return a 16-bit value. We return
+		 * all zeros. This is equivalent to a null descriptor for
+		 * str and sldt.
+		 */
+		/* fall through */
+	case UMIP_SLDT:
+		/* fall through */
+	case UMIP_STR:
+		/* if operand is a register, it is zero-extended */
+		if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+			memset(data, 0, insn->opnd_bytes);
+			*data_size = insn->opnd_bytes;
+		/* if not, only the two least significant bytes are copied */
+		} else {
+			*data_size = 2;
+		}
+		memcpy(data, &dummy_value, sizeof(dummy_value));
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * fixup_umip_exception - Fixup #GP faults caused by UMIP
+ * @regs:	Registers as saved when entering the #GP trap
+ *
+ * The instructions sgdt, sidt, str, smsw, sldt cause a general protection
+ * fault if with CPL > 0 (i.e., from user space). This function can be
+ * used to emulate the results of the aforementioned instructions with
+ * dummy values. Results are copied to user-space memory as indicated by
+ * the instruction pointed by EIP using the registers indicated in the
+ * instruction operands. This function also takes care of determining
+ * the address to which the results must be copied.
+ */
+bool fixup_umip_exception(struct pt_regs *regs)
+{
+	struct insn insn;
+	unsigned char buf[MAX_INSN_SIZE];
+	/* 10 bytes is the maximum size of the result of UMIP instructions */
+	unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+	unsigned long seg_base;
+	int not_copied, nr_copied, reg_offset, dummy_data_size;
+	void __user *uaddr;
+	unsigned long *reg_addr;
+	enum umip_insn umip_inst;
+
+	/*
+	 * Use the segment base in case user space used a different code
+	 * segment, either in protected (e.g., from an LDT) or virtual-8086
+	 * modes. In most of the cases seg_base will be zero as in USER_CS.
+	 */
+	seg_base = insn_get_seg_base(regs, &insn, offsetof(struct pt_regs, ip),
+				     true);
+	not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
+				    sizeof(buf));
+	nr_copied = sizeof(buf) - not_copied;
+	/*
+	 * The copy_from_user above could have failed if user code is protected
+	 * by a memory protection key. Give up on emulation in such a case.
+	 * Should we issue a page fault?
+	 */
+	if (!nr_copied)
+		return false;
+
+	insn_init(&insn, buf, nr_copied, 0);
+
+	/*
+	 * Override the default operand and address sizes to what is specified
+	 * in the code segment descriptor. The instruction decoder only sets
+	 * the address size it to either 4 or 8 address bytes and does nothing
+	 * for the operand bytes. This OK for most of the cases, but we could
+	 * have special cases where, for instance, a 16-bit code segment
+	 * descriptor is used.
+	 * If there are overrides, the instruction decoder correctly updates
+	 * these values, even for 16-bit defaults.
+	 */
+	insn.addr_bytes = insn_get_seg_default_address_bytes(regs);
+	insn.opnd_bytes = insn_get_seg_default_operand_bytes(regs);
+
+	if (!insn.addr_bytes || !insn.opnd_bytes)
+		return false;
+
+#ifdef CONFIG_X86_64
+	if (user_64bit_mode(regs))
+		return false;
+#endif
+
+	insn_get_length(&insn);
+	if (nr_copied < insn.length)
+		return false;
+
+	umip_inst = __identify_insn(&insn);
+	/* Check if we found an instruction protected by UMIP */
+	if (umip_inst < 0)
+		return false;
+
+	if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
+		return false;
+
+	/* If operand is a register, write directly to it */
+	if (X86_MODRM_MOD(insn.modrm.value) == 3) {
+		reg_offset = insn_get_reg_offset_modrm_rm(&insn, regs);
+		reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
+		memcpy(reg_addr, dummy_data, dummy_data_size);
+	} else {
+		uaddr = insn_get_addr_ref(&insn, regs);
+		nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
+		if (nr_copied  > 0)
+			return false;
+	}
+
+	/* increase IP to let the program keep going */
+	regs->ip += insn.length;
+	return true;
+}
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 18/21] x86/umip: Force a page fault when unable to copy emulated result to user
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (16 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 17/21] x86: Add emulation code for UMIP instructions Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 19/21] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri

fixup_umip_exception will be called from do_general_protection. If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function is inspired in
force_sig_info_fault is introduced to model the page fault.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/umip.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index e64d8e5..bd06e26 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -163,6 +163,41 @@ static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
 }
 
 /**
+ * __force_sig_info_umip_fault - Force a SIGSEGV with SEGV_MAPERR
+ * @address:	Address that caused the signal
+ * @regs:	Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void __force_sig_info_umip_fault(void __user *address,
+					struct pt_regs *regs)
+{
+	siginfo_t info;
+	struct task_struct *tsk = current;
+
+	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
+		printk_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n",
+				   tsk->comm, task_pid_nr(tsk), regs->ip,
+				   regs->sp, X86_PF_USER | X86_PF_WRITE,
+				   regs->ip);
+	}
+
+	tsk->thread.cr2		= (unsigned long)address;
+	tsk->thread.error_code	= X86_PF_USER | X86_PF_WRITE;
+	tsk->thread.trap_nr	= X86_TRAP_PF;
+
+	info.si_signo	= SIGSEGV;
+	info.si_errno	= 0;
+	info.si_code	= SEGV_MAPERR;
+	info.si_addr	= address;
+	force_sig_info(SIGSEGV, &info, tsk);
+}
+
+/**
  * fixup_umip_exception - Fixup #GP faults caused by UMIP
  * @regs:	Registers as saved when entering the #GP trap
  *
@@ -247,8 +282,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
 	} else {
 		uaddr = insn_get_addr_ref(&insn, regs);
 		nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-		if (nr_copied  > 0)
-			return false;
+		if (nr_copied  > 0) {
+			/*
+			 * If copy fails, send a signal and tell caller that
+			 * fault was fixed up
+			 */
+			__force_sig_info_umip_fault(uaddr, regs);
+			return true;
+		}
 	}
 
 	/* increase IP to let the program keep going */
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 19/21] x86/traps: Fixup general protection faults caused by UMIP
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (17 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 18/21] x86/umip: Force a page fault when unable to copy emulated result to user Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08 15:54   ` Andy Lutomirski
  2017-03-08  0:32 ` [v6 PATCH 20/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception will emulate dummy results for these
instructions. If emulation is successful, the result is passed to the
user space program and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/traps.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 948443e..86efbcb 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include <asm/trace/mpx.h>
 #include <asm/mpx.h>
 #include <asm/vm86.h>
+#include <asm/umip.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
 	cond_local_irq_enable(regs);
 
+	if (user_mode(regs) && fixup_umip_exception(regs))
+		return;
+
 	if (v8086_mode(regs)) {
 		local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 20/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (18 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 19/21] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08  0:32 ` [v6 PATCH 21/21] selftests/x86: Add tests for " Ricardo Neri
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri,
	Tony Luck

User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Liang Z. Li <liang.z.li@intel.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: x86@kernel.org
Cc: linux-msdos@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/Kconfig             | 10 ++++++++++
 arch/x86/kernel/cpu/common.c | 16 +++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a..b7f1226 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1735,6 +1735,16 @@ config X86_SMAP
 
 	  If unsure, say Y.
 
+config X86_INTEL_UMIP
+	def_bool y
+	depends on CPU_SUP_INTEL
+	prompt "Intel User Mode Instruction Prevention" if EXPERT
+	---help---
+	  The User Mode Instruction Prevention (UMIP) is a security
+	  feature in newer Intel processors. If enabled, a general
+	  protection fault is issued if the instructions SGDT, SLDT,
+	  SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 58094a1..9f59eb5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -311,6 +311,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 	}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+	if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
+	    cpu_has(c, X86_FEATURE_UMIP))
+		cr4_set_bits(X86_CR4_UMIP);
+	else
+		/*
+		 * Make sure UMIP is disabled in case it was enabled in a
+		 * previous boot (e.g., via kexec).
+		 */
+		cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1080,9 +1093,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP */
+	/* Set up SMEP/SMAP/UMIP */
 	setup_smep(c);
 	setup_smap(c);
+	setup_umip(c);
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [v6 PATCH 21/21] selftests/x86: Add tests for User-Mode Instruction Prevention
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (19 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 20/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
@ 2017-03-08  0:32 ` Ricardo Neri
  2017-03-08 15:56   ` Andy Lutomirski
  2017-03-08 14:08 ` [v6 PATCH 00/21] x86: Enable " Stas Sergeev
  2017-03-08 16:07 ` Andy Lutomirski
  22 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-08  0:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Ricardo Neri

Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel catches it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed without
causing such #GP. If no #GP exceptions occur, we expect to exit virtual-
8086 mode from INT 0x80.

The instructions protected by UMIP are executed in representative use
cases:
 a) the memory address of the result is given in the form of a displacement
    from the base of the data segment
 b) the memory address of the result is given in a general purpose register
 c) the result is stored directly in a general purpose register.

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not have
the UMIP feature. Instead, results are printed for verification.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 tools/testing/selftests/x86/entry_from_vm86.c | 39 ++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..377b773 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
 	"int3\n\t"
 	"vmcode_int80:\n\t"
 	"int $0x80\n\t"
+	"umip:\n\t"
+	/* addressing via displacements */
+	"smsw (2052)\n\t"
+	"sidt (2054)\n\t"
+	"sgdt (2060)\n\t"
+	/* addressing via registers */
+	"mov $2066, %bx\n\t"
+	"smsw (%bx)\n\t"
+	"mov $2068, %bx\n\t"
+	"sidt (%bx)\n\t"
+	"mov $2074, %bx\n\t"
+	"sgdt (%bx)\n\t"
+	/* register operands, only for smsw */
+	"smsw %ax\n\t"
+	"mov %ax, (2080)\n\t"
+	"int $0x80\n\t"
 	".size vmcode, . - vmcode\n\t"
 	"end_vmcode:\n\t"
 	".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-	vmcode_sti[], vmcode_int3[], vmcode_int80[];
+	vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -218,6 +234,27 @@ int main(void)
 	v86.regs.eax = (unsigned int)-1;
 	do_test(&v86, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
 
+	/* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */
+	do_test(&v86, umip - vmcode, VM86_INTx, 0x80, "UMIP tests");
+	printf("[INFO]\tResults of UMIP-protected instructions via displacements:\n");
+	printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052));
+	printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+	       *(unsigned short *)(addr + 2054),
+	       *(unsigned long  *)(addr + 2056));
+	printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+	       *(unsigned short *)(addr + 2060),
+	       *(unsigned long  *)(addr + 2062));
+	printf("[INFO]\tResults of UMIP-protected instructions via addressing in registers:\n");
+	printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066));
+	printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+	       *(unsigned short *)(addr + 2068),
+	       *(unsigned long  *)(addr + 2070));
+	printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+	       *(unsigned short *)(addr + 2074),
+	       *(unsigned long  *)(addr + 2076));
+	printf("[INFO]\tResults of SMSW via register operands:\n");
+	printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080));
+
 	/* Execute a null pointer */
 	v86.regs.cs = 0;
 	v86.regs.ss = 0;
-- 
2.9.3

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (20 preceding siblings ...)
  2017-03-08  0:32 ` [v6 PATCH 21/21] selftests/x86: Add tests for " Ricardo Neri
@ 2017-03-08 14:08 ` Stas Sergeev
  2017-03-08 16:06   ` Andy Lutomirski
  2017-03-09  0:46   ` Ricardo Neri
  2017-03-08 16:07 ` Andy Lutomirski
  22 siblings, 2 replies; 112+ messages in thread
From: Stas Sergeev @ 2017-03-08 14:08 UTC (permalink / raw)
  To: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov
  Cc: Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel

08.03.2017 03:32, Ricardo Neri пишет:
> These are the instructions covered by UMIP:
> * SGDT - Store Global Descriptor Table
> * SIDT - Store Interrupt Descriptor Table
> * SLDT - Store Local Descriptor Table
> * SMSW - Store Machine Status Word
> * STR - Store Task Register
>
> This patchset initially treated tasks running in virtual-8086 mode as a
> special case. However, I received clarification that DOSEMU[8] does not
> support applications that use these instructions.
Yes, this is the case.
But at least in the past there was an attempt to
support SLDT as it is used by an ancient pharlap
DOS extender (currently unsupported by dosemu1/2).
So how difficult would it be to add an optional
possibility of delivering such SIGSEGV to userspace
so that the kernel's dummy emulation can be overridden?
It doesn't need to be a matter of this particular
patch set, i.e. this proposal should not trigger a
v7 resend of all 21 patches. :) But it would be useful
for the future development of dosemu2.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 19/21] x86/traps: Fixup general protection faults caused by UMIP
  2017-03-08  0:32 ` [v6 PATCH 19/21] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
@ 2017-03-08 15:54   ` Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-08 15:54 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Stas Sergeev, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel,
	Tony Luck

On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
> If the User-Mode Instruction Prevention CPU feature is available and
> enabled, a general protection fault will be issued if the instructions
> sgdt, sldt, sidt, str or smsw are executed from user-mode context
> (CPL > 0). If the fault was caused by any of the instructions protected
> by UMIP, fixup_umip_exception will emulate dummy results for these
> instructions. If emulation is successful, the result is passed to the
> user space program and no SIGSEGV signal is emitted.
>
> Please note that fixup_umip_exception also caters for the case when
> the fault originated while running in virtual-8086 mode.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 21/21] selftests/x86: Add tests for User-Mode Instruction Prevention
  2017-03-08  0:32 ` [v6 PATCH 21/21] selftests/x86: Add tests for " Ricardo Neri
@ 2017-03-08 15:56   ` Andy Lutomirski
  2017-03-10 23:38     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-08 15:56 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Stas Sergeev, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel

On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
> Certain user space programs that run on virtual-8086 mode may utilize
> instructions protected by the User-Mode Instruction Prevention (UMIP)
> security feature present in new Intel processors: SGDT, SIDT and SMSW. In
> such a case, a general protection fault is issued if UMIP is enabled. When
> such a fault happens, the kernel catches it and emulates the results of
> these instructions with dummy values. The purpose of this new
> test is to verify whether the impacted instructions can be executed without
> causing such #GP. If no #GP exceptions occur, we expect to exit virtual-
> 8086 mode from INT 0x80.
>
> The instructions protected by UMIP are executed in representative use
> cases:
>  a) the memory address of the result is given in the form of a displacement
>     from the base of the data segment
>  b) the memory address of the result is given in a general purpose register
>  c) the result is stored directly in a general purpose register.
>
> Unfortunately, it is not possible to check the results against a set of
> expected values because no emulation will occur in systems that do not have
> the UMIP feature. Instead, results are printed for verification.

You could pre-initialize the result buffer to a bunch of non-matching
values (1, 2, 3, ...) and then check that all the invocations of the
same instruction gave the same value.

If you do this, maybe make it a follow-up patch -- see other email.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08 14:08 ` [v6 PATCH 00/21] x86: Enable " Stas Sergeev
@ 2017-03-08 16:06   ` Andy Lutomirski
  2017-03-08 16:29     ` Stas Sergeev
  2017-03-09  0:46   ` Ricardo Neri
  1 sibling, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-08 16:06 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
> 08.03.2017 03:32, Ricardo Neri пишет:
>>
>> These are the instructions covered by UMIP:
>> * SGDT - Store Global Descriptor Table
>> * SIDT - Store Interrupt Descriptor Table
>> * SLDT - Store Local Descriptor Table
>> * SMSW - Store Machine Status Word
>> * STR - Store Task Register
>>
>> This patchset initially treated tasks running in virtual-8086 mode as a
>> special case. However, I received clarification that DOSEMU[8] does not
>> support applications that use these instructions.

Can you remind me what was special about it?  It looks like you still
emulate them in v8086 mode.

>
> Yes, this is the case.
> But at least in the past there was an attempt to
> support SLDT as it is used by an ancient pharlap
> DOS extender (currently unsupported by dosemu1/2).
> So how difficult would it be to add an optional
> possibility of delivering such SIGSEGV to userspace
> so that the kernel's dummy emulation can be overridden?
> It doesn't need to be a matter of this particular
> patch set, i.e. this proposal should not trigger a
> v7 resend of all 21 patches. :) But it would be useful
> for the future development of dosemu2.

What I'd actually like to see is a totally separate patchset that adds
an inheritable (but reset on exec) per-task mask of legacy
compatibility features to disable.  Maybe:

sys_adjust_compat_mask(int op, int word, u32 mask);

op could indicate that we want to so SET, OR, AND, or READ.  word
would be 0 for now.  It could be a prctl, too.

Things in the mask could include:

COMPAT_MASK0_X86_64_VSYSCALL [1]
COMPAT_MASK0_X86_UMIP_FIXUP

I'm sure I could think of more along these lines.

Then DOSEMU (and future WINE versions, too) could just mask off
X86_UMIP_FIXUP and do their own emulation

[1] For those of you thinking about this and realizing that VSYSCALL
readability is inherently global and not per-task, I know how to fix
that for essentially no cost :)

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
                   ` (21 preceding siblings ...)
  2017-03-08 14:08 ` [v6 PATCH 00/21] x86: Enable " Stas Sergeev
@ 2017-03-08 16:07 ` Andy Lutomirski
  22 siblings, 0 replies; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-08 16:07 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Stas Sergeev, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel

On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
> This is v6 of this series. The five previous submissions can be found
> here [1], here [2], here[3], here[4], and here[5]. This version addresses
> the comments received in v4 plus improvements of the handling of emulation
> in 64-bit builds. Please see details in the change log.
>

Hi Ingo and Thomas-

I think this series is in good enough shape that you should consider
making a topic branch (x86/umip?) for it so that it can soak in -next
and further development can be done incrementally.  In the unlikely
event that a major problem shows up, you could skip the pull request
to Linus for a cycle.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 15/21] x86/mm: Relocate page fault error codes to traps.h
  2017-03-08  0:32 ` [v6 PATCH 15/21] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
@ 2017-03-08 16:08   ` Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-08 16:08 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Stas Sergeev, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel,
	Kirill A. Shutemov, Josh Poimboeuf

On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
> Up to this point, only fault.c used the definitions of the page fault error
> codes. Thus, it made sense to keep them within such file. Other portions of
> code might be interested in those definitions too. For instance, the User-
> Mode Instruction Prevention emulation code will use such definitions to
> emulate a page fault when it is unable to successfully copy the results
> of the emulated instructions to user space.
>
> While relocating the error code enumeration, the prefix X86_ is used to
> make it consistent with the rest of the definitions in traps.h. Of course,
> code using the enumeration had to be updated as well. No functional changes
> were performed.
>

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08 16:06   ` Andy Lutomirski
@ 2017-03-08 16:29     ` Stas Sergeev
  2017-03-08 16:46       ` Andy Lutomirski
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-08 16:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

08.03.2017 19:06, Andy Lutomirski пишет:
> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
>> 08.03.2017 03:32, Ricardo Neri пишет:
>>> These are the instructions covered by UMIP:
>>> * SGDT - Store Global Descriptor Table
>>> * SIDT - Store Interrupt Descriptor Table
>>> * SLDT - Store Local Descriptor Table
>>> * SMSW - Store Machine Status Word
>>> * STR - Store Task Register
>>>
>>> This patchset initially treated tasks running in virtual-8086 mode as a
>>> special case. However, I received clarification that DOSEMU[8] does not
>>> support applications that use these instructions.
> Can you remind me what was special about it?  It looks like you still
> emulate them in v8086 mode.
Indeed, sorry, I meant prot mode here. :)
So I wonder what was cited to be special about v86.

>> Yes, this is the case.
>> But at least in the past there was an attempt to
>> support SLDT as it is used by an ancient pharlap
>> DOS extender (currently unsupported by dosemu1/2).
>> So how difficult would it be to add an optional
>> possibility of delivering such SIGSEGV to userspace
>> so that the kernel's dummy emulation can be overridden?
>> It doesn't need to be a matter of this particular
>> patch set, i.e. this proposal should not trigger a
>> v7 resend of all 21 patches. :) But it would be useful
>> for the future development of dosemu2.
> What I'd actually like to see is a totally separate patchset that adds
> an inheritable (but reset on exec) per-task mask of legacy
> compatibility features to disable.  Maybe:
>
> sys_adjust_compat_mask(int op, int word, u32 mask);
No no, since I meant prot mode, this is not what I need.
I would never need to disable UMIP as to allow the
prot mode apps to do SLDT. Instead it would be good
to have an ability to provide a replacement for the dummy
emulation that is currently being proposed for kernel.
All is needed for this, is just to deliver a SIGSEGV.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08 16:29     ` Stas Sergeev
@ 2017-03-08 16:46       ` Andy Lutomirski
  2017-03-08 16:53         ` Stas Sergeev
  2017-03-09  1:15         ` Ricardo Neri
  0 siblings, 2 replies; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-08 16:46 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
> 08.03.2017 19:06, Andy Lutomirski пишет:
>>
>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>
>>> 08.03.2017 03:32, Ricardo Neri пишет:
>>>>
>>>> These are the instructions covered by UMIP:
>>>> * SGDT - Store Global Descriptor Table
>>>> * SIDT - Store Interrupt Descriptor Table
>>>> * SLDT - Store Local Descriptor Table
>>>> * SMSW - Store Machine Status Word
>>>> * STR - Store Task Register
>>>>
>>>> This patchset initially treated tasks running in virtual-8086 mode as a
>>>> special case. However, I received clarification that DOSEMU[8] does not
>>>> support applications that use these instructions.
>>
>> Can you remind me what was special about it?  It looks like you still
>> emulate them in v8086 mode.
>
> Indeed, sorry, I meant prot mode here. :)
> So I wonder what was cited to be special about v86.

Not sure.  Ricardo?

>
>>> Yes, this is the case.
>>> But at least in the past there was an attempt to
>>> support SLDT as it is used by an ancient pharlap
>>> DOS extender (currently unsupported by dosemu1/2).
>>> So how difficult would it be to add an optional
>>> possibility of delivering such SIGSEGV to userspace
>>> so that the kernel's dummy emulation can be overridden?
>>> It doesn't need to be a matter of this particular
>>> patch set, i.e. this proposal should not trigger a
>>> v7 resend of all 21 patches. :) But it would be useful
>>> for the future development of dosemu2.
>>
>> What I'd actually like to see is a totally separate patchset that adds
>> an inheritable (but reset on exec) per-task mask of legacy
>> compatibility features to disable.  Maybe:
>>
>> sys_adjust_compat_mask(int op, int word, u32 mask);
>
> No no, since I meant prot mode, this is not what I need.
> I would never need to disable UMIP as to allow the
> prot mode apps to do SLDT. Instead it would be good
> to have an ability to provide a replacement for the dummy
> emulation that is currently being proposed for kernel.
> All is needed for this, is just to deliver a SIGSEGV.

That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
GP exit).

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08 16:46       ` Andy Lutomirski
@ 2017-03-08 16:53         ` Stas Sergeev
  2017-03-09  1:11           ` Ricardo Neri
  2017-03-09  1:15         ` Ricardo Neri
  1 sibling, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-08 16:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

08.03.2017 19:46, Andy Lutomirski пишет:
>> No no, since I meant prot mode, this is not what I need.
>> I would never need to disable UMIP as to allow the
>> prot mode apps to do SLDT. Instead it would be good
>> to have an ability to provide a replacement for the dummy
>> emulation that is currently being proposed for kernel.
>> All is needed for this, is just to deliver a SIGSEGV.
> That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
> turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
> GP exit).
But then I am confused with the word "compat" in
your "COMPAT_MASK0_X86_UMIP_FIXUP" and
"sys_adjust_compat_mask(int op, int word, u32 mask);"

Leaving UMIP on and only disabling a fixup doesn't
sound like a compat option to me. I would expect
compat to disable it completely.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08 14:08 ` [v6 PATCH 00/21] x86: Enable " Stas Sergeev
  2017-03-08 16:06   ` Andy Lutomirski
@ 2017-03-09  0:46   ` Ricardo Neri
  2017-03-09 22:01     ` Stas Sergeev
  1 sibling, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-09  0:46 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel

On Wed, 2017-03-08 at 17:08 +0300, Stas Sergeev wrote:
> 08.03.2017 03:32, Ricardo Neri пишет:
> > These are the instructions covered by UMIP:
> > * SGDT - Store Global Descriptor Table
> > * SIDT - Store Interrupt Descriptor Table
> > * SLDT - Store Local Descriptor Table
> > * SMSW - Store Machine Status Word
> > * STR - Store Task Register
> >
> > This patchset initially treated tasks running in virtual-8086 mode as a
> > special case. However, I received clarification that DOSEMU[8] does not
> > support applications that use these instructions.
> Yes, this is the case.
> But at least in the past there was an attempt to
> support SLDT as it is used by an ancient pharlap
> DOS extender (currently unsupported by dosemu1/2).
> So how difficult would it be to add an optional
> possibility of delivering such SIGSEGV to userspace
> so that the kernel's dummy emulation can be overridden?

I suppose a umip=noemulation kernel parameter could be added in this
case.

> It doesn't need to be a matter of this particular
> patch set, i.e. this proposal should not trigger a
> v7 resend of all 21 patches. :) But it would be useful
> for the future development of dosemu2.

Would dosemu2 use 32-bit processes in order to keep segmentation? If it
could use 64-bit processes, emulation is not used in this case and the
SIGSEGV is delivered to user space.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08 16:53         ` Stas Sergeev
@ 2017-03-09  1:11           ` Ricardo Neri
  2017-03-09 22:05             ` Stas Sergeev
  2017-03-10  2:41             ` Andy Lutomirski
  0 siblings, 2 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-09  1:11 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
> 08.03.2017 19:46, Andy Lutomirski пишет:
> >> No no, since I meant prot mode, this is not what I need.
> >> I would never need to disable UMIP as to allow the
> >> prot mode apps to do SLDT. Instead it would be good
> >> to have an ability to provide a replacement for the dummy
> >> emulation that is currently being proposed for kernel.
> >> All is needed for this, is just to deliver a SIGSEGV.
> > That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
> > turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
> > GP exit).
> But then I am confused with the word "compat" in
> your "COMPAT_MASK0_X86_UMIP_FIXUP" and
> "sys_adjust_compat_mask(int op, int word, u32 mask);"
> 
> Leaving UMIP on and only disabling a fixup doesn't
> sound like a compat option to me. I would expect
> compat to disable it completely.

I guess that the _UMIP_FIXUP part makes it clear that emulation, not
UMIP is disabled, allowing the SIGSEGV be delivered to the user space
program.

Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a
COMPAT_MASK0_X86_UMIP to disable UMIP make sense?

Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its
purpose? Applications could simply use this compat mask to bypass UMIP
and gain access to the instructions it protects.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-08 16:46       ` Andy Lutomirski
  2017-03-08 16:53         ` Stas Sergeev
@ 2017-03-09  1:15         ` Ricardo Neri
  2017-03-09 22:10           ` Stas Sergeev
  1 sibling, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-09  1:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stas Sergeev, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
> > 08.03.2017 19:06, Andy Lutomirski пишет:
> >>
> >> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
> >>>
> >>> 08.03.2017 03:32, Ricardo Neri пишет:
> >>>>
> >>>> These are the instructions covered by UMIP:
> >>>> * SGDT - Store Global Descriptor Table
> >>>> * SIDT - Store Interrupt Descriptor Table
> >>>> * SLDT - Store Local Descriptor Table
> >>>> * SMSW - Store Machine Status Word
> >>>> * STR - Store Task Register
> >>>>
> >>>> This patchset initially treated tasks running in virtual-8086
> mode as a
> >>>> special case. However, I received clarification that DOSEMU[8]
> does not
> >>>> support applications that use these instructions.
> >>
> >> Can you remind me what was special about it?  It looks like you
> still
> >> emulate them in v8086 mode.
> >
> > Indeed, sorry, I meant prot mode here. :)
> > So I wonder what was cited to be special about v86.

Initially my patches disabled UMIP on virtual-8086 instructions, without
regards of protected mode (i.e., UMIP was always enabled). I didn't have
emulation at the time. Then, I added emulation code that now covers
protected and virtual-8086 modes. I guess it is not special anymore.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-09  0:46   ` Ricardo Neri
@ 2017-03-09 22:01     ` Stas Sergeev
  2017-03-10 23:47       ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-09 22:01 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel

09.03.2017 03:46, Ricardo Neri пишет:
> On Wed, 2017-03-08 at 17:08 +0300, Stas Sergeev wrote:
>> 08.03.2017 03:32, Ricardo Neri пишет:
>>> These are the instructions covered by UMIP:
>>> * SGDT - Store Global Descriptor Table
>>> * SIDT - Store Interrupt Descriptor Table
>>> * SLDT - Store Local Descriptor Table
>>> * SMSW - Store Machine Status Word
>>> * STR - Store Task Register
>>>
>>> This patchset initially treated tasks running in virtual-8086 mode as a
>>> special case. However, I received clarification that DOSEMU[8] does not
>>> support applications that use these instructions.
>> Yes, this is the case.
>> But at least in the past there was an attempt to
>> support SLDT as it is used by an ancient pharlap
>> DOS extender (currently unsupported by dosemu1/2).
>> So how difficult would it be to add an optional
>> possibility of delivering such SIGSEGV to userspace
>> so that the kernel's dummy emulation can be overridden?
> I suppose a umip=noemulation kernel parameter could be added in this
> case.
Why?
It doesn't need to be global: the app should be
able to change that on its own. Note that no app currently
requires this, so its just for the future, and in the
future the app can start using the new API for this,
if you provide one.


>> It doesn't need to be a matter of this particular
>> patch set, i.e. this proposal should not trigger a
>> v7 resend of all 21 patches. :) But it would be useful
>> for the future development of dosemu2.
> Would dosemu2 use 32-bit processes in order to keep segmentation? If it
> could use 64-bit processes, emulation is not used in this case and the
> SIGSEGV is delivered to user space.
It does use the mix: 64bit process but some segments
are 32bit for DOS code.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-09  1:11           ` Ricardo Neri
@ 2017-03-09 22:05             ` Stas Sergeev
  2017-03-10  2:41             ` Andy Lutomirski
  1 sibling, 0 replies; 112+ messages in thread
From: Stas Sergeev @ 2017-03-09 22:05 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

09.03.2017 04:11, Ricardo Neri пишет:
> On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
>> 08.03.2017 19:46, Andy Lutomirski пишет:
>>>> No no, since I meant prot mode, this is not what I need.
>>>> I would never need to disable UMIP as to allow the
>>>> prot mode apps to do SLDT. Instead it would be good
>>>> to have an ability to provide a replacement for the dummy
>>>> emulation that is currently being proposed for kernel.
>>>> All is needed for this, is just to deliver a SIGSEGV.
>>> That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
>>> turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
>>> GP exit).
>> But then I am confused with the word "compat" in
>> your "COMPAT_MASK0_X86_UMIP_FIXUP" and
>> "sys_adjust_compat_mask(int op, int word, u32 mask);"
>>
>> Leaving UMIP on and only disabling a fixup doesn't
>> sound like a compat option to me. I would expect
>> compat to disable it completely.
> I guess that the _UMIP_FIXUP part makes it clear that emulation, not
> UMIP is disabled, allowing the SIGSEGV be delivered to the user space
> program.
>
> Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a
> COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
>
> Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its
> purpose? Applications could simply use this compat mask to bypass UMIP
> and gain access to the instructions it protects.
I don't think someone will want to completely disable
UMIP, so why do you need such functionality?
My question was only what does "compat" mean
in "COMPAT_MASK0_X86_UMIP_FIXUP", compat with what.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-09  1:15         ` Ricardo Neri
@ 2017-03-09 22:10           ` Stas Sergeev
  2017-03-10  2:39             ` Andy Lutomirski
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-09 22:10 UTC (permalink / raw)
  To: Ricardo Neri, Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, X86 ML, linux-msdos, wine-devel

09.03.2017 04:15, Ricardo Neri пишет:
> On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
>> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
>>> 08.03.2017 19:06, Andy Lutomirski пишет:
>>>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>>> 08.03.2017 03:32, Ricardo Neri пишет:
>>>>>> These are the instructions covered by UMIP:
>>>>>> * SGDT - Store Global Descriptor Table
>>>>>> * SIDT - Store Interrupt Descriptor Table
>>>>>> * SLDT - Store Local Descriptor Table
>>>>>> * SMSW - Store Machine Status Word
>>>>>> * STR - Store Task Register
>>>>>>
>>>>>> This patchset initially treated tasks running in virtual-8086
>> mode as a
>>>>>> special case. However, I received clarification that DOSEMU[8]
>> does not
>>>>>> support applications that use these instructions.
>>>> Can you remind me what was special about it?  It looks like you
>> still
>>>> emulate them in v8086 mode.
>>> Indeed, sorry, I meant prot mode here. :)
>>> So I wonder what was cited to be special about v86.
> Initially my patches disabled UMIP on virtual-8086 instructions, without
> regards of protected mode (i.e., UMIP was always enabled). I didn't have
> emulation at the time. Then, I added emulation code that now covers
> protected and virtual-8086 modes. I guess it is not special anymore.
But isn't SLDT&friends just throw UD in v86?
How does UMIP affect this? How does your patch affect
this?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-09 22:10           ` Stas Sergeev
@ 2017-03-10  2:39             ` Andy Lutomirski
  2017-03-10 11:33               ` Stas Sergeev
  2017-03-10 23:58               ` Ricardo Neri
  0 siblings, 2 replies; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-10  2:39 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev <stsp@list.ru> wrote:
> 09.03.2017 04:15, Ricardo Neri пишет:
>
>> On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
>>>
>>> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>>
>>>> 08.03.2017 19:06, Andy Lutomirski пишет:
>>>>>
>>>>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>>>>
>>>>>> 08.03.2017 03:32, Ricardo Neri пишет:
>>>>>>>
>>>>>>> These are the instructions covered by UMIP:
>>>>>>> * SGDT - Store Global Descriptor Table
>>>>>>> * SIDT - Store Interrupt Descriptor Table
>>>>>>> * SLDT - Store Local Descriptor Table
>>>>>>> * SMSW - Store Machine Status Word
>>>>>>> * STR - Store Task Register
>>>>>>>
>>>>>>> This patchset initially treated tasks running in virtual-8086
>>>
>>> mode as a
>>>>>>>
>>>>>>> special case. However, I received clarification that DOSEMU[8]
>>>
>>> does not
>>>>>>>
>>>>>>> support applications that use these instructions.
>>>>>
>>>>> Can you remind me what was special about it?  It looks like you
>>>
>>> still
>>>>>
>>>>> emulate them in v8086 mode.
>>>>
>>>> Indeed, sorry, I meant prot mode here. :)
>>>> So I wonder what was cited to be special about v86.
>>
>> Initially my patches disabled UMIP on virtual-8086 instructions, without
>> regards of protected mode (i.e., UMIP was always enabled). I didn't have
>> emulation at the time. Then, I added emulation code that now covers
>> protected and virtual-8086 modes. I guess it is not special anymore.
>
> But isn't SLDT&friends just throw UD in v86?
> How does UMIP affect this? How does your patch affect
> this?

Er, right.  Ricardo, your code may need fixing.  But don't you have a
test case for this?  The behavior should be the same with and without
your patches applied.  The exception is #UD, not #GP, so maybe your
code just never executes in the vm86 case.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-09  1:11           ` Ricardo Neri
  2017-03-09 22:05             ` Stas Sergeev
@ 2017-03-10  2:41             ` Andy Lutomirski
  2017-03-10 10:30               ` Stas Sergeev
  1 sibling, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-10  2:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Stas Sergeev, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, X86 ML, linux-msdos, wine-devel

On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
> On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
>> 08.03.2017 19:46, Andy Lutomirski пишет:
>> >> No no, since I meant prot mode, this is not what I need.
>> >> I would never need to disable UMIP as to allow the
>> >> prot mode apps to do SLDT. Instead it would be good
>> >> to have an ability to provide a replacement for the dummy
>> >> emulation that is currently being proposed for kernel.
>> >> All is needed for this, is just to deliver a SIGSEGV.
>> > That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
>> > turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
>> > GP exit).
>> But then I am confused with the word "compat" in
>> your "COMPAT_MASK0_X86_UMIP_FIXUP" and
>> "sys_adjust_compat_mask(int op, int word, u32 mask);"
>>
>> Leaving UMIP on and only disabling a fixup doesn't
>> sound like a compat option to me. I would expect
>> compat to disable it completely.
>
> I guess that the _UMIP_FIXUP part makes it clear that emulation, not
> UMIP is disabled, allowing the SIGSEGV be delivered to the user space
> program.
>
> Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a
> COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
>
> Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its
> purpose? Applications could simply use this compat mask to bypass UMIP
> and gain access to the instructions it protects.
>

I was obviously extremely unclear.  The point of the proposed syscall
is to let programs opt out of legacy features.  So there would be a
bit to disable emulation of UMIP-blocked instructions (this giving the
unadulterated #GP).  There would not be a bit to disable UMIP itself.

There's also a flaw in my proposal.  Disable-vsyscall would be per-mm
and disable-umip-emulation would be per-task, so they'd need to be in
separate words to make any sense.  I'll ponder this a bit more.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10  2:41             ` Andy Lutomirski
@ 2017-03-10 10:30               ` Stas Sergeev
  2017-03-10 21:04                 ` Andy Lutomirski
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-10 10:30 UTC (permalink / raw)
  To: Andy Lutomirski, Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Borislav Petkov,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

10.03.2017 05:41, Andy Lutomirski пишет:
> On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri
> <ricardo.neri-calderon@linux.intel.com> wrote:
>> On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
>>> 08.03.2017 19:46, Andy Lutomirski пишет:
>>>>> No no, since I meant prot mode, this is not what I need.
>>>>> I would never need to disable UMIP as to allow the
>>>>> prot mode apps to do SLDT. Instead it would be good
>>>>> to have an ability to provide a replacement for the dummy
>>>>> emulation that is currently being proposed for kernel.
>>>>> All is needed for this, is just to deliver a SIGSEGV.
>>>> That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
>>>> turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
>>>> GP exit).
>>> But then I am confused with the word "compat" in
>>> your "COMPAT_MASK0_X86_UMIP_FIXUP" and
>>> "sys_adjust_compat_mask(int op, int word, u32 mask);"
>>>
>>> Leaving UMIP on and only disabling a fixup doesn't
>>> sound like a compat option to me. I would expect
>>> compat to disable it completely.
>> I guess that the _UMIP_FIXUP part makes it clear that emulation, not
>> UMIP is disabled, allowing the SIGSEGV be delivered to the user space
>> program.
>>
>> Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a
>> COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
>>
>> Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its
>> purpose? Applications could simply use this compat mask to bypass UMIP
>> and gain access to the instructions it protects.
>>
> I was obviously extremely unclear.  The point of the proposed syscall
> is to let programs opt out of legacy features.
I guess both "compat" and "legacy" are misleading
here. Maybe these are "x86-specific" or "hypervisor-specific",
but a mere enabling of UMIP doesn't immediately make
the use of SLDT instruction a legacy IMHO.

>   I'll ponder this a bit more.
So if we are to invent something new, it would be nice to
also think up a clear terminology for it. Maybe something
like "X86_FEATURE_xxx_MASK" or alike.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10  2:39             ` Andy Lutomirski
@ 2017-03-10 11:33               ` Stas Sergeev
  2017-03-10 14:17                 ` Andy Lutomirski
  2017-03-10 23:59                 ` Ricardo Neri
  2017-03-10 23:58               ` Ricardo Neri
  1 sibling, 2 replies; 112+ messages in thread
From: Stas Sergeev @ 2017-03-10 11:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

10.03.2017 05:39, Andy Lutomirski пишет:
> On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev <stsp@list.ru> wrote:
>> 09.03.2017 04:15, Ricardo Neri пишет:
>>
>>> On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
>>>> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>>> 08.03.2017 19:06, Andy Lutomirski пишет:
>>>>>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>>>>> 08.03.2017 03:32, Ricardo Neri пишет:
>>>>>>>> These are the instructions covered by UMIP:
>>>>>>>> * SGDT - Store Global Descriptor Table
>>>>>>>> * SIDT - Store Interrupt Descriptor Table
>>>>>>>> * SLDT - Store Local Descriptor Table
>>>>>>>> * SMSW - Store Machine Status Word
>>>>>>>> * STR - Store Task Register
>>>>>>>>
>>>>>>>> This patchset initially treated tasks running in virtual-8086
>>>> mode as a
>>>>>>>> special case. However, I received clarification that DOSEMU[8]
>>>> does not
>>>>>>>> support applications that use these instructions.
>>>>>> Can you remind me what was special about it?  It looks like you
>>>> still
>>>>>> emulate them in v8086 mode.
>>>>> Indeed, sorry, I meant prot mode here. :)
>>>>> So I wonder what was cited to be special about v86.
>>> Initially my patches disabled UMIP on virtual-8086 instructions, without
>>> regards of protected mode (i.e., UMIP was always enabled). I didn't have
>>> emulation at the time. Then, I added emulation code that now covers
>>> protected and virtual-8086 modes. I guess it is not special anymore.
>> But isn't SLDT&friends just throw UD in v86?
>> How does UMIP affect this? How does your patch affect
>> this?
> Er, right.  Ricardo, your code may need fixing.  But don't you have a
> test case for this?
Why would you need one?
Or do you really want to allow these instructions
in v86 by the means of emulation? If so - this wasn't
clearly stated in the patch description, neither it was
properly discussed, it seems.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 11:33               ` Stas Sergeev
@ 2017-03-10 14:17                 ` Andy Lutomirski
  2017-03-11  1:22                   ` Ricardo Neri
  2017-03-10 23:59                 ` Ricardo Neri
  1 sibling, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-10 14:17 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Ricardo Neri, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Fri, Mar 10, 2017 at 3:33 AM, Stas Sergeev <stsp@list.ru> wrote:
> 10.03.2017 05:39, Andy Lutomirski пишет:
>
>> On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev <stsp@list.ru> wrote:
>>>
>>> 09.03.2017 04:15, Ricardo Neri пишет:
>>>
>>>> On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
>>>>>
>>>>> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>>>>
>>>>>> 08.03.2017 19:06, Andy Lutomirski пишет:
>>>>>>>
>>>>>>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
>>>>>>>>
>>>>>>>> 08.03.2017 03:32, Ricardo Neri пишет:
>>>>>>>>>
>>>>>>>>> These are the instructions covered by UMIP:
>>>>>>>>> * SGDT - Store Global Descriptor Table
>>>>>>>>> * SIDT - Store Interrupt Descriptor Table
>>>>>>>>> * SLDT - Store Local Descriptor Table
>>>>>>>>> * SMSW - Store Machine Status Word
>>>>>>>>> * STR - Store Task Register
>>>>>>>>>
>>>>>>>>> This patchset initially treated tasks running in virtual-8086
>>>>>
>>>>> mode as a
>>>>>>>>>
>>>>>>>>> special case. However, I received clarification that DOSEMU[8]
>>>>>
>>>>> does not
>>>>>>>>>
>>>>>>>>> support applications that use these instructions.
>>>>>>>
>>>>>>> Can you remind me what was special about it?  It looks like you
>>>>>
>>>>> still
>>>>>>>
>>>>>>> emulate them in v8086 mode.
>>>>>>
>>>>>> Indeed, sorry, I meant prot mode here. :)
>>>>>> So I wonder what was cited to be special about v86.
>>>>
>>>> Initially my patches disabled UMIP on virtual-8086 instructions, without
>>>> regards of protected mode (i.e., UMIP was always enabled). I didn't have
>>>> emulation at the time. Then, I added emulation code that now covers
>>>> protected and virtual-8086 modes. I guess it is not special anymore.
>>>
>>> But isn't SLDT&friends just throw UD in v86?
>>> How does UMIP affect this? How does your patch affect
>>> this?
>>
>> Er, right.  Ricardo, your code may need fixing.  But don't you have a
>> test case for this?
>
> Why would you need one?
> Or do you really want to allow these instructions
> in v86 by the means of emulation? If so - this wasn't
> clearly stated in the patch description, neither it was
> properly discussed, it seems.

What I meant was: if the patches incorrectly started making these
instructions work in vm86 mode where they used to cause a vm86 exit,
then that's a bug that the selftest should have caught.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 10:30               ` Stas Sergeev
@ 2017-03-10 21:04                 ` Andy Lutomirski
  2017-03-10 21:37                   ` Stas Sergeev
  0 siblings, 1 reply; 112+ messages in thread
From: Andy Lutomirski @ 2017-03-10 21:04 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, Ricardo Neri, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, X86 ML, linux-msdos, wine-devel

On Fri, Mar 10, 2017 at 2:30 AM, Stas Sergeev <stsp@list.ru> wrote:
> 10.03.2017 05:41, Andy Lutomirski пишет:
>
>> On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri
>> <ricardo.neri-calderon@linux.intel.com> wrote:
>>>
>>> On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
>>>>
>>>> 08.03.2017 19:46, Andy Lutomirski пишет:
>>>>>>
>>>>>> No no, since I meant prot mode, this is not what I need.
>>>>>> I would never need to disable UMIP as to allow the
>>>>>> prot mode apps to do SLDT. Instead it would be good
>>>>>> to have an ability to provide a replacement for the dummy
>>>>>> emulation that is currently being proposed for kernel.
>>>>>> All is needed for this, is just to deliver a SIGSEGV.
>>>>>
>>>>> That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
>>>>> turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
>>>>> GP exit).
>>>>
>>>> But then I am confused with the word "compat" in
>>>> your "COMPAT_MASK0_X86_UMIP_FIXUP" and
>>>> "sys_adjust_compat_mask(int op, int word, u32 mask);"
>>>>
>>>> Leaving UMIP on and only disabling a fixup doesn't
>>>> sound like a compat option to me. I would expect
>>>> compat to disable it completely.
>>>
>>> I guess that the _UMIP_FIXUP part makes it clear that emulation, not
>>> UMIP is disabled, allowing the SIGSEGV be delivered to the user space
>>> program.
>>>
>>> Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a
>>> COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
>>>
>>> Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its
>>> purpose? Applications could simply use this compat mask to bypass UMIP
>>> and gain access to the instructions it protects.
>>>
>> I was obviously extremely unclear.  The point of the proposed syscall
>> is to let programs opt out of legacy features.
>
> I guess both "compat" and "legacy" are misleading
> here. Maybe these are "x86-specific" or "hypervisor-specific",
> but a mere enabling of UMIP doesn't immediately make
> the use of SLDT instruction a legacy IMHO.

Sure it is. :)  Using SLDT from user mode is a legacy ability that
just happens to still work on existing CPUs and kernels.  Once UMIP
goes in, it will officially be obsolete -- it will just be supported
for backwards compatibility.  New code should opt out and emulate in
usermode if needed.  (And the vast, vast majority of Linux programs
don't use these instructions in the first place.)

Similarly, vsyscalls were obsolete the as soon as better alternatives
were fully supported and the kernel started making them slow, and the
fact that new static glibc programs still used them for a little while
didn't make them any less obsolete.

>
>>   I'll ponder this a bit more.
>
> So if we are to invent something new, it would be nice to
> also think up a clear terminology for it. Maybe something
> like "X86_FEATURE_xxx_MASK" or alike.

But they're misfeatures, not features.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 21:04                 ` Andy Lutomirski
@ 2017-03-10 21:37                   ` Stas Sergeev
  0 siblings, 0 replies; 112+ messages in thread
From: Stas Sergeev @ 2017-03-10 21:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Ricardo Neri, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, X86 ML, linux-msdos, wine-devel

11.03.2017 00:04, Andy Lutomirski пишет:
> On Fri, Mar 10, 2017 at 2:30 AM, Stas Sergeev <stsp@list.ru> wrote:
>> 10.03.2017 05:41, Andy Lutomirski пишет:
>>
>>> On Wed, Mar 8, 2017 at 5:11 PM, Ricardo Neri
>>> <ricardo.neri-calderon@linux.intel.com> wrote:
>>>> On Wed, 2017-03-08 at 19:53 +0300, Stas Sergeev wrote:
>>>>> 08.03.2017 19:46, Andy Lutomirski пишет:
>>>>>>> No no, since I meant prot mode, this is not what I need.
>>>>>>> I would never need to disable UMIP as to allow the
>>>>>>> prot mode apps to do SLDT. Instead it would be good
>>>>>>> to have an ability to provide a replacement for the dummy
>>>>>>> emulation that is currently being proposed for kernel.
>>>>>>> All is needed for this, is just to deliver a SIGSEGV.
>>>>>> That's what I meant.  Turning off FIXUP_UMIP would leave UMIP on but
>>>>>> turn off the fixup, so you'd get a SIGSEGV indicating #GP (or a vm86
>>>>>> GP exit).
>>>>> But then I am confused with the word "compat" in
>>>>> your "COMPAT_MASK0_X86_UMIP_FIXUP" and
>>>>> "sys_adjust_compat_mask(int op, int word, u32 mask);"
>>>>>
>>>>> Leaving UMIP on and only disabling a fixup doesn't
>>>>> sound like a compat option to me. I would expect
>>>>> compat to disable it completely.
>>>> I guess that the _UMIP_FIXUP part makes it clear that emulation, not
>>>> UMIP is disabled, allowing the SIGSEGV be delivered to the user space
>>>> program.
>>>>
>>>> Would having a COMPAT_MASK0_X86_UMIP_FIXUP to disable emulation and a
>>>> COMPAT_MASK0_X86_UMIP to disable UMIP make sense?
>>>>
>>>> Also, wouldn't having a COMPAT_MASK0_X86_UMIP to disable UMIP defeat its
>>>> purpose? Applications could simply use this compat mask to bypass UMIP
>>>> and gain access to the instructions it protects.
>>>>
>>> I was obviously extremely unclear.  The point of the proposed syscall
>>> is to let programs opt out of legacy features.
>> I guess both "compat" and "legacy" are misleading
>> here. Maybe these are "x86-specific" or "hypervisor-specific",
>> but a mere enabling of UMIP doesn't immediately make
>> the use of SLDT instruction a legacy IMHO.
> Sure it is. :)  Using SLDT from user mode is a legacy ability that
> just happens to still work on existing CPUs and kernels.  Once UMIP
> goes in, it will officially be obsolete
Yes, but the names you suggest, imply that "UMIP_FIXUP"
is legacy or compat, which I find misleading because it have
just appeared. Maybe something like "COMPAT_X86_UMIP_INSNS_EMU"?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 21/21] selftests/x86: Add tests for User-Mode Instruction Prevention
  2017-03-08 15:56   ` Andy Lutomirski
@ 2017-03-10 23:38     ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-10 23:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Stas Sergeev, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel

On Wed, 2017-03-08 at 07:56 -0800, Andy Lutomirski wrote:
> On Tue, Mar 7, 2017 at 4:32 PM, Ricardo Neri
> <ricardo.neri-calderon@linux.intel.com> wrote:
> > Certain user space programs that run on virtual-8086 mode may utilize
> > instructions protected by the User-Mode Instruction Prevention (UMIP)
> > security feature present in new Intel processors: SGDT, SIDT and SMSW. In
> > such a case, a general protection fault is issued if UMIP is enabled. When
> > such a fault happens, the kernel catches it and emulates the results of
> > these instructions with dummy values. The purpose of this new
> > test is to verify whether the impacted instructions can be executed without
> > causing such #GP. If no #GP exceptions occur, we expect to exit virtual-
> > 8086 mode from INT 0x80.
> >
> > The instructions protected by UMIP are executed in representative use
> > cases:
> >  a) the memory address of the result is given in the form of a displacement
> >     from the base of the data segment
> >  b) the memory address of the result is given in a general purpose register
> >  c) the result is stored directly in a general purpose register.
> >
> > Unfortunately, it is not possible to check the results against a set of
> > expected values because no emulation will occur in systems that do not have
> > the UMIP feature. Instead, results are printed for verification.
> 
> You could pre-initialize the result buffer to a bunch of non-matching
> values (1, 2, 3, ...) and then check that all the invocations of the
> same instruction gave the same value.

Yes, I can do this. Alternatively, I can check in the test program if
the CPU has UMIP and only run the tests in that case.

> 
> If you do this, maybe make it a follow-up patch -- see other email.

Great! Thank you!

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-09 22:01     ` Stas Sergeev
@ 2017-03-10 23:47       ` Ricardo Neri
  2017-03-10 23:58         ` Stas Sergeev
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-10 23:47 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel

On Fri, 2017-03-10 at 01:01 +0300, Stas Sergeev wrote:
> 09.03.2017 03:46, Ricardo Neri пишет:
> > On Wed, 2017-03-08 at 17:08 +0300, Stas Sergeev wrote:
> >> 08.03.2017 03:32, Ricardo Neri пишет:
> >>> These are the instructions covered by UMIP:
> >>> * SGDT - Store Global Descriptor Table
> >>> * SIDT - Store Interrupt Descriptor Table
> >>> * SLDT - Store Local Descriptor Table
> >>> * SMSW - Store Machine Status Word
> >>> * STR - Store Task Register
> >>>
> >>> This patchset initially treated tasks running in virtual-8086 mode as a
> >>> special case. However, I received clarification that DOSEMU[8] does not
> >>> support applications that use these instructions.
> >> Yes, this is the case.
> >> But at least in the past there was an attempt to
> >> support SLDT as it is used by an ancient pharlap
> >> DOS extender (currently unsupported by dosemu1/2).
> >> So how difficult would it be to add an optional
> >> possibility of delivering such SIGSEGV to userspace
> >> so that the kernel's dummy emulation can be overridden?
> > I suppose a umip=noemulation kernel parameter could be added in this
> > case.
> Why?
> It doesn't need to be global: the app should be
> able to change that on its own. Note that no app currently
> requires this, so its just for the future, and in the
> future the app can start using the new API for this,
> if you provide one.

Right, I missed this detail. Then, yes the API should allow only one app
to relay the SIGSEGV.
> 
> 
> >> It doesn't need to be a matter of this particular
> >> patch set, i.e. this proposal should not trigger a
> >> v7 resend of all 21 patches. :) But it would be useful
> >> for the future development of dosemu2.
> > Would dosemu2 use 32-bit processes in order to keep segmentation? If it
> > could use 64-bit processes, emulation is not used in this case and the
> > SIGSEGV is delivered to user space.
> It does use the mix: 64bit process but some segments
> are 32bit for DOS code.

Do you mean that dosemu2 will start as a 64-bit process and will jump to
32-bit code segments? My emulation code should work in this case as it
will use segmentation in 32-bit code descriptors. Is there anything else
needed?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10  2:39             ` Andy Lutomirski
  2017-03-10 11:33               ` Stas Sergeev
@ 2017-03-10 23:58               ` Ricardo Neri
  1 sibling, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-10 23:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stas Sergeev, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Thu, 2017-03-09 at 18:39 -0800, Andy Lutomirski wrote:
> On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev <stsp@list.ru> wrote:
> > 09.03.2017 04:15, Ricardo Neri пишет:
> >
> >> On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
> >>>
> >>> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
> >>>>
> >>>> 08.03.2017 19:06, Andy Lutomirski пишет:
> >>>>>
> >>>>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
> >>>>>>
> >>>>>> 08.03.2017 03:32, Ricardo Neri пишет:
> >>>>>>>
> >>>>>>> These are the instructions covered by UMIP:
> >>>>>>> * SGDT - Store Global Descriptor Table
> >>>>>>> * SIDT - Store Interrupt Descriptor Table
> >>>>>>> * SLDT - Store Local Descriptor Table
> >>>>>>> * SMSW - Store Machine Status Word
> >>>>>>> * STR - Store Task Register
> >>>>>>>
> >>>>>>> This patchset initially treated tasks running in virtual-8086
> >>>
> >>> mode as a
> >>>>>>>
> >>>>>>> special case. However, I received clarification that DOSEMU[8]
> >>>
> >>> does not
> >>>>>>>
> >>>>>>> support applications that use these instructions.
> >>>>>
> >>>>> Can you remind me what was special about it?  It looks like you
> >>>
> >>> still
> >>>>>
> >>>>> emulate them in v8086 mode.
> >>>>
> >>>> Indeed, sorry, I meant prot mode here. :)
> >>>> So I wonder what was cited to be special about v86.
> >>
> >> Initially my patches disabled UMIP on virtual-8086 instructions, without
> >> regards of protected mode (i.e., UMIP was always enabled). I didn't have
> >> emulation at the time. Then, I added emulation code that now covers
> >> protected and virtual-8086 modes. I guess it is not special anymore.
> >
> > But isn't SLDT&friends just throw UD in v86?
> > How does UMIP affect this? How does your patch affect
> > this?
> 
> Er, right.  Ricardo, your code may need fixing.  But don't you have a
> test case for this?  The behavior should be the same with and without
> your patches applied.  The exception is #UD, not #GP, so maybe your
> code just never executes in the vm86 case.

Ouch! Yes, I am afraid my code will attempt to emulate sldt in vm86
mode. The test cases that I have for vm86 are only for the instructions
that are valid in vm86: smsw, sidt and sgdt.

I will add test cases for str and sldt and make sure that a #UD is
issued.

Would this trigger a v7 series?

Thanks and BR,
Ricardo
> 
> --Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 23:47       ` Ricardo Neri
@ 2017-03-10 23:58         ` Stas Sergeev
  2017-03-11  0:13           ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-10 23:58 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel

11.03.2017 02:47, Ricardo Neri пишет:
>>
>>>> It doesn't need to be a matter of this particular
>>>> patch set, i.e. this proposal should not trigger a
>>>> v7 resend of all 21 patches. :) But it would be useful
>>>> for the future development of dosemu2.
>>> Would dosemu2 use 32-bit processes in order to keep segmentation? If it
>>> could use 64-bit processes, emulation is not used in this case and the
>>> SIGSEGV is delivered to user space.
>> It does use the mix: 64bit process but some segments
>> are 32bit for DOS code.
> Do you mean that dosemu2 will start as a 64-bit process and will jump to
> 32-bit code segments?
Yes, so the offending insns are executed only in 32bit
and 16bit segments, even if the process itself is 64bit.
I guess you handle 16bit segments same as 32bit ones.

>   My emulation code should work in this case as it
> will use segmentation in 32-bit code descriptors. Is there anything else
> needed?
If I understand you correctly, you are saying that SLDT
executed in 64bit code segment, will inevitably segfault
to userspace. If this is the case and it makes your code
simpler, then its perfectly fine with me as dosemu does
not do this and the 64bit DOS progs are not anticipated.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 11:33               ` Stas Sergeev
  2017-03-10 14:17                 ` Andy Lutomirski
@ 2017-03-10 23:59                 ` Ricardo Neri
  2017-03-13 21:25                   ` Stas Sergeev
  1 sibling, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-10 23:59 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
> 10.03.2017 05:39, Andy Lutomirski пишет:
> > On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev <stsp@list.ru> wrote:
> >> 09.03.2017 04:15, Ricardo Neri пишет:
> >>
> >>> On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
> >>>> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
> >>>>> 08.03.2017 19:06, Andy Lutomirski пишет:
> >>>>>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
> >>>>>>> 08.03.2017 03:32, Ricardo Neri пишет:
> >>>>>>>> These are the instructions covered by UMIP:
> >>>>>>>> * SGDT - Store Global Descriptor Table
> >>>>>>>> * SIDT - Store Interrupt Descriptor Table
> >>>>>>>> * SLDT - Store Local Descriptor Table
> >>>>>>>> * SMSW - Store Machine Status Word
> >>>>>>>> * STR - Store Task Register
> >>>>>>>>
> >>>>>>>> This patchset initially treated tasks running in virtual-8086
> >>>> mode as a
> >>>>>>>> special case. However, I received clarification that DOSEMU[8]
> >>>> does not
> >>>>>>>> support applications that use these instructions.
> >>>>>> Can you remind me what was special about it?  It looks like you
> >>>> still
> >>>>>> emulate them in v8086 mode.
> >>>>> Indeed, sorry, I meant prot mode here. :)
> >>>>> So I wonder what was cited to be special about v86.
> >>> Initially my patches disabled UMIP on virtual-8086 instructions, without
> >>> regards of protected mode (i.e., UMIP was always enabled). I didn't have
> >>> emulation at the time. Then, I added emulation code that now covers
> >>> protected and virtual-8086 modes. I guess it is not special anymore.
> >> But isn't SLDT&friends just throw UD in v86?
> >> How does UMIP affect this? How does your patch affect
> >> this?
> > Er, right.  Ricardo, your code may need fixing.  But don't you have a
> > test case for this?
> Why would you need one?
> Or do you really want to allow these instructions
> in v86 by the means of emulation? If so - this wasn't
> clearly stated in the patch description, neither it was
> properly discussed, it seems.

It str and sldt can be emulated in vm86 but as Andy mention, the
behavior sould be the same with and without emulation.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 23:58         ` Stas Sergeev
@ 2017-03-11  0:13           ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-11  0:13 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel

On Sat, 2017-03-11 at 02:58 +0300, Stas Sergeev wrote:
> 11.03.2017 02:47, Ricardo Neri пишет:
> >>
> >>>> It doesn't need to be a matter of this particular
> >>>> patch set, i.e. this proposal should not trigger a
> >>>> v7 resend of all 21 patches. :) But it would be useful
> >>>> for the future development of dosemu2.
> >>> Would dosemu2 use 32-bit processes in order to keep segmentation? If it
> >>> could use 64-bit processes, emulation is not used in this case and the
> >>> SIGSEGV is delivered to user space.
> >> It does use the mix: 64bit process but some segments
> >> are 32bit for DOS code.
> > Do you mean that dosemu2 will start as a 64-bit process and will jump to
> > 32-bit code segments?
> Yes, so the offending insns are executed only in 32bit
> and 16bit segments, even if the process itself is 64bit.
> I guess you handle 16bit segments same as 32bit ones.

I have code to handle 16-bit and 32-bit address encodings differently.
Segmentation is used if !user_64bit_mode(regs). In such a case, the
emulation code will check the segment descriptor D flag and the
address-size overrides prefix to determine the address size and use
16-bit or 32-bit address encodings as applicable.

> 
> >   My emulation code should work in this case as it
> > will use segmentation in 32-bit code descriptors. Is there anything else
> > needed?
> If I understand you correctly, you are saying that SLDT
> executed in 64bit code segment, will inevitably segfault
> to userspace. 
Correct.

> If this is the case and it makes your code
> simpler, then its perfectly fine with me as dosemu does
> not do this and the 64bit DOS progs are not anticipated.

But if 32-bit or 16-bit code segments are used emulation will be used.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 14:17                 ` Andy Lutomirski
@ 2017-03-11  1:22                   ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-11  1:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stas Sergeev, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Fri, 2017-03-10 at 06:17 -0800, Andy Lutomirski wrote:
> On Fri, Mar 10, 2017 at 3:33 AM, Stas Sergeev <stsp@list.ru> wrote:
> > 10.03.2017 05:39, Andy Lutomirski пишет:
> >
> >> On Thu, Mar 9, 2017 at 2:10 PM, Stas Sergeev <stsp@list.ru> wrote:
> >>>
> >>> 09.03.2017 04:15, Ricardo Neri пишет:
> >>>
> >>>> On Wed, 2017-03-08 at 08:46 -0800, Andy Lutomirski wrote:
> >>>>>
> >>>>> On Wed, Mar 8, 2017 at 8:29 AM, Stas Sergeev <stsp@list.ru> wrote:
> >>>>>>
> >>>>>> 08.03.2017 19:06, Andy Lutomirski пишет:
> >>>>>>>
> >>>>>>> On Wed, Mar 8, 2017 at 6:08 AM, Stas Sergeev <stsp@list.ru> wrote:
> >>>>>>>>
> >>>>>>>> 08.03.2017 03:32, Ricardo Neri пишет:
> >>>>>>>>>
> >>>>>>>>> These are the instructions covered by UMIP:
> >>>>>>>>> * SGDT - Store Global Descriptor Table
> >>>>>>>>> * SIDT - Store Interrupt Descriptor Table
> >>>>>>>>> * SLDT - Store Local Descriptor Table
> >>>>>>>>> * SMSW - Store Machine Status Word
> >>>>>>>>> * STR - Store Task Register
> >>>>>>>>>
> >>>>>>>>> This patchset initially treated tasks running in virtual-8086
> >>>>>
> >>>>> mode as a
> >>>>>>>>>
> >>>>>>>>> special case. However, I received clarification that DOSEMU[8]
> >>>>>
> >>>>> does not
> >>>>>>>>>
> >>>>>>>>> support applications that use these instructions.
> >>>>>>>
> >>>>>>> Can you remind me what was special about it?  It looks like you
> >>>>>
> >>>>> still
> >>>>>>>
> >>>>>>> emulate them in v8086 mode.
> >>>>>>
> >>>>>> Indeed, sorry, I meant prot mode here. :)
> >>>>>> So I wonder what was cited to be special about v86.
> >>>>
> >>>> Initially my patches disabled UMIP on virtual-8086 instructions, without
> >>>> regards of protected mode (i.e., UMIP was always enabled). I didn't have
> >>>> emulation at the time. Then, I added emulation code that now covers
> >>>> protected and virtual-8086 modes. I guess it is not special anymore.
> >>>
> >>> But isn't SLDT&friends just throw UD in v86?
> >>> How does UMIP affect this? How does your patch affect
> >>> this?
> >>
> >> Er, right.  Ricardo, your code may need fixing.  But don't you have a
> >> test case for this?
> >
> > Why would you need one?
> > Or do you really want to allow these instructions
> > in v86 by the means of emulation? If so - this wasn't
> > clearly stated in the patch description, neither it was
> > properly discussed, it seems.
> 
> What I meant was: if the patches incorrectly started making these
> instructions work in vm86 mode where they used to cause a vm86 exit,
> then that's a bug that the selftest should have caught.

Yes, this is the case. I will fix this behavior... and update the test
cases.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-10 23:59                 ` Ricardo Neri
@ 2017-03-13 21:25                   ` Stas Sergeev
  2017-03-27 23:46                     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-13 21:25 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

11.03.2017 02:59, Ricardo Neri пишет:
> On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
>
>> Why would you need one?
>> Or do you really want to allow these instructions
>> in v86 by the means of emulation? If so - this wasn't
>> clearly stated in the patch description, neither it was
>> properly discussed, it seems.
> It str and sldt can be emulated in vm86 but as Andy mention, the
> behavior sould be the same with and without emulation.
Why would you do that?
I looked up the dosemu2 CPU simulator code that
is used under x86-64. It says this:
---
                                     CODE_FLUSH();
                                     if (REALMODE()) goto illegal_op;
                                     PC += ModRMSim(PC+1, mode) + 1;
                                     error("SLDT not implemented\n");
                                     break;
                                 case 1: /* STR */
                                     /* Store Task Register */
                                     CODE_FLUSH();
                                     if (REALMODE()) goto illegal_op;
                                     PC += ModRMSim(PC+1, mode) + 1;
                                     error("STR not implemented\n");
                                     break;
...
                                 case 0: /* SGDT */
                                     /* Store Global Descriptor Table 
Register */
                                     PC++; PC += ModRM(opc, PC, 
mode|DATA16|MSTORE);
                                     error("SGDT not implemented\n");
                                     break;
                                 case 1: /* SIDT */
                                     /* Store Interrupt Descriptor Table 
Register */
                                     PC++; PC += ModRM(opc, PC, 
mode|DATA16|MSTORE);
                                     error("SIDT not implemented\n");
                                     break;
---

It only implements smsw.
So maybe you can make your code much
simpler and remove the unneeded emulation?
Same is for prot mode. You know the wine's
requirements now - they are very small. And
dosemu doesn't need anything at all but smsw.
And even smsw is very rare.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-13 21:25                   ` Stas Sergeev
@ 2017-03-27 23:46                     ` Ricardo Neri
  2017-03-28  9:38                       ` Stas Sergeev
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-27 23:46 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Tue, 2017-03-14 at 00:25 +0300, Stas Sergeev wrote:
> 11.03.2017 02:59, Ricardo Neri пишет:
> > On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
> >
> >> Why would you need one?
> >> Or do you really want to allow these instructions
> >> in v86 by the means of emulation? If so - this wasn't
> >> clearly stated in the patch description, neither it was
> >> properly discussed, it seems.
> > It str and sldt can be emulated in vm86 but as Andy mention, the
> > behavior sould be the same with and without emulation.
> Why would you do that?
> I looked up the dosemu2 CPU simulator code that
> is used under x86-64. It says this:

Stas, I apologize for the delayed reply; I missed your e-mail. 
> ---
>                                      CODE_FLUSH();
>                                      if (REALMODE()) goto illegal_op;
>                                      PC += ModRMSim(PC+1, mode) + 1;
>                                      error("SLDT not implemented\n");
>                                      break;
>                                  case 1: /* STR */
>                                      /* Store Task Register */
>                                      CODE_FLUSH();
>                                      if (REALMODE()) goto illegal_op;
>                                      PC += ModRMSim(PC+1, mode) + 1;
>                                      error("STR not implemented\n");
>                                      break;
> ...
>                                  case 0: /* SGDT */
>                                      /* Store Global Descriptor Table 
> Register */
>                                      PC++; PC += ModRM(opc, PC, 
> mode|DATA16|MSTORE);
>                                      error("SGDT not implemented\n");
>                                      break;
>                                  case 1: /* SIDT */
>                                      /* Store Interrupt Descriptor Table 
> Register */
>                                      PC++; PC += ModRM(opc, PC, 
> mode|DATA16|MSTORE);
>                                      error("SIDT not implemented\n");
>                                      break;
> ---
> 
> It only implements smsw.
> So maybe you can make your code much
> simpler and remove the unneeded emulation?
> Same is for prot mode.

Do you mean the unneeded emulation for SLDT and STR?

> You know the wine's
> requirements now - they are very small. And
> dosemu doesn't need anything at all but smsw.
> And even smsw is very rare.
But emulation is still needed for SMSW, right?

The majority of my patches deal with computing the effective based on
the instruction operands and linear addresses based on the effective
address and the segment descriptor. Only two or three patches deal with
identifying particular UMIP-protected instructions. Not having to worry
about STR and SLDT in vm86 could simplify things a bit, though.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-27 23:46                     ` Ricardo Neri
@ 2017-03-28  9:38                       ` Stas Sergeev
  2017-03-29  4:38                         ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-28  9:38 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

28.03.2017 02:46, Ricardo Neri пишет:
> On Tue, 2017-03-14 at 00:25 +0300, Stas Sergeev wrote:
>> 11.03.2017 02:59, Ricardo Neri пишет:
>>> On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
>>>
>>>> Why would you need one?
>>>> Or do you really want to allow these instructions
>>>> in v86 by the means of emulation? If so - this wasn't
>>>> clearly stated in the patch description, neither it was
>>>> properly discussed, it seems.
>>> It str and sldt can be emulated in vm86 but as Andy mention, the
>>> behavior sould be the same with and without emulation.
>> Why would you do that?
>> I looked up the dosemu2 CPU simulator code that
>> is used under x86-64. It says this:
> Stas, I apologize for the delayed reply; I missed your e-mail.
>> It only implements smsw.
>> So maybe you can make your code much
>> simpler and remove the unneeded emulation?
>> Same is for prot mode.
> Do you mean the unneeded emulation for SLDT and STR?
Not quite, I meant also sgdt and sidt in vm86.
Yes that it will be a somewhat "incompatible" change,
but if there is nothing to stay compatible with,
then why to worry? Probably you could also remove
the sldt and str emulation for protected mode, because,
as I understand from this thread, wine does not
need those.

Note that these days dosemu2 uses v86 mode set
up under kvm rather than vm86(). Your patches
affect that the same way as they do for vm86()
syscall, or can there be some differences? Or should
the UMIP be enabled under kvm by hands?

>> You know the wine's
>> requirements now - they are very small. And
>> dosemu doesn't need anything at all but smsw.
>> And even smsw is very rare.
> But emulation is still needed for SMSW, right?
Likely so.
If you want, I can enable the logging of this command
and see if it is used by some of the DOS programs I have.
But at least dosemu implements it, so probably it is needed.
Of course if it is used by one of 100 DOS progs, then there
is an option to just add its support to dosemu2 and pretend
the compatibility problems did not exist. :) So, if this can be
an option, I can do the tests to estimate its usage.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-28  9:38                       ` Stas Sergeev
@ 2017-03-29  4:38                         ` Ricardo Neri
  2017-03-29 20:55                           ` Stas Sergeev
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-29  4:38 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Tue, 2017-03-28 at 12:38 +0300, Stas Sergeev wrote:
> 28.03.2017 02:46, Ricardo Neri пишет:
> > On Tue, 2017-03-14 at 00:25 +0300, Stas Sergeev wrote:
> >> 11.03.2017 02:59, Ricardo Neri пишет:
> >>> On Fri, 2017-03-10 at 14:33 +0300, Stas Sergeev wrote:
> >>>
> >>>> Why would you need one?
> >>>> Or do you really want to allow these instructions
> >>>> in v86 by the means of emulation? If so - this wasn't
> >>>> clearly stated in the patch description, neither it was
> >>>> properly discussed, it seems.
> >>> It str and sldt can be emulated in vm86 but as Andy mention, the
> >>> behavior sould be the same with and without emulation.
> >> Why would you do that?
> >> I looked up the dosemu2 CPU simulator code that
> >> is used under x86-64. It says this:
> > Stas, I apologize for the delayed reply; I missed your e-mail.
> >> It only implements smsw.
> >> So maybe you can make your code much
> >> simpler and remove the unneeded emulation?
> >> Same is for prot mode.
> > Do you mean the unneeded emulation for SLDT and STR?
> Not quite, I meant also sgdt and sidt in vm86.	
> Yes that it will be a somewhat "incompatible" change,
> but if there is nothing to stay compatible with,
> then why to worry?

My idea of compatibility was to have the emulation code behave exactly
as a processor without UMIP :)

> Probably you could also remove
> the sldt and str emulation for protected mode, because,
> as I understand from this thread, wine does not
> need those.

I see. I would lean on keeping the emulation because I already
implemented it :), for completeness, and because it is performed in a
single switch. The bulk of the emulation code deals with operands.
> 
> Note that these days dosemu2 uses v86 mode set
> up under kvm rather than vm86(). Your patches
> affect that the same way as they do for vm86()
> syscall, or can there be some differences?
My code does not touch kvm at all. I would need to assess how kvm will
behave.
> Or should
> the UMIP be enabled under kvm by hands?
There was an attempt to emulate UMIP that was submitted a while ago:
https://lkml.org/lkml/2016/7/12/644

> 
> >> You know the wine's
> >> requirements now - they are very small. And
> >> dosemu doesn't need anything at all but smsw.
> >> And even smsw is very rare.
> > But emulation is still needed for SMSW, right?
> Likely so.
> If you want, I can enable the logging of this command
> and see if it is used by some of the DOS programs I have.

It would be great if you could do that, if you don't mind.
> But at least dosemu implements it, so probably it is needed.

Right.

> Of course if it is used by one of 100 DOS progs, then there
> is an option to just add its support to dosemu2 and pretend
> the compatibility problems did not exist. :)
Do you mean relaying the GP fault to dosemu instead of trapping it and
emulating it in the kernel?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-29  4:38                         ` Ricardo Neri
@ 2017-03-29 20:55                           ` Stas Sergeev
  2017-03-30  5:14                             ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-29 20:55 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

29.03.2017 07:38, Ricardo Neri пишет:
>> Probably you could also remove
>> the sldt and str emulation for protected mode, because,
>> as I understand from this thread, wine does not
>> need those.
> I see. I would lean on keeping the emulation because I already
> implemented it :), for completeness, and because it is performed in a
> single switch. The bulk of the emulation code deals with operands.
But this is not for free.
As Andy said, you will then need a syscall and
a feature mask to be able to disable this emulation.
And AFAIK you haven't implemented that yet, so
there is something to consider.

>>>> You know the wine's
>>>> requirements now - they are very small. And
>>>> dosemu doesn't need anything at all but smsw.
>>>> And even smsw is very rare.
>>> But emulation is still needed for SMSW, right?
>> Likely so.
>> If you want, I can enable the logging of this command
>> and see if it is used by some of the DOS programs I have.
> It would be great if you could do that, if you don't mind.
OK, scheduled to the week-end.
I'll let you know.

>> But at least dosemu implements it, so probably it is needed.
> Right.
>
>> Of course if it is used by one of 100 DOS progs, then there
>> is an option to just add its support to dosemu2 and pretend
>> the compatibility problems did not exist. :)
> Do you mean relaying the GP fault to dosemu instead of trapping it and
> emulating it in the kernel?
Yes, that would be optimal if this does not severely break
the current setups. If we can find out that smsw is not in
the real use, we can probably do exactly that. But other
instructions are not in real use in v86 for sure, so I
wouldn't be adding the explicit test-cases to the kernel
that will make you depend on some particular behaviour
that no one may need. My objection was that we shouldn't
write tests before we know exactly how we want this to work.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-29 20:55                           ` Stas Sergeev
@ 2017-03-30  5:14                             ` Ricardo Neri
  2017-03-30 10:10                               ` Stas Sergeev
  2017-04-01 13:08                               ` Stas Sergeev
  0 siblings, 2 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-03-30  5:14 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Wed, 2017-03-29 at 23:55 +0300, Stas Sergeev wrote:
> 29.03.2017 07:38, Ricardo Neri пишет:
> >> Probably you could also remove
> >> the sldt and str emulation for protected mode, because,
> >> as I understand from this thread, wine does not
> >> need those.
> > I see. I would lean on keeping the emulation because I already
> > implemented it :), for completeness, and because it is performed in a
> > single switch. The bulk of the emulation code deals with operands.
> But this is not for free.
> As Andy said, you will then need a syscall and
> a feature mask to be able to disable this emulation.
> And AFAIK you haven't implemented that yet, so
> there is something to consider.

Right, I see your point.

> >>>> You know the wine's
> >>>> requirements now - they are very small. And
> >>>> dosemu doesn't need anything at all but smsw.
> >>>> And even smsw is very rare.
> >>> But emulation is still needed for SMSW, right?
> >> Likely so.
> >> If you want, I can enable the logging of this command
> >> and see if it is used by some of the DOS programs I have.
> > It would be great if you could do that, if you don't mind.
> OK, scheduled to the week-end.
> I'll let you know.

Thanks!

> 
> >> But at least dosemu implements it, so probably it is needed.
> > Right.
> >
> >> Of course if it is used by one of 100 DOS progs, then there
> >> is an option to just add its support to dosemu2 and pretend
> >> the compatibility problems did not exist. :)
> > Do you mean relaying the GP fault to dosemu instead of trapping it and
> > emulating it in the kernel?
> Yes, that would be optimal if this does not severely break
> the current setups. If we can find out that smsw is not in
> the real use, we can probably do exactly that. 
> But other
> instructions are not in real use in v86 for sure, so I
> wouldn't be adding the explicit test-cases to the kernel
> that will make you depend on some particular behaviour
> that no one may need.

> My objection was that we shouldn't
> write tests before we know exactly how we want this to work.
OK, if only SMSW is used then I'll keep the emulation for SMSW only.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-30  5:14                             ` Ricardo Neri
@ 2017-03-30 10:10                               ` Stas Sergeev
  2017-03-31  1:33                                 ` Ricardo Neri
  2017-04-01 13:08                               ` Stas Sergeev
  1 sibling, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-30 10:10 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, X86 ML, linux-msdos, wine-devel

30.03.2017 08:14, Ricardo Neri пишет:
>>>> But at least dosemu implements it, so probably it is needed.
>>> Right.
>>>
>>>> Of course if it is used by one of 100 DOS progs, then there
>>>> is an option to just add its support to dosemu2 and pretend
>>>> the compatibility problems did not exist. :)
>>> Do you mean relaying the GP fault to dosemu instead of trapping it and
>>> emulating it in the kernel?
>> Yes, that would be optimal if this does not severely break
>> the current setups. If we can find out that smsw is not in
>> the real use, we can probably do exactly that.
>> But other
>> instructions are not in real use in v86 for sure, so I
>> wouldn't be adding the explicit test-cases to the kernel
>> that will make you depend on some particular behaviour
>> that no one may need.
>> My objection was that we shouldn't
>> write tests before we know exactly how we want this to work.
> OK, if only SMSW is used then I'll keep the emulation for SMSW only.
In fact, smsw has an interesting property, which is that
no one will ever want to disable its in-kernel emulation
to provide its own.
So while I'll try to estimate its usage, emulating it in kernel
will not be that problematic in either case.
As for protected mode, if wine only needs sgdt/sidt, then
again, no one will want to disable its emulation. Not the
case with sldt, but AFAICS wine doesn't need sldt, and so
we can leave sldt without a fixups. Is my understanding
correct?
In this case, I suppose, we are very well on a way to avoid
the extra syscalls to toggle the emulation features.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-30 10:10                               ` Stas Sergeev
@ 2017-03-31  1:33                                 ` Ricardo Neri
  2017-03-31 14:11                                   ` Alexandre Julliard
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-03-31  1:33 UTC (permalink / raw)
  To: Stas Sergeev, Alexandre Julliard
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, X86 ML, linux-msdos, wine-devel

On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
> 30.03.2017 08:14, Ricardo Neri пишет:
> >>>> But at least dosemu implements it, so probably it is needed.
> >>> Right.
> >>>
> >>>> Of course if it is used by one of 100 DOS progs, then there
> >>>> is an option to just add its support to dosemu2 and pretend
> >>>> the compatibility problems did not exist. :)
> >>> Do you mean relaying the GP fault to dosemu instead of trapping it and
> >>> emulating it in the kernel?
> >> Yes, that would be optimal if this does not severely break
> >> the current setups. If we can find out that smsw is not in
> >> the real use, we can probably do exactly that.
> >> But other
> >> instructions are not in real use in v86 for sure, so I
> >> wouldn't be adding the explicit test-cases to the kernel
> >> that will make you depend on some particular behaviour
> >> that no one may need.
> >> My objection was that we shouldn't
> >> write tests before we know exactly how we want this to work.
> > OK, if only SMSW is used then I'll keep the emulation for SMSW only.
> In fact, smsw has an interesting property, which is that
> no one will ever want to disable its in-kernel emulation
> to provide its own.
> So while I'll try to estimate its usage, emulating it in kernel
> will not be that problematic in either case.

Ah good to know!

> As for protected mode, if wine only needs sgdt/sidt, then
> again, no one will want to disable its emulation. Not the
> case with sldt, but AFAICS wine doesn't need sldt, and so
> we can leave sldt without a fixups. Is my understanding
> correct?

This is my understanding as well. I could not find any use of sldt in
wine. Alexandre, would you mind confirming?

> In this case, I suppose, we are very well on a way to avoid
> the extra syscalls to toggle the emulation features.

Great! Then I will keep the emulation for sgdt, sidt, and smsw but not
for str and sldt; for both vm86 and protected mode. This seems to be the
agreement.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-31  1:33                                 ` Ricardo Neri
@ 2017-03-31 14:11                                   ` Alexandre Julliard
  2017-03-31 21:26                                     ` Stas Sergeev
  2017-04-04  2:02                                     ` Ricardo Neri
  0 siblings, 2 replies; 112+ messages in thread
From: Alexandre Julliard @ 2017-03-31 14:11 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Stas Sergeev, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, Peter Zijlstra,
	Andrew Morton, Brian Gerst, Chris Metcalf, Dave Hansen,
	Paolo Bonzini, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel

Ricardo Neri <ricardo.neri-calderon@linux.intel.com> writes:

> On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
>> 30.03.2017 08:14, Ricardo Neri пишет:
>> >>>> But at least dosemu implements it, so probably it is needed.
>> >>> Right.
>> >>>
>> >>>> Of course if it is used by one of 100 DOS progs, then there
>> >>>> is an option to just add its support to dosemu2 and pretend
>> >>>> the compatibility problems did not exist. :)
>> >>> Do you mean relaying the GP fault to dosemu instead of trapping it and
>> >>> emulating it in the kernel?
>> >> Yes, that would be optimal if this does not severely break
>> >> the current setups. If we can find out that smsw is not in
>> >> the real use, we can probably do exactly that.
>> >> But other
>> >> instructions are not in real use in v86 for sure, so I
>> >> wouldn't be adding the explicit test-cases to the kernel
>> >> that will make you depend on some particular behaviour
>> >> that no one may need.
>> >> My objection was that we shouldn't
>> >> write tests before we know exactly how we want this to work.
>> > OK, if only SMSW is used then I'll keep the emulation for SMSW only.
>> In fact, smsw has an interesting property, which is that
>> no one will ever want to disable its in-kernel emulation
>> to provide its own.
>> So while I'll try to estimate its usage, emulating it in kernel
>> will not be that problematic in either case.
>
> Ah good to know!
>
>> As for protected mode, if wine only needs sgdt/sidt, then
>> again, no one will want to disable its emulation. Not the
>> case with sldt, but AFAICS wine doesn't need sldt, and so
>> we can leave sldt without a fixups. Is my understanding
>> correct?
>
> This is my understanding as well. I could not find any use of sldt in
> wine. Alexandre, would you mind confirming?

Some versions of the Themida software protection are known to use sldt
as part of the virtual machine detection code [1]. The check currently
fails because it expects the LDT to be zero, so the app is already
broken, but sldt segfaulting would still cause a crash where there
wasn't one before.

However, I'm only aware of one application using this, and being able to
catch and emulate sldt ourselves would actually give us a chance to fix
this app in newer Wine versions, so I'm not opposed to having it
segfault.

In fact it would be nice to be able to make sidt/sgdt/etc. segfault
too. I know a new syscall is a pain, but as far as Wine is concerned,
being able to opt out from any emulation would be potentially useful.

[1] https://www.winehq.org/pipermail/wine-bugs/2008-February/094470.html

-- 
Alexandre Julliard
julliard@winehq.org

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-31 14:11                                   ` Alexandre Julliard
@ 2017-03-31 21:26                                     ` Stas Sergeev
  2017-04-01  2:18                                       ` Andy Lutomirski
  2017-04-04  2:02                                     ` Ricardo Neri
  1 sibling, 1 reply; 112+ messages in thread
From: Stas Sergeev @ 2017-03-31 21:26 UTC (permalink / raw)
  To: Alexandre Julliard, Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML,
	linux-msdos, wine-devel

31.03.2017 17:11, Alexandre Julliard пишет:
> In fact it would be nice to be able to make sidt/sgdt/etc. segfault
> too. I know a new syscall is a pain,
Maybe arch_prctl() then?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-31 21:26                                     ` Stas Sergeev
@ 2017-04-01  2:18                                       ` Andy Lutomirski
  0 siblings, 0 replies; 112+ messages in thread
From: Andy Lutomirski @ 2017-04-01  2:18 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Alexandre Julliard, Ricardo Neri, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, Peter Zijlstra,
	Andrew Morton, Brian Gerst, Chris Metcalf, Dave Hansen,
	Paolo Bonzini, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel

On Fri, Mar 31, 2017 at 2:26 PM, Stas Sergeev <stsp@list.ru> wrote:
> 31.03.2017 17:11, Alexandre Julliard пишет:
>>
>> In fact it would be nice to be able to make sidt/sgdt/etc. segfault
>> too. I know a new syscall is a pain,
>
> Maybe arch_prctl() then?

I still like my idea of a generic mechanism to turn off
backwards-compatibility things.  After all, hardened programs should
turn off UMIP fixups entirely.  They should also turn off vsyscall
emulation entirely, and I see no reason that these mechanisms should
be different.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-30  5:14                             ` Ricardo Neri
  2017-03-30 10:10                               ` Stas Sergeev
@ 2017-04-01 13:08                               ` Stas Sergeev
  2017-04-01 17:49                                 ` H. Peter Anvin
  2017-04-04  2:05                                 ` Ricardo Neri
  1 sibling, 2 replies; 112+ messages in thread
From: Stas Sergeev @ 2017-04-01 13:08 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

30.03.2017 08:14, Ricardo Neri пишет:
>>>>>> You know the wine's
>>>>>> requirements now - they are very small. And
>>>>>> dosemu doesn't need anything at all but smsw.
>>>>>> And even smsw is very rare.
>>>>> But emulation is still needed for SMSW, right?
>>>> Likely so.
>>>> If you want, I can enable the logging of this command
>>>> and see if it is used by some of the DOS programs I have.
>>> It would be great if you could do that, if you don't mind.
>> OK, scheduled to the week-end.
>> I'll let you know.
> Thanks!
OK, done the testing.
It appears smsw is used in v86 by windows-3.1 and dos4gw
at the very least, and these are the "major" apps. So doing
without a fixup in v86 will not go unnoticed. Unfortunately
this also means that KVM-vm86 should be properly tested.
I have also found a weird program that does SGDT under
v86. This causes "ERROR: SGDT not implemented" under
dosemu, but the prog still works fine as it obviously does
not care about the results. This app can easily be broken
of course, if that makes any sense (likely not).

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-04-01 13:08                               ` Stas Sergeev
@ 2017-04-01 17:49                                 ` H. Peter Anvin
  2017-04-02 15:52                                   ` Andy Lutomirski
  2017-04-04  9:59                                   ` Stas Sergeev
  2017-04-04  2:05                                 ` Ricardo Neri
  1 sibling, 2 replies; 112+ messages in thread
From: H. Peter Anvin @ 2017-04-01 17:49 UTC (permalink / raw)
  To: Stas Sergeev, Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Liang Z Li,
	Masami Hiramatsu, Huang Rui, Jiri Slaby, Jonathan Corbet,
	Michael S. Tsirkin, Paul Gortmaker, Vlastimil Babka, Chen Yucong,
	Alexandre Julliard, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, X86.ML

<x86@kernel.org>,linux-msdos@vger.kernel.org,wine-devel@winehq.org
From: hpa@zytor.com
Message-ID: <3FD12652-AA83-4D73-9914-BBA089E58FFA@zytor.com>

On April 1, 2017 6:08:43 AM PDT, Stas Sergeev <stsp@list.ru> wrote:
>30.03.2017 08:14, Ricardo Neri пишет:
>>>>>>> You know the wine's
>>>>>>> requirements now - they are very small. And
>>>>>>> dosemu doesn't need anything at all but smsw.
>>>>>>> And even smsw is very rare.
>>>>>> But emulation is still needed for SMSW, right?
>>>>> Likely so.
>>>>> If you want, I can enable the logging of this command
>>>>> and see if it is used by some of the DOS programs I have.
>>>> It would be great if you could do that, if you don't mind.
>>> OK, scheduled to the week-end.
>>> I'll let you know.
>> Thanks!
>OK, done the testing.
>It appears smsw is used in v86 by windows-3.1 and dos4gw
>at the very least, and these are the "major" apps. So doing
>without a fixup in v86 will not go unnoticed. Unfortunately
>this also means that KVM-vm86 should be properly tested.
>I have also found a weird program that does SGDT under
>v86. This causes "ERROR: SGDT not implemented" under
>dosemu, but the prog still works fine as it obviously does
>not care about the results. This app can easily be broken
>of course, if that makes any sense (likely not).

Using SMSW to detect v86 mode is relatively common.  pushf hides the VM flag, but SMSW is available, providing the v86 virtualization hole.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-04-01 17:49                                 ` H. Peter Anvin
@ 2017-04-02 15:52                                   ` Andy Lutomirski
  2017-04-04  9:59                                   ` Stas Sergeev
  1 sibling, 0 replies; 112+ messages in thread
From: Andy Lutomirski @ 2017-04-02 15:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Stas Sergeev, Ricardo Neri, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86.ML

On Sat, Apr 1, 2017 at 10:49 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> <x86@kernel.org>,linux-msdos@vger.kernel.org,wine-devel@winehq.org
> From: hpa@zytor.com
> Message-ID: <3FD12652-AA83-4D73-9914-BBA089E58FFA@zytor.com>
>
> On April 1, 2017 6:08:43 AM PDT, Stas Sergeev <stsp@list.ru> wrote:
>>30.03.2017 08:14, Ricardo Neri пишет:
>>>>>>>> You know the wine's
>>>>>>>> requirements now - they are very small. And
>>>>>>>> dosemu doesn't need anything at all but smsw.
>>>>>>>> And even smsw is very rare.
>>>>>>> But emulation is still needed for SMSW, right?
>>>>>> Likely so.
>>>>>> If you want, I can enable the logging of this command
>>>>>> and see if it is used by some of the DOS programs I have.
>>>>> It would be great if you could do that, if you don't mind.
>>>> OK, scheduled to the week-end.
>>>> I'll let you know.
>>> Thanks!
>>OK, done the testing.
>>It appears smsw is used in v86 by windows-3.1 and dos4gw
>>at the very least, and these are the "major" apps. So doing
>>without a fixup in v86 will not go unnoticed. Unfortunately
>>this also means that KVM-vm86 should be properly tested.
>>I have also found a weird program that does SGDT under
>>v86. This causes "ERROR: SGDT not implemented" under
>>dosemu, but the prog still works fine as it obviously does
>>not care about the results. This app can easily be broken
>>of course, if that makes any sense (likely not).
>
> Using SMSW to detect v86 mode is relatively common.  pushf hides the VM flag, but SMSW is available, providing the v86 virtualization hole.

I think we should emulate all the instructions (as documented in the
SDM, so things that #UD in v86 mode still do so) rather than trying to
be clever.  If we're clever and we get it wrong, we might discover
that something started depending on our cleverness in the mean time.

--Andy

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-03-31 14:11                                   ` Alexandre Julliard
  2017-03-31 21:26                                     ` Stas Sergeev
@ 2017-04-04  2:02                                     ` Ricardo Neri
  2017-04-04  6:08                                       ` Alexandre Julliard
  1 sibling, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-04  2:02 UTC (permalink / raw)
  To: Alexandre Julliard
  Cc: Stas Sergeev, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, Peter Zijlstra,
	Andrew Morton, Brian Gerst, Chris Metcalf, Dave Hansen,
	Paolo Bonzini, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel

On Fri, 2017-03-31 at 16:11 +0200, Alexandre Julliard wrote:
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> writes:
> 
> > On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
> >> 30.03.2017 08:14, Ricardo Neri пишет:
> >> >>>> But at least dosemu implements it, so probably it is needed.
> >> >>> Right.
> >> >>>
> >> >>>> Of course if it is used by one of 100 DOS progs, then there
> >> >>>> is an option to just add its support to dosemu2 and pretend
> >> >>>> the compatibility problems did not exist. :)
> >> >>> Do you mean relaying the GP fault to dosemu instead of trapping it and
> >> >>> emulating it in the kernel?
> >> >> Yes, that would be optimal if this does not severely break
> >> >> the current setups. If we can find out that smsw is not in
> >> >> the real use, we can probably do exactly that.
> >> >> But other
> >> >> instructions are not in real use in v86 for sure, so I
> >> >> wouldn't be adding the explicit test-cases to the kernel
> >> >> that will make you depend on some particular behaviour
> >> >> that no one may need.
> >> >> My objection was that we shouldn't
> >> >> write tests before we know exactly how we want this to work.
> >> > OK, if only SMSW is used then I'll keep the emulation for SMSW only.
> >> In fact, smsw has an interesting property, which is that
> >> no one will ever want to disable its in-kernel emulation
> >> to provide its own.
> >> So while I'll try to estimate its usage, emulating it in kernel
> >> will not be that problematic in either case.
> >
> > Ah good to know!
> >
> >> As for protected mode, if wine only needs sgdt/sidt, then
> >> again, no one will want to disable its emulation. Not the
> >> case with sldt, but AFAICS wine doesn't need sldt, and so
> >> we can leave sldt without a fixups. Is my understanding
> >> correct?
> >
> > This is my understanding as well. I could not find any use of sldt in
> > wine. Alexandre, would you mind confirming?
> 
> Some versions of the Themida software protection are known to use sldt
> as part of the virtual machine detection code [1]. The check currently
> fails because it expects the LDT to be zero, so the app is already
> broken, but sldt segfaulting would still cause a crash where there
> wasn't one before.
> 
> However, I'm only aware of one application using this, and being able to
> catch and emulate sldt ourselves would actually give us a chance to fix
> this app in newer Wine versions, so I'm not opposed to having it
> segfault.

Great! Then this is in line with what we are aiming to do with dosemu2:
not emulate str and sldt.
> 
> In fact it would be nice to be able to make sidt/sgdt/etc. segfault
> too. I know a new syscall is a pain, but as far as Wine is concerned,
> being able to opt out from any emulation would be potentially useful.

I see. I guess for now there should not be a problem with emulating
sidt/sgdt/smsw, right? In this way we don't break current versions of
winehq and programs using it. In a phase two we can introduce the
syscall so that kernel fixups can be disabled. Does this make sense?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-04-01 13:08                               ` Stas Sergeev
  2017-04-01 17:49                                 ` H. Peter Anvin
@ 2017-04-04  2:05                                 ` Ricardo Neri
  2017-04-04  8:03                                   ` Stas Sergeev
  1 sibling, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-04  2:05 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

On Sat, 2017-04-01 at 16:08 +0300, Stas Sergeev wrote:
> 30.03.2017 08:14, Ricardo Neri пишет:
> >>>>>> You know the wine's
> >>>>>> requirements now - they are very small. And
> >>>>>> dosemu doesn't need anything at all but smsw.
> >>>>>> And even smsw is very rare.
> >>>>> But emulation is still needed for SMSW, right?
> >>>> Likely so.
> >>>> If you want, I can enable the logging of this command
> >>>> and see if it is used by some of the DOS programs I have.
> >>> It would be great if you could do that, if you don't mind.
> >> OK, scheduled to the week-end.
> >> I'll let you know.
> > Thanks!
> OK, done the testing.
> It appears smsw is used in v86 by windows-3.1 and dos4gw
> at the very least, and these are the "major" apps. So doing
> without a fixup in v86 will not go unnoticed. Unfortunately
> this also means that KVM-vm86 should be properly tested.
> I have also found a weird program that does SGDT under
> v86. This causes "ERROR: SGDT not implemented" under
> dosemu, but the prog still works fine as it obviously does
> not care about the results. This app can easily be broken
> of course, if that makes any sense (likely not).

Thanks for inputs! Then it seems that we will need emulation for sgdt
and smsw. Perhaps sidt? sldt and str will not need emulation in either
protected mode or virtual-8086 mode. At a later stage I can look into
working in the syscall as Andy proposes.

I will also look into the kvm-v86 path for dosemu2.

It seems we have an agreement :) Do we?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-04-04  2:02                                     ` Ricardo Neri
@ 2017-04-04  6:08                                       ` Alexandre Julliard
  0 siblings, 0 replies; 112+ messages in thread
From: Alexandre Julliard @ 2017-04-04  6:08 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Stas Sergeev, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, Peter Zijlstra,
	Andrew Morton, Brian Gerst, Chris Metcalf, Dave Hansen,
	Paolo Bonzini, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Fenghua Yu, Ravi V. Shankar,
	Shuah Khan, linux-kernel, X86 ML, linux-msdos, wine-devel

Ricardo Neri <ricardo.neri-calderon@linux.intel.com> writes:

> On Fri, 2017-03-31 at 16:11 +0200, Alexandre Julliard wrote:
>> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> writes:
>> 
>> > On Thu, 2017-03-30 at 13:10 +0300, Stas Sergeev wrote:
>> >> 30.03.2017 08:14, Ricardo Neri пишет:
>> >> In fact, smsw has an interesting property, which is that
>> >> no one will ever want to disable its in-kernel emulation
>> >> to provide its own.
>> >> So while I'll try to estimate its usage, emulating it in kernel
>> >> will not be that problematic in either case.
>> >
>> > Ah good to know!
>> >
>> >> As for protected mode, if wine only needs sgdt/sidt, then
>> >> again, no one will want to disable its emulation. Not the
>> >> case with sldt, but AFAICS wine doesn't need sldt, and so
>> >> we can leave sldt without a fixups. Is my understanding
>> >> correct?
>> >
>> > This is my understanding as well. I could not find any use of sldt in
>> > wine. Alexandre, would you mind confirming?
>> 
>> Some versions of the Themida software protection are known to use sldt
>> as part of the virtual machine detection code [1]. The check currently
>> fails because it expects the LDT to be zero, so the app is already
>> broken, but sldt segfaulting would still cause a crash where there
>> wasn't one before.
>> 
>> However, I'm only aware of one application using this, and being able to
>> catch and emulate sldt ourselves would actually give us a chance to fix
>> this app in newer Wine versions, so I'm not opposed to having it
>> segfault.
>
> Great! Then this is in line with what we are aiming to do with dosemu2:
> not emulate str and sldt.
>> 
>> In fact it would be nice to be able to make sidt/sgdt/etc. segfault
>> too. I know a new syscall is a pain, but as far as Wine is concerned,
>> being able to opt out from any emulation would be potentially useful.
>
> I see. I guess for now there should not be a problem with emulating
> sidt/sgdt/smsw, right? In this way we don't break current versions of
> winehq and programs using it. In a phase two we can introduce the
> syscall so that kernel fixups can be disabled. Does this make sense?

Yes, that makes sense.

-- 
Alexandre Julliard
julliard@winehq.org

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-04-04  2:05                                 ` Ricardo Neri
@ 2017-04-04  8:03                                   ` Stas Sergeev
  0 siblings, 0 replies; 112+ messages in thread
From: Stas Sergeev @ 2017-04-04  8:03 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Brian Gerst, Chris Metcalf, Dave Hansen, Paolo Bonzini,
	Liang Z Li, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Fenghua Yu,
	Ravi V. Shankar, Shuah Khan, linux-kernel, X86 ML, linux-msdos,
	wine-devel

04.04.2017 05:05, Ricardo Neri пишет:
> On Sat, 2017-04-01 at 16:08 +0300, Stas Sergeev wrote:
>> 30.03.2017 08:14, Ricardo Neri пишет:
>>>>>>>> You know the wine's
>>>>>>>> requirements now - they are very small. And
>>>>>>>> dosemu doesn't need anything at all but smsw.
>>>>>>>> And even smsw is very rare.
>>>>>>> But emulation is still needed for SMSW, right?
>>>>>> Likely so.
>>>>>> If you want, I can enable the logging of this command
>>>>>> and see if it is used by some of the DOS programs I have.
>>>>> It would be great if you could do that, if you don't mind.
>>>> OK, scheduled to the week-end.
>>>> I'll let you know.
>>> Thanks!
>> OK, done the testing.
>> It appears smsw is used in v86 by windows-3.1 and dos4gw
>> at the very least, and these are the "major" apps. So doing
>> without a fixup in v86 will not go unnoticed. Unfortunately
>> this also means that KVM-vm86 should be properly tested.
>> I have also found a weird program that does SGDT under
>> v86. This causes "ERROR: SGDT not implemented" under
>> dosemu, but the prog still works fine as it obviously does
>> not care about the results. This app can easily be broken
>> of course, if that makes any sense (likely not).
> Thanks for inputs! Then it seems that we will need emulation for sgdt
> and smsw.
I wouldn't claim we need an emulation of sgdt. One
or 2 exotic apps do not count much, considering the
overall small usage of dosemu and an easiness of
re-adding them to dosemu itself.
So if it makes any sense to not add it for vm86, then
please leave it omitted. However it seems Andy wants
an overall completeness here, lot let me just say I'll be
fine with either option.

>   Perhaps sidt?
If only for overall completeness.
If it makes any sense to, please leave it omitted.

>   sldt and str will not need emulation in either
> protected mode or virtual-8086 mode. At a later stage I can look into
> working in the syscall as Andy proposes.
>
> I will also look into the kvm-v86 path for dosemu2.
>
> It seems we have an agreement :) Do we?
Yes, fine with me.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
  2017-04-01 17:49                                 ` H. Peter Anvin
  2017-04-02 15:52                                   ` Andy Lutomirski
@ 2017-04-04  9:59                                   ` Stas Sergeev
  1 sibling, 0 replies; 112+ messages in thread
From: Stas Sergeev @ 2017-04-04  9:59 UTC (permalink / raw)
  To: H. Peter Anvin, Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Brian Gerst,
	Chris Metcalf, Dave Hansen, Paolo Bonzini, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, X86.ML

01.04.2017 20:49, H. Peter Anvin пишет:
> <x86@kernel.org>,linux-msdos@vger.kernel.org,wine-devel@winehq.org
> From: hpa@zytor.com
> Message-ID: <3FD12652-AA83-4D73-9914-BBA089E58FFA@zytor.com>
>
> On April 1, 2017 6:08:43 AM PDT, Stas Sergeev <stsp@list.ru> wrote:
>> 30.03.2017 08:14, Ricardo Neri пишет:
>>>>>>>> You know the wine's
>>>>>>>> requirements now - they are very small. And
>>>>>>>> dosemu doesn't need anything at all but smsw.
>>>>>>>> And even smsw is very rare.
>>>>>>> But emulation is still needed for SMSW, right?
>>>>>> Likely so.
>>>>>> If you want, I can enable the logging of this command
>>>>>> and see if it is used by some of the DOS programs I have.
>>>>> It would be great if you could do that, if you don't mind.
>>>> OK, scheduled to the week-end.
>>>> I'll let you know.
>>> Thanks!
>> OK, done the testing.
>> It appears smsw is used in v86 by windows-3.1 and dos4gw
>> at the very least, and these are the "major" apps. So doing
>> without a fixup in v86 will not go unnoticed. Unfortunately
>> this also means that KVM-vm86 should be properly tested.
>> I have also found a weird program that does SGDT under
>> v86. This causes "ERROR: SGDT not implemented" under
>> dosemu, but the prog still works fine as it obviously does
>> not care about the results. This app can easily be broken
>> of course, if that makes any sense (likely not).
> Using SMSW to detect v86 mode is relatively common.  pushf hides the VM flag, but SMSW is available, providing the v86 virtualization hole.
Perhaps sgdt in v86 is used (very rare) for the same purpose then.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points to R/ESP
  2017-03-08  0:32 ` [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points to R/ESP Ricardo Neri
@ 2017-04-11 11:31   ` Borislav Petkov
  2017-04-26  1:39     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-11 11:31 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Adam Buchbinder,
	Colin Ian King, Lorenzo Stoakes, Qiaowei Ren, Nathan Howard,
	Adan Hawthorn, Joe Perches

On Tue, Mar 07, 2017 at 04:32:35PM -0800, Ricardo Neri wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when memory addressing is used
> (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
> the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
> used in the computation of the memory address.
> 
> In these cases the address is simply the value present in the register
> pointed by the base part of the SIB byte plus the displacement byte.
> 
> An example of such instruction could be
> 
>     insn -0x80(%rsp)
> 
> This is represented as:
> 
>      [opcode] 4c 23 80
> 
>       ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
>       SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
>       Displacement -0x80
> 
> The correct address is (base) + displacement; no index is used.
> 
> We can achieve the desired effect of not using the index by making
> get_reg_offset return -EDOM in this particular case. This value indicates
> callers that they should not use the index to calculate the address.
> EINVAL continues to indicate that an error when decoding the SIB byte.
> 
> Care is taken to allow R12 to be used as index, which is a valid scenario.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Nathan Howard <liverlint@gmail.com>
> Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/mm/mpx.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index ff112e3..d9e92d6 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  		regno = X86_SIB_INDEX(insn->sib.value);
>  		if (X86_REX_X(insn->rex_prefix.value))
>  			regno += 8;
> +		/*
> +		 * If mod !=3, register R/ESP (regno=4) is not used as index in
> +		 * the address computation. Check is done after looking at REX.X
> +		 * This is because R12 (regno=12) can be used as an index.
> +		 */
> +		if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
> +			return -EDOM;

Hmm, ok, so this is a bit confusing, to me at least. Maybe you're saying
the same things but here's how I see it:

1. When ModRM.mod != 11b and ModRM.rm == 100b, all that does mean
is that you have a SIB byte following. I.e., you have indexed
register-indirect addressing.

Now, you still need to decode the SIB byte and it goes this way:

SIB.index == 100b means that the index register specification is
null, i.e., the scale*index portion of that indexed register-indirect
addressing is null, i.e., you have an offset following the SIB byte.
Now, depending on ModRM.mod, that offset is:

ModRM.mod == 01b -> 1 byte offset
ModRM.mod == 10b -> 4 bytes offset

That's why for an instruction like this one (let's use your example) you
have:

	8b 4c 23 80             mov    -0x80(%rbx,%riz,1),%ecx

That's basically a binutils hack to state that the SIB index register is
null.

Another SIB index register works, of course:

	 8b 4c 03 80             mov -0x80(%rbx,%rax,1),%ecx

Ok, so far so good.

2. Now, the %r12 thing is part of the REX implications to those
encodings: That's the REX.X bit which adds a fourth bit to the encoding
of the SIB base register, i.e., if you specify a register with
SIB.index, you want to be able to specify all 16 regs, thus the 4th
bit. That's why it says that the SIB byte is required for %r12-based
addressing.

I.e., you can still have a SIB.index == 100b addressing with an index
register which is not null but that is only because SIB.index is now
{REX.X=1b, 100b}, i.e.:

Prefixes:
 REX:                   0x43 { 4 [w]: 0 [r]: 0 [x]: 1 [b]: 1 }
Opcode:                 0x8b
ModRM:                  0x4c  [mod:1b][.R:0b,reg:1b][.B:1b,r/m:1100b]
                        register-indirect mode, 1-byte offset in displ. field
SIB:                    0x63 [.B:1b,base:1011b][.X:1b,idx:1100b][scale: 1]

 MOV Gv,Ev; MOV reg{16,32,64} reg/mem{16,32,64}
               0:       43 8b 4c 63 80          mov -0x80(%r11,%r12,2),%ecx

So, I'm not saying your version is necessarily wrong - I'm just saying
that it could explain the situation a bit more verbose.

Btw, I'd flip the if-test above:

	if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)

to make it just like the order the conditions are specified in the
manuals.

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses
  2017-03-08  0:32 ` [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
@ 2017-04-11 21:56   ` Borislav Petkov
  2017-04-26  1:40     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-11 21:56 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Nathan Howard, Adan Hawthorn,
	Joe Perches

On Tue, Mar 07, 2017 at 04:32:34PM -0800, Ricardo Neri wrote:
> Even though memory addresses are unsigned. The operands used to compute the

				... unsigned, the operands ...

> effective address do have a sign. This is true for the r/m part of the
> ModRM byte, the base and index parts of the SiB byte as well as the
> displacement. Thus, signed variables shall be used when computing the
> effective address from these operands. Once the signed effective address
> has been computed, it is casted to an unsigned long to determine the
> linear address.
> 
> Variables are renamed to better reflect the type of address being
> computed.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0
  2017-03-08  0:32 ` [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0 Ricardo Neri
@ 2017-04-11 22:08   ` Borislav Petkov
  2017-04-26  2:04     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-11 22:08 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Nathan Howard, Adan Hawthorn,
	Joe Perches

On Tue, Mar 07, 2017 at 04:32:36PM -0800, Ricardo Neri wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when a SIB byte is used and the
> base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part
> of the ModRM byte is zero, the value of such register will not be used
> as part of the address computation. To signal this, a -EDOM error is
> returned to indicate callers that they should ignore the value.
> 
> Also, for this particular case, a displacement of 32-bits should follow
> the SIB byte if the mod part of ModRM is equal to zero. The instruction
> decoder ensures that this is the case.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Nathan Howard <liverlint@gmail.com>
> Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/mm/mpx.c | 29 ++++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index d9e92d6..ef7eb67 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  
>  	case REG_TYPE_BASE:
>  		regno = X86_SIB_BASE(insn->sib.value);
> +		/*
> +		 * If mod is 0 and register R/EBP (regno=5) is indicated in the
> +		 * base part of the SIB byte,

you can simply say here: "if SIB.base == 5, the base of the
register-indirect addressing is 0."

> the value of such register should
> +		 * not be used in the address computation. Also, a 32-bit

Not "Also" but "In this case, a 32-bit displacement..."

> +		 * displacement is expected in this case; the instruction
> +		 * decoder takes care of it. This is true for both R13 and
> +		 * R/EBP as REX.B will not be decoded.

You don't need that sentence as the only thing that matters is ModRM.mod
being 0.

> +		 */
> +		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)

The 0 test we normally do with the ! (also flip parts of if-condition):

		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)

> +			return -EDOM;
> +
>  		if (X86_REX_B(insn->rex_prefix.value))
>  			regno += 8;
>  		break;
> @@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  		eff_addr = regs_get_register(regs, addr_offset);
>  	} else {
>  		if (insn->sib.nbytes) {
> +			/*
> +			 * Negative values in the base and index offset means
> +			 * an error when decoding the SIB byte. Except -EDOM,
> +			 * which means that the registers should not be used
> +			 * in the address computation.
> +			 */
>  			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> -			if (base_offset < 0)
> +			if (unlikely(base_offset == -EDOM))
> +				base = 0;
> +			else if (unlikely(base_offset < 0))

Bah, unlikely's in something which is not really a hot path. They only
encumber readability, no need for them.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 04/21] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel
  2017-03-08  0:32 ` [v6 PATCH 04/21] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel Ricardo Neri
@ 2017-04-12 10:03   ` Borislav Petkov
  2017-04-26  2:05     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-12 10:03 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:37PM -0800, Ricardo Neri wrote:
> Other kernel submodules can benefit from using the utility functions
> defined in mpx.c to obtain the addresses and values of operands contained
> in the general purpose registers. An instance of this is the emulation code
> used for instructions protected by the Intel User-Mode Instruction
> Prevention feature.
> 
> Thus, these functions are relocated to a new insn-eval.c file. The reason
> to not relocate these utilities into insn.c is that the latter solely
> analyses instructions given by a struct insn without any knowledge of the
> meaning of the values of instruction operands. This new utility insn-
> eval.c aims to be used to resolve effective and userspace linear addresses
> based on the contents of the instruction operands as well as the contents
> of pt_regs structure.
> 
> These utilities come with a separate header. This is to avoid taking insn.c
> out of sync from the instructions decoders under tools/obj and tools/perf.
> This also avoids adding cumbersome #ifdef's for the #include'd files
> required to decode instructions in a kernel context.
> 
> Functions are simply relocated. There are not functional or indentation
> changes.

...

> +	case REG_TYPE_BASE:
> +		regno = X86_SIB_BASE(insn->sib.value);
> +		/*
> +		 * If mod is 0 and register R/EBP (regno=5) is indicated in the
> +		 * base part of the SIB byte, the value of such register should
> +		 * not be used in the address computation. Also, a 32-bit
> +		 * displacement is expected in this case; the instruction
> +		 * decoder takes care of it. This is true for both R13 and
> +		 * R/EBP as REX.B will not be decoded.
> +		 */
> +		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
> +			return -EDOM;
> +
> +		if (X86_REX_B(insn->rex_prefix.value))
> +			regno += 8;
> +		break;
> +
> +	default:
> +		pr_err("invalid register type");
> +		BUG();

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
#211: FILE: arch/x86/lib/insn-eval.c:90:
+               BUG();

And checkpatch is kinda right. We need to warn here, not explode. Oh and
that function returns negative values on error...

Please change that with a patch ontop of the move.

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets
  2017-03-08  0:32 ` [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets Ricardo Neri
@ 2017-04-12 16:28   ` Borislav Petkov
  2017-04-26 18:13     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-12 16:28 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:38PM -0800, Ricardo Neri wrote:
> The function insn_get_reg_offset takes as argument an enumeration that

Please end function names with parentheses.

And do you mean get_reg_offset(), per chance?

> indicates the type of offset that is returned: the R/M part of the ModRM
> byte, the index of the SIB byte or the base of the SIB byte.

Err, you mean, it returns the offset to the register the argument
specifies.

> Callers of
> this function would need the definition of such enumeration. This is not
> needed. Instead, helper functions can be defined for this purpose can be
> added.

"Instead, add helpers... "

> These functions are useful in cases when, for instance, the caller
> needs to decide whether the operand is a register or a memory location by
> looking at the mod part of the ModRM byte.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/insn-eval.h |  3 +++
>  arch/x86/lib/insn-eval.c         | 51 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> index 5cab1b1..754211b 100644
> --- a/arch/x86/include/asm/insn-eval.h
> +++ b/arch/x86/include/asm/insn-eval.h
> @@ -12,5 +12,8 @@
>  #include <asm/ptrace.h>
>  
>  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
> +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);

Forgotten to edit the copy-paste?

Which means, nothing really needs insn_get_reg_offset_sib_index() and
you can get rid of it?

>  #endif /* _ASM_X86_INSN_EVAL_H */
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 23cf010..78df1c9 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  	return regoff[regno];
>  }
>  
> +/**
> + * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
> + * @insn:	Instruction structure containing the ModRM byte
> + * @regs:	Set of registers indicated by the ModRM byte

That's simply struct pt_regs - not a set of registers indicated by
ModRM?!?

> + * Obtain the register indicated by the r/m part of the ModRM byte. The
> + * register is obtained as an offset from the base of pt_regs. In specific
> + * cases, the returned value can be -EDOM to indicate that the particular value
> + * of ModRM does not refer to a register.

Put that sentence under the "Return: " paragraph below so that it is
immediately obvious what the retvals are.

> + *
> + * Return: Register indicated by r/m, as an offset within struct pt_regs
> + */
> +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)

That name is too long: insn_get_modrm_rm_off() should be enough.

> +{
> +	return get_reg_offset(insn, regs, REG_TYPE_RM);
> +}
> +
> +/**
> + * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
> + * @insn:	Instruction structure containing the SiB byte
> + * @regs:	Set of registers indicated by the SiB byte
> + *
> + * Obtain the register indicated by the base part of the SiB byte. The
> + * register is obtained as an offset from the base of pt_regs. In specific
> + * cases, the returned value can be -EDOM to indicate that the particular value
> + * of SiB does not refer to a register.
> + *
> + * Return: Register indicated by SiB's base, as an offset within struct pt_regs

Let's stick to a single spelling: SIB, all caps.

> + */
> +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)

insn_get_sib_base_off()

Ditto for the rest of the comments on insn_get_reg_offset_modrm_rm() above.

> +{
> +	return get_reg_offset(insn, regs, REG_TYPE_BASE);
> +}
> +
> +/**
> + * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
> + * @insn:	Instruction structure containing the SiB byte
> + * @regs:	Set of registers indicated by the SiB byte
> + *
> + * Obtain the register indicated by the index part of the SiB byte. The
> + * register is obtained as an offset from the index of pt_regs. In specific
> + * cases, the returned value can be -EDOM to indicate that the particular value
> + * of SiB does not refer to a register.
> + *
> + * Return: Register indicated by SiB's base, as an offset within struct pt_regs
> + */
> +int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)

insn_get_sib_idx_off()

And again, if this function is unused, don't add it.

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector
  2017-03-08  0:32 ` [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
@ 2017-04-18  9:42   ` Borislav Petkov
  2017-04-26 20:44     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-18  9:42 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:39PM -0800, Ricardo Neri wrote:
> When computing a linear address and segmentation is used, we need to know
> the base address of the segment involved in the computation. In most of
> the cases, the segment base address will be zero as in USER_DS/USER32_DS.
> However, it may be possible that a user space program defines its own
> segments via a local descriptor table. In such a case, the segment base
> address may not be zero .Thus, the segment base address is needed to
> calculate correctly the linear address.
> 
> The segment selector to be used when computing a linear address is
> determined by either any of segment select override prefixes in the
> instruction or inferred from the registers involved in the computation of
> the effective address; in that order. Also, there are cases when the
> overrides shall be ignored.
> 
> For clarity, this process can be split into two steps: resolving the
> relevant segment and, once known, read the applicable segment selector.
> The method to obtain the segment selector depends on several factors. In
> 32-bit builds, segment selectors are saved into the pt_regs structure
> when switching to kernel mode. The same is also true for virtual-8086
> mode. In 64-bit builds, segmentation is mostly ignored, except when
> running a program in 32-bit legacy mode. In this case, CS and SS can be
> obtained from pt_regs. DS, ES, FS and GS can be read directly from
> registers.

> Lastly, segmentation is possible in 64-bit mode via FS and GS.

I'd say "Lastly, the only two segment registers which are not ignored in
long mode are FS and GS."

> In these two cases, base addresses are obtained from the relevant MSRs.

s/relevant/respective/

> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 195 insertions(+)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 78df1c9..8d45df8 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -8,6 +8,7 @@
>  #include <asm/inat.h>
>  #include <asm/insn.h>
>  #include <asm/insn-eval.h>
> +#include <asm/vm86.h>
>  
>  enum reg_type {
>  	REG_TYPE_RM = 0,
> @@ -15,6 +16,200 @@ enum reg_type {
>  	REG_TYPE_BASE,
>  };
>  
> +enum segment {
> +	SEG_CS = 0x23,
> +	SEG_SS = 0x36,
> +	SEG_DS = 0x3e,
> +	SEG_ES = 0x26,
> +	SEG_FS = 0x64,
> +	SEG_GS = 0x65
> +};
> +
> +/**
> + * resolve_seg_selector() - obtain segment selector
> + * @regs:	Set of registers containing the segment selector

That arg is gone.

> + * @insn:	Instruction structure with selector override prefixes
> + * @regoff:	Operand offset, in pt_regs, of which the selector is needed
> + * @default:	Resolve default segment selector (i.e., ignore overrides)
> + *
> + * The segment selector to which an effective address refers depends on
> + * a) segment selector overrides instruction prefixes or b) the operand
> + * register indicated in the ModRM or SiB byte.
> + *
> + * For case a), the function inspects any prefixes in the insn instruction;

s/insn //

> + * insn can be null to indicate that selector override prefixes shall be
> + * ignored.

This is not what the code does: it returns -EINVAL when insn is NULL.

> This is useful when the use of prefixes is forbidden (e.g.,
> + * obtaining the code selector). For case b), the operand register shall be
> + * represented as the offset from the base address of pt_regs. Also, regoff
> + * can be -EINVAL for cases in which registers are not used as operands (e.g.,
> + * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
> + *
> + * This function returns the segment selector to utilize as per the conditions
> + * described above. Please note that this functin does not return the value
> + * of the segment selector. The value of the segment selector needs to be
> + * obtained using get_segment_selector and passing the segment selector type
> + * resolved by this function.
> + *
> + * Return: Segment selector to use, among CS, SS, DS, ES, FS or GS.

	    : negative value when...

> + */
> +static int resolve_seg_selector(struct insn *insn, int regoff, bool get_default)
> +{
> +	int i;
> +
> +	if (!insn)
> +		return -EINVAL;
> +
> +	if (get_default)
> +		goto default_seg;
> +	/*
> +	 * Check first if we have selector overrides. Having more than
> +	 * one selector override leads to undefined behavior. We
> +	 * only use the first one and return

Well, I'd return -EINVAL to catch that undefined behavior. Note in a
local var that I've already seen a seg reg and then if I see another
one, return -EINVAL.

> +	 */
> +	for (i = 0; i < insn->prefixes.nbytes; i++) {
> +		switch (insn->prefixes.bytes[i]) {
> +		case SEG_CS:
> +			return SEG_CS;
> +		case SEG_SS:
> +			return SEG_SS;
> +		case SEG_DS:
> +			return SEG_DS;
> +		case SEG_ES:
> +			return SEG_ES;
> +		case SEG_FS:
> +			return SEG_FS;
> +		case SEG_GS:
> +			return SEG_GS;

So what happens if you're in 64-bit mode and you have CS, DS, ES, or SS?
Or is this what @get_default is supposed to do? But it doesn't look like
it, it still returns segments ignored in 64-bit mode.

> +		default:
> +			return -EINVAL;
> +		}
> +	}
> +
> +default_seg:
> +	/*
> +	 * If no overrides, use default selectors as described in the
> +	 * Intel documentation: SS for ESP or EBP. DS for all data references,
> +	 * except when relative to stack or string destination.
> +	 * Also, AX, CX and DX are not valid register operands in 16-bit
> +	 * address encodings.
> +	 * Callers must interpret the result correctly according to the type
> +	 * of instructions (e.g., use ES for string instructions).
> +	 * Also, some values of modrm and sib might seem to indicate the use
> +	 * of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but actually
> +	 * they refer to cases in which only a displacement used. These cases
> +	 * should be indentified by the caller and not with this function.
> +	 */
> +	switch (regoff) {
> +	case offsetof(struct pt_regs, ax):
> +		/* fall through */
> +	case offsetof(struct pt_regs, cx):
> +		/* fall through */
> +	case offsetof(struct pt_regs, dx):
> +		if (insn && insn->addr_bytes == 2)
> +			return -EINVAL;
> +	case -EDOM: /* no register involved in address computation */
> +	case offsetof(struct pt_regs, bx):
> +		/* fall through */
> +	case offsetof(struct pt_regs, di):
> +		/* fall through */

		return SEG_ES;

?

It is even in the comment above. I'm looking at MOVS %es:%rdi, %ds:%rsi,
for example.

> +	case offsetof(struct pt_regs, si):
> +		return SEG_DS;
> +	case offsetof(struct pt_regs, bp):
> +		/* fall through */
> +	case offsetof(struct pt_regs, sp):
> +		return SEG_SS;
> +	case offsetof(struct pt_regs, ip):
> +		return SEG_CS;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +/**
> + * get_segment_selector() - obtain segment selector
> + * @regs:	Set of registers containing the segment selector
> + * @seg_type:	Type of segment selector to obtain
> + * @regoff:	Operand offset, in pt_regs, of which the selector is needed

That's gone.

> + *
> + * Obtain the segment selector for any of CS, SS, DS, ES, FS, GS. In
> + * CONFIG_X86_32, the segment is obtained from either pt_regs or
> + * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
> + * from pt_regs. DS, ES, FS and GS are obtained by reading the ds and es, fs
> + * and gs, respectively.

... and DS and ES are ignored in long mode.

> + *
> + * Return: Value of the segment selector

	... or negative...
> + */
> +static unsigned short get_segment_selector(struct pt_regs *regs,
> +					   enum segment seg_type)
> +{

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor
  2017-03-08  0:32 ` [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor Ricardo Neri
@ 2017-04-19 10:26   ` Borislav Petkov
  2017-04-26 21:51     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-19 10:26 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:40PM -0800, Ricardo Neri wrote:
> The segment descriptor contains information that is relevant to how linear
> address need to be computed. It contains the default size of addresses as
> well as the base address of the segment. Thus, given a segment selector,
> we ought look at segment descriptor to correctly calculate the linear
> address.
> 
> In protected mode, the segment selector might indicate a segment
> descriptor from either the global descriptor table or a local descriptor
> table. Both cases are considered in this function.
> 
> This function is the initial implementation for subsequent functions that
> will obtain the aforementioned attributes of the segment descriptor.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 8d45df8..8608adf 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -5,9 +5,13 @@
>   */
>  #include <linux/kernel.h>
>  #include <linux/string.h>
> +#include <asm/desc_defs.h>
> +#include <asm/desc.h>
>  #include <asm/inat.h>
>  #include <asm/insn.h>
>  #include <asm/insn-eval.h>
> +#include <asm/ldt.h>
> +#include <linux/mmu_context.h>
>  #include <asm/vm86.h>
>  
>  enum reg_type {
> @@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  }
>  
>  /**
> + * get_desc() - Obtain address of segment descriptor
> + * @seg:	Segment selector

Maybe that should be

@sel

if it is a sel-ector. :)

And using "sel" makes more sense then when you look at:

	desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);

for example:

> + * @desc:	Pointer to the selected segment descriptor
> + *
> + * Given a segment selector, obtain a memory pointer to the segment

s/memory //

> + * descriptor. Both global and local descriptor tables are supported.
> + * desc will contain the address of the descriptor.
> + *
> + * Return: 0 if success, -EINVAL if failure

Why isn't this function returning the pointer or NULL on error? Maybe
the later patches have an answer and I'll discover it if I continue
reviewing :)

> + */
> +static int get_desc(unsigned short seg, struct desc_struct **desc)
> +{
> +	struct desc_ptr gdt_desc = {0, 0};
> +	unsigned long desc_base;
> +
> +	if (!desc)
> +		return -EINVAL;
> +
> +	desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);

That looks useless as you're doing it below again.

> +
> +#ifdef CONFIG_MODIFY_LDT_SYSCALL
> +	if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
> +		seg >>= 3;
> +
> +		mutex_lock(&current->active_mm->context.lock);
> +		if (unlikely(!current->active_mm->context.ldt ||

Is that really a fast path to complicate the if-test with an unlikely()?
If not, you don't really need it.

> +			     seg >= current->active_mm->context.ldt->size)) {

ldt->size is the size of the descriptor table but you've shifted seg by
3. That selector index is shifted by 3 (to the left) to form an offset
into the descriptor table because the entries there are 8 bytes.

So I *think* you wanna use the "useless" desc_base above... :)

> +			*desc = NULL;
> +			mutex_unlock(&current->active_mm->context.lock);
> +			return -EINVAL;
> +		}
> +
> +		*desc = &current->active_mm->context.ldt->entries[seg];

... and seg here as it is an index into the table.

> +		mutex_unlock(&current->active_mm->context.lock);
> +		return 0;
> +	}
> +#endif
> +	native_store_gdt(&gdt_desc);
> +
> +	/*
> +	 * Bits [15:3] of the segment selector contain the index. Such
> +	 * index needs to be multiplied by 8.

... because <insert reason I typed in above>.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-03-08  0:32 ` [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address Ricardo Neri
@ 2017-04-20  8:25   ` Borislav Petkov
  2017-04-26 22:37     ` Ricardo Neri
  2017-04-26 22:52     ` Ricardo Neri
  0 siblings, 2 replies; 112+ messages in thread
From: Borislav Petkov @ 2017-04-20  8:25 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:41PM -0800, Ricardo Neri wrote:
> With segmentation, the base address of the segment descriptor is needed
> to compute a linear address. The segment descriptor used in the address
> computation depends on either any segment override prefixes in the in the

s/in the //

> instruction or the default segment determined by the registers involved
> in the address computation. Thus, both the instruction as well as the
> register (specified as the offset from the base of pt_regs) are given as
> inputs, along with a boolean variable to select between override and
> default.
> 
> The segment selector is determined by get_seg_selector with the inputs

Please end function names with parentheses: get_seg_selector().

> described above. Once the selector is known the base address is

					known, ...

> determined. In protected mode, the selector is used to obtain the segment
> descriptor and then its base address. If in 64-bit user mode, the segment =
> base address is zero except when FS or GS are used. In virtual-8086 mode,
> the base address is computed as the value of the segment selector shifted 4
> positions to the left.

Good.

> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/insn-eval.h |  2 ++
>  arch/x86/lib/insn-eval.c         | 66 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+)
> 
> diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> index 754211b..b201742 100644
> --- a/arch/x86/include/asm/insn-eval.h
> +++ b/arch/x86/include/asm/insn-eval.h
> @@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
>  int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
>  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
>  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> +				int regoff, bool use_default_seg);
>  
>  #endif /* _ASM_X86_INSN_EVAL_H */
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 8608adf..383ca83 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct **desc)
>  }
>  
>  /**
> + * insn_get_seg_base() - Obtain base address contained in descriptor
> + * @regs:	Set of registers containing the segment selector
> + * @insn:	Instruction structure with selector override prefixes
> + * @regoff:	Operand offset, in pt_regs, of which the selector is needed
> + * @use_default_seg: Use the default segment instead of prefix overrides

I'm wondering whether you really need that bool or you can deduce this
from pt_regs... I guess I'll see...

> + *
> + * Obtain the base address of the segment descriptor as indicated by either
> + * any segment override prefixes contained in insn or the default segment
> + * applicable to the register indicated by regoff. regoff is specified as the
> + * offset in bytes from the base of pt_regs.
> + *
> + * Return: In protected mode, base address of the segment. It may be zero in
> + * certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086
> + * mode, the segment selector shifed 4 positions to the right. -1L in case of

s/shifed/shifted/

> + * error.
> + */
> +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> +				int regoff, bool use_default_seg)
> +{
> +	struct desc_struct *desc;
> +	unsigned short seg;
> +	enum segment seg_type;
> +	int ret;
> +
> +	seg_type = resolve_seg_selector(insn, regoff, use_default_seg);

<--- error handling.

And that's not really a "seg_type" but simply the "sel"-ector. And that
"enum segment" is not really a segment but an segment override prefixes
enum. Can we please get the nomenclature right first?

> +
> +	seg = get_segment_selector(regs, seg_type);

s/seg/sel/

> +	if (seg < 0)
> +		return -1L;
> +
> +	if (v8086_mode(regs))
> +		/*
> +		 * Base is simply the segment selector shifted 4
> +		 * positions to the right.
> +		 */
> +		return (unsigned long)(seg << 4);
> +
> +#ifdef CONFIG_X86_64
> +	if (user_64bit_mode(regs)) {

	if (IS_ENABLED(CONFIG_X86_64) && user_64bit_mode(regs)) {

> +		/*
> +		 * Only FS or GS will have a base address, the rest of
> +		 * the segments' bases are forced to 0.
> +		 */
> +		unsigned long base;
> +
> +		if (seg_type == SEG_FS)
> +			rdmsrl(MSR_FS_BASE, base);
> +		else if (seg_type == SEG_GS)
> +			/*
> +			 * swapgs was called at the kernel entry point. Thus,
> +			 * MSR_KERNEL_GS_BASE will have the user-space GS base.
> +			 */
> +			rdmsrl(MSR_KERNEL_GS_BASE, base);
> +		else
> +			base = 0;
> +		return base;
> +	}
> +#endif
> +	ret = get_desc(seg, &desc);
> +	if (ret)
> +		return -1L;
> +
> +	return get_desc_base(desc);
> +}
> +
> +/**
>   * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
>   * @insn:	Instruction structure containing the ModRM byte
>   * @regs:	Set of registers indicated by the ModRM byte
> -- 
> 2.9.3
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 09/21] x86/insn-eval: Add functions to get default operand and address sizes
  2017-03-08  0:32 ` [v6 PATCH 09/21] x86/insn-eval: Add functions to get default operand and address sizes Ricardo Neri
@ 2017-04-20 13:06   ` Borislav Petkov
  2017-04-27  1:07     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-20 13:06 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:42PM -0800, Ricardo Neri wrote:
> These functions read the default values of the address and operand sizes
> as specified in the segment descriptor. This information is determined
> from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
> 32-bit legacy modes. For virtual-8086 mode, the default address and
> operand sizes are always 2 bytes.

Yeah, we tend to call that customarily 16-bit :)

> The D bit is only meaningful for code segments. Thus, these functions
> always use the code segment selector contained in regs.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/include/asm/insn-eval.h |  2 +
>  arch/x86/lib/insn-eval.c         | 80 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+)
> 
> diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> index b201742..a0d81fc 100644
> --- a/arch/x86/include/asm/insn-eval.h
> +++ b/arch/x86/include/asm/insn-eval.h
> @@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
>  int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
>  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
>  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs);
> +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs);
>  unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
>  				int regoff, bool use_default_seg);
>  
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 383ca83..cda6c71 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
>  }
>  
>  /**
> + * insn_get_seg_default_address_bytes - Obtain default address size of segment
> + * @regs:	Set of registers containing the segment selector
> + *
> + * Obtain the default address size as indicated in the segment descriptor
> + * selected in regs' code segment selector. In protected mode, the default
> + * address is determined by inspecting the L and D bits of the segment
> + * descriptor. In virtual-8086 mode, the default is always two bytes.
> + *
> + * Return: Default address size of segment

		0 on error.

> + */
> +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs)
> +{
> +	struct desc_struct *desc;
> +	unsigned short seg;
> +	int ret;
> +
> +	if (v8086_mode(regs))
> +		return 2;
> +
> +	seg = (unsigned short)regs->cs;
> +
> +	ret = get_desc(seg, &desc);
> +	if (ret)
> +		return 0;
> +
> +	switch ((desc->l << 1) | desc->d) {
> +	case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
> +		return 2;
> +	case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
> +		return 4;
> +	case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
> +		return 8;
> +	case 3: /* Invalid setting. CS.L=1, CS.D=1 */
> +		/* fall through */
> +	default:
> +		return 0;
> +	}
> +}
> +
> +/**
> + * insn_get_seg_default_operand_bytes - Obtain default operand size of segment
> + * @regs:	Set of registers containing the segment selector
> + *
> + * Obtain the default operand size as indicated in the segment descriptor
> + * selected in regs' code segment selector. In protected mode, the default
> + * operand size is determined by inspecting the L and D bits of the segment
> + * descriptor. In virtual-8086 mode, the default is always two bytes.
> + *
> + * Return: Default operand size of segment
> + */
> +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)

Right, so default address and operand size always go together so I don't
think you need two separate functions.

So what I'd suggest - provided this pans out (I still haven't reviewed
the whole thing) - is to determine the operating mode of the segment:
long, legacy, etc and then return both address and operand sizes. Patch
17/21 needs them both at the same time AFAICT.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero
  2017-03-08  0:32 ` [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero Ricardo Neri
@ 2017-04-21 10:52   ` Borislav Petkov
  2017-04-27  1:29     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-21 10:52 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:43PM -0800, Ricardo Neri wrote:
> Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when the mod part of the ModRM
> byte is zero and R/EBP is specified in the R/M part of such bit, the value
> of the aforementioned register should not be used in the address
> computation. Instead, a 32-bit displacement is expected. The instruction
> decoder takes care of setting the displacement to the expected value.
> Returning -EDOM signals callers that they should ignore the value of such
> register when computing the address encoded in the instruction operands.
> 
> Also, callers should exercise care to correctly interpret this particular
> case. In IA-32e 64-bit mode, the address is given by the displacement plus
> the value of the RIP. In IA-32e compatibility mode, the value of EIP is
> ignored. This correction is done for our insn_get_addr_ref.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index cda6c71..ea10b03 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>  	switch (type) {
>  	case REG_TYPE_RM:
>  		regno = X86_MODRM_RM(insn->modrm.value);
> +		/* if mod=0, register R/EBP is not used in the address
> +		 * computation. Instead, a 32-bit displacement is expected;
> +		 * the instruction decoder takes care of reading such
> +		 * displacement. This is true for both R/EBP and R13, as the
> +		 * REX.B bit is not decoded.
> +		 */

I'd simply write here: "ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit
displacement is following."

In addition, kernel comments style is:

	/*
	 * A sentence ending with a full-stop.
	 * Another sentence. ...
	 * More sentences. ...
	 */

> +		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
> +			return -EDOM;

	if (X86_MODRM_MOD(insn->modrm.value) == 0 &&
	    X86_MODRM_RM(insn->modrm.value)  == 5)

looks more understandable to me.

>  		if (X86_REX_B(insn->rex_prefix.value))
>  			regno += 8;
>  		break;
> @@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
>  		} else {
>  			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> -			if (addr_offset < 0)
> +			/* -EDOM means that we must ignore the address_offset.
> +			 * The only case in which we see this value is when
> +			 * R/M points to R/EBP. In such a case, in 64-bit mode
> +			 * the effective address is relative to tho RIP.

s/tho//

> +			 */

Kernel comments style is:

	/*
	 * A sentence ending with a full-stop.
	 * Another sentence. ...
	 * More sentences. ...
	 */

> +			if (addr_offset == -EDOM) {
> +				eff_addr = 0;
> +#ifdef CONFIG_X86_64
> +				if (user_64bit_mode(regs))
> +					eff_addr = (long)regs->ip;

Is regs->ip the rIP of the *following* insn?

> +#endif

You can do this in a prepatch and then get rid of the ifdeffery here:

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 2b5d686ea9f3..f6239273c5f1 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -115,9 +115,9 @@ static inline int v8086_mode(struct pt_regs *regs)
 #endif
 }
 
-#ifdef CONFIG_X86_64
 static inline bool user_64bit_mode(struct pt_regs *regs)
 {
+#ifdef CONFIG_X86_64
 #ifndef CONFIG_PARAVIRT
 	/*
 	 * On non-paravirt systems, this is the only long mode CPL 3
@@ -128,6 +128,9 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
 	/* Headers are too twisted for this to go in paravirt.h. */
 	return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;
 #endif
+#else /* !CONFIG_X86_64 */
+	return false;
+#endif
 }
 
 #define current_user_stack_pointer()	current_pt_regs()->sp
---

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 11/21] insn/eval: Incorporate segment base in address computation
  2017-03-08  0:32 ` [v6 PATCH 11/21] insn/eval: Incorporate segment base in address computation Ricardo Neri
@ 2017-04-21 14:55   ` Borislav Petkov
  2017-04-27  1:31     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-21 14:55 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:44PM -0800, Ricardo Neri wrote:
> insn_get_addr_ref returns the effective address as defined by the

Please end function names with parentheses.

> section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual. In order to compute the linear address, we must add
> to the effective address the segment base address as set in the segment
> descriptor. Furthermore, the segment descriptor to use depends on the
> register that is used as the base of the effective address. The effective
> base address varies depending on whether the operand is a register or a
> memory address and on whether a SiB byte is used.
> 
> In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
> segment is used or if segmentation is not used. However, the base address
> is not necessarily zero if a user programs defines its own segments. This
> is possible by using a local descriptor table.
> 
> Since the effective address is a signed quantity, the unsigned segment
> base address saved in a separate variable and added to the final effective

".. is saved..."

> address.
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses
  2017-03-08  0:32 ` [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
@ 2017-04-25 13:51   ` Borislav Petkov
  2017-04-27  3:33     ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-25 13:51 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, Mar 07, 2017 at 04:32:45PM -0800, Ricardo Neri wrote:
> The 32-bit and 64-bit address encodings are identical. This means that we
> can use the same function in both cases. In order to reuse the function for
> 32-bit address encodings, we must sign-extend our 32-bit signed operands to
> 64-bit signed variables (only for 64-bit builds). To decide on whether sign
> extension is needed, we rely on the address size as given by the
> instruction structure.
> 
> Lastly, before computing the linear address, we must truncate our signed
> 64-bit signed effective address if the address size is 32-bit.
> 
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Garnier <thgarnie@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  arch/x86/lib/insn-eval.c | 44 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index edb360f..a9a1704 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
>  	return get_reg_offset(insn, regs, REG_TYPE_INDEX);
>  }
>  
> +static inline long __to_signed_long(unsigned long val, int long_bytes)
> +{
> +#ifdef CONFIG_X86_64
> +	return long_bytes == 4 ? (long)((int)((val) & 0xffffffff)) : (long)val;

I don't think this always works as expected:

---
typedef unsigned int u32;
typedef unsigned long u64;

int main()
{
        u64 v = 0x1ffffffff;

        printf("v: %ld, 0x%lx, %ld\n", v, v, (long)((int)((v) & 0xffffffff)));

        return 0;
}
--
...

v: 8589934591, 0x1ffffffff, -1

Now, this should not happen on 32-bit because unsigned long is 32-bit
there but can that happen on 64-bit?

> +#else
> +	return (long)val;
> +#endif
> +}
> +
>  /*
>   * return the address being referenced be instruction
>   * for rm=3 returning the content of the rm reg
> @@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
>  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  {
>  	unsigned long linear_addr, seg_base_addr;
> -	long eff_addr, base, indx;
> -	int addr_offset, base_offset, indx_offset;
> +	long eff_addr, base, indx, tmp;
> +	int addr_offset, base_offset, indx_offset, addr_bytes;
>  	insn_byte_t sib;
>  
>  	insn_get_modrm(insn);
>  	insn_get_sib(insn);
>  	sib = insn->sib.value;
> +	addr_bytes = insn->addr_bytes;
>  
>  	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
>  		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
>  		if (addr_offset < 0)
>  			goto out_err;
> -		eff_addr = regs_get_register(regs, addr_offset);
> +		tmp = regs_get_register(regs, addr_offset);
> +		eff_addr = __to_signed_long(tmp, addr_bytes);

This repeats throughout the function so it begs to be a separate:

	get_mem_addr()

or so.

>  		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
>  						  false);
>  	} else {
> @@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  			 * in the address computation.
>  			 */
>  			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> -			if (unlikely(base_offset == -EDOM))
> +			if (unlikely(base_offset == -EDOM)) {
>  				base = 0;
> -			else if (unlikely(base_offset < 0))
> +			} else if (unlikely(base_offset < 0)) {
>  				goto out_err;
> -			else
> -				base = regs_get_register(regs, base_offset);
> +			} else {
> +				tmp = regs_get_register(regs, base_offset);
> +				base = __to_signed_long(tmp, addr_bytes);
> +			}
>  
>  			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> -			if (unlikely(indx_offset == -EDOM))
> +			if (unlikely(indx_offset == -EDOM)) {
>  				indx = 0;
> -			else if (unlikely(indx_offset < 0))
> +			} else if (unlikely(indx_offset < 0)) {
>  				goto out_err;
> -			else
> -				indx = regs_get_register(regs, indx_offset);
> +			} else {
> +				tmp = regs_get_register(regs, indx_offset);
> +				indx = __to_signed_long(tmp, addr_bytes);
> +			}
>  
>  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
>  			seg_base_addr = insn_get_seg_base(regs, insn,
> @@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>  			} else if (addr_offset < 0) {
>  				goto out_err;
>  			} else {
> -				eff_addr = regs_get_register(regs, addr_offset);
> +				tmp = regs_get_register(regs, addr_offset);
> +				eff_addr = __to_signed_long(tmp, addr_bytes);
>  			}
>  			seg_base_addr = insn_get_seg_base(regs, insn,
>  							  addr_offset, false);
>  		}
>  		eff_addr += insn->displacement.value;
>  	}
> +	/* truncate to 4 bytes for 32-bit effective addresses */
> +	if (addr_bytes == 4)
> +		eff_addr &= 0xffffffff;

Why again?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points to R/ESP
  2017-04-11 11:31   ` Borislav Petkov
@ 2017-04-26  1:39     ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26  1:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Liang Z Li, Masami Hiramatsu,
	Huang Rui, Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin,
	Paul Gortmaker, Vlastimil Babka, Chen Yucong, Alexandre Julliard,
	Stas Sergeev, Fenghua Yu, Ravi V. Shankar, Shuah Khan,
	linux-kernel, x86, linux-msdos, wine-devel, Adam Buchbinder,
	Colin Ian King, Lorenzo Stoakes, Qiaowei Ren, Nathan Howard,
	Adan Hawthorn, Joe Perches

Hi Boris,

I am sorry I missed your feedback earlier. Thanks for commenting!

On Tue, 2017-04-11 at 13:31 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:35PM -0800, Ricardo Neri wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when memory addressing is used
> > (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
> > the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
> > used in the computation of the memory address.
> > 
> > In these cases the address is simply the value present in the register
> > pointed by the base part of the SIB byte plus the displacement byte.
> > 
> > An example of such instruction could be
> > 
> >     insn -0x80(%rsp)
> > 
> > This is represented as:
> > 
> >      [opcode] 4c 23 80
> > 
> >       ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
> >       SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
> >       Displacement -0x80
> > 
> > The correct address is (base) + displacement; no index is used.
> > 
> > We can achieve the desired effect of not using the index by making
> > get_reg_offset return -EDOM in this particular case. This value indicates
> > callers that they should not use the index to calculate the address.
> > EINVAL continues to indicate that an error when decoding the SIB byte.
> > 
> > Care is taken to allow R12 to be used as index, which is a valid scenario.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Nathan Howard <liverlint@gmail.com>
> > Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> > Cc: Joe Perches <joe@perches.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/mm/mpx.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> > index ff112e3..d9e92d6 100644
> > --- a/arch/x86/mm/mpx.c
> > +++ b/arch/x86/mm/mpx.c
> > @@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  		regno = X86_SIB_INDEX(insn->sib.value);
> >  		if (X86_REX_X(insn->rex_prefix.value))
> >  			regno += 8;
> > +		/*
> > +		 * If mod !=3, register R/ESP (regno=4) is not used as index in
> > +		 * the address computation. Check is done after looking at REX.X
> > +		 * This is because R12 (regno=12) can be used as an index.
> > +		 */
> > +		if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
> > +			return -EDOM;
> 
> Hmm, ok, so this is a bit confusing, to me at least. Maybe you're saying
> the same things but here's how I see it:
> 
> 1. When ModRM.mod != 11b and ModRM.rm == 100b, all that does mean
> is that you have a SIB byte following. I.e., you have indexed
> register-indirect addressing.

Yes, callers of this function already know that there is a SIB byte
because they saw ModRM.mod != 11b and ModRM.rm == 100b and struct
insn.sib.nbytes is non zero.
> 
> Now, you still need to decode the SIB byte and it goes this way:
> 
> SIB.index == 100b means that the index register specification is
> null, i.e., the scale*index portion of that indexed register-indirect
> addressing is null, i.e., you have an offset following the SIB byte.
> Now, depending on ModRM.mod, that offset is:

Yes, for this reason if ModRM.rm != 11b and an index of 100b is found
the function return -EDOM to indicate callers to not use the index. We
need to return -EDOM because this function returns an offset from the
base of struct pt_regs for successful cases. A negative value indicates
to not use the offset.

Perhaps a better wording is to say as you propose: the scale*index
portion that indexed register-indirect addressing is null. I will take
your wording!
> 
> ModRM.mod == 01b -> 1 byte offset
> ModRM.mod == 10b -> 4 bytes offset

Callers will now the size of the offset based on struct
insn.displacement.value.

> 
> That's why for an instruction like this one (let's use your example) you
> have:
> 
> 	8b 4c 23 80             mov    -0x80(%rbx,%riz,1),%ecx
> 
> That's basically a binutils hack to state that the SIB index register is
> null.
> 
> Another SIB index register works, of course:
> 
> 	 8b 4c 03 80             mov -0x80(%rbx,%rax,1),%ecx
> 
> Ok, so far so good.
> 
> 2. Now, the %r12 thing is part of the REX implications to those
> encodings: That's the REX.X bit which adds a fourth bit to the encoding
> of the SIB base register, i.e., if you specify a register with
> SIB.index, you want to be able to specify all 16 regs, thus the 4th
> bit. That's why it says that the SIB byte is required for %r12-based
> addressing.
> 
> I.e., you can still have a SIB.index == 100b addressing with an index
> register which is not null but that is only because SIB.index is now
> {REX.X=1b, 100b}, i.e.:
> 
> Prefixes:
>  REX:                   0x43 { 4 [w]: 0 [r]: 0 [x]: 1 [b]: 1 }
> Opcode:                 0x8b
> ModRM:                  0x4c  [mod:1b][.R:0b,reg:1b][.B:1b,r/m:1100b]
>                         register-indirect mode, 1-byte offset in displ. field
> SIB:                    0x63 [.B:1b,base:1011b][.X:1b,idx:1100b][scale: 1]
> 
>  MOV Gv,Ev; MOV reg{16,32,64} reg/mem{16,32,64}
>                0:       43 8b 4c 63 80          mov -0x80(%r11,%r12,2),%ecx

Correct, that is why we check the value of regno (the value of ModRM.rm)
after correcting its value in case we find REX.X set. In this way we can
use %r12.
> 
> So, I'm not saying your version is necessarily wrong - I'm just saying
> that it could explain the situation a bit more verbose.

Sure. I will be more verbose in my commit message.
> 
> Btw, I'd flip the if-test above:
> 
> 	if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
> 
> to make it just like the order the conditions are specified in the
> manuals.

Will do.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses
  2017-04-11 21:56   ` Borislav Petkov
@ 2017-04-26  1:40     ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26  1:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Nathan Howard, Adan Hawthorn,
	Joe Perches

On Tue, 2017-04-11 at 23:56 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:34PM -0800, Ricardo Neri wrote:
> > Even though memory addresses are unsigned. The operands used to compute the
> 
> 				... unsigned, the operands ...

Oops! I will correct.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0
  2017-04-11 22:08   ` Borislav Petkov
@ 2017-04-26  2:04     ` Ricardo Neri
  2017-04-26  8:05       ` Borislav Petkov
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26  2:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Nathan Howard, Adan Hawthorn,
	Joe Perches

On Wed, 2017-04-12 at 00:08 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:36PM -0800, Ricardo Neri wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when a SIB byte is used and the
> > base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part
> > of the ModRM byte is zero, the value of such register will not be used
> > as part of the address computation. To signal this, a -EDOM error is
> > returned to indicate callers that they should ignore the value.
> > 
> > Also, for this particular case, a displacement of 32-bits should follow
> > the SIB byte if the mod part of ModRM is equal to zero. The instruction
> > decoder ensures that this is the case.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Nathan Howard <liverlint@gmail.com>
> > Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> > Cc: Joe Perches <joe@perches.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/mm/mpx.c | 29 ++++++++++++++++++++++-------
> >  1 file changed, 22 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> > index d9e92d6..ef7eb67 100644
> > --- a/arch/x86/mm/mpx.c
> > +++ b/arch/x86/mm/mpx.c
> > @@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  
> >  	case REG_TYPE_BASE:
> >  		regno = X86_SIB_BASE(insn->sib.value);
> > +		/*
> > +		 * If mod is 0 and register R/EBP (regno=5) is indicated in the
> > +		 * base part of the SIB byte,
> 
> you can simply say here: "if SIB.base == 5, the base of the
> register-indirect addressing is 0."

This is better wording. I will change it.
> 
> > the value of such register should
> > +		 * not be used in the address computation. Also, a 32-bit
> 
> Not "Also" but "In this case, a 32-bit displacement..."

Will change.
> 
> > +		 * displacement is expected in this case; the instruction
> > +		 * decoder takes care of it. This is true for both R13 and
> > +		 * R/EBP as REX.B will not be decoded.
> 
> You don't need that sentence as the only thing that matters is ModRM.mod
> being 0.

For the specific case of ModRM.mod being 0, I feel I need to clarify
that REX.B is not decoded and if SIB.base is %r13 the base is also 0.
This comment adds clarity because REX.X is decoded when determining
SIB.index.
> 
> > +		 */
> > +		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
> 
> The 0 test we normally do with the ! (also flip parts of if-condition):
> 
> 		if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)

Will change it.
> 
> > +			return -EDOM;
> > +
> >  		if (X86_REX_B(insn->rex_prefix.value))
> >  			regno += 8;
> >  		break;
> > @@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  		eff_addr = regs_get_register(regs, addr_offset);
> >  	} else {
> >  		if (insn->sib.nbytes) {
> > +			/*
> > +			 * Negative values in the base and index offset means
> > +			 * an error when decoding the SIB byte. Except -EDOM,
> > +			 * which means that the registers should not be used
> > +			 * in the address computation.
> > +			 */
> >  			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> > -			if (base_offset < 0)
> > +			if (unlikely(base_offset == -EDOM))
> > +				base = 0;
> > +			else if (unlikely(base_offset < 0))
> 
> Bah, unlikely's in something which is not really a hot path. They only
> encumber readability, no need for them.

I will remove them.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 04/21] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel
  2017-04-12 10:03   ` Borislav Petkov
@ 2017-04-26  2:05     ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26  2:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-04-12 at 12:03 +0200, Borislav Petkov wrote:
> > +              * If mod is 0 and register R/EBP (regno=5) is
> indicated in the
> > +              * base part of the SIB byte, the value of such
> register should
> > +              * not be used in the address computation. Also, a
> 32-bit
> > +              * displacement is expected in this case; the
> instruction
> > +              * decoder takes care of it. This is true for both R13
> and
> > +              * R/EBP as REX.B will not be decoded.
> > +              */
> > +             if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) ==
> 0)
> > +                     return -EDOM;
> > +
> > +             if (X86_REX_B(insn->rex_prefix.value))
> > +                     regno += 8;
> > +             break;
> > +
> > +     default:
> > +             pr_err("invalid register type");
> > +             BUG();
> 
> WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
> rather than BUG() or BUG_ON()
> #211: FILE: arch/x86/lib/insn-eval.c:90:
> +               BUG();
> 
> And checkpatch is kinda right. We need to warn here, not explode. Oh
> and
> that function returns negative values on error...
> 
> Please change that with a patch ontop of the move.

Sure, I will change it.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0
  2017-04-26  2:04     ` Ricardo Neri
@ 2017-04-26  8:05       ` Borislav Petkov
  2017-04-27 22:49         ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-26  8:05 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Nathan Howard, Adan Hawthorn,
	Joe Perches

On Tue, Apr 25, 2017 at 07:04:20PM -0700, Ricardo Neri wrote:
> For the specific case of ModRM.mod being 0, I feel I need to clarify
> that REX.B is not decoded and if SIB.base is %r13 the base is also 0.

Well, that all doesn't matter. The rule is this:

ModRM.mod == 00b and ModRM.r/m == 101b -> effective address: disp32

See Table 2-2. "32-Bit Addressing Forms with the ModR/M Byte" in the SDM.

So the base register is not used. How that base register is specified
then doesn't matter (undecoded REX bits or not).

> This comment adds clarity because REX.X is decoded when determining
> SIB.index.

Well, that's a different thing. The REX bits participating in the SIB
fields don't matter about this particular case. We only want to say that
we're returning a disp32 without a base register and the comment should
keep it simple without extraneous information.

I know, you want to mention what Table 2-5. "Special Cases of REX
Encodings" says but we should avoid unnecessary content in the comment.
People who want details can stare at the manuals - the comment should
only document what that particular case is.

Btw, you could write it even better:

	if (!X86_MODRM_MOD(insn->modrm.value) && X86_MODRM_RM(insn->modrm.value) == 5)

and then it is basically a 1:1 copy of the rule from Table 2-2.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets
  2017-04-12 16:28   ` Borislav Petkov
@ 2017-04-26 18:13     ` Ricardo Neri
  2017-04-28 10:40       ` Borislav Petkov
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26 18:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-04-12 at 18:28 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:38PM -0800, Ricardo Neri wrote:
> > The function insn_get_reg_offset takes as argument an enumeration that
> 
> Please end function names with parentheses.

Will do! 
> 
> And do you mean get_reg_offset(), per chance?

Yes, I meant that. This was a copy/paste error.
> 
> > indicates the type of offset that is returned: the R/M part of the ModRM
> > byte, the index of the SIB byte or the base of the SIB byte.
> 
> Err, you mean, it returns the offset to the register the argument
> specifies.

Yes. I will reword.
> 
> > Callers of
> > this function would need the definition of such enumeration. This is not
> > needed. Instead, helper functions can be defined for this purpose can be
> > added.
> 
> "Instead, add helpers... "

I will reword.
> 
> > These functions are useful in cases when, for instance, the caller
> > needs to decide whether the operand is a register or a memory location by
> > looking at the mod part of the ModRM byte.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/include/asm/insn-eval.h |  3 +++
> >  arch/x86/lib/insn-eval.c         | 51 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 54 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> > index 5cab1b1..754211b 100644
> > --- a/arch/x86/include/asm/insn-eval.h
> > +++ b/arch/x86/include/asm/insn-eval.h
> > @@ -12,5 +12,8 @@
> >  #include <asm/ptrace.h>
> >  
> >  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> > +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
> > +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> > +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> 
> Forgotten to edit the copy-paste?
> 
> Which means, nothing really needs insn_get_reg_offset_sib_index() and
> you can get rid of it?

Yes, I can get rid of it.
> 
> >  #endif /* _ASM_X86_INSN_EVAL_H */
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 23cf010..78df1c9 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  	return regoff[regno];
> >  }
> >  
> > +/**
> > + * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
> > + * @insn:	Instruction structure containing the ModRM byte
> > + * @regs:	Set of registers indicated by the ModRM byte
> 
> That's simply struct pt_regs - not a set of registers indicated by
> ModRM?!?

I will reword it to say "A struct pt_regs containing register values
indicated by the ModRM byte".
> 
> > + * Obtain the register indicated by the r/m part of the ModRM byte. The
> > + * register is obtained as an offset from the base of pt_regs. In specific
> > + * cases, the returned value can be -EDOM to indicate that the particular value
> > + * of ModRM does not refer to a register.
> 
> Put that sentence under the "Return: " paragraph below so that it is
> immediately obvious what the retvals are.

Will do.
> 
> > + *
> > + * Return: Register indicated by r/m, as an offset within struct pt_regs
> > + */
> > +int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
> 
> That name is too long: insn_get_modrm_rm_off() should be enough.
> 
> > +{
> > +	return get_reg_offset(insn, regs, REG_TYPE_RM);
> > +}
> > +
> > +/**
> > + * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
> > + * @insn:	Instruction structure containing the SiB byte
> > + * @regs:	Set of registers indicated by the SiB byte
> > + *
> > + * Obtain the register indicated by the base part of the SiB byte. The
> > + * register is obtained as an offset from the base of pt_regs. In specific
> > + * cases, the returned value can be -EDOM to indicate that the particular value
> > + * of SiB does not refer to a register.
> > + *
> > + * Return: Register indicated by SiB's base, as an offset within struct pt_regs

Will make the spelling consistent.
> 
> Let's stick to a single spelling: SIB, all caps.
> 
> > + */
> > +int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
> 
> insn_get_sib_base_off()
> 
> Ditto for the rest of the comments on insn_get_reg_offset_modrm_rm() above.
> 
> > +{
> > +	return get_reg_offset(insn, regs, REG_TYPE_BASE);
> > +}
> > +
> > +/**
> > + * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
> > + * @insn:	Instruction structure containing the SiB byte
> > + * @regs:	Set of registers indicated by the SiB byte
> > + *
> > + * Obtain the register indicated by the index part of the SiB byte. The
> > + * register is obtained as an offset from the index of pt_regs. In specific
> > + * cases, the returned value can be -EDOM to indicate that the particular value
> > + * of SiB does not refer to a register.
> > + *
> > + * Return: Register indicated by SiB's base, as an offset within struct pt_regs
> > + */
> > +int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
> 
> insn_get_sib_idx_off()
> 
> And again, if this function is unused, don't add it.

Masami Hiramatsu had originally requested to add the two functions. I
suppose the unneeded functions could be added if/when needed.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector
  2017-04-18  9:42   ` Borislav Petkov
@ 2017-04-26 20:44     ` Ricardo Neri
  2017-04-26 20:47       ` Ricardo Neri
  2017-04-30 17:15       ` Borislav Petkov
  0 siblings, 2 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26 20:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, 2017-04-18 at 11:42 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:39PM -0800, Ricardo Neri wrote:
> > When computing a linear address and segmentation is used, we need to know
> > the base address of the segment involved in the computation. In most of
> > the cases, the segment base address will be zero as in USER_DS/USER32_DS.
> > However, it may be possible that a user space program defines its own
> > segments via a local descriptor table. In such a case, the segment base
> > address may not be zero .Thus, the segment base address is needed to
> > calculate correctly the linear address.
> > 
> > The segment selector to be used when computing a linear address is
> > determined by either any of segment select override prefixes in the
> > instruction or inferred from the registers involved in the computation of
> > the effective address; in that order. Also, there are cases when the
> > overrides shall be ignored.
> > 
> > For clarity, this process can be split into two steps: resolving the
> > relevant segment and, once known, read the applicable segment selector.
> > The method to obtain the segment selector depends on several factors. In
> > 32-bit builds, segment selectors are saved into the pt_regs structure
> > when switching to kernel mode. The same is also true for virtual-8086
> > mode. In 64-bit builds, segmentation is mostly ignored, except when
> > running a program in 32-bit legacy mode. In this case, CS and SS can be
> > obtained from pt_regs. DS, ES, FS and GS can be read directly from
> > registers.
> 
> > Lastly, segmentation is possible in 64-bit mode via FS and GS.
> 
> I'd say "Lastly, the only two segment registers which are not ignored in
> long mode are FS and GS."

I will make this clarification.
> 
> > In these two cases, base addresses are obtained from the relevant MSRs.
> 
> s/relevant/respective/

Will clarify.
> 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 195 insertions(+)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 78df1c9..8d45df8 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -8,6 +8,7 @@
> >  #include <asm/inat.h>
> >  #include <asm/insn.h>
> >  #include <asm/insn-eval.h>
> > +#include <asm/vm86.h>
> >  
> >  enum reg_type {
> >  	REG_TYPE_RM = 0,
> > @@ -15,6 +16,200 @@ enum reg_type {
> >  	REG_TYPE_BASE,
> >  };
> >  
> > +enum segment {
> > +	SEG_CS = 0x23,
> > +	SEG_SS = 0x36,
> > +	SEG_DS = 0x3e,
> > +	SEG_ES = 0x26,
> > +	SEG_FS = 0x64,
> > +	SEG_GS = 0x65
> > +};
> > +
> > +/**
> > + * resolve_seg_selector() - obtain segment selector
> > + * @regs:	Set of registers containing the segment selector
> 
> That arg is gone.

This came from one of my initial implementations. I will remove it.
> 
> > + * @insn:	Instruction structure with selector override prefixes
> > + * @regoff:	Operand offset, in pt_regs, of which the selector is needed
> > + * @default:	Resolve default segment selector (i.e., ignore overrides)
> > + *
> > + * The segment selector to which an effective address refers depends on
> > + * a) segment selector overrides instruction prefixes or b) the operand
> > + * register indicated in the ModRM or SiB byte.
> > + *
> > + * For case a), the function inspects any prefixes in the insn instruction;
> 
> s/insn //

In this case I meant "any prefixes in the insn structure". Probably it
will make it more clear.
> 
> > + * insn can be null to indicate that selector override prefixes shall be
> > + * ignored.
> 
> This is not what the code does: it returns -EINVAL when insn is NULL.

This was the behavior in a previous implementation. I will update it.
> 
> > This is useful when the use of prefixes is forbidden (e.g.,
> > + * obtaining the code selector). For case b), the operand register shall be
> > + * represented as the offset from the base address of pt_regs. Also, regoff
> > + * can be -EINVAL for cases in which registers are not used as operands (e.g.,
> > + * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
> > + *
> > + * This function returns the segment selector to utilize as per the conditions
> > + * described above. Please note that this functin does not return the value
> > + * of the segment selector. The value of the segment selector needs to be
> > + * obtained using get_segment_selector and passing the segment selector type
> > + * resolved by this function.
> > + *
> > + * Return: Segment selector to use, among CS, SS, DS, ES, FS or GS.
> 
> 	    : negative value when...

I will document this behavior.
> 
> > + */
> > +static int resolve_seg_selector(struct insn *insn, int regoff, bool get_default)
> > +{
> > +	int i;
> > +
> > +	if (!insn)
> > +		return -EINVAL;
> > +
> > +	if (get_default)
> > +		goto default_seg;
> > +	/*
> > +	 * Check first if we have selector overrides. Having more than
> > +	 * one selector override leads to undefined behavior. We
> > +	 * only use the first one and return
> 
> Well, I'd return -EINVAL to catch that undefined behavior. Note in a
> local var that I've already seen a seg reg and then if I see another
> one, return -EINVAL.

Sure. Will do.
> 
> > +	 */
> > +	for (i = 0; i < insn->prefixes.nbytes; i++) {
> > +		switch (insn->prefixes.bytes[i]) {
> > +		case SEG_CS:
> > +			return SEG_CS;
> > +		case SEG_SS:
> > +			return SEG_SS;
> > +		case SEG_DS:
> > +			return SEG_DS;
> > +		case SEG_ES:
> > +			return SEG_ES;
> > +		case SEG_FS:
> > +			return SEG_FS;
> > +		case SEG_GS:
> > +			return SEG_GS;
> 
> So what happens if you're in 64-bit mode and you have CS, DS, ES, or SS?
> Or is this what @get_default is supposed to do? But it doesn't look like
> it, it still returns segments ignored in 64-bit mode.

I regard that the role of this function is to obtain the the segment
selector from either of the prefixes or inferred from the operands. It
is the role of caller to determine if the segment selector should be
ignored. So far the only caller is insn_get_seg_base() [1]. If in long
mode, the segment base address is regarded as 0 unless the segment
selector is FS or GS.
> 
> > +		default:
> > +			return -EINVAL;
> > +		}
> > +	}
> > +
> > +default_seg:
> > +	/*
> > +	 * If no overrides, use default selectors as described in the
> > +	 * Intel documentation: SS for ESP or EBP. DS for all data references,
> > +	 * except when relative to stack or string destination.
> > +	 * Also, AX, CX and DX are not valid register operands in 16-bit
> > +	 * address encodings.
> > +	 * Callers must interpret the result correctly according to the type
> > +	 * of instructions (e.g., use ES for string instructions).
> > +	 * Also, some values of modrm and sib might seem to indicate the use
> > +	 * of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but actually
> > +	 * they refer to cases in which only a displacement used. These cases
> > +	 * should be indentified by the caller and not with this function.
> > +	 */
> > +	switch (regoff) {
> > +	case offsetof(struct pt_regs, ax):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, cx):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, dx):
> > +		if (insn && insn->addr_bytes == 2)
> > +			return -EINVAL;
> > +	case -EDOM: /* no register involved in address computation */
> > +	case offsetof(struct pt_regs, bx):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, di):
> > +		/* fall through */
> 
> 		return SEG_ES;
> 
> ?

I double-checked the latest version of the Intel Software Development
manual [2], in the table 3-5 in section 3.7.4 mentions that DS is
default segment for all data references, except string destinations. I
tested this code with the UMIP-protected instructions and whenever I use
%edi the default segment is %ds.
> 
> It is even in the comment above.

This function does not decode instructions but only the segment
selectors. This is the reason I added a comment about callers using the
segment carefully when string instructions. Perhaps I can move the
comment to the function documentation. Given that string instructions
seem to be the only exception, the function could take a boolean
parameter if the segment is to be obtained for a destination string
operand. How does this sound?

> I'm looking at MOVS %es:%rdi, %ds:%rsi,
> for example.

Is this example valid? The documentation of MOVS specifies that it
always moves DS:(E)SI to ES:(E)DI.
> 
> > +	case offsetof(struct pt_regs, si):
> > +		return SEG_DS;
> > +	case offsetof(struct pt_regs, bp):
> > +		/* fall through */
> > +	case offsetof(struct pt_regs, sp):
> > +		return SEG_SS;
> > +	case offsetof(struct pt_regs, ip):
> > +		return SEG_CS;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +}
> > +
> > +/**
> > + * get_segment_selector() - obtain segment selector
> > + * @regs:	Set of registers containing the segment selector
> > + * @seg_type:	Type of segment selector to obtain
> > + * @regoff:	Operand offset, in pt_regs, of which the selector is needed
> 
> That's gone.

I will remove it.
> 
> > + *
> > + * Obtain the segment selector for any of CS, SS, DS, ES, FS, GS. In
> > + * CONFIG_X86_32, the segment is obtained from either pt_regs or
> > + * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
> > + * from pt_regs. DS, ES, FS and GS are obtained by reading the ds and es, fs
> > + * and gs, respectively.
> 
> ... and DS and ES are ignored in long mode.

I will clarify that callers need to ignore DS and ES if in long mode.
> 
> > + *
> > + * Return: Value of the segment selector
> 
> 	... or negative...

I will complement documentation on this specific case.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector
  2017-04-26 20:44     ` Ricardo Neri
@ 2017-04-26 20:47       ` Ricardo Neri
  2017-04-30 17:15       ` Borislav Petkov
  1 sibling, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26 20:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-04-26 at 13:44 -0700, Ricardo Neri wrote:
> > 
> > > +    */
> > > +   for (i = 0; i < insn->prefixes.nbytes; i++) {
> > > +           switch (insn->prefixes.bytes[i]) {
> > > +           case SEG_CS:
> > > +                   return SEG_CS;
> > > +           case SEG_SS:
> > > +                   return SEG_SS;
> > > +           case SEG_DS:
> > > +                   return SEG_DS;
> > > +           case SEG_ES:
> > > +                   return SEG_ES;
> > > +           case SEG_FS:
> > > +                   return SEG_FS;
> > > +           case SEG_GS:
> > > +                   return SEG_GS;
> > 
> > So what happens if you're in 64-bit mode and you have CS, DS, ES, or
> SS?
> > Or is this what @get_default is supposed to do? But it doesn't look
> like
> > it, it still returns segments ignored in 64-bit mode.
> 
> I regard that the role of this function is to obtain the the segment
> selector from either of the prefixes or inferred from the operands. It
> is the role of caller to determine if the segment selector should be
> ignored. So far the only caller is insn_get_seg_base() [1]. If in long
> mode, the segment base address is regarded as 0 unless the segment
> selector is FS or GS.
> > 
> > > +           default:
> > > +                   return -EINVAL;
> > > +           }
> > > +   }
> > > +
> > > +default_seg:
> > > +   /*
> > > +    * If no overrides, use default selectors as described in the
> > > +    * Intel documentation: SS for ESP or EBP. DS for all data
> references,
> > > +    * except when relative to stack or string destination.
> > > +    * Also, AX, CX and DX are not valid register operands in
> 16-bit
> > > +    * address encodings.
> > > +    * Callers must interpret the result correctly according to
> the type
> > > +    * of instructions (e.g., use ES for string instructions).
> > > +    * Also, some values of modrm and sib might seem to indicate
> the use
> > > +    * of EBP and ESP (e.g., modrm_mod = 0, modrm_rm = 5) but
> actually
> > > +    * they refer to cases in which only a displacement used.
> These cases
> > > +    * should be indentified by the caller and not with this
> function.
> > > +    */
> > > +   switch (regoff) {
> > > +   case offsetof(struct pt_regs, ax):
> > > +           /* fall through */
> > > +   case offsetof(struct pt_regs, cx):
> > > +           /* fall through */
> > > +   case offsetof(struct pt_regs, dx):
> > > +           if (insn && insn->addr_bytes == 2)
> > > +                   return -EINVAL;
> > > +   case -EDOM: /* no register involved in address computation */
> > > +   case offsetof(struct pt_regs, bx):
> > > +           /* fall through */
> > > +   case offsetof(struct pt_regs, di):
> > > +           /* fall through */
> > 
> >               return SEG_ES;
> > 
> > ?
> 
> I double-checked the latest version of the Intel Software Development
> manual [2], in the table 3-5 in section 3.7.4 mentions that DS is
> default segment for all data references, except string destinations. I
> tested this code with the UMIP-protected instructions and whenever I
> use
> %edi the default segment is %ds.


I forgot my references:

[1]. https://lkml.org/lkml/2017/3/7/876
[2]. https://software.intel.com/en-us/articles/intel-sdm#combined

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor
  2017-04-19 10:26   ` Borislav Petkov
@ 2017-04-26 21:51     ` Ricardo Neri
  2017-05-04 11:02       ` Borislav Petkov
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26 21:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, 2017-04-19 at 12:26 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:40PM -0800, Ricardo Neri wrote:
> > The segment descriptor contains information that is relevant to how linear
> > address need to be computed. It contains the default size of addresses as
> > well as the base address of the segment. Thus, given a segment selector,
> > we ought look at segment descriptor to correctly calculate the linear
> > address.
> > 
> > In protected mode, the segment selector might indicate a segment
> > descriptor from either the global descriptor table or a local descriptor
> > table. Both cases are considered in this function.
> > 
> > This function is the initial implementation for subsequent functions that
> > will obtain the aforementioned attributes of the segment descriptor.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 61 insertions(+)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 8d45df8..8608adf 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -5,9 +5,13 @@
> >   */
> >  #include <linux/kernel.h>
> >  #include <linux/string.h>
> > +#include <asm/desc_defs.h>
> > +#include <asm/desc.h>
> >  #include <asm/inat.h>
> >  #include <asm/insn.h>
> >  #include <asm/insn-eval.h>
> > +#include <asm/ldt.h>
> > +#include <linux/mmu_context.h>
> >  #include <asm/vm86.h>
> >  
> >  enum reg_type {
> > @@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  }
> >  
> >  /**
> > + * get_desc() - Obtain address of segment descriptor
> > + * @seg:	Segment selector
> 
> Maybe that should be
> 
> @sel
> 
> if it is a sel-ector. :)

It makes sense. I will rename it.
> 
> And using "sel" makes more sense then when you look at:
> 
> 	desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
> 
> for example:
> 
> > + * @desc:	Pointer to the selected segment descriptor
> > + *
> > + * Given a segment selector, obtain a memory pointer to the segment
> 
> s/memory //

Will update it.
> 
> > + * descriptor. Both global and local descriptor tables are supported.
> > + * desc will contain the address of the descriptor.
> > + *
> > + * Return: 0 if success, -EINVAL if failure
> 
> Why isn't this function returning the pointer or NULL on error? Maybe
> the later patches have an answer and I'll discover it if I continue
> reviewing :)

After revisiting the code, I don't see why the function cannot return
NULL.
> 
> > + */
> > +static int get_desc(unsigned short seg, struct desc_struct **desc)
> > +{
> > +	struct desc_ptr gdt_desc = {0, 0};
> > +	unsigned long desc_base;
> > +
> > +	if (!desc)
> > +		return -EINVAL;
> > +
> > +	desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
> 
> That looks useless as you're doing it below again.

Yes, it is useless. Please see my comment below.
> 
> > +
> > +#ifdef CONFIG_MODIFY_LDT_SYSCALL
> > +	if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
> > +		seg >>= 3;
> > +
> > +		mutex_lock(&current->active_mm->context.lock);
> > +		if (unlikely(!current->active_mm->context.ldt ||
> 
> Is that really a fast path to complicate the if-test with an unlikely()?
> If not, you don't really need it.

I will remove it.
> 
> > +			     seg >= current->active_mm->context.ldt->size)) {
> 
> ldt->size is the size of the descriptor table but you've shifted seg by
> 3. That selector index is shifted by 3 (to the left) to form an offset
> into the descriptor table because the entries there are 8 bytes.

I double-checked the ldt code and it seems to me that size refers to the
number of entries in the table; it is always multiplied by
LDT_ENTRY_SIZE [1], [2]. Am I missing something?

> 
> So I *think* you wanna use the "useless" desc_base above... :)
> 
> > +			*desc = NULL;
> > +			mutex_unlock(&current->active_mm->context.lock);
> > +			return -EINVAL;
> > +		}
> > +
> > +		*desc = &current->active_mm->context.ldt->entries[seg];
> 
> ... and seg here as it is an index into the table.
> 
> > +		mutex_unlock(&current->active_mm->context.lock);
> > +		return 0;
> > +	}
> > +#endif
> > +	native_store_gdt(&gdt_desc);
> > +
> > +	/*
> > +	 * Bits [15:3] of the segment selector contain the index. Such
> > +	 * index needs to be multiplied by 8.
> 
> ... because <insert reason I typed in above>.

I will elaborate on the reason for this.

Thanks and BR,
Ricardo

[1].
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/ldt.c?id=refs/tags/v4.11-rc8#n260
[2].
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/ldt.c?id=refs/tags/v4.11-rc8#n50

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-04-20  8:25   ` Borislav Petkov
@ 2017-04-26 22:37     ` Ricardo Neri
  2017-05-05 17:19       ` Borislav Petkov
  2017-04-26 22:52     ` Ricardo Neri
  1 sibling, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26 22:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, 2017-04-20 at 10:25 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:41PM -0800, Ricardo Neri wrote:
> > With segmentation, the base address of the segment descriptor is needed
> > to compute a linear address. The segment descriptor used in the address
> > computation depends on either any segment override prefixes in the in the
> 
> s/in the //

I will fix this typo.
> 
> > instruction or the default segment determined by the registers involved
> > in the address computation. Thus, both the instruction as well as the
> > register (specified as the offset from the base of pt_regs) are given as
> > inputs, along with a boolean variable to select between override and
> > default.
> > 
> > The segment selector is determined by get_seg_selector with the inputs
> 
> Please end function names with parentheses: get_seg_selector().

I will use parentheses.
> 
> > described above. Once the selector is known the base address is
> 
> 					known, ...

Will fix.
> 
> > determined. In protected mode, the selector is used to obtain the segment
> > descriptor and then its base address. If in 64-bit user mode, the segment =
> > base address is zero except when FS or GS are used. In virtual-8086 mode,
> > the base address is computed as the value of the segment selector shifted 4
> > positions to the left.
> 
> Good.
> 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/include/asm/insn-eval.h |  2 ++
> >  arch/x86/lib/insn-eval.c         | 66 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 68 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> > index 754211b..b201742 100644
> > --- a/arch/x86/include/asm/insn-eval.h
> > +++ b/arch/x86/include/asm/insn-eval.h
> > @@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> > +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> > +				int regoff, bool use_default_seg);
> >  
> >  #endif /* _ASM_X86_INSN_EVAL_H */
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 8608adf..383ca83 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct **desc)
> >  }
> >  
> >  /**
> > + * insn_get_seg_base() - Obtain base address contained in descriptor
> > + * @regs:	Set of registers containing the segment selector
> > + * @insn:	Instruction structure with selector override prefixes
> > + * @regoff:	Operand offset, in pt_regs, of which the selector is needed
> > + * @use_default_seg: Use the default segment instead of prefix overrides
> 
> I'm wondering whether you really need that bool or you can deduce this
> from pt_regs... I guess I'll see...
> 
> > + *
> > + * Obtain the base address of the segment descriptor as indicated by either
> > + * any segment override prefixes contained in insn or the default segment
> > + * applicable to the register indicated by regoff. regoff is specified as the
> > + * offset in bytes from the base of pt_regs.
> > + *
> > + * Return: In protected mode, base address of the segment. It may be zero in
> > + * certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086
> > + * mode, the segment selector shifed 4 positions to the right. -1L in case of
> 
> s/shifed/shifted/

I will correct the typo.
> 
> > + * error.
> > + */
> > +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> > +				int regoff, bool use_default_seg)
> > +{
> > +	struct desc_struct *desc;
> > +	unsigned short seg;
> > +	enum segment seg_type;
> > +	int ret;
> > +
> > +	seg_type = resolve_seg_selector(insn, regoff, use_default_seg);
> 
> <--- error handling.

I will add it.
> 
> And that's not really a "seg_type" but simply the "sel"-ector.

I will update the variable names to reflect the fact that they are
segment selectors.

> And that
> "enum segment" is not really a segment but an segment override prefixes
> enum. Can we please get the nomenclature right first?

I need a human-readable way of identifying what segment selector (in
pt_regs, vm86regs or directly reading the segment registers) to use.
Since there is a segment override prefix for all of them, I thought I
could use them. Perhaps I can rename enum segment to enum
segment_selector and comment that the values in the enum are those of
the override prefixes. Would that be reasonable?

> 
> > +
> > +	seg = get_segment_selector(regs, seg_type);
> 
> s/seg/sel/

Will change.

> 
> > +	if (seg < 0)
> > +		return -1L;
> > +
> > +	if (v8086_mode(regs))
> > +		/*
> > +		 * Base is simply the segment selector shifted 4
> > +		 * positions to the right.
> > +		 */
> > +		return (unsigned long)(seg << 4);
> > +
> > +#ifdef CONFIG_X86_64
> > +	if (user_64bit_mode(regs)) {
> 
> 	if (IS_ENABLED(CONFIG_X86_64) && user_64bit_mode(regs)) {

I will change it.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-04-20  8:25   ` Borislav Petkov
  2017-04-26 22:37     ` Ricardo Neri
@ 2017-04-26 22:52     ` Ricardo Neri
  2017-05-05 17:28       ` Borislav Petkov
  1 sibling, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-26 22:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, 2017-04-20 at 10:25 +0200, Borislav Petkov wrote:
> > + * insn_get_seg_base() - Obtain base address contained in
> descriptor
> > + * @regs:    Set of registers containing the segment selector
> > + * @insn:    Instruction structure with selector override prefixes
> > + * @regoff:  Operand offset, in pt_regs, of which the selector is
> needed
> > + * @use_default_seg: Use the default segment instead of prefix
> overrides
> 
> I'm wondering whether you really need that bool or you can deduce this
> from pt_regs... I guess I'll see...

Probably insn_get_seg_base() itself can verify if there are segment
override prefixes in the struct insn. If yes, use them except for
specific cases such as CS.

On an unrelated note, I still have the problem of using DS vs ES for
string instructions. Perhaps instead of a use_default_seg flag, a
string_instruction flag that indicates how to determine the default
segment.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 09/21] x86/insn-eval: Add functions to get default operand and address sizes
  2017-04-20 13:06   ` Borislav Petkov
@ 2017-04-27  1:07     ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-27  1:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, 2017-04-20 at 15:06 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:42PM -0800, Ricardo Neri wrote:
> > These functions read the default values of the address and operand sizes
> > as specified in the segment descriptor. This information is determined
> > from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
> > 32-bit legacy modes. For virtual-8086 mode, the default address and
> > operand sizes are always 2 bytes.
> 
> Yeah, we tend to call that customarily 16-bit :)

I will call it like this.
> 
> > The D bit is only meaningful for code segments. Thus, these functions
> > always use the code segment selector contained in regs.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/include/asm/insn-eval.h |  2 +
> >  arch/x86/lib/insn-eval.c         | 80 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 82 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
> > index b201742..a0d81fc 100644
> > --- a/arch/x86/include/asm/insn-eval.h
> > +++ b/arch/x86/include/asm/insn-eval.h
> > @@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> >  int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
> > +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs);
> > +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs);
> >  unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> >  				int regoff, bool use_default_seg);
> >  
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 383ca83..cda6c71 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
> >  }
> >  
> >  /**
> > + * insn_get_seg_default_address_bytes - Obtain default address size of segment
> > + * @regs:	Set of registers containing the segment selector
> > + *
> > + * Obtain the default address size as indicated in the segment descriptor
> > + * selected in regs' code segment selector. In protected mode, the default
> > + * address is determined by inspecting the L and D bits of the segment
> > + * descriptor. In virtual-8086 mode, the default is always two bytes.
> > + *
> > + * Return: Default address size of segment
> 
> 		0 on error.
> 
> > + */
> > +unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs)
> > +{
> > +	struct desc_struct *desc;
> > +	unsigned short seg;
> > +	int ret;
> > +
> > +	if (v8086_mode(regs))
> > +		return 2;
> > +
> > +	seg = (unsigned short)regs->cs;
> > +
> > +	ret = get_desc(seg, &desc);
> > +	if (ret)
> > +		return 0;
> > +
> > +	switch ((desc->l << 1) | desc->d) {
> > +	case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
> > +		return 2;
> > +	case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
> > +		return 4;
> > +	case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
> > +		return 8;
> > +	case 3: /* Invalid setting. CS.L=1, CS.D=1 */
> > +		/* fall through */
> > +	default:
> > +		return 0;
> > +	}
> > +}
> > +
> > +/**
> > + * insn_get_seg_default_operand_bytes - Obtain default operand size of segment
> > + * @regs:	Set of registers containing the segment selector
> > + *
> > + * Obtain the default operand size as indicated in the segment descriptor
> > + * selected in regs' code segment selector. In protected mode, the default
> > + * operand size is determined by inspecting the L and D bits of the segment
> > + * descriptor. In virtual-8086 mode, the default is always two bytes.
> > + *
> > + * Return: Default operand size of segment
> > + */
> > +unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
> 
> Right, so default address and operand size always go together so I don't
> think you need two separate functions.
> 
> So what I'd suggest - provided this pans out (I still haven't reviewed
> the whole thing) - is to determine the operating mode of the segment:
> long, legacy, etc and then return both address and operand sizes. Patch
> 17/21 needs them both at the same time AFAICT.

It makes sense to me. So far these two functions are used in the same
place.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero
  2017-04-21 10:52   ` Borislav Petkov
@ 2017-04-27  1:29     ` Ricardo Neri
  2017-05-07 17:20       ` Borislav Petkov
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-27  1:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, 2017-04-21 at 12:52 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:43PM -0800, Ricardo Neri wrote:
> > Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when the mod part of the ModRM
> > byte is zero and R/EBP is specified in the R/M part of such bit, the value
> > of the aforementioned register should not be used in the address
> > computation. Instead, a 32-bit displacement is expected. The instruction
> > decoder takes care of setting the displacement to the expected value.
> > Returning -EDOM signals callers that they should ignore the value of such
> > register when computing the address encoded in the instruction operands.
> > 
> > Also, callers should exercise care to correctly interpret this particular
> > case. In IA-32e 64-bit mode, the address is given by the displacement plus
> > the value of the RIP. In IA-32e compatibility mode, the value of EIP is
> > ignored. This correction is done for our insn_get_addr_ref.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 25 +++++++++++++++++++++++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index cda6c71..ea10b03 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  	switch (type) {
> >  	case REG_TYPE_RM:
> >  		regno = X86_MODRM_RM(insn->modrm.value);
> > +		/* if mod=0, register R/EBP is not used in the address
> > +		 * computation. Instead, a 32-bit displacement is expected;
> > +		 * the instruction decoder takes care of reading such
> > +		 * displacement. This is true for both R/EBP and R13, as the
> > +		 * REX.B bit is not decoded.
> > +		 */
> 
> I'd simply write here: "ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit
> displacement is following."

I will shorten the comment.
> 
> In addition, kernel comments style is:
> 
> 	/*
> 	 * A sentence ending with a full-stop.
> 	 * Another sentence. ...
> 	 * More sentences. ...
> 	 */

... and use the correct style. I feel bad I missed this one.
> 
> > +		if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
> > +			return -EDOM;
> 
> 	if (X86_MODRM_MOD(insn->modrm.value) == 0 &&
> 	    X86_MODRM_RM(insn->modrm.value)  == 5)
> 
> looks more understandable to me.

Should I go with !(X86_MODRM_MOD(insn->modrm.value)) as you suggested in
other patches?

> 
> >  		if (X86_REX_B(insn->rex_prefix.value))
> >  			regno += 8;
> >  		break;
> > @@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
> >  		} else {
> >  			addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > -			if (addr_offset < 0)
> > +			/* -EDOM means that we must ignore the address_offset.
> > +			 * The only case in which we see this value is when
> > +			 * R/M points to R/EBP. In such a case, in 64-bit mode
> > +			 * the effective address is relative to tho RIP.
> 
> s/tho//

Will correct.
> 
> > +			 */
> 
> Kernel comments style is:
> 
> 	/*
> 	 * A sentence ending with a full-stop.
> 	 * Another sentence. ...
> 	 * More sentences. ...
> 	 */
> 

Will correct.
> > +			if (addr_offset == -EDOM) {
> > +				eff_addr = 0;
> > +#ifdef CONFIG_X86_64
> > +				if (user_64bit_mode(regs))
> > +					eff_addr = (long)regs->ip;
> 
> Is regs->ip the rIP of the *following* insn?

No this is a bug. This should be regs->ip + insn.length.
> 
> > +#endif
> 
> You can do this in a prepatch and then get rid of the ifdeffery here:
> 
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index 2b5d686ea9f3..f6239273c5f1 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -115,9 +115,9 @@ static inline int v8086_mode(struct pt_regs *regs)
>  #endif
>  }
>  
> -#ifdef CONFIG_X86_64
>  static inline bool user_64bit_mode(struct pt_regs *regs)
>  {
> +#ifdef CONFIG_X86_64
>  #ifndef CONFIG_PARAVIRT
>  	/*
>  	 * On non-paravirt systems, this is the only long mode CPL 3
> @@ -128,6 +128,9 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
>  	/* Headers are too twisted for this to go in paravirt.h. */
>  	return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;
>  #endif
> +#else /* !CONFIG_X86_64 */
> +	return false;
> +#endif
>  }

This look nice. I will add this pre-patch.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 11/21] insn/eval: Incorporate segment base in address computation
  2017-04-21 14:55   ` Borislav Petkov
@ 2017-04-27  1:31     ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-27  1:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, 2017-04-21 at 16:55 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:44PM -0800, Ricardo Neri wrote:
> > insn_get_addr_ref returns the effective address as defined by the
> 
> Please end function names with parentheses.

Will do.
> 
> > section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual. In order to compute the linear address, we must add
> > to the effective address the segment base address as set in the segment
> > descriptor. Furthermore, the segment descriptor to use depends on the
> > register that is used as the base of the effective address. The effective
> > base address varies depending on whether the operand is a register or a
> > memory address and on whether a SiB byte is used.
> > 
> > In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
> > segment is used or if segmentation is not used. However, the base address
> > is not necessarily zero if a user programs defines its own segments. This
> > is possible by using a local descriptor table.
> > 
> > Since the effective address is a signed quantity, the unsigned segment
> > base address saved in a separate variable and added to the final effective
> 
> ".. is saved..."

I will correct this.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses
  2017-04-25 13:51   ` Borislav Petkov
@ 2017-04-27  3:33     ` Ricardo Neri
  2017-05-08 11:42       ` Borislav Petkov
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-04-27  3:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Tue, 2017-04-25 at 15:51 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:45PM -0800, Ricardo Neri wrote:
> > The 32-bit and 64-bit address encodings are identical. This means that we
> > can use the same function in both cases. In order to reuse the function for
> > 32-bit address encodings, we must sign-extend our 32-bit signed operands to
> > 64-bit signed variables (only for 64-bit builds). To decide on whether sign
> > extension is needed, we rely on the address size as given by the
> > instruction structure.
> > 
> > Lastly, before computing the linear address, we must truncate our signed
> > 64-bit signed effective address if the address size is 32-bit.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Thomas Garnier <thgarnie@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/lib/insn-eval.c | 44 ++++++++++++++++++++++++++++++++------------
> >  1 file changed, 32 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index edb360f..a9a1704 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
> >  	return get_reg_offset(insn, regs, REG_TYPE_INDEX);
> >  }
> >  
> > +static inline long __to_signed_long(unsigned long val, int long_bytes)
> > +{
> > +#ifdef CONFIG_X86_64
> > +	return long_bytes == 4 ? (long)((int)((val) & 0xffffffff)) : (long)val;
> 
> I don't think this always works as expected:
> 
> ---
> typedef unsigned int u32;
> typedef unsigned long u64;
> 
> int main()
> {
>         u64 v = 0x1ffffffff;
> 
>         printf("v: %ld, 0x%lx, %ld\n", v, v, (long)((int)((v) & 0xffffffff)));
> 
>         return 0;
> }
> --
> ...
> 
> v: 8589934591, 0x1ffffffff, -1
> 
> Now, this should not happen on 32-bit because unsigned long is 32-bit
> there but can that happen on 64-bit?

This is the reason I check the value of long_bytes. If long_bytes is not
4, being the only other possible value 8 (perhaps I need to issue an
error when the value is not any of these values), the cast is simply
(long)val. I modified your test program with:

printf("v: %ld, 0x%lx, %ld, %ld\n", v, v, (long)((int)((v) &
0xffffffff)), (long)v);

and I get:

v: 8589934591, 0x1ffffffff, -1, 8589934591.

Am I missing something?

> 
> > +#else
> > +	return (long)val;
> > +#endif
> > +}
> > +
> >  /*
> >   * return the address being referenced be instruction
> >   * for rm=3 returning the content of the rm reg
> > @@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
> >  void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  {
> >  	unsigned long linear_addr, seg_base_addr;
> > -	long eff_addr, base, indx;
> > -	int addr_offset, base_offset, indx_offset;
> > +	long eff_addr, base, indx, tmp;
> > +	int addr_offset, base_offset, indx_offset, addr_bytes;
> >  	insn_byte_t sib;
> >  
> >  	insn_get_modrm(insn);
> >  	insn_get_sib(insn);
> >  	sib = insn->sib.value;
> > +	addr_bytes = insn->addr_bytes;
> >  
> >  	if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> >  		addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> >  		if (addr_offset < 0)
> >  			goto out_err;
> > -		eff_addr = regs_get_register(regs, addr_offset);
> > +		tmp = regs_get_register(regs, addr_offset);
> > +		eff_addr = __to_signed_long(tmp, addr_bytes);
> 
> This repeats throughout the function so it begs to be a separate:
> 
> 	get_mem_addr()
> 
> or so.

Yes, the same pattern is used in all places except when using register
operands (ModRM.rm == 11b). I will look into putting it in a function.
> 
> >  		seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
> >  						  false);
> >  	} else {
> > @@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  			 * in the address computation.
> >  			 */
> >  			base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> > -			if (unlikely(base_offset == -EDOM))
> > +			if (unlikely(base_offset == -EDOM)) {
> >  				base = 0;
> > -			else if (unlikely(base_offset < 0))
> > +			} else if (unlikely(base_offset < 0)) {
> >  				goto out_err;
> > -			else
> > -				base = regs_get_register(regs, base_offset);
> > +			} else {
> > +				tmp = regs_get_register(regs, base_offset);
> > +				base = __to_signed_long(tmp, addr_bytes);
> > +			}
> >  
> >  			indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> > -			if (unlikely(indx_offset == -EDOM))
> > +			if (unlikely(indx_offset == -EDOM)) {
> >  				indx = 0;
> > -			else if (unlikely(indx_offset < 0))
> > +			} else if (unlikely(indx_offset < 0)) {
> >  				goto out_err;
> > -			else
> > -				indx = regs_get_register(regs, indx_offset);
> > +			} else {
> > +				tmp = regs_get_register(regs, indx_offset);
> > +				indx = __to_signed_long(tmp, addr_bytes);
> > +			}
> >  
> >  			eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
> >  			seg_base_addr = insn_get_seg_base(regs, insn,
> > @@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> >  			} else if (addr_offset < 0) {
> >  				goto out_err;
> >  			} else {
> > -				eff_addr = regs_get_register(regs, addr_offset);
> > +				tmp = regs_get_register(regs, addr_offset);
> > +				eff_addr = __to_signed_long(tmp, addr_bytes);
> >  			}
> >  			seg_base_addr = insn_get_seg_base(regs, insn,
> >  							  addr_offset, false);
> >  		}
> >  		eff_addr += insn->displacement.value;
> >  	}
> > +	/* truncate to 4 bytes for 32-bit effective addresses */
> > +	if (addr_bytes == 4)
> > +		eff_addr &= 0xffffffff;
> 
> Why again?

eff_addr is a long variable, which in x86_64 has 64-bit. However, in
32-bit segments the effective address is 32-bit. Thus, I discard the 32
most significant bytes.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0
  2017-04-26  8:05       ` Borislav Petkov
@ 2017-04-27 22:49         ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-04-27 22:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Nathan Howard, Adan Hawthorn,
	Joe Perches

On Wed, 2017-04-26 at 10:05 +0200, Borislav Petkov wrote:
> On Tue, Apr 25, 2017 at 07:04:20PM -0700, Ricardo Neri wrote:
> > For the specific case of ModRM.mod being 0, I feel I need to clarify
> > that REX.B is not decoded and if SIB.base is %r13 the base is also 0.
> 
> Well, that all doesn't matter. The rule is this:
> 
> ModRM.mod == 00b and ModRM.r/m == 101b -> effective address: disp32
> 
> See Table 2-2. "32-Bit Addressing Forms with the ModR/M Byte" in the SDM.

You are right. This summarizes the rule. Then I will shorten the
comment.
> 
> So the base register is not used. How that base register is specified
> then doesn't matter (undecoded REX bits or not).
> 
> > This comment adds clarity because REX.X is decoded when determining
> > SIB.index.
> 
> Well, that's a different thing. The REX bits participating in the SIB
> fields don't matter about this particular case. We only want to say that
> we're returning a disp32 without a base register and the comment should
> keep it simple without extraneous information.
> 
> I know, you want to mention what Table 2-5. "Special Cases of REX
> Encodings" says but we should avoid unnecessary content in the comment.
> People who want details can stare at the manuals - the comment should
> only document what that particular case is.
> 
> Btw, you could write it even better:
> 
> 	if (!X86_MODRM_MOD(insn->modrm.value) && X86_MODRM_RM(insn->modrm.value) == 5)
> 
> and then it is basically a 1:1 copy of the rule from Table 2-2.

It is!

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets
  2017-04-26 18:13     ` Ricardo Neri
@ 2017-04-28 10:40       ` Borislav Petkov
  0 siblings, 0 replies; 112+ messages in thread
From: Borislav Petkov @ 2017-04-28 10:40 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, Apr 26, 2017 at 11:13:44AM -0700, Ricardo Neri wrote:
> Masami Hiramatsu had originally requested to add the two functions. I
> suppose the unneeded functions could be added if/when needed.

Yap, exactly.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector
  2017-04-26 20:44     ` Ricardo Neri
  2017-04-26 20:47       ` Ricardo Neri
@ 2017-04-30 17:15       ` Borislav Petkov
  2017-05-05 18:31         ` Ricardo Neri
  1 sibling, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-04-30 17:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, Apr 26, 2017 at 01:44:43PM -0700, Ricardo Neri wrote:
> I regard that the role of this function is to obtain the the segment
> selector from either of the prefixes or inferred from the operands. It
> is the role of caller to determine if the segment selector should be
> ignored.

No, this is wrong. The function is called resolve_seg_selector() and it
gives you the segment selector. CS, DS, ES, and SS in 64-bit mode are
treated as null segments and your function should return/signal exactly
that, i.e, saying that those should be ignored in that case.

> I double-checked the latest version of the Intel Software Development
> manual [2], in the table 3-5 in section 3.7.4 mentions that DS is
> default segment for all data references, except string destinations. I
> tested this code with the UMIP-protected instructions and whenever I use
> %edi the default segment is %ds.

Yes, all correct. Except that we're adding a more-or-less generic x86
insn decoder so we should make it so...

> Is this example valid? The documentation of MOVS specifies that it
> always moves DS:(E)SI to ES:(E)DI.

... that the decoder should do exactly that:

	if (MOVS and rDI)
		return SEG_ES;

And you're handing in struct insn * so you can easily check which insn
you're looking at.

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor
  2017-04-26 21:51     ` Ricardo Neri
@ 2017-05-04 11:02       ` Borislav Petkov
  2017-05-12  2:13         ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-05-04 11:02 UTC (permalink / raw)
  To: Ricardo Neri, Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Andrew Morton, Brian Gerst, Chris Metcalf, Dave Hansen,
	Paolo Bonzini, Masami Hiramatsu, Huang Rui, Jiri Slaby,
	Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, Apr 26, 2017 at 02:51:56PM -0700, Ricardo Neri wrote:
> > > +			     seg >= current->active_mm->context.ldt->size)) {
> > 
> > ldt->size is the size of the descriptor table but you've shifted seg by
> > 3. That selector index is shifted by 3 (to the left) to form an offset
> > into the descriptor table because the entries there are 8 bytes.
> 
> I double-checked the ldt code and it seems to me that size refers to the
> number of entries in the table; it is always multiplied by
> LDT_ENTRY_SIZE [1], [2]. Am I missing something?

No, you're not. I fell into that wrongly named struct member trap.

So ldt_struct.size should actually be called ldt_struct.n_entries or
similar. Because what's in there is now is not "size".

And then code like

	new_ldt->size * LDT_ENTRY_SIZE

would make much more sense if written like this:

	new_ldt->n_entries * LDT_ENTRY_SIZE

Would you fix that in a prepatch pls?

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-04-26 22:37     ` Ricardo Neri
@ 2017-05-05 17:19       ` Borislav Petkov
  2017-05-12  2:09         ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-05-05 17:19 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, Apr 26, 2017 at 03:37:44PM -0700, Ricardo Neri wrote:
> I need a human-readable way of identifying what segment selector (in
> pt_regs, vm86regs or directly reading the segment registers) to use.
> Since there is a segment override prefix for all of them, I thought I
> could use them.

Yes, you should...

> Perhaps I can rename enum segment to enum segment_selector and comment
> that the values in the enum are those of the override prefixes. Would
> that be reasonable?

... but you should call them what they are: "enum seg_override_pfxs" or
"enum seg_ovr_pfx" or...

Or somesuch. I suck at naming stuff.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-04-26 22:52     ` Ricardo Neri
@ 2017-05-05 17:28       ` Borislav Petkov
  2017-05-12  2:06         ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-05-05 17:28 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, Apr 26, 2017 at 03:52:41PM -0700, Ricardo Neri wrote:
> Probably insn_get_seg_base() itself can verify if there are segment
> override prefixes in the struct insn. If yes, use them except for
> specific cases such as CS.

... and depending on whether in long mode or not.

> On an unrelated note, I still have the problem of using DS vs ES for
> string instructions. Perhaps instead of a use_default_seg flag, a
> string_instruction flag that indicates how to determine the default
> segment.

... or you can look at the insn opcode directly. AFAICT, you need
to check whether the opcode is 0xa4 or 0xa5 and that the insn is a
single-byte opcode, i.e., not from the secondary map escaped with 0xf or
some of the other multi-byte opcode maps.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector
  2017-04-30 17:15       ` Borislav Petkov
@ 2017-05-05 18:31         ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-05-05 18:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Sun, 2017-04-30 at 19:15 +0200, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 01:44:43PM -0700, Ricardo Neri wrote:
> > I regard that the role of this function is to obtain the the segment
> > selector from either of the prefixes or inferred from the operands. It
> > is the role of caller to determine if the segment selector should be
> > ignored.
> 
> No, this is wrong. The function is called resolve_seg_selector() and it
> gives you the segment selector. CS, DS, ES, and SS in 64-bit mode are
> treated as null segments and your function should return/signal exactly
> that, i.e, saying that those should be ignored in that case.
> 
> > I double-checked the latest version of the Intel Software Development
> > manual [2], in the table 3-5 in section 3.7.4 mentions that DS is
> > default segment for all data references, except string destinations. I
> > tested this code with the UMIP-protected instructions and whenever I use
> > %edi the default segment is %ds.
> 
> Yes, all correct. Except that we're adding a more-or-less generic x86
> insn decoder so we should make it so...
> 
> > Is this example valid? The documentation of MOVS specifies that it
> > always moves DS:(E)SI to ES:(E)DI.
> 
> ... that the decoder should do exactly that:
> 
> 	if (MOVS and rDI)
> 		return SEG_ES;
> 
> And you're handing in struct insn * so you can easily check which insn
> you're looking at.

I see. I have submitted v7 of the series and I have implemented all the
changes above. Now I am able to identify string instructions.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero
  2017-04-27  1:29     ` Ricardo Neri
@ 2017-05-07 17:20       ` Borislav Petkov
  2017-05-12  1:57         ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-05-07 17:20 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, Apr 26, 2017 at 06:29:59PM -0700, Ricardo Neri wrote:
> > 	if (X86_MODRM_MOD(insn->modrm.value) == 0 &&
> > 	    X86_MODRM_RM(insn->modrm.value)  == 5)
> > 
> > looks more understandable to me.
> 
> Should I go with !(X86_MODRM_MOD(insn->modrm.value)) as you suggested in
> other patches?

Ah, yes pls.

Thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses
  2017-04-27  3:33     ` Ricardo Neri
@ 2017-05-08 11:42       ` Borislav Petkov
  2017-05-12  1:55         ` Ricardo Neri
  0 siblings, 1 reply; 112+ messages in thread
From: Borislav Petkov @ 2017-05-08 11:42 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Wed, Apr 26, 2017 at 08:33:46PM -0700, Ricardo Neri wrote:
> This is the reason I check the value of long_bytes. If long_bytes is not
> 4, being the only other possible value 8 (perhaps I need to issue an
> error when the value is not any of these values),

Well, maybe I'm a bit too paranoid. Bottom line is, we should do the
address computations exactly like the hardware does them so that there
are no surprises. Doing them with longs looks ok to me.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses
  2017-05-08 11:42       ` Borislav Petkov
@ 2017-05-12  1:55         ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-05-12  1:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Mon, 2017-05-08 at 13:42 +0200, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 08:33:46PM -0700, Ricardo Neri wrote:
> > This is the reason I check the value of long_bytes. If long_bytes is not
> > 4, being the only other possible value 8 (perhaps I need to issue an
> > error when the value is not any of these values),
> 
> Well, maybe I'm a bit too paranoid. Bottom line is, we should do the
> address computations exactly like the hardware does them so that there
> are no surprises. Doing them with longs looks ok to me.

Using long is exactly what I intend to do. The problem that I am trying
to resolve is to sign-extend signed memory offsets of 32-bit programs
running on 64-bit kernels. For 64-bit programs running on 64-bit kernels
I can simply use longs. I added error checking in my v7 of this series
[1].

Thanks and BR,
Ricardo

[1]. https://lkml.org/lkml/2017/5/5/407

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero
  2017-05-07 17:20       ` Borislav Petkov
@ 2017-05-12  1:57         ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-05-12  1:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Sun, 2017-05-07 at 19:20 +0200, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 06:29:59PM -0700, Ricardo Neri wrote:
> > > 	if (X86_MODRM_MOD(insn->modrm.value) == 0 &&
> > > 	    X86_MODRM_RM(insn->modrm.value)  == 5)
> > > 
> > > looks more understandable to me.
> > 
> > Should I go with !(X86_MODRM_MOD(insn->modrm.value)) as you suggested in
> > other patches?
> 
> Ah, yes pls.
> 
 I did this in v7[1].

Thanks and BR,
Ricardo

[1]. https://lkml.org/lkml/2017/5/5/399

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-05-05 17:28       ` Borislav Petkov
@ 2017-05-12  2:06         ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-05-12  2:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, 2017-05-05 at 19:28 +0200, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 03:52:41PM -0700, Ricardo Neri wrote:
> > Probably insn_get_seg_base() itself can verify if there are segment
> > override prefixes in the struct insn. If yes, use them except for
> > specific cases such as CS.
> 
> ... and depending on whether in long mode or not.

Yes, in my v7 I ignore the segment register if we are in long mode [1].
> 
> > On an unrelated note, I still have the problem of using DS vs ES for
> > string instructions. Perhaps instead of a use_default_seg flag, a
> > string_instruction flag that indicates how to determine the default
> > segment.
> 
> ... or you can look at the insn opcode directly. AFAICT, you need
> to check whether the opcode is 0xa4 or 0xa5 and that the insn is a
> single-byte opcode, i.e., not from the secondary map escaped with 0xf or
> some of the other multi-byte opcode maps.

In my v7, I have added a section my function resolve_seg_register() that
ignores
segment overrides if it sees string instructions and the register EDI
and defaults to ES. If the register is EIP, it defaults to CS. To
determine if an instruction is a string instruction I do check for the
size of the opcode and the opcodes that you mention plus others based on
the Intel Software Development Manual[2].

[1]. https://lkml.org/lkml/2017/5/5/405
[2]. https://lkml.org/lkml/2017/5/5/410

Thanks and BR,
Ricardo


> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address
  2017-05-05 17:19       ` Borislav Petkov
@ 2017-05-12  2:09         ` Ricardo Neri
  0 siblings, 0 replies; 112+ messages in thread
From: Ricardo Neri @ 2017-05-12  2:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Fri, 2017-05-05 at 19:19 +0200, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 03:37:44PM -0700, Ricardo Neri wrote:
> > I need a human-readable way of identifying what segment selector (in
> > pt_regs, vm86regs or directly reading the segment registers) to use.
> > Since there is a segment override prefix for all of them, I thought I
> > could use them.
> 
> Yes, you should...
> 
> > Perhaps I can rename enum segment to enum segment_selector and comment
> > that the values in the enum are those of the override prefixes. Would
> > that be reasonable?
> 
> ... but you should call them what they are: "enum seg_override_pfxs" or
> "enum seg_ovr_pfx" or...
> 
> Or somesuch. I suck at naming stuff.

In my v7, I simply named my enumeration enum segment_register, which is
what they are. Some of its entries happen to have the value of the
segment override prefixes but also have special entries as SEG_REG_INVAL
when for errors and SEG_REG_IGNORE for long mode [1].

Thanks and BR,
Ricardo

[1]. https://lkml.org/lkml/2017/5/5/405

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor
  2017-05-04 11:02       ` Borislav Petkov
@ 2017-05-12  2:13         ` Ricardo Neri
  2017-05-15 17:27           ` Borislav Petkov
  0 siblings, 1 reply; 112+ messages in thread
From: Ricardo Neri @ 2017-05-12  2:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, 2017-05-04 at 13:02 +0200, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 02:51:56PM -0700, Ricardo Neri wrote:
> > > > +			     seg >= current->active_mm->context.ldt->size)) {
> > > 
> > > ldt->size is the size of the descriptor table but you've shifted seg by
> > > 3. That selector index is shifted by 3 (to the left) to form an offset
> > > into the descriptor table because the entries there are 8 bytes.
> > 
> > I double-checked the ldt code and it seems to me that size refers to the
> > number of entries in the table; it is always multiplied by
> > LDT_ENTRY_SIZE [1], [2]. Am I missing something?
> 
> No, you're not. I fell into that wrongly named struct member trap.
> 
> So ldt_struct.size should actually be called ldt_struct.n_entries or
> similar. Because what's in there is now is not "size".
> 
> And then code like
> 
> 	new_ldt->size * LDT_ENTRY_SIZE
> 
> would make much more sense if written like this:
> 
> 	new_ldt->n_entries * LDT_ENTRY_SIZE
> 
> Would you fix that in a prepatch pls?
> 

Sure I can. Would this trigger a v8 of my series? I was hoping v7 series
could be merged and then start doing incremental work on top of it. Does
it make sense?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor
  2017-05-12  2:13         ` Ricardo Neri
@ 2017-05-15 17:27           ` Borislav Petkov
  0 siblings, 0 replies; 112+ messages in thread
From: Borislav Petkov @ 2017-05-15 17:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, Andrew Morton, Brian Gerst, Chris Metcalf,
	Dave Hansen, Paolo Bonzini, Masami Hiramatsu, Huang Rui,
	Jiri Slaby, Jonathan Corbet, Michael S. Tsirkin, Paul Gortmaker,
	Vlastimil Babka, Chen Yucong, Alexandre Julliard, Stas Sergeev,
	Fenghua Yu, Ravi V. Shankar, Shuah Khan, linux-kernel, x86,
	linux-msdos, wine-devel, Adam Buchbinder, Colin Ian King,
	Lorenzo Stoakes, Qiaowei Ren, Arnaldo Carvalho de Melo,
	Adrian Hunter, Kees Cook, Thomas Garnier, Dmitry Vyukov

On Thu, May 11, 2017 at 07:13:57PM -0700, Ricardo Neri wrote:
> Sure I can. Would this trigger a v8 of my series? I was hoping v7 series
> could be merged and then start doing incremental work on top of it. Does
> it make sense?

I guess that's tip guys' call.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2017-05-15 17:27 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-08  0:32 [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 01/21] x86/mpx: Use signed variables to compute effective addresses Ricardo Neri
2017-04-11 21:56   ` Borislav Petkov
2017-04-26  1:40     ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points to R/ESP Ricardo Neri
2017-04-11 11:31   ` Borislav Petkov
2017-04-26  1:39     ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 03/21] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0 Ricardo Neri
2017-04-11 22:08   ` Borislav Petkov
2017-04-26  2:04     ` Ricardo Neri
2017-04-26  8:05       ` Borislav Petkov
2017-04-27 22:49         ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 04/21] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel Ricardo Neri
2017-04-12 10:03   ` Borislav Petkov
2017-04-26  2:05     ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 05/21] x86/insn-eval: Add utility functions to get register offsets Ricardo Neri
2017-04-12 16:28   ` Borislav Petkov
2017-04-26 18:13     ` Ricardo Neri
2017-04-28 10:40       ` Borislav Petkov
2017-03-08  0:32 ` [v6 PATCH 06/21] x86/insn-eval: Add utility functions to get segment selector Ricardo Neri
2017-04-18  9:42   ` Borislav Petkov
2017-04-26 20:44     ` Ricardo Neri
2017-04-26 20:47       ` Ricardo Neri
2017-04-30 17:15       ` Borislav Petkov
2017-05-05 18:31         ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 07/21] x86/insn-eval: Add utility function to get segment descriptor Ricardo Neri
2017-04-19 10:26   ` Borislav Petkov
2017-04-26 21:51     ` Ricardo Neri
2017-05-04 11:02       ` Borislav Petkov
2017-05-12  2:13         ` Ricardo Neri
2017-05-15 17:27           ` Borislav Petkov
2017-03-08  0:32 ` [v6 PATCH 08/21] x86/insn-eval: Add utility function to get segment descriptor base address Ricardo Neri
2017-04-20  8:25   ` Borislav Petkov
2017-04-26 22:37     ` Ricardo Neri
2017-05-05 17:19       ` Borislav Petkov
2017-05-12  2:09         ` Ricardo Neri
2017-04-26 22:52     ` Ricardo Neri
2017-05-05 17:28       ` Borislav Petkov
2017-05-12  2:06         ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 09/21] x86/insn-eval: Add functions to get default operand and address sizes Ricardo Neri
2017-04-20 13:06   ` Borislav Petkov
2017-04-27  1:07     ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 10/21] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero Ricardo Neri
2017-04-21 10:52   ` Borislav Petkov
2017-04-27  1:29     ` Ricardo Neri
2017-05-07 17:20       ` Borislav Petkov
2017-05-12  1:57         ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 11/21] insn/eval: Incorporate segment base in address computation Ricardo Neri
2017-04-21 14:55   ` Borislav Petkov
2017-04-27  1:31     ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 12/21] x86/insn: Support both signed 32-bit and 64-bit effective addresses Ricardo Neri
2017-04-25 13:51   ` Borislav Petkov
2017-04-27  3:33     ` Ricardo Neri
2017-05-08 11:42       ` Borislav Petkov
2017-05-12  1:55         ` Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 13/21] x86/insn-eval: Add support to resolve 16-bit addressing encodings Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 14/21] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 15/21] x86/mm: Relocate page fault error codes to traps.h Ricardo Neri
2017-03-08 16:08   ` Andy Lutomirski
2017-03-08  0:32 ` [v6 PATCH 16/21] x86/cpufeature: Add User-Mode Instruction Prevention definitions Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 17/21] x86: Add emulation code for UMIP instructions Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 18/21] x86/umip: Force a page fault when unable to copy emulated result to user Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 19/21] x86/traps: Fixup general protection faults caused by UMIP Ricardo Neri
2017-03-08 15:54   ` Andy Lutomirski
2017-03-08  0:32 ` [v6 PATCH 20/21] x86: Enable User-Mode Instruction Prevention Ricardo Neri
2017-03-08  0:32 ` [v6 PATCH 21/21] selftests/x86: Add tests for " Ricardo Neri
2017-03-08 15:56   ` Andy Lutomirski
2017-03-10 23:38     ` Ricardo Neri
2017-03-08 14:08 ` [v6 PATCH 00/21] x86: Enable " Stas Sergeev
2017-03-08 16:06   ` Andy Lutomirski
2017-03-08 16:29     ` Stas Sergeev
2017-03-08 16:46       ` Andy Lutomirski
2017-03-08 16:53         ` Stas Sergeev
2017-03-09  1:11           ` Ricardo Neri
2017-03-09 22:05             ` Stas Sergeev
2017-03-10  2:41             ` Andy Lutomirski
2017-03-10 10:30               ` Stas Sergeev
2017-03-10 21:04                 ` Andy Lutomirski
2017-03-10 21:37                   ` Stas Sergeev
2017-03-09  1:15         ` Ricardo Neri
2017-03-09 22:10           ` Stas Sergeev
2017-03-10  2:39             ` Andy Lutomirski
2017-03-10 11:33               ` Stas Sergeev
2017-03-10 14:17                 ` Andy Lutomirski
2017-03-11  1:22                   ` Ricardo Neri
2017-03-10 23:59                 ` Ricardo Neri
2017-03-13 21:25                   ` Stas Sergeev
2017-03-27 23:46                     ` Ricardo Neri
2017-03-28  9:38                       ` Stas Sergeev
2017-03-29  4:38                         ` Ricardo Neri
2017-03-29 20:55                           ` Stas Sergeev
2017-03-30  5:14                             ` Ricardo Neri
2017-03-30 10:10                               ` Stas Sergeev
2017-03-31  1:33                                 ` Ricardo Neri
2017-03-31 14:11                                   ` Alexandre Julliard
2017-03-31 21:26                                     ` Stas Sergeev
2017-04-01  2:18                                       ` Andy Lutomirski
2017-04-04  2:02                                     ` Ricardo Neri
2017-04-04  6:08                                       ` Alexandre Julliard
2017-04-01 13:08                               ` Stas Sergeev
2017-04-01 17:49                                 ` H. Peter Anvin
2017-04-02 15:52                                   ` Andy Lutomirski
2017-04-04  9:59                                   ` Stas Sergeev
2017-04-04  2:05                                 ` Ricardo Neri
2017-04-04  8:03                                   ` Stas Sergeev
2017-03-10 23:58               ` Ricardo Neri
2017-03-09  0:46   ` Ricardo Neri
2017-03-09 22:01     ` Stas Sergeev
2017-03-10 23:47       ` Ricardo Neri
2017-03-10 23:58         ` Stas Sergeev
2017-03-11  0:13           ` Ricardo Neri
2017-03-08 16:07 ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).