From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752477AbdFQBee (ORCPT ); Fri, 16 Jun 2017 21:34:34 -0400 Received: from mga14.intel.com ([192.55.52.115]:50734 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752234AbdFQBec (ORCPT ); Fri, 16 Jun 2017 21:34:32 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,349,1493708400"; d="scan'208";a="981891097" Message-ID: <1497663269.133434.27.camel@ranerica-desktop> Subject: Re: [PATCH v7 21/26] x86: Add emulation code for UMIP instructions From: Ricardo Neri To: Borislav Petkov Cc: Ingo Molnar , Thomas Gleixner , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Andrew Morton , Brian Gerst , Chris Metcalf , Dave Hansen , Paolo Bonzini , Masami Hiramatsu , Huang Rui , Jiri Slaby , Jonathan Corbet , "Michael S. Tsirkin" , Paul Gortmaker , Vlastimil Babka , Chen Yucong , Alexandre Julliard , Stas Sergeev , Fenghua Yu , "Ravi V. Shankar" , Shuah Khan , linux-kernel@vger.kernel.org, x86@kernel.org, linux-msdos@vger.kernel.org, wine-devel@winehq.org, Tony Luck Date: Fri, 16 Jun 2017 18:34:29 -0700 In-Reply-To: <20170608183819.c766klw2mxs7loo5@pd.tnic> References: <20170505181724.55000-1-ricardo.neri-calderon@linux.intel.com> <20170505181724.55000-22-ricardo.neri-calderon@linux.intel.com> <20170608183819.c766klw2mxs7loo5@pd.tnic> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2017-06-08 at 20:38 +0200, Borislav Petkov wrote: > On Fri, May 05, 2017 at 11:17:19AM -0700, Ricardo Neri wrote: > > The feature User-Mode Instruction Prevention present in recent Intel > > processor prevents a group of instructions from being executed with > > CPL > 0. Otherwise, a general protection fault is issued. > > This is one of the best opening paragraphs of a commit message I've > read this year! This is how you open: short, succinct, to the point, no > marketing bullshit. Good! Thanks you! > > > Rather than relaying this fault to the user space (in the form of a SIGSEGV > > signal), the instructions protected by UMIP can be emulated to provide > > dummy results. This allows to conserve the current kernel behavior and not > > reveal the system resources that UMIP intends to protect (the global > > descriptor and interrupt descriptor tables, the segment selectors of the > > local descriptor table and the task state and the machine status word). > > > > This emulation is needed because certain applications (e.g., WineHQ and > > DOSEMU2) rely on this subset of instructions to function. > > > > The instructions protected by UMIP can be split in two groups. Those who > > s/who/which/ I will correct. > > > return a kernel memory address (sgdt and sidt) and those who return a > > ditto. I will correct here also. > > > value (sldt, str and smsw). > > > > For the instructions that return a kernel memory address, applications > > such as WineHQ rely on the result being located in the kernel memory space. > > The result is emulated as a hard-coded value that, lies close to the top > > of the kernel memory. The limit for the GDT and the IDT are set to zero. > > Nice. > > > Given that sldt and str are not used in common in programs supported by > > You wanna say "in common programs" here? Or "not commonly used in programs" ? I will rephrase this comment. > > > WineHQ and DOSEMU2, they are not emulated. > > > > The instruction smsw is emulated to return the value that the register CR0 > > has at boot time as set in the head_32. > > > > Care is taken to appropriately emulate the results when segmentation is > > used. This is, rather than relying on USER_DS and USER_CS, the function > > "That is,... " I will correct it. > > > insn_get_addr_ref() inspects the segment descriptor pointed by the > > registers in pt_regs. This ensures that we correctly obtain the segment > > base address and the address and operand sizes even if the user space > > application uses local descriptor table. > > Btw, I could very well use all that nice explanation in umip.c too so > that the high-level behavior is documented. Sure, I will include a high-level description in the file itself. > > > Cc: Andy Lutomirski > > Cc: Andrew Morton > > Cc: H. Peter Anvin > > Cc: Borislav Petkov > > Cc: Brian Gerst > > Cc: Chen Yucong > > Cc: Chris Metcalf > > Cc: Dave Hansen > > Cc: Fenghua Yu > > Cc: Huang Rui > > Cc: Jiri Slaby > > Cc: Jonathan Corbet > > Cc: Michael S. Tsirkin > > Cc: Paul Gortmaker > > Cc: Peter Zijlstra > > Cc: Ravi V. Shankar > > Cc: Shuah Khan > > Cc: Vlastimil Babka > > Cc: Tony Luck > > Cc: Paolo Bonzini > > Cc: Liang Z. Li > > Cc: Alexandre Julliard > > Cc: Stas Sergeev > > Cc: x86@kernel.org > > Cc: linux-msdos@vger.kernel.org > > Signed-off-by: Ricardo Neri > > --- > > arch/x86/include/asm/umip.h | 15 +++ > > arch/x86/kernel/Makefile | 1 + > > arch/x86/kernel/umip.c | 245 ++++++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 261 insertions(+) > > create mode 100644 arch/x86/include/asm/umip.h > > create mode 100644 arch/x86/kernel/umip.c > > > > diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h > > new file mode 100644 > > index 0000000..077b236 > > --- /dev/null > > +++ b/arch/x86/include/asm/umip.h > > @@ -0,0 +1,15 @@ > > +#ifndef _ASM_X86_UMIP_H > > +#define _ASM_X86_UMIP_H > > + > > +#include > > +#include > > + > > +#ifdef CONFIG_X86_INTEL_UMIP > > +bool fixup_umip_exception(struct pt_regs *regs); > > +#else > > +static inline bool fixup_umip_exception(struct pt_regs *regs) > > +{ > > + return false; > > +} > > Let's save some header lines: > > static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; } > > those trunks take too much space as it is. I will correct. > > > +#endif /* CONFIG_X86_INTEL_UMIP */ > > +#endif /* _ASM_X86_UMIP_H */ > > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > > index 4b99423..cc1b7cc 100644 > > --- a/arch/x86/kernel/Makefile > > +++ b/arch/x86/kernel/Makefile > > @@ -123,6 +123,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o > > obj-$(CONFIG_PERF_EVENTS) += perf_regs.o > > obj-$(CONFIG_TRACING) += tracepoint.o > > obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o > > +obj-$(CONFIG_X86_INTEL_UMIP) += umip.o > > > > ifdef CONFIG_FRAME_POINTER > > obj-y += unwind_frame.o > > diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c > > new file mode 100644 > > index 0000000..c7c5795 > > --- /dev/null > > +++ b/arch/x86/kernel/umip.c > > @@ -0,0 +1,245 @@ > > +/* > > + * umip.c Emulation for instruction protected by the Intel User-Mode > > + * Instruction Prevention. The instructions are: > > + * sgdt > > + * sldt > > + * sidt > > + * str > > + * smsw > > + * > > + * Copyright (c) 2017, Intel Corporation. > > + * Ricardo Neri > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* > > + * == Base addresses of GDT and IDT > > + * Some applications to function rely finding the global descriptor table (GDT) > > That formulation reads funny. I will correct. > > > + * and the interrupt descriptor table (IDT) in kernel memory. > > + * For x86_32, the selected values do not match any particular hole, but it > > + * suffices to provide a memory location within kernel memory. > > + * > > + * == CRO flags for SMSW > > + * Use the flags given when booting, as found in head_32.S > > + */ > > + > > +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \ > > + X86_CR0_WP | X86_CR0_AM) > > Why not pull those up in asm/processor-flags.h or so and share the > definition instead of duplicating it? Sure, I will relocate this definition. > > > +#define UMIP_DUMMY_GDT_BASE 0xfffe0000 > > +#define UMIP_DUMMY_IDT_BASE 0xffff0000 > > + > > +enum umip_insn { > > + UMIP_SGDT = 0, /* opcode 0f 01 ModR/M reg 0 */ > > + UMIP_SIDT, /* opcode 0f 01 ModR/M reg 1 */ > > + UMIP_SLDT, /* opcode 0f 00 ModR/M reg 0 */ > > + UMIP_SMSW, /* opcode 0f 01 ModR/M reg 4 */ > > + UMIP_STR, /* opcode 0f 00 ModR/M reg 1 */ > > Let's stick to a single spelling: ModRM.reg=0, etc. > > Better yet, use the SDM format: > > UMIP_SGDT = 0, /* 0F 01 /0 */ > UMIP_SIDT, /* 0F 01 /1 */ > ... > I will update accordingly. > > +}; > > + > > +/** > > + * __identify_insn() - Identify a UMIP-protected instruction > > + * @insn: Instruction structure with opcode and ModRM byte. > > + * > > + * From the instruction opcode and the reg part of the ModRM byte, identify, > > + * if any, a UMIP-protected instruction. > > + * > > + * Return: an enumeration of a UMIP-protected instruction; -EINVAL on failure. > > + */ > > +static int __identify_insn(struct insn *insn) > > static enum umip_insn __identify_insn(... > > But frankly, that enum looks pointless to me - it is used locally only > and you can just as well use plain ints. I will change to plain ints. > > > +{ > > + /* By getting modrm we also get the opcode. */ > > + insn_get_modrm(insn); > > + > > + /* All the instructions of interest start with 0x0f. */ > > + if (insn->opcode.bytes[0] != 0xf) > > + return -EINVAL; > > + > > + if (insn->opcode.bytes[1] == 0x1) { > > + switch (X86_MODRM_REG(insn->modrm.value)) { > > + case 0: > > + return UMIP_SGDT; > > + case 1: > > + return UMIP_SIDT; > > + case 4: > > + return UMIP_SMSW; > > + default: > > + return -EINVAL; > > + } > > + } > > + /* SLDT AND STR are not emulated */ > > + return -EINVAL; > > +} > > + > > +/** > > + * __emulate_umip_insn() - Emulate UMIP instructions with dummy values > > + * @insn: Instruction structure with ModRM byte > > + * @umip_inst: Instruction to emulate > > + * @data: Buffer onto which the dummy values will be copied > > + * @data_size: Size of the emulated result > > + * > > + * Emulate an instruction protected by UMIP. The result of the emulation > > + * is saved in the provided buffer. The size of the results depends on both > > + * the instruction and type of operand (register vs memory address). Thus, > > + * the size of the result needs to be updated. > > + * > > + * Result: 0 if success, -EINVAL on failure to emulate > > + */ > > +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst, > > + unsigned char *data, int *data_size) > > +{ > > + unsigned long dummy_base_addr; > > + unsigned short dummy_limit = 0; > > + unsigned int dummy_value = 0; > > + > > + switch (umip_inst) { > > + /* > > + * These two instructions return the base address and limit of the > > + * global and interrupt descriptor table. The base address can be > > + * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand > > + * size is 16-bit the returned value of the base address is supposed > > + * to be a zero-extended 24-byte number. However, it seems that a > > + * 32-byte number is always returned in legacy protected mode > > + * irrespective of the operand size. > > + */ > > + case UMIP_SGDT: > > + /* fall through */ > > + case UMIP_SIDT: > > + if (umip_inst == UMIP_SGDT) > > + dummy_base_addr = UMIP_DUMMY_GDT_BASE; > > + else > > + dummy_base_addr = UMIP_DUMMY_IDT_BASE; > > + if (X86_MODRM_MOD(insn->modrm.value) == 3) { > > + /* SGDT and SIDT do not take register as argument. */ > > Comment above the if. I will correct. > > > + return -EINVAL; > > + } > > So that check needs to go first, then the dummy_base_addr assignment. I will rearrange. > > > + > > + memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr)); > > + memcpy(data, &dummy_limit, sizeof(dummy_limit)); > > + *data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit); > > Huh, that value will always be the same - why do you have a specific > variable? It could be a define, once for 32-bit and once for 64-bit. Sure. I will use #define's. > > + break; > > + case UMIP_SMSW: > > + /* > > + * Even though CR0_STATE contain 4 bytes, the number > > + * of bytes to be copied in the result buffer is determined > > + * by whether the operand is a register or a memory location. > > + */ > > + dummy_value = CR0_STATE; > > Something's wrong here: how does that local, write-only variable have > any effect? Ah yes, initially SMSW, SLDT and STR were handled equally. Since I removed support for the last two, I inadvertently removed the code that copies the result of SMSW. I will re-add it. > > > + /* > > + * These two instructions return a 16-bit value. We return > > + * all zeros. This is equivalent to a null descriptor for > > + * str and sldt. > > + */ > > + /* SLDT and STR are not emulated */ > > + /* fall through */ > > + case UMIP_SLDT: > > + /* fall through */ > > + case UMIP_STR: > > + /* fall through */ > > + default: > > + return -EINVAL; > > That switch-case has a majority of fall-throughs. So make it an if-else > instead. Sure, I will update. > > > + } > > + return 0; > > +} > > + > > +/** > > + * fixup_umip_exception() - Fixup #GP faults caused by UMIP > > + * @regs: Registers as saved when entering the #GP trap > > + * > > + * The instructions sgdt, sidt, str, smsw, sldt cause a general protection > > + * fault if with CPL > 0 (i.e., from user space). This function can be > > + * used to emulate the results of the aforementioned instructions with > > + * dummy values. Results are copied to user-space memory as indicated by > > + * the instruction pointed by EIP using the registers indicated in the > > + * instruction operands. This function also takes care of determining > > + * the address to which the results must be copied. > > + */ > > +bool fixup_umip_exception(struct pt_regs *regs) > > +{ > > + struct insn insn; > > + unsigned char buf[MAX_INSN_SIZE]; > > + /* 10 bytes is the maximum size of the result of UMIP instructions */ > > + unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; > > unsigned char dummy_data[10] = { 0 }; > > One 0 should be enough :) Right. I will update. > > > + unsigned long seg_base; > > + int not_copied, nr_copied, reg_offset, dummy_data_size; > > + void __user *uaddr; > > + unsigned long *reg_addr; > > + enum umip_insn umip_inst; > > + struct insn_code_seg_defaults seg_defs; > > Please sort function local variables declaration in a reverse christmas > tree order: > > longest_variable_name; > shorter_var_name; > even_shorter; > i; > I will rearrange my variables. > > + > > + /* > > + * Use the segment base in case user space used a different code > > + * segment, either in protected (e.g., from an LDT) or virtual-8086 > > + * modes. In most of the cases seg_base will be zero as in USER_CS. > > + */ > > + seg_base = insn_get_seg_base(regs, &insn, > > + offsetof(struct pt_regs, ip)); > > Oh boy, where's the error handling?! That can return -1L. > > > + not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip), > > -1L + regs->ip is then your pwnage. I will add the error handling code. > > > + sizeof(buf)); > > Just let them stick out. Sure. > > > + nr_copied = sizeof(buf) - not_copied; > > <---- newline here. I will add the new line. > > > + /* > > + * The copy_from_user above could have failed if user code is protected > () > > > + * by a memory protection key. Give up on emulation in such a case. > > + * Should we issue a page fault? > > Why? AFAICT, you're in the #GP handler. Simply you return unhandled. If I returned unhandled, a SIGSEGV will be sent to the user space application but siginfo will look like a #GP. However, memory protection keys cause page faults and siginfo is filled differently. > > > + */ > > + if (!nr_copied) > > + return false; > > + > > + insn_init(&insn, buf, nr_copied, user_64bit_mode(regs)); > > + > > + /* > > + * Override the default operand and address sizes to what is specified > > + * in the code segment descriptor. The instruction decoder only sets > > + * the address size it to either 4 or 8 address bytes and does nothing > > + * for the operand bytes. This OK for most of the cases, but we could > > + * have special cases where, for instance, a 16-bit code segment > > + * descriptor is used. > > + * If there are overrides, the instruction decoder correctly updates > > + * these values, even for 16-bit defaults. > > + */ > > + seg_defs = insn_get_code_seg_defaults(regs); > > + insn.addr_bytes = seg_defs.address_bytes; > > + insn.opnd_bytes = seg_defs.operand_bytes; > > + > > + if (!insn.addr_bytes || !insn.opnd_bytes) > > + return false; > > + > > + if (user_64bit_mode(regs)) > > + return false; > > + > > + insn_get_length(&insn); > > + if (nr_copied < insn.length) > > + return false; > > + > > + umip_inst = __identify_insn(&insn); > > + /* Check if we found an instruction protected by UMIP */ > > Put comment above the function call. Will do. > > > + if (umip_inst < 0) > > + return false; > > + > > + if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size)) > > + return false; > > + > > + /* If operand is a register, write directly to it */ > > + if (X86_MODRM_MOD(insn.modrm.value) == 3) { > > + reg_offset = insn_get_modrm_rm_off(&insn, regs); > > Grr, error handling!! That reg_offset can be -E. I will add the error handling code. > > > + reg_addr = (unsigned long *)((unsigned long)regs + reg_offset); > > + memcpy(reg_addr, dummy_data, dummy_data_size); > > + } else { > > + uaddr = insn_get_addr_ref(&insn, regs); > > + /* user address could not be determined, abort emulation */ > > That comment is kinda obvious. But yes, this has error handling. OK, I will remove this comment. Many thanks for your detailed review! BR, Ricardo