All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@oracle.com>
To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com,
	linux-kernel@vger.kernel.org, Alexander.Levin@microsoft.com,
	dan.j.williams@intel.com, sathyanarayanan.kuppuswamy@intel.com,
	pankaj.laxminarayan.bharadiya@intel.com, akuster@mvista.com,
	cminyard@mvista.com, pasha.tatashin@oracle.com,
	gregkh@linuxfoundation.org, stable@vger.kernel.org
Subject: [PATCH 4.1 46/65] kaiser: paranoid_entry pass cr3 need to paranoid_exit
Date: Mon,  5 Mar 2018 19:25:19 -0500	[thread overview]
Message-ID: <20180306002538.1761-47-pasha.tatashin@oracle.com> (raw)
In-Reply-To: <20180306002538.1761-1-pasha.tatashin@oracle.com>

From: Hugh Dickins <hughd@google.com>

Neel Natu points out that paranoid_entry() was wrong to assume that
an entry that did not need swapgs would not need SWITCH_KERNEL_CR3:
paranoid_entry (used for debug breakpoint, int3, double fault or MCE;
though I think it's only the MCE case that is cause for concern here)
can break in at an awkward time, between cr3 switch and swapgs, but
its handling always needs kernel gs and kernel cr3.

Easy to fix in itself, but paranoid_entry() also needs to convey to
paranoid_exit() (and my reading of macro idtentry says paranoid_entry
and paranoid_exit are always paired) how to restore the prior state.
The swapgs state is already conveyed by %ebx (0 or 1), so extend that
also to convey when SWITCH_USER_CR3 will be needed (2 or 3).

(Yes, I'd much prefer that 0 meant no swapgs, whereas it's the other
way round: and a convention shared with error_entry() and error_exit(),
which I don't want to touch.  Perhaps I should have inverted the bit
for switch cr3 too, but did not.)

paranoid_exit() would be straightforward, except for TRACE_IRQS: it
did TRACE_IRQS_IRETQ when doing swapgs, but TRACE_IRQS_IRETQ_DEBUG
when not: which is it supposed to use when SWITCH_USER_CR3 is split
apart from that?  As best as I can determine, commit 5963e317b1e9
("ftrace/x86: Do not change stacks in DEBUG when calling lockdep")
missed the swapgs case, and should have used TRACE_IRQS_IRETQ_DEBUG
there too (the discrepancy has nothing to do with the liberal use
of _NO_STACK and _UNSAFE_STACK hereabouts: TRACE_IRQS_OFF_DEBUG has
just been used in all cases); discrepancy lovingly preserved across
several paranoid_exit() cleanups, but I'm now removing it.

Neel further indicates that to use SWITCH_USER_CR3_NO_STACK there in
paranoid_exit() is now not only unnecessary but unsafe: might corrupt
syscall entry's unsafe_stack_register_backup of %rax.  Just use
SWITCH_USER_CR3: and delete SWITCH_USER_CR3_NO_STACK altogether,
before we make the mistake of using it again.

hughd adds: this commit fixes an issue in the Kaiser-without-PCIDs
part of the series, and ought to be moved earlier, if you decided
to make a release of Kaiser-without-PCIDs.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fc8334e6b3e5d28afd4eec8a74493933f73b2784)
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>

Conflicts:
	arch/x86/entry/entry_64.S (not in this tree)
	arch/x86/kernel/entry_64.S (patched instead of that)
---
 arch/x86/include/asm/kaiser.h |  8 --------
 arch/x86/kernel/entry_64.S    | 46 +++++++++++++++++++++++++++++++++----------
 2 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kaiser.h b/arch/x86/include/asm/kaiser.h
index 48d8d70dd8c7..3dc5f4c39b3e 100644
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -63,20 +63,12 @@ _SWITCH_TO_KERNEL_CR3 %rax
 movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
 .endm
 
-.macro SWITCH_USER_CR3_NO_STACK
-movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
-_SWITCH_TO_USER_CR3 %rax %al
-movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
-.endm
-
 #else /* CONFIG_KAISER */
 
 .macro SWITCH_KERNEL_CR3 reg
 .endm
 .macro SWITCH_USER_CR3 reg regb
 .endm
-.macro SWITCH_USER_CR3_NO_STACK
-.endm
 .macro SWITCH_KERNEL_CR3_NO_STACK
 .endm
 
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0508420c6cde..3973c43903f5 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1300,7 +1300,11 @@ idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(
 /*
  * Save all registers in pt_regs, and switch gs if needed.
  * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ *
+ * Return: ebx=0: needs swapgs but not SWITCH_USER_CR3 in paranoid_exit
+ *         ebx=1: needs neither swapgs nor SWITCH_USER_CR3 in paranoid_exit
+ *         ebx=2: needs both swapgs and SWITCH_USER_CR3 in paranoid_exit
+ *         ebx=3: needs SWITCH_USER_CR3 but not swapgs in paranoid_exit
  */
 ENTRY(paranoid_entry)
 	XCPT_FRAME 1 15*8
@@ -1313,9 +1317,26 @@ ENTRY(paranoid_entry)
 	testl %edx,%edx
 	js 1f	/* negative -> in kernel */
 	SWAPGS
-	SWITCH_KERNEL_CR3
 	xorl %ebx,%ebx
-1:	ret
+1:
+#ifdef CONFIG_KAISER
+	/*
+	 * We might have come in between a swapgs and a SWITCH_KERNEL_CR3
+	 * on entry, or between a SWITCH_USER_CR3 and a swapgs on exit.
+	 * Do a conditional SWITCH_KERNEL_CR3: this could safely be done
+	 * unconditionally, but we need to find out whether the reverse
+	 * should be done on return (conveyed to paranoid_exit in %ebx).
+	 */
+	movq	%cr3, %rax
+	testl	$KAISER_SHADOW_PGD_OFFSET, %eax
+	jz	2f
+	orl	$2, %ebx
+	andq	$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+	orq	x86_cr3_pcid_noflush, %rax
+	movq	%rax, %cr3
+2:
+#endif
+	ret
 	CFI_ENDPROC
 END(paranoid_entry)
 
@@ -1328,21 +1349,26 @@ END(paranoid_entry)
  * in syscall entry), so checking for preemption here would
  * be complicated.  Fortunately, we there's no good reason
  * to try to handle preemption here.
+ * On entry: ebx=0: needs swapgs but not SWITCH_USER_CR3
+ *           ebx=1: needs neither swapgs nor SWITCH_USER_CR3
+ *           ebx=2: needs both swapgs and SWITCH_USER_CR3
+ *           ebx=3: needs SWITCH_USER_CR3 but not swapgs
  */
-/* On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) */
 ENTRY(paranoid_exit)
 	DEFAULT_FRAME
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF_DEBUG
-	testl %ebx,%ebx				/* swapgs needed? */
+	TRACE_IRQS_IRETQ_DEBUG
+#ifdef CONFIG_KAISER
+	testl   $2, %ebx			/* SWITCH_USER_CR3 needed? */
+	jz      paranoid_exit_no_switch
+	SWITCH_USER_CR3
+paranoid_exit_no_switch:
+#endif
+	testl	$1, %ebx			/* swapgs needed? */
 	jnz paranoid_exit_no_swapgs
-	TRACE_IRQS_IRETQ
-	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
-	jmp paranoid_exit_restore
 paranoid_exit_no_swapgs:
-	TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
 	RESTORE_EXTRA_REGS
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
-- 
2.16.2

  parent reply	other threads:[~2018-03-06  0:25 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-06  0:24 [PATCH 4.1 00/65] page table isolation for stable 4.1 Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 01/65] x86/mm: Add INVPCID helpers Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 02/65] x86/mm: Fix INVPCID asm constraint Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 03/65] x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 04/65] x86/mm: If INVPCID is available, use it to flush global mappings Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 05/65] mm/mmu_context, sched/core: Fix mmu_context.h assumption Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 06/65] sched/core: Add switch_mm_irqs_off() and use it in the scheduler Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 07/65] x86/mm: Build arch/x86/mm/tlb.c even on !SMP Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 08/65] x86/mm, sched/core: Uninline switch_mm() Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 09/65] x86/mm, sched/core: Turn off IRQs in switch_mm() Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 10/65] ARM: Hide finish_arch_post_lock_switch() from modules Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 11/65] sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off() Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 12/65] x86/irq: Do not substract irq_tlb_count from irq_call_count Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 13/65] x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 14/65] x86/mm: Remove flush_tlb() and flush_tlb_current_task() Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 15/65] x86/mm: Make flush_tlb_mm_range() more predictable Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 16/65] x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range() Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 17/65] x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 18/65] x86/mm: Disable PCID on 32-bit kernels Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 19/65] x86/mm: Add the 'nopcid' boot option to turn off PCID Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 20/65] x86/mm: Enable CR4.PCIDE on supported systems Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 21/65] x86/mm/64: Fix reboot interaction with CR4.PCIDE Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 22/65] x86/boot: Add early cmdline parsing for options with arguments Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 23/65] x86/entry: Stop using PER_CPU_VAR(kernel_stack) Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 24/65] x86/entry: Remove unused 'kernel_stack' per-cpu variable Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 25/65] x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code Pavel Tatashin
2018-03-06  0:24 ` [PATCH 4.1 26/65] KAISER: Kernel Address Isolation Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 27/65] kaiser: merged update Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 28/65] kaiser: do not set _PAGE_NX on pgd_none Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 29/65] kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 30/65] kaiser: fix build and FIXME in alloc_ldt_struct() Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 31/65] kaiser: KAISER depends on SMP Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 32/65] kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 33/65] kaiser: fix perf crashes Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 34/65] kaiser: ENOMEM if kaiser_pagetable_walk() NULL Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 35/65] kaiser: tidied up asm/kaiser.h somewhat Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 36/65] kaiser: tidied up kaiser_add/remove_mapping slightly Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 37/65] kaiser: kaiser_remove_mapping() move along the pgd Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 38/65] kaiser: cleanups while trying for gold link Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 39/65] kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 40/65] kaiser: delete KAISER_REAL_SWITCH option Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 41/65] kaiser: vmstat show NR_KAISERTABLE as nr_overhead Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 42/65] kaiser: enhanced by kernel and user PCIDs Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 43/65] kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 44/65] kaiser: PCID 0 for kernel and 128 for user Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 45/65] kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user Pavel Tatashin
2018-03-06  0:25 ` Pavel Tatashin [this message]
2018-03-06  0:25 ` [PATCH 4.1 47/65] kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 48/65] kaiser: fix unlikely error in alloc_ldt_struct() Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 49/65] kaiser: add "nokaiser" boot option, using ALTERNATIVE Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 50/65] " Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 51/65] x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 52/65] x86/kaiser: Check boottime cmdline params Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 53/65] kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 54/65] kaiser: drop is_atomic arg to kaiser_pagetable_walk() Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 55/65] kaiser: asm/tlbflush.h handle noPGE at lower level Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 56/65] kaiser: kaiser_flush_tlb_on_return_to_user() check PCID Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 57/65] x86/paravirt: Dont patch flush_tlb_single Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 58/65] x86/kaiser: Reenable PARAVIRT Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 59/65] x86/kaiser: Move feature detection up Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 60/65] KPTI: Rename to PAGE_TABLE_ISOLATION Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 61/65] x86/ldt: fix crash in ldt freeing Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 62/65] PTI: unbreak EFI old_memmap Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 63/65] kpti: Disable when running under Xen PV Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 64/65] pti: Rename X86_FEATURE_KAISER to X86_FEATURE_PTI Pavel Tatashin
2018-03-06  0:25 ` [PATCH 4.1 65/65] x86/pti/efi: broken conversion from efi to kernel page table Pavel Tatashin
2018-03-07 12:55 ` [PATCH 4.1 00/65] page table isolation for stable 4.1 Jiri Kosina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180306002538.1761-47-pasha.tatashin@oracle.com \
    --to=pasha.tatashin@oracle.com \
    --cc=Alexander.Levin@microsoft.com \
    --cc=akuster@mvista.com \
    --cc=cminyard@mvista.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pankaj.laxminarayan.bharadiya@intel.com \
    --cc=sathyanarayanan.kuppuswamy@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.