From: Borislav Petkov <bp@alien8.de>
To: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Alexander van Heukelum <heukelum@fastmail.fm>,
Andy Lutomirski <amluto@gmail.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Arjan van de Ven <arjan.van.de.ven@intel.com>,
Brian Gerst <brgerst@gmail.com>,
Alexandre Julliard <julliard@winehq.com>,
Andi Kleen <andi@firstfloor.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*
Date: Tue, 22 Apr 2014 13:25:50 +0200 [thread overview]
Message-ID: <20140422112550.GC15882@pd.tnic> (raw)
In-Reply-To: <1398120472-6190-1-git-send-email-hpa@linux.intel.com>
Just nitpicks below:
On Mon, Apr 21, 2014 at 03:47:52PM -0700, H. Peter Anvin wrote:
> This is a prototype of espfix for the 64-bit kernel. espfix is a
> workaround for the architectural definition of IRET, which fails to
> restore bits [31:16] of %esp when returning to a 16-bit stack
> segment. We have a workaround for the 32-bit kernel, but that
> implementation doesn't work for 64 bits.
>
> The 64-bit implementation works like this:
>
> Set up a ministack for each CPU, which is then mapped 65536 times
> using the page tables. This implementation uses the second-to-last
> PGD slot for this; with a 64-byte espfix stack this is sufficient for
> 2^18 CPUs (currently we support a max of 2^13 CPUs.)
I wish we'd put this description in the code instead of in a commit
message as those can get lost in git history over time.
> 64 bytes appear to be sufficient, because NMI and #MC cause a task
> switch.
>
> THIS IS A PROTOTYPE AND IS NOT COMPLETE. We need to make sure all
> code paths that can interrupt userspace execute this code.
> Fortunately we never need to use the espfix stack for nested faults,
> so one per CPU is guaranteed to be safe.
>
> Furthermore, this code adds unnecessary instructions to the common
> path. For example, on exception entry we push %rdi, pop %rdi, and
> then save away %rdi. Ideally we should do this in such a way that we
> avoid unnecessary swapgs, especially on the IRET path (the exception
> path is going to be very rare, and so is less critical.)
>
> Putting this version out there for people to look at/laugh at/play
> with.
>
> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> Link: http://lkml.kernel.org/r/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Alexander van Heukelum <heukelum@fastmail.fm>
> Cc: Andy Lutomirski <amluto@gmail.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Alexandre Julliard <julliard@winehq.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
...
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 1e96c3628bf2..7cc01770bf21 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -58,6 +58,7 @@
> #include <asm/asm.h>
> #include <asm/context_tracking.h>
> #include <asm/smap.h>
> +#include <asm/pgtable_types.h>
> #include <linux/err.h>
>
> /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
> @@ -1040,8 +1041,16 @@ restore_args:
> RESTORE_ARGS 1,8,1
>
> irq_return:
> + /*
> + * Are we returning to the LDT? Note: in 64-bit mode
> + * SS:RSP on the exception stack is always valid.
> + */
> + testb $4,(SS-RIP)(%rsp)
> + jnz irq_return_ldt
> +
> +irq_return_iret:
> INTERRUPT_RETURN
> - _ASM_EXTABLE(irq_return, bad_iret)
> + _ASM_EXTABLE(irq_return_iret, bad_iret)
>
> #ifdef CONFIG_PARAVIRT
> ENTRY(native_iret)
> @@ -1049,6 +1058,34 @@ ENTRY(native_iret)
> _ASM_EXTABLE(native_iret, bad_iret)
> #endif
>
> +irq_return_ldt:
> + pushq_cfi %rcx
> + larl (CS-RIP+8)(%rsp), %ecx
> + jnz 1f /* Invalid segment - will #GP at IRET time */
> + testl $0x00200000, %ecx
> + jnz 1f /* Returning to 64-bit mode */
> + larl (SS-RIP+8)(%rsp), %ecx
> + jnz 1f /* Invalid segment - will #SS at IRET time */
You mean " ... will #GP at IRET time"? But you're right, you're looking
at SS :-)
> + testl $0x00400000, %ecx
> + jnz 1f /* Not a 16-bit stack segment */
> + pushq_cfi %rsi
> + pushq_cfi %rdi
> + SWAPGS
> + movq PER_CPU_VAR(espfix_stack),%rdi
> + movl (RSP-RIP+3*8)(%rsp),%esi
> + xorw %si,%si
> + orq %rsi,%rdi
> + movq %rsp,%rsi
> + movl $8,%ecx
> + rep;movsq
> + leaq -(8*8)(%rdi),%rsp
> + SWAPGS
> + popq_cfi %rdi
> + popq_cfi %rsi
> +1:
> + popq_cfi %rcx
> + jmp irq_return_iret
> +
> .section .fixup,"ax"
> bad_iret:
> /*
...
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 85126ccbdf6b..dc2d8afcafe9 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -32,6 +32,7 @@
> * Manage page tables very early on.
> */
> extern pgd_t early_level4_pgt[PTRS_PER_PGD];
> +extern pud_t espfix_pud_page[PTRS_PER_PUD];
I guess you don't need the "extern" here.
> extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
> static unsigned int __initdata next_early_pgt = 2;
> pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
> diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
> index af1d14a9ebda..ebc987398923 100644
> --- a/arch/x86/kernel/ldt.c
> +++ b/arch/x86/kernel/ldt.c
> @@ -229,17 +229,6 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
> }
> }
>
> - /*
> - * On x86-64 we do not support 16-bit segments due to
> - * IRET leaking the high bits of the kernel stack address.
> - */
> -#ifdef CONFIG_X86_64
> - if (!ldt_info.seg_32bit) {
> - error = -EINVAL;
> - goto out_unlock;
> - }
> -#endif
> -
> fill_ldt(&ldt, &ldt_info);
> if (oldmode)
> ldt.avl = 0;
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 34826934d4a7..ff32efb14e33 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -244,6 +244,11 @@ static void notrace start_secondary(void *unused)
> check_tsc_sync_target();
>
> /*
> + * Enable the espfix hack for this CPU
> + */
> + init_espfix_cpu();
> +
> + /*
> * We need to hold vector_lock so there the set of online cpus
> * does not change while we are assigning vectors to cpus. Holding
> * this lock ensures we don't half assign or remove an irq from a cpu.
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 20621d753d5f..96bf767a05fc 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -327,6 +327,8 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
> int i;
> struct pg_state st = {};
>
> + st.to_dmesg = true;
Right, remove before applying :)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2014-04-22 11:26 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-11 17:36 [tip:x86/urgent] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels tip-bot for H. Peter Anvin
2014-04-11 18:12 ` Andy Lutomirski
2014-04-11 18:20 ` H. Peter Anvin
2014-04-11 18:27 ` Brian Gerst
2014-04-11 18:29 ` H. Peter Anvin
2014-04-11 18:35 ` Brian Gerst
2014-04-11 21:16 ` Andy Lutomirski
2014-04-11 21:24 ` H. Peter Anvin
2014-04-11 21:53 ` Andy Lutomirski
2014-04-11 21:59 ` H. Peter Anvin
2014-04-11 22:15 ` Andy Lutomirski
2014-04-11 22:18 ` H. Peter Anvin
2014-04-13 4:20 ` H. Peter Anvin
2014-04-12 23:26 ` Alexander van Heukelum
2014-04-12 23:31 ` H. Peter Anvin
2014-04-12 23:49 ` Alexander van Heukelum
2014-04-13 0:03 ` H. Peter Anvin
2014-04-13 1:25 ` Andy Lutomirski
2014-04-13 1:29 ` Andy Lutomirski
2014-04-13 3:00 ` H. Peter Anvin
2014-04-11 21:34 ` Linus Torvalds
2014-04-11 18:41 ` Linus Torvalds
2014-04-11 18:45 ` Brian Gerst
2014-04-11 18:50 ` Linus Torvalds
2014-04-12 4:44 ` Brian Gerst
2014-04-12 17:18 ` H. Peter Anvin
2014-04-12 19:35 ` Borislav Petkov
2014-04-12 19:44 ` H. Peter Anvin
2014-04-12 20:11 ` Borislav Petkov
2014-04-12 20:34 ` Brian Gerst
2014-04-12 20:59 ` Borislav Petkov
2014-04-12 21:13 ` Brian Gerst
2014-04-12 21:40 ` Borislav Petkov
2014-04-14 7:21 ` Ingo Molnar
2014-04-14 9:44 ` Borislav Petkov
2014-04-14 9:47 ` Ingo Molnar
2014-04-12 21:53 ` Linus Torvalds
2014-04-12 22:25 ` H. Peter Anvin
2014-04-13 2:56 ` Andi Kleen
2014-04-13 3:02 ` H. Peter Anvin
2014-04-13 3:13 ` Linus Torvalds
2014-04-12 20:29 ` Brian Gerst
2014-04-14 7:48 ` Alexandre Julliard
2014-05-07 9:18 ` Sven Joachim
2014-05-07 10:18 ` Borislav Petkov
2014-05-07 16:57 ` Linus Torvalds
2014-05-07 17:09 ` H. Peter Anvin
2014-05-07 17:50 ` Alexandre Julliard
2014-05-08 6:43 ` Sven Joachim
2014-05-08 13:50 ` H. Peter Anvin
2014-05-08 20:13 ` H. Peter Anvin
2014-05-08 20:40 ` H. Peter Anvin
2014-05-12 13:16 ` Josh Boyer
2014-05-12 16:52 ` H. Peter Anvin
2014-05-14 23:43 ` [tip:x86/urgent] x86-64, modify_ldt: Make support for 16-bit segments a runtime option tip-bot for Linus Torvalds
2014-04-11 18:46 ` [tip:x86/urgent] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels H. Peter Anvin
2014-04-14 7:27 ` Ingo Molnar
2014-04-14 15:45 ` H. Peter Anvin
2014-04-13 2:54 ` Andi Kleen
2014-04-21 22:47 ` [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE* H. Peter Anvin
2014-04-21 23:19 ` Andrew Lutomirski
2014-04-21 23:29 ` H. Peter Anvin
2014-04-22 0:37 ` Andrew Lutomirski
2014-04-22 0:53 ` H. Peter Anvin
2014-04-22 1:06 ` Andrew Lutomirski
2014-04-22 1:14 ` H. Peter Anvin
2014-04-22 1:28 ` Andrew Lutomirski
2014-04-22 1:47 ` H. Peter Anvin
2014-04-22 1:53 ` Andrew Lutomirski
2014-04-22 11:23 ` Borislav Petkov
2014-04-22 14:46 ` Borislav Petkov
2014-04-22 16:03 ` Andrew Lutomirski
2014-04-22 16:10 ` H. Peter Anvin
2014-04-22 16:33 ` Andrew Lutomirski
2014-04-22 16:43 ` Linus Torvalds
2014-04-22 17:00 ` Andrew Lutomirski
2014-04-22 17:04 ` Linus Torvalds
2014-04-22 17:11 ` Andrew Lutomirski
2014-04-22 17:15 ` H. Peter Anvin
2014-04-23 9:54 ` One Thousand Gnomes
2014-04-23 15:53 ` H. Peter Anvin
2014-04-23 17:08 ` Andrew Lutomirski
2014-04-23 17:16 ` H. Peter Anvin
2014-04-23 17:25 ` Andrew Lutomirski
2014-04-23 17:28 ` H. Peter Anvin
2014-04-23 17:45 ` Andrew Lutomirski
2014-04-22 17:19 ` Linus Torvalds
2014-04-22 17:29 ` H. Peter Anvin
2014-04-22 17:46 ` Andrew Lutomirski
2014-04-22 17:59 ` H. Peter Anvin
2014-04-22 18:03 ` Brian Gerst
2014-04-22 18:06 ` H. Peter Anvin
2014-04-22 18:17 ` Brian Gerst
2014-04-22 18:51 ` H. Peter Anvin
2014-04-22 19:55 ` Brian Gerst
2014-04-22 20:17 ` H. Peter Anvin
2014-04-22 23:08 ` Brian Gerst
2014-04-22 23:39 ` Andi Kleen
2014-04-22 23:40 ` H. Peter Anvin
2014-04-22 17:11 ` H. Peter Anvin
2014-04-22 17:26 ` Borislav Petkov
2014-04-22 17:29 ` Andrew Lutomirski
2014-04-22 19:27 ` Borislav Petkov
2014-04-23 6:24 ` H. Peter Anvin
2014-04-23 8:57 ` Alexandre Julliard
2014-04-22 17:09 ` H. Peter Anvin
2014-04-22 17:20 ` Andrew Lutomirski
2014-04-22 17:24 ` H. Peter Anvin
2014-04-22 11:25 ` Borislav Petkov [this message]
2014-04-23 1:17 ` H. Peter Anvin
2014-04-23 1:23 ` Andrew Lutomirski
2014-04-23 1:42 ` H. Peter Anvin
2014-04-23 14:24 ` Boris Ostrovsky
2014-04-23 16:56 ` H. Peter Anvin
2014-04-28 13:04 ` Konrad Rzeszutek Wilk
2014-04-25 21:02 ` Konrad Rzeszutek Wilk
2014-04-25 21:16 ` H. Peter Anvin
2014-04-24 4:13 ` comex
2014-04-24 4:53 ` Andrew Lutomirski
2014-04-24 22:24 ` H. Peter Anvin
2014-04-24 22:31 ` Andrew Lutomirski
2014-04-24 22:37 ` H. Peter Anvin
2014-04-24 22:43 ` Andrew Lutomirski
2014-04-28 23:05 ` H. Peter Anvin
2014-04-28 23:08 ` H. Peter Anvin
2014-04-29 0:02 ` Andrew Lutomirski
2014-04-29 0:15 ` H. Peter Anvin
2014-04-29 0:20 ` Andrew Lutomirski
2014-04-29 2:38 ` H. Peter Anvin
2014-04-29 2:44 ` H. Peter Anvin
2014-04-29 3:45 ` H. Peter Anvin
2014-04-29 3:47 ` H. Peter Anvin
2014-04-29 4:36 ` H. Peter Anvin
2014-04-29 7:14 ` H. Peter Anvin
2014-04-25 12:02 ` Pavel Machek
2014-04-25 21:20 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140422112550.GC15882@pd.tnic \
--to=bp@alien8.de \
--cc=amluto@gmail.com \
--cc=andi@firstfloor.org \
--cc=arjan.van.de.ven@intel.com \
--cc=boris.ostrovsky@oracle.com \
--cc=brgerst@gmail.com \
--cc=heukelum@fastmail.fm \
--cc=hpa@linux.intel.com \
--cc=hpa@zytor.com \
--cc=julliard@winehq.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).