From: Andy Lutomirski <luto@MIT.EDU>
To: Ingo Molnar <mingo@elte.hu>, x86@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, Jesper Juhl <jj@chaosbits.net>,
Borislav Petkov <bp@alien8.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>,
Jan Beulich <JBeulich@novell.com>,
richard -rw- weinberger <richard.weinberger@gmail.com>,
Mikael Pettersson <mikpe@it.uu.se>,
Andi Kleen <andi@firstfloor.org>, Brian Gerst <brgerst@gmail.com>,
Louis Rilling <Louis.Rilling@kerlabs.com>,
Valdis.Kletnieks@vt.edu, pageexec@freemail.hu,
Andy Lutomirski <luto@MIT.EDU>
Subject: [PATCH v5 2/9] x86-64: Document some of entry_64.S
Date: Sun, 5 Jun 2011 13:50:18 -0400 [thread overview]
Message-ID: <fc134867cc550977cc996866129e11a16ba0f9ea.1307292171.git.luto@mit.edu> (raw)
In-Reply-To: <cover.1307292171.git.luto@mit.edu>
In-Reply-To: <cover.1307292171.git.luto@mit.edu>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
Documentation/x86/entry_64.txt | 98 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/entry_64.S | 2 +
2 files changed, 100 insertions(+), 0 deletions(-)
create mode 100644 Documentation/x86/entry_64.txt
diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
new file mode 100644
index 0000000..7869f14
--- /dev/null
+++ b/Documentation/x86/entry_64.txt
@@ -0,0 +1,98 @@
+This file documents some of the kernel entries in
+arch/x86/kernel/entry_64.S. A lot of this explanation is adapted from
+an email from Ingo Molnar:
+
+http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
+
+The x86 architecture has quite a few different ways to jump into
+kernel code. Most of these entry points are registered in
+arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S
+and arch/x86/ia32/ia32entry.S.
+
+The IDT vector assignments are listed in arch/x86/include/irq_vectors.h.
+
+Some of these entries are:
+
+ - system_call: syscall instruction from 64-bit code.
+
+ - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
+ either way.
+
+ - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit
+ code
+
+ - interrupt: An array of entries. Every IDT vector that doesn't
+ explicitly point somewhere else gets set to the corresponding
+ value in interrupts. These point to a whole array of
+ magically-generated functions that make their way to do_IRQ with
+ the interrupt number as a parameter.
+
+ - emulate_vsyscall: int 0xcc, a special non-ABI entry used by
+ vsyscall emulation.
+
+ - APIC interrupts: Various special-purpose interrupts for things
+ like TLB shootdown.
+
+ - Architecturally-defined exceptions like divide_error.
+
+There are a few complexities here. The different x86-64 entries
+have different calling conventions. The syscall and sysenter
+instructions have their own peculiar calling conventions. Some of
+the IDT entries push an error code onto the stack; others don't.
+IDT entries using the IST alternative stack mechanism need their own
+magic to get the stack frames right. (You can find some
+documentation in the AMD APM, Volume 2, Chapter 8 and the Intel SDM,
+Volume 3, Chapter 6.)
+
+Dealing with the swapgs instruction is especially tricky. Swapgs
+toggles whether gs is the kernel gs or the user gs. The swapgs
+instruction is rather fragile: it must nest perfectly and only in
+single depth, it should only be used if entering from user mode to
+kernel mode and then when returning to user-space, and precisely
+so. If we mess that up even slightly, we crash.
+
+So when we have a secondary entry, already in kernel mode, we *must
+not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
+not switched/swapped yet.
+
+Now, there's a secondary complication: there's a cheap way to test
+which mode the CPU is in and an expensive way.
+
+The cheap way is to pick this info off the entry frame on the kernel
+stack, from the CS of the ptregs area of the kernel stack:
+
+ xorl %ebx,%ebx
+ testl $3,CS+8(%rsp)
+ je error_kernelspace
+ SWAPGS
+
+The expensive (paranoid) way is to read back the MSR_GS_BASE value
+(which is what SWAPGS modifies):
+
+ movl $1,%ebx
+ movl $MSR_GS_BASE,%ecx
+ rdmsr
+ testl %edx,%edx
+ js 1f /* negative -> in kernel */
+ SWAPGS
+ xorl %ebx,%ebx
+1: ret
+
+and the whole paranoid non-paranoid macro complexity is about whether
+to suffer that RDMSR cost.
+
+If we are at an interrupt or user-trap/gate-alike boundary then we can
+use the faster check: the stack will be a reliable indicator of
+whether SWAPGS was already done: if we see that we are a secondary
+entry interrupting kernel mode execution, then we know that the GS
+base has already been switched. If it says that we interrupted
+user-space execution then we must do the SWAPGS.
+
+But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
+which might have triggered right after a normal entry wrote CS to the
+stack but before we executed SWAPGS, then the only safe way to check
+for GS is the slower method: the RDMSR.
+
+So we try only to mark those entry methods 'paranoid' that absolutely
+need the more expensive check for the GS base - and we generate all
+'normal' entry points with the regular (faster) entry macros.
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 8a445a0..72c4a77 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -9,6 +9,8 @@
/*
* entry.S contains the system-call and fault low-level handling routines.
*
+ * Some of this is documented in Documentation/x86/entry_64.txt
+ *
* NOTE: This code handles signal-recognition, which happens every time
* after an interrupt and after each system call.
*
--
1.7.5.2
next prev parent reply other threads:[~2011-06-05 18:04 UTC|newest]
Thread overview: 112+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable Andy Lutomirski
2011-06-06 8:31 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` Andy Lutomirski [this message]
2011-06-06 8:31 ` [tip:x86/vdso] x86-64: Document some of entry_64.S tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 3/9] x86-64: Give vvars their own page Andy Lutomirski
2011-06-06 8:32 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
2011-06-06 8:32 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-12-05 18:27 ` [PATCH v5 4/9] " Matthew Maurer
2011-06-05 17:50 ` [PATCH v5 5/9] x86-64: Map the HPET NX Andy Lutomirski
2011-06-06 8:33 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 6/9] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
2011-06-06 8:33 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 7/9] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
2011-06-06 8:34 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
2011-06-05 19:30 ` Ingo Molnar
2011-06-05 20:01 ` Andrew Lutomirski
2011-06-06 7:39 ` Ingo Molnar
2011-06-06 9:42 ` pageexec
2011-06-06 11:19 ` Andrew Lutomirski
2011-06-06 11:56 ` pageexec
2011-06-06 12:43 ` Andrew Lutomirski
2011-06-06 13:58 ` pageexec
2011-06-06 14:07 ` Brian Gerst
2011-06-07 23:32 ` pageexec
2011-06-07 23:49 ` Andrew Lutomirski
2011-06-08 6:32 ` pageexec
2011-06-06 15:26 ` Ingo Molnar
2011-06-06 15:48 ` pageexec
2011-06-06 15:59 ` Ingo Molnar
2011-06-06 16:19 ` pageexec
2011-06-06 16:47 ` Ingo Molnar
2011-06-06 22:49 ` pageexec
2011-06-06 22:57 ` david
2011-06-07 9:07 ` Ingo Molnar
2011-06-07 6:59 ` Pekka Enberg
2011-06-07 8:30 ` Ingo Molnar
2011-06-07 23:24 ` pageexec
2011-06-08 5:55 ` Pekka Enberg
2011-06-08 6:19 ` pageexec
2011-06-08 6:48 ` Ingo Molnar
2011-06-08 9:02 ` pageexec
2011-06-08 9:11 ` Andi Kleen
2011-06-08 9:35 ` pageexec
2011-06-08 10:06 ` Andi Kleen
2011-06-08 10:26 ` pageexec
2011-06-08 10:39 ` Ingo Molnar
2011-06-08 10:35 ` Ingo Molnar
2011-06-08 9:15 ` Ingo Molnar
2011-06-08 7:16 ` Ingo Molnar
2011-06-08 9:29 ` pageexec
2011-06-06 14:01 ` Linus Torvalds
2011-06-06 14:55 ` pageexec
2011-06-06 15:33 ` Ingo Molnar
2011-06-06 15:58 ` pageexec
2011-06-06 15:41 ` Ingo Molnar
2011-06-06 8:34 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-06 8:35 ` [tip:x86/vdso] x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build tip-bot for Ingo Molnar
2011-06-07 7:49 ` [tip:x86/vdso] x86-64: Emulate legacy vsyscalls tip-bot for Andy Lutomirski
2011-06-07 8:03 ` tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule Andy Lutomirski
2011-06-06 8:34 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-06 8:46 ` [PATCH v5 9/9] " Linus Torvalds
2011-06-06 9:31 ` Andi Kleen
2011-06-06 10:39 ` pageexec
2011-06-06 13:56 ` Linus Torvalds
2011-06-06 18:46 ` pageexec
2011-06-06 20:40 ` Linus Torvalds
2011-06-06 20:51 ` Andrew Lutomirski
2011-06-06 21:54 ` Ingo Molnar
2011-06-06 21:45 ` Ingo Molnar
2011-06-06 21:48 ` Ingo Molnar
[not found] ` <BANLkTi==uw_h78oaep1cCOCzwY0edLUU_Q@mail.gmail.com>
2011-06-07 8:03 ` [PATCH, v6] x86-64: Emulate legacy vsyscalls Ingo Molnar
2011-06-06 21:53 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule pageexec
2011-06-06 14:44 ` Ingo Molnar
2011-06-06 15:01 ` pageexec
2011-06-06 15:15 ` Ingo Molnar
2011-06-06 15:29 ` pageexec
2011-06-06 16:54 ` Ingo Molnar
2011-06-06 18:59 ` pageexec
2011-06-06 19:25 ` Ingo Molnar
2011-06-07 0:34 ` pageexec
2011-06-07 9:51 ` Ingo Molnar
2011-06-07 23:24 ` pageexec
2011-06-10 11:19 ` Ingo Molnar
2011-06-14 0:48 ` pageexec
2011-06-15 19:42 ` Valdis.Kletnieks
2011-06-06 14:52 ` Ingo Molnar
2011-06-06 10:24 ` [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS Ingo Molnar
2011-06-06 11:20 ` pageexec
2011-06-06 12:47 ` Ingo Molnar
2011-06-06 12:48 ` Ingo Molnar
2011-06-06 18:04 ` pageexec
2011-06-06 19:12 ` Ingo Molnar
2011-06-07 0:02 ` pageexec
2011-06-07 9:56 ` Ingo Molnar
2011-06-07 23:24 ` pageexec
2011-06-09 6:48 ` Ingo Molnar
2011-06-09 23:33 ` pageexec
2011-06-07 10:05 ` Ingo Molnar
2011-06-07 23:24 ` pageexec
2011-06-09 7:02 ` Ingo Molnar
2011-06-09 23:33 ` pageexec
2011-06-07 10:13 ` Ingo Molnar
2011-06-07 23:24 ` pageexec
2011-06-06 12:19 ` Ted Ts'o
2011-06-06 12:33 ` Andrew Lutomirski
2011-06-06 12:37 ` Ingo Molnar
2011-06-06 14:34 ` [tip:x86/vdso] " tip-bot for Ingo Molnar
2011-06-05 20:05 ` [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andrew Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fc134867cc550977cc996866129e11a16ba0f9ea.1307292171.git.luto@mit.edu \
--to=luto@mit.edu \
--cc=JBeulich@novell.com \
--cc=Louis.Rilling@kerlabs.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=arjan@infradead.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=jj@chaosbits.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mikpe@it.uu.se \
--cc=mingo@elte.hu \
--cc=pageexec@freemail.hu \
--cc=richard.weinberger@gmail.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).