linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@MIT.EDU>
To: Ingo Molnar <mingo@elte.hu>, x86@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Jesper Juhl <jj@chaosbits.net>,
	Borislav Petkov <bp@alien8.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Jan Beulich <JBeulich@novell.com>,
	richard -rw- weinberger <richard.weinberger@gmail.com>,
	Mikael Pettersson <mikpe@it.uu.se>,
	Andi Kleen <andi@firstfloor.org>, Brian Gerst <brgerst@gmail.com>,
	Louis Rilling <Louis.Rilling@kerlabs.com>,
	Valdis.Kletnieks@vt.edu, pageexec@freemail.hu,
	Andy Lutomirski <luto@MIT.EDU>
Subject: [PATCH v5 2/9] x86-64: Document some of entry_64.S
Date: Sun,  5 Jun 2011 13:50:18 -0400	[thread overview]
Message-ID: <fc134867cc550977cc996866129e11a16ba0f9ea.1307292171.git.luto@mit.edu> (raw)
In-Reply-To: <cover.1307292171.git.luto@mit.edu>
In-Reply-To: <cover.1307292171.git.luto@mit.edu>

Signed-off-by: Andy Lutomirski <luto@mit.edu>
---
 Documentation/x86/entry_64.txt |   98 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/entry_64.S     |    2 +
 2 files changed, 100 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/x86/entry_64.txt

diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
new file mode 100644
index 0000000..7869f14
--- /dev/null
+++ b/Documentation/x86/entry_64.txt
@@ -0,0 +1,98 @@
+This file documents some of the kernel entries in
+arch/x86/kernel/entry_64.S.  A lot of this explanation is adapted from
+an email from Ingo Molnar:
+
+http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
+
+The x86 architecture has quite a few different ways to jump into
+kernel code.  Most of these entry points are registered in
+arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S
+and arch/x86/ia32/ia32entry.S.
+
+The IDT vector assignments are listed in arch/x86/include/irq_vectors.h.
+
+Some of these entries are:
+
+ - system_call: syscall instruction from 64-bit code.
+
+ - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
+   either way.
+
+ - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit
+   code
+
+ - interrupt: An array of entries.  Every IDT vector that doesn't
+   explicitly point somewhere else gets set to the corresponding
+   value in interrupts.  These point to a whole array of
+   magically-generated functions that make their way to do_IRQ with
+   the interrupt number as a parameter.
+
+ - emulate_vsyscall: int 0xcc, a special non-ABI entry used by
+   vsyscall emulation.
+
+ - APIC interrupts: Various special-purpose interrupts for things
+   like TLB shootdown.
+
+ - Architecturally-defined exceptions like divide_error.
+
+There are a few complexities here.  The different x86-64 entries
+have different calling conventions.  The syscall and sysenter
+instructions have their own peculiar calling conventions.  Some of
+the IDT entries push an error code onto the stack; others don't.
+IDT entries using the IST alternative stack mechanism need their own
+magic to get the stack frames right.  (You can find some
+documentation in the AMD APM, Volume 2, Chapter 8 and the Intel SDM,
+Volume 3, Chapter 6.)
+
+Dealing with the swapgs instruction is especially tricky.  Swapgs
+toggles whether gs is the kernel gs or the user gs.  The swapgs
+instruction is rather fragile: it must nest perfectly and only in
+single depth, it should only be used if entering from user mode to
+kernel mode and then when returning to user-space, and precisely
+so. If we mess that up even slightly, we crash.
+
+So when we have a secondary entry, already in kernel mode, we *must
+not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
+not switched/swapped yet.
+
+Now, there's a secondary complication: there's a cheap way to test
+which mode the CPU is in and an expensive way.
+
+The cheap way is to pick this info off the entry frame on the kernel
+stack, from the CS of the ptregs area of the kernel stack:
+
+	xorl %ebx,%ebx
+	testl $3,CS+8(%rsp)
+	je error_kernelspace
+	SWAPGS
+
+The expensive (paranoid) way is to read back the MSR_GS_BASE value
+(which is what SWAPGS modifies):
+
+	movl $1,%ebx
+	movl $MSR_GS_BASE,%ecx
+	rdmsr
+	testl %edx,%edx
+	js 1f   /* negative -> in kernel */
+	SWAPGS
+	xorl %ebx,%ebx
+1:	ret
+
+and the whole paranoid non-paranoid macro complexity is about whether
+to suffer that RDMSR cost.
+
+If we are at an interrupt or user-trap/gate-alike boundary then we can
+use the faster check: the stack will be a reliable indicator of
+whether SWAPGS was already done: if we see that we are a secondary
+entry interrupting kernel mode execution, then we know that the GS
+base has already been switched. If it says that we interrupted
+user-space execution then we must do the SWAPGS.
+
+But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
+which might have triggered right after a normal entry wrote CS to the
+stack but before we executed SWAPGS, then the only safe way to check
+for GS is the slower method: the RDMSR.
+
+So we try only to mark those entry methods 'paranoid' that absolutely
+need the more expensive check for the GS base - and we generate all
+'normal' entry points with the regular (faster) entry macros.
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 8a445a0..72c4a77 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -9,6 +9,8 @@
 /*
  * entry.S contains the system-call and fault low-level handling routines.
  *
+ * Some of this is documented in Documentation/x86/entry_64.txt
+ *
  * NOTE: This code handles signal-recognition, which happens every time
  * after an interrupt and after each system call.
  *
-- 
1.7.5.2


  parent reply	other threads:[~2011-06-05 18:04 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-05 17:50 [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 1/9] x86-64: Fix alignment of jiffies variable Andy Lutomirski
2011-06-06  8:31   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` Andy Lutomirski [this message]
2011-06-06  8:31   ` [tip:x86/vdso] x86-64: Document some of entry_64.S tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 3/9] x86-64: Give vvars their own page Andy Lutomirski
2011-06-06  8:32   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 4/9] x86-64: Remove kernel.vsyscall64 sysctl Andy Lutomirski
2011-06-06  8:32   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-12-05 18:27   ` [PATCH v5 4/9] " Matthew Maurer
2011-06-05 17:50 ` [PATCH v5 5/9] x86-64: Map the HPET NX Andy Lutomirski
2011-06-06  8:33   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 6/9] x86-64: Remove vsyscall number 3 (venosys) Andy Lutomirski
2011-06-06  8:33   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 7/9] x86-64: Fill unused parts of the vsyscall page with 0xcc Andy Lutomirski
2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls Andy Lutomirski
2011-06-05 19:30   ` Ingo Molnar
2011-06-05 20:01     ` Andrew Lutomirski
2011-06-06  7:39       ` Ingo Molnar
2011-06-06  9:42       ` pageexec
2011-06-06 11:19         ` Andrew Lutomirski
2011-06-06 11:56           ` pageexec
2011-06-06 12:43             ` Andrew Lutomirski
2011-06-06 13:58               ` pageexec
2011-06-06 14:07                 ` Brian Gerst
2011-06-07 23:32                   ` pageexec
2011-06-07 23:49                     ` Andrew Lutomirski
2011-06-08  6:32                       ` pageexec
2011-06-06 15:26                 ` Ingo Molnar
2011-06-06 15:48                   ` pageexec
2011-06-06 15:59                     ` Ingo Molnar
2011-06-06 16:19                       ` pageexec
2011-06-06 16:47                         ` Ingo Molnar
2011-06-06 22:49                           ` pageexec
2011-06-06 22:57                             ` david
2011-06-07  9:07                               ` Ingo Molnar
2011-06-07  6:59                             ` Pekka Enberg
2011-06-07  8:30                             ` Ingo Molnar
2011-06-07 23:24                               ` pageexec
2011-06-08  5:55                                 ` Pekka Enberg
2011-06-08  6:19                                   ` pageexec
2011-06-08  6:48                                 ` Ingo Molnar
2011-06-08  9:02                                   ` pageexec
2011-06-08  9:11                                     ` Andi Kleen
2011-06-08  9:35                                       ` pageexec
2011-06-08 10:06                                         ` Andi Kleen
2011-06-08 10:26                                           ` pageexec
2011-06-08 10:39                                             ` Ingo Molnar
2011-06-08 10:35                                           ` Ingo Molnar
2011-06-08  9:15                                     ` Ingo Molnar
2011-06-08  7:16                                 ` Ingo Molnar
2011-06-08  9:29                                   ` pageexec
2011-06-06 14:01             ` Linus Torvalds
2011-06-06 14:55               ` pageexec
2011-06-06 15:33                 ` Ingo Molnar
2011-06-06 15:58                   ` pageexec
2011-06-06 15:41         ` Ingo Molnar
2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-06  8:35   ` [tip:x86/vdso] x86-64, vdso, seccomp: Fix !CONFIG_SECCOMP build tip-bot for Ingo Molnar
2011-06-07  7:49   ` [tip:x86/vdso] x86-64: Emulate legacy vsyscalls tip-bot for Andy Lutomirski
2011-06-07  8:03   ` tip-bot for Andy Lutomirski
2011-06-05 17:50 ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule Andy Lutomirski
2011-06-06  8:34   ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2011-06-06  8:46   ` [PATCH v5 9/9] " Linus Torvalds
2011-06-06  9:31     ` Andi Kleen
2011-06-06 10:39       ` pageexec
2011-06-06 13:56         ` Linus Torvalds
2011-06-06 18:46           ` pageexec
2011-06-06 20:40             ` Linus Torvalds
2011-06-06 20:51               ` Andrew Lutomirski
2011-06-06 21:54                 ` Ingo Molnar
2011-06-06 21:45               ` Ingo Molnar
2011-06-06 21:48                 ` Ingo Molnar
     [not found]                 ` <BANLkTi==uw_h78oaep1cCOCzwY0edLUU_Q@mail.gmail.com>
2011-06-07  8:03                   ` [PATCH, v6] x86-64: Emulate legacy vsyscalls Ingo Molnar
2011-06-06 21:53               ` [PATCH v5 9/9] x86-64: Add CONFIG_UNSAFE_VSYSCALLS to feature-removal-schedule pageexec
2011-06-06 14:44         ` Ingo Molnar
2011-06-06 15:01           ` pageexec
2011-06-06 15:15             ` Ingo Molnar
2011-06-06 15:29               ` pageexec
2011-06-06 16:54                 ` Ingo Molnar
2011-06-06 18:59           ` pageexec
2011-06-06 19:25             ` Ingo Molnar
2011-06-07  0:34               ` pageexec
2011-06-07  9:51                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-10 11:19                     ` Ingo Molnar
2011-06-14  0:48                       ` pageexec
2011-06-15 19:42                         ` Valdis.Kletnieks
2011-06-06 14:52         ` Ingo Molnar
2011-06-06 10:24     ` [PATCH] x86-64, vsyscalls: Rename UNSAFE_VSYSCALLS to COMPAT_VSYSCALLS Ingo Molnar
2011-06-06 11:20       ` pageexec
2011-06-06 12:47         ` Ingo Molnar
2011-06-06 12:48           ` Ingo Molnar
2011-06-06 18:04           ` pageexec
2011-06-06 19:12             ` Ingo Molnar
2011-06-07  0:02               ` pageexec
2011-06-07  9:56                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-09  6:48                     ` Ingo Molnar
2011-06-09 23:33                       ` pageexec
2011-06-07 10:05                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-09  7:02                     ` Ingo Molnar
2011-06-09 23:33                       ` pageexec
2011-06-07 10:13                 ` Ingo Molnar
2011-06-07 23:24                   ` pageexec
2011-06-06 12:19       ` Ted Ts'o
2011-06-06 12:33         ` Andrew Lutomirski
2011-06-06 12:37         ` Ingo Molnar
2011-06-06 14:34     ` [tip:x86/vdso] " tip-bot for Ingo Molnar
2011-06-05 20:05 ` [PATCH v5 0/9] Remove syscall instructions at fixed addresses Andrew Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc134867cc550977cc996866129e11a16ba0f9ea.1307292171.git.luto@mit.edu \
    --to=luto@mit.edu \
    --cc=JBeulich@novell.com \
    --cc=Louis.Rilling@kerlabs.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arjan@infradead.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=jj@chaosbits.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikpe@it.uu.se \
    --cc=mingo@elte.hu \
    --cc=pageexec@freemail.hu \
    --cc=richard.weinberger@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).