linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] x86-64 entry documentation and clean up
@ 2022-01-07 23:52 Ammar Faizi
  2022-01-07 23:52 ` [PATCH v1 1/3] x86/entry/64: Clean up spaces after the instruction Ammar Faizi
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ammar Faizi @ 2022-01-07 23:52 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: Ammar Faizi, x86-ml, lkml, GNU/Weeb Mailing List

Hi,

There are 3 patches in this series.

1) Trivial clean up in entry_64.S.
2) Add comment about registers on exit in entry_64.S.
3) Add documentation about registers on entry and exit.

(2) and (3) are based on the discussion we had at:

  https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/

This series is based on commit:

  24556728c305886b8bb05bf2ac7e20cf7db3e314 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")

Thanks!

Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
Ammar Faizi (3):
  x86/entry/64: Clean up spaces after instruction
  x86/entry/64: Add info about registers on exit
  Documentation: x86-64: Document registers on entry and exit

 Documentation/x86/entry_64.rst | 47 ++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S      | 27 ++++++++++++++-----
 2 files changed, 67 insertions(+), 7 deletions(-)


base-commit: 24556728c305886b8bb05bf2ac7e20cf7db3e314
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 1/3] x86/entry/64: Clean up spaces after the instruction
  2022-01-07 23:52 [PATCH v1 0/3] x86-64 entry documentation and clean up Ammar Faizi
@ 2022-01-07 23:52 ` Ammar Faizi
  2022-01-07 23:52 ` [PATCH v1 2/3] x86/entry/64: Add info about registers on exit Ammar Faizi
  2022-01-07 23:52 ` [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit Ammar Faizi
  2 siblings, 0 replies; 9+ messages in thread
From: Ammar Faizi @ 2022-01-07 23:52 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: Ammar Faizi, x86-ml, lkml, GNU/Weeb Mailing List

Most of lines here use a tab as a separator between the instruction
and its operand(s). But there are several parts that use spaces.

Replace these spaces with a tab for consistency.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: GNU/Weeb Mailing List <gwml@gnuweeb.org>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 arch/x86/entry/entry_64.S | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 97b1f84bb53f..e432dd075291 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -305,13 +305,13 @@ SYM_CODE_END(ret_from_fork)
 
 .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
 #ifdef CONFIG_DEBUG_ENTRY
-	pushq %rax
+	pushq	%rax
 	SAVE_FLAGS
-	testl $X86_EFLAGS_IF, %eax
-	jz .Lokay_\@
+	testl	$X86_EFLAGS_IF, %eax
+	jz	.Lokay_\@
 	ud2
 .Lokay_\@:
-	popq %rax
+	popq	%rax
 #endif
 .endm
 
@@ -749,8 +749,8 @@ SYM_CODE_START_LOCAL_NOALIGN(.Lbad_gs)
 	swapgs					/* switch back to user gs */
 .macro ZAP_GS
 	/* This can't be a string because the preprocessor needs to see it. */
-	movl $__USER_DS, %eax
-	movl %eax, %gs
+	movl	$__USER_DS, %eax
+	movl	%eax, %gs
 .endm
 	ALTERNATIVE "", "ZAP_GS", X86_BUG_NULL_SEG
 	xorl	%eax, %eax
@@ -1135,7 +1135,7 @@ SYM_CODE_START(asm_exc_nmi)
 	pushq	2*8(%rdx)	/* pt_regs->cs */
 	pushq	1*8(%rdx)	/* pt_regs->rip */
 	UNWIND_HINT_IRET_REGS
-	pushq   $-1		/* pt_regs->orig_ax */
+	pushq	$-1		/* pt_regs->orig_ax */
 	PUSH_AND_CLEAR_REGS rdx=(%rdx)
 	ENCODE_FRAME_POINTER
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v1 2/3] x86/entry/64: Add info about registers on exit
  2022-01-07 23:52 [PATCH v1 0/3] x86-64 entry documentation and clean up Ammar Faizi
  2022-01-07 23:52 ` [PATCH v1 1/3] x86/entry/64: Clean up spaces after the instruction Ammar Faizi
@ 2022-01-07 23:52 ` Ammar Faizi
  2022-01-08  0:03   ` Andy Lutomirski
  2022-01-07 23:52 ` [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit Ammar Faizi
  2 siblings, 1 reply; 9+ messages in thread
From: Ammar Faizi @ 2022-01-07 23:52 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: Ammar Faizi, x86-ml, lkml, GNU/Weeb Mailing List, Michael Matz,
	H.J. Lu, Willy Tarreau

There was a controversial discussion about the wording in the System
V ABI document regarding what registers the kernel is allowed to
clobber when the userspace executes syscall.

The resolution of the discussion was reviewing the clobber list in
the glibc source. For a historical reason in the glibc source, the
kernel must restore all registers before returning to the userspace
(except for rax, rcx and r11).

Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/
Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25

This adds info about registers on exit.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michael Matz <matz@suse.de>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: GNU/Weeb Mailing List <gwml@gnuweeb.org>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Quoted the full comment in that file after patched, so it's easier to
review:
/*
 * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
 *
 * This is the only entry point used for 64-bit system calls.  The
 * hardware interface is reasonably well designed and the register to
 * argument mapping Linux uses fits well with the registers that are
 * available when SYSCALL is used.
 *
 * SYSCALL instructions can be found inlined in libc implementations as
 * well as some other programs and libraries.  There are also a handful
 * of SYSCALL instructions in the vDSO used, for example, as a
 * clock_gettimeofday fallback.
 *
 * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
 * then loads new ss, cs, and rip from previously programmed MSRs.
 * rflags gets masked by a value from another MSR (so CLD and CLAC
 * are not needed). SYSCALL does not save anything on the stack
 * and does not change rsp.
 *
 * Registers on entry:
 * rax  system call number
 * rcx  return address
 * r11  saved rflags (note: r11 is callee-clobbered register in C ABI)
 * rdi  arg0
 * rsi  arg1
 * rdx  arg2
 * r10  arg3 (needs to be moved to rcx to conform to C ABI)
 * r8   arg4
 * r9   arg5
 * (note: r12-r15, rbp, rbx are callee-preserved in C ABI)
 *
 * Only called from user space.
 *
 * Registers on exit:
 * rax  syscall return value
 * rcx  return address
 * r11  rflags
 *
 * For a historical reason in the glibc source, the kernel must restore all
 * registers except the rax (syscall return value) before returning to the
 * userspace.
 *
 * In other words, with respect to the userspace, when the kernel returns
 * to the userspace, only 3 registers are clobbered, they are rax, rcx,
 * and r11.
 *
 * When user can change pt_regs->foo always force IRET. That is because
 * it deals with uncanonical addresses better. SYSRET has trouble
 * with them due to bugs in both AMD and Intel CPUs.
 */

---

 arch/x86/entry/entry_64.S | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e432dd075291..1111fff2e05f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -79,6 +79,19 @@
  *
  * Only called from user space.
  *
+ * Registers on exit:
+ * rax  syscall return value
+ * rcx  return address
+ * r11  rflags
+ *
+ * For a historical reason in the glibc source, the kernel must restore all
+ * registers except the rax (syscall return value) before returning to the
+ * userspace.
+ *
+ * In other words, with respect to the userspace, when the kernel returns
+ * to the userspace, only 3 registers are clobbered, they are rax, rcx,
+ * and r11.
+ *
  * When user can change pt_regs->foo always force IRET. That is because
  * it deals with uncanonical addresses better. SYSRET has trouble
  * with them due to bugs in both AMD and Intel CPUs.
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit
  2022-01-07 23:52 [PATCH v1 0/3] x86-64 entry documentation and clean up Ammar Faizi
  2022-01-07 23:52 ` [PATCH v1 1/3] x86/entry/64: Clean up spaces after the instruction Ammar Faizi
  2022-01-07 23:52 ` [PATCH v1 2/3] x86/entry/64: Add info about registers on exit Ammar Faizi
@ 2022-01-07 23:52 ` Ammar Faizi
  2022-01-08  0:02   ` Andy Lutomirski
  2 siblings, 1 reply; 9+ messages in thread
From: Ammar Faizi @ 2022-01-07 23:52 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: Ammar Faizi, x86-ml, lkml, GNU/Weeb Mailing List, Michael Matz,
	H.J. Lu, Jonathan Corbet, Willy Tarreau

There was a controversial discussion about the wording in the System
V ABI document regarding what registers the kernel is allowed to
clobber when the userspace executes syscall.

The resolution of the discussion was reviewing the clobber list in
the glibc source. For a historical reason in the glibc source, the
kernel must restore all registers before returning to the userspace
(except for rax, rcx and r11).

On Wed, 13 Oct 2021 at 16:24:28 +0000, Michael Matz <matz@suse.de> wrote:
> It might also be interesting to know that while the wording of the psABI
> was indeed intended to imply that all argument registers are potentially
> clobbered (like with normal calls) glibc's inline assembler to call
> syscalls relies on most registers to actually be preserved:
>
> # define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"
> ...
> #define internal_syscall6(number, arg1, arg2, arg3, arg4, arg5, arg6) \
> ({                                                                      \
>     unsigned long int resultvar;                                        \
>     TYPEFY (arg6, __arg6) = ARGIFY (arg6);                              \
>     TYPEFY (arg5, __arg5) = ARGIFY (arg5);                              \
>     TYPEFY (arg4, __arg4) = ARGIFY (arg4);                              \
>     TYPEFY (arg3, __arg3) = ARGIFY (arg3);                              \
>     TYPEFY (arg2, __arg2) = ARGIFY (arg2);                              \
>     TYPEFY (arg1, __arg1) = ARGIFY (arg1);                              \
>     register TYPEFY (arg6, _a6) asm ("r9") = __arg6;                    \
>     register TYPEFY (arg5, _a5) asm ("r8") = __arg5;                    \
>     register TYPEFY (arg4, _a4) asm ("r10") = __arg4;                   \
>     register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;                   \
>     register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;                   \
>     register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;                   \
>     asm volatile (                                                      \
>     "syscall\n\t"                                                       \
>     : "=a" (resultvar)                                                  \
>     : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),         \
>       "r" (_a5), "r" (_a6)                                              \
>     : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);                        \
>     (long int) resultvar;                                               \
> })
>
>
> Note in particular the missing clobbers or outputs of any of the argument
> regs.
>
> So, even though the psABI (might have) meant something else, as glibc is
> doing the above we in fact have a de-facto standard that the kernel can't
> clobber any of the argument regs.  The wording and the linux x86-64
> syscall implementation (and use in glibc) all come from the same time in
> 2001, so there never was a time when the kernel was not saving/restoring
> the arg registers, so it can't stop now.
>
> In effect this means the psABI should be clarified to explicitely say the
> the arg registers aren't clobbered, i.e. that the mentioned list of
> clobbered regs isn't inclusive but exclusive.  I will do that.
>
> When I was discussing this with Boris earlier I hadn't yet looked at glibc
> use but only gave my interpretation from memory and reading.  Obviously
> reality trumps anything like that :-)

Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/
Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25

This documents "registers on entry" and "registers on exit".

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michael Matz <matz@suse.de>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Willy Tarreau <w@1wt.eu>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: GNU/Weeb Mailing List <gwml@gnuweeb.org>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 Documentation/x86/entry_64.rst | 47 ++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
index e433e08f7018..3f2007e2a938 100644
--- a/Documentation/x86/entry_64.rst
+++ b/Documentation/x86/entry_64.rst
@@ -108,3 +108,50 @@ We try to only use IST entries and the paranoid entry code for vectors
 that absolutely need the more expensive check for the GS base - and we
 generate all 'normal' entry points with the regular (faster) paranoid=0
 variant.
+
+
+Registers on entry:
+-------------------
+
+  - rax  system call number
+  - rcx  return address
+  - r11  saved rflags (note: r11 is callee-clobbered register in C ABI)
+  - rdi  arg0
+  - rsi  arg1
+  - rdx  arg2
+  - r10  arg3 (needs to be moved to rcx to conform to C ABI)
+  - r8   arg4
+  - r9   arg5
+
+
+Registers on exit:
+------------------
+
+This explanation is adapted from a controversial discussion about the
+wording in the System V ABI document regarding what registers the
+kernel is allowed to clobber when the userspace executes syscall.
+
+The resolution of the discussion was reviewing the clobber list in the
+glibc source. For a historical reason in the glibc source, the kernel
+must restore all registers before returning to the userspace (except
+for rax, rcx and r11).
+
+For the detailed story, see the email from Michael Matz:
+
+https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/
+
+The kernel saves/restores all registers, so the userspace using it can
+assume the value of all registers are still intact, except for the rax,
+rcx, and r11.
+
+When the kernel returns to the userspace, only 3 registers are
+clobbered:
+
+  - rax  syscall return value
+  - rcx  return address
+  - r11  rflags
+
+The kernel syscall entry actually also saves/restores rcx and r11, but
+at that point, they have already been overwritten by the syscall
+instruction itself with the userspace rip and rflags value, they will
+be used by the sysret instruction when returning to the userspace.
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit
  2022-01-07 23:52 ` [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit Ammar Faizi
@ 2022-01-08  0:02   ` Andy Lutomirski
  2022-01-08  0:38     ` Ammar Faizi
  2022-01-21 13:32     ` Borislav Petkov
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Lutomirski @ 2022-01-08  0:02 UTC (permalink / raw)
  To: Ammar Faizi, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: x86-ml, lkml, GNU/Weeb Mailing List, Michael Matz, H.J. Lu,
	Jonathan Corbet, Willy Tarreau

On 1/7/22 15:52, Ammar Faizi wrote:
> There was a controversial discussion about the wording in the System
> V ABI document regarding what registers the kernel is allowed to
> clobber when the userspace executes syscall.
> 
> The resolution of the discussion was reviewing the clobber list in
> the glibc source. For a historical reason in the glibc source, the
> kernel must restore all registers before returning to the userspace
> (except for rax, rcx and r11).
> 
> On Wed, 13 Oct 2021 at 16:24:28 +0000, Michael Matz <matz@suse.de> wrote:
>> It might also be interesting to know that while the wording of the psABI
>> was indeed intended to imply that all argument registers are potentially
>> clobbered (like with normal calls) glibc's inline assembler to call
>> syscalls relies on most registers to actually be preserved:
>>
>> # define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"
>> ...
>> #define internal_syscall6(number, arg1, arg2, arg3, arg4, arg5, arg6) \
>> ({                                                                      \
>>      unsigned long int resultvar;                                        \
>>      TYPEFY (arg6, __arg6) = ARGIFY (arg6);                              \
>>      TYPEFY (arg5, __arg5) = ARGIFY (arg5);                              \
>>      TYPEFY (arg4, __arg4) = ARGIFY (arg4);                              \
>>      TYPEFY (arg3, __arg3) = ARGIFY (arg3);                              \
>>      TYPEFY (arg2, __arg2) = ARGIFY (arg2);                              \
>>      TYPEFY (arg1, __arg1) = ARGIFY (arg1);                              \
>>      register TYPEFY (arg6, _a6) asm ("r9") = __arg6;                    \
>>      register TYPEFY (arg5, _a5) asm ("r8") = __arg5;                    \
>>      register TYPEFY (arg4, _a4) asm ("r10") = __arg4;                   \
>>      register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;                   \
>>      register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;                   \
>>      register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;                   \
>>      asm volatile (                                                      \
>>      "syscall\n\t"                                                       \
>>      : "=a" (resultvar)                                                  \
>>      : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),         \
>>        "r" (_a5), "r" (_a6)                                              \
>>      : "memory", REGISTERS_CLOBBERED_BY_SYSCALL);                        \
>>      (long int) resultvar;                                               \
>> })
>>
>>
>> Note in particular the missing clobbers or outputs of any of the argument
>> regs.
>>
>> So, even though the psABI (might have) meant something else, as glibc is
>> doing the above we in fact have a de-facto standard that the kernel can't
>> clobber any of the argument regs.  The wording and the linux x86-64
>> syscall implementation (and use in glibc) all come from the same time in
>> 2001, so there never was a time when the kernel was not saving/restoring
>> the arg registers, so it can't stop now.
>>
>> In effect this means the psABI should be clarified to explicitely say the
>> the arg registers aren't clobbered, i.e. that the mentioned list of
>> clobbered regs isn't inclusive but exclusive.  I will do that.
>>
>> When I was discussing this with Boris earlier I hadn't yet looked at glibc
>> use but only gave my interpretation from memory and reading.  Obviously
>> reality trumps anything like that :-)
> 
> Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/
> Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25
> 
> This documents "registers on entry" and "registers on exit".
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Michael Matz <matz@suse.de>
> Cc: "H.J. Lu" <hjl.tools@gmail.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Willy Tarreau <w@1wt.eu>
> Cc: x86-ml <x86@kernel.org>
> Cc: lkml <linux-kernel@vger.kernel.org>
> Cc: GNU/Weeb Mailing List <gwml@gnuweeb.org>
> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> ---
>   Documentation/x86/entry_64.rst | 47 ++++++++++++++++++++++++++++++++++
>   1 file changed, 47 insertions(+)
> 
> diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
> index e433e08f7018..3f2007e2a938 100644
> --- a/Documentation/x86/entry_64.rst
> +++ b/Documentation/x86/entry_64.rst
> @@ -108,3 +108,50 @@ We try to only use IST entries and the paranoid entry code for vectors
>   that absolutely need the more expensive check for the GS base - and we
>   generate all 'normal' entry points with the regular (faster) paranoid=0
>   variant.
> +
> +
> +Registers on entry:
> +-------------------

This is SYSCALL64 registers on entry, not general registers on entry. 
Also, this has little to do with the entry logic, so it probably doesn't 
belong in this file.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/3] x86/entry/64: Add info about registers on exit
  2022-01-07 23:52 ` [PATCH v1 2/3] x86/entry/64: Add info about registers on exit Ammar Faizi
@ 2022-01-08  0:03   ` Andy Lutomirski
  2022-01-08  0:34     ` Ammar Faizi
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2022-01-08  0:03 UTC (permalink / raw)
  To: Ammar Faizi, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: x86-ml, lkml, GNU/Weeb Mailing List, Michael Matz, H.J. Lu,
	Willy Tarreau

On 1/7/22 15:52, Ammar Faizi wrote:
> There was a controversial discussion about the wording in the System
> V ABI document regarding what registers the kernel is allowed to
> clobber when the userspace executes syscall.
> 
> The resolution of the discussion was reviewing the clobber list in
> the glibc source. For a historical reason in the glibc source, the
> kernel must restore all registers before returning to the userspace
> (except for rax, rcx and r11).
> 
> Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/
> Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25
> 
> This adds info about registers on exit.
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Michael Matz <matz@suse.de>
> Cc: "H.J. Lu" <hjl.tools@gmail.com>
> Cc: Willy Tarreau <w@1wt.eu>
> Cc: x86-ml <x86@kernel.org>
> Cc: lkml <linux-kernel@vger.kernel.org>
> Cc: GNU/Weeb Mailing List <gwml@gnuweeb.org>
> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> ---
> 
> Quoted the full comment in that file after patched, so it's easier to
> review:
> /*
>   * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
>   *
>   * This is the only entry point used for 64-bit system calls.  The
>   * hardware interface is reasonably well designed and the register to
>   * argument mapping Linux uses fits well with the registers that are
>   * available when SYSCALL is used.
>   *
>   * SYSCALL instructions can be found inlined in libc implementations as
>   * well as some other programs and libraries.  There are also a handful
>   * of SYSCALL instructions in the vDSO used, for example, as a
>   * clock_gettimeofday fallback.
>   *
>   * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
>   * then loads new ss, cs, and rip from previously programmed MSRs.
>   * rflags gets masked by a value from another MSR (so CLD and CLAC
>   * are not needed). SYSCALL does not save anything on the stack
>   * and does not change rsp.
>   *
>   * Registers on entry:
>   * rax  system call number
>   * rcx  return address
>   * r11  saved rflags (note: r11 is callee-clobbered register in C ABI)
>   * rdi  arg0
>   * rsi  arg1
>   * rdx  arg2
>   * r10  arg3 (needs to be moved to rcx to conform to C ABI)
>   * r8   arg4
>   * r9   arg5
>   * (note: r12-r15, rbp, rbx are callee-preserved in C ABI)
>   *
>   * Only called from user space.
>   *
>   * Registers on exit:
>   * rax  syscall return value
>   * rcx  return address
>   * r11  rflags
>   *
>   * For a historical reason in the glibc source, the kernel must restore all
>   * registers except the rax (syscall return value) before returning to the
>   * userspace.
>   *
>   * In other words, with respect to the userspace, when the kernel returns
>   * to the userspace, only 3 registers are clobbered, they are rax, rcx,
>   * and r11.
>   *
>   * When user can change pt_regs->foo always force IRET. That is because
>   * it deals with uncanonical addresses better. SYSRET has trouble
>   * with them due to bugs in both AMD and Intel CPUs.
>   */
> 
> ---
> 
>   arch/x86/entry/entry_64.S | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index e432dd075291..1111fff2e05f 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -79,6 +79,19 @@
>    *
>    * Only called from user space.
>    *
> + * Registers on exit:
> + * rax  syscall return value
> + * rcx  return address
> + * r11  rflags
> + *
> + * For a historical reason in the glibc source, the kernel must restore all
> + * registers except the rax (syscall return value) before returning to the
> + * userspace.
> + *
> + * In other words, with respect to the userspace, when the kernel returns
> + * to the userspace, only 3 registers are clobbered, they are rax, rcx,
> + * and r11.
> + *

I would say this much more concisely:

The Linux kernel preserves all registers (even C callee-clobbered 
registers) except for rax, rcx and r11 across system calls, and existing 
user code relies on this behavior.

>    * When user can change pt_regs->foo always force IRET. That is because
>    * it deals with uncanonical addresses better. SYSRET has trouble
>    * with them due to bugs in both AMD and Intel CPUs.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/3] x86/entry/64: Add info about registers on exit
  2022-01-08  0:03   ` Andy Lutomirski
@ 2022-01-08  0:34     ` Ammar Faizi
  0 siblings, 0 replies; 9+ messages in thread
From: Ammar Faizi @ 2022-01-08  0:34 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: x86-ml, lkml, GNU/Weeb Mailing List, Michael Matz, H.J. Lu,
	Willy Tarreau


[-- Attachment #1.1: Type: text/plain, Size: 2365 bytes --]

On 1/8/22 7:03 AM, Andy Lutomirski wrote:
> On 1/7/22 15:52, Ammar Faizi wrote:
>> There was a controversial discussion about the wording in the System
>> V ABI document regarding what registers the kernel is allowed to
>> clobber when the userspace executes syscall.
>>
>> The resolution of the discussion was reviewing the clobber list in
>> the glibc source. For a historical reason in the glibc source, the
>> kernel must restore all registers before returning to the userspace
>> (except for rax, rcx and r11).
>>
>> Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/
>> Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25
>>
>> This adds info about registers on exit.
>>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Michael Matz <matz@suse.de>
>> Cc: "H.J. Lu" <hjl.tools@gmail.com>
>> Cc: Willy Tarreau <w@1wt.eu>
>> Cc: x86-ml <x86@kernel.org>
>> Cc: lkml <linux-kernel@vger.kernel.org>
>> Cc: GNU/Weeb Mailing List <gwml@gnuweeb.org>
>> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
>> ---
[...]
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index e432dd075291..1111fff2e05f 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -79,6 +79,19 @@
>>    *
>>    * Only called from user space.
>>    *
>> + * Registers on exit:
>> + * rax  syscall return value
>> + * rcx  return address
>> + * r11  rflags
>> + *
>> + * For a historical reason in the glibc source, the kernel must restore all
>> + * registers except the rax (syscall return value) before returning to the
>> + * userspace.
>> + *
>> + * In other words, with respect to the userspace, when the kernel returns
>> + * to the userspace, only 3 registers are clobbered, they are rax, rcx,
>> + * and r11.
>> + *
> 
> I would say this much more concisely:
> 
> The Linux kernel preserves all registers (even C callee-clobbered
> registers) except for rax, rcx and r11 across system calls, and
> existing user code relies on this behavior.

Agree, I will take that as Suggested-by in the v2.

-- 
Ammar Faizi


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit
  2022-01-08  0:02   ` Andy Lutomirski
@ 2022-01-08  0:38     ` Ammar Faizi
  2022-01-21 13:32     ` Borislav Petkov
  1 sibling, 0 replies; 9+ messages in thread
From: Ammar Faizi @ 2022-01-08  0:38 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin
  Cc: x86-ml, lkml, GNU/Weeb Mailing List, Michael Matz, H.J. Lu,
	Jonathan Corbet, Willy Tarreau


[-- Attachment #1.1: Type: text/plain, Size: 1615 bytes --]

On 1/8/22 7:02 AM, Andy Lutomirski wrote:
> On 1/7/22 15:52, Ammar Faizi wrote:
>> There was a controversial discussion about the wording in the System
>> V ABI document regarding what registers the kernel is allowed to
>> clobber when the userspace executes syscall.
>>
>> The resolution of the discussion was reviewing the clobber list in
>> the glibc source. For a historical reason in the glibc source, the
>> kernel must restore all registers before returning to the userspace
>> (except for rax, rcx and r11).
>>
[...]
>> diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
>> index e433e08f7018..3f2007e2a938 100644
>> --- a/Documentation/x86/entry_64.rst
>> +++ b/Documentation/x86/entry_64.rst
>> @@ -108,3 +108,50 @@ We try to only use IST entries and the paranoid entry code for vectors
>>   that absolutely need the more expensive check for the GS base - and we
>>   generate all 'normal' entry points with the regular (faster) paranoid=0
>>   variant.
>> +
>> +
>> +Registers on entry:
>> +-------------------
>
> This is SYSCALL64 registers on entry, not general registers on entry.
> Also, this has little to do with the entry logic, so it probably
> doesn't belong in this file.

Ah right, I should be more specific saying it's for syscall64 entry
as there are 6 entries mentioned in this document.

Should syscall64 entry topic be documented? If not I will drop it,
otherwise suggest me a place for it.

I think we can document it here, but it needs to be more specific
saying it's for syscall64 entry.

Jonathan?

-- 
Ammar Faizi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit
  2022-01-08  0:02   ` Andy Lutomirski
  2022-01-08  0:38     ` Ammar Faizi
@ 2022-01-21 13:32     ` Borislav Petkov
  1 sibling, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2022-01-21 13:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ammar Faizi, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, x86-ml, lkml, GNU/Weeb Mailing List,
	Michael Matz, H.J. Lu, Jonathan Corbet, Willy Tarreau

On Fri, Jan 07, 2022 at 04:02:27PM -0800, Andy Lutomirski wrote:
> This is SYSCALL64 registers on entry, not general registers on entry. Also,
> this has little to do with the entry logic, so it probably doesn't belong in
> this file.

Right, except that syscall is also a kernel entry point so it kinda
belongs in a documentation file called "entry". :)

Srsly, I'd really like to keep the note about which registers glibc
considers clobbered and which not, documented somewhere as that is
practically an ABI which is not (yet) in the psABI doc.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-21 13:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-07 23:52 [PATCH v1 0/3] x86-64 entry documentation and clean up Ammar Faizi
2022-01-07 23:52 ` [PATCH v1 1/3] x86/entry/64: Clean up spaces after the instruction Ammar Faizi
2022-01-07 23:52 ` [PATCH v1 2/3] x86/entry/64: Add info about registers on exit Ammar Faizi
2022-01-08  0:03   ` Andy Lutomirski
2022-01-08  0:34     ` Ammar Faizi
2022-01-07 23:52 ` [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit Ammar Faizi
2022-01-08  0:02   ` Andy Lutomirski
2022-01-08  0:38     ` Ammar Faizi
2022-01-21 13:32     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).