linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: "Xing, Cedric" <cedric.xing@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>,
	linux-sgx@vger.kernel.org
Subject: Re: [PATCH for_v23 16/16] x86/vdso: sgx: Rework __vdso_sgx_enter_enclave() to prefer "no callback"
Date: Wed, 9 Oct 2019 12:10:03 -0700	[thread overview]
Message-ID: <20191009191003.GD19952@linux.intel.com> (raw)
In-Reply-To: <15c03e7e-cd98-5ea2-82a1-a11bec4abe2a@intel.com>

On Wed, Oct 09, 2019 at 11:00:55AM -0700, Xing, Cedric wrote:
> On 10/7/2019 9:46 PM, Sean Christopherson wrote:
> >Rework __vdso_sgx_enter_enclave() to prioritize the flow where userspace
> >is not providing a callback, which is the preferred method of operation.
> 
> Processors have branch predictors so your "prioritizing" may not get what
> your want.
> 
> But if you still insist, a simple ht/hnt prefixing the original branch
> instruction would have sufficed. Rearrangement of code blocks is indeed
> unnecessary.

Tying into the other thread, it's not a matter of absolute necessity, it's
a matter of us providing the best code possible.

> A caveat though, for any given process, whether it supplies a callback or
> not is usually hard-coded. So either it always takes the callback path, or
> it always takes the other. And the branch predictor will do well in both
> cases. It's usually unwise to apply ht/hnt prefixes.

Readability is the primary concern, performance is secondary concern.  For
a random joe user, this:

   0x0000000000000a20 <+0>:	push   %rbp
   0x0000000000000a21 <+1>:	mov    %rsp,%rbp
   0x0000000000000a24 <+4>:	cmp    $0x2,%eax
   0x0000000000000a27 <+7>:	jb     0xa46 <__vdso_sgx_enter_enclave+38>
   0x0000000000000a29 <+9>:	cmp    $0x3,%eax
   0x0000000000000a2c <+12>:	ja     0xa46 <__vdso_sgx_enter_enclave+38>
   0x0000000000000a2e <+14>:	mov    0x10(%rbp),%rbx
   0x0000000000000a32 <+18>:	lea    0x0(%rip),%rcx        # 0xa39 <__vdso_sgx_enter_enclave+25>
   0x0000000000000a39 <+25>:	enclu  
   0x0000000000000a3c <+28>:	xor    %eax,%eax
   0x0000000000000a3e <+30>:	cmpl   $0x0,0x20(%rbp)
   0x0000000000000a42 <+34>:	jne    0xa6b <__vdso_sgx_enter_enclave+75>
   0x0000000000000a44 <+36>:	leaveq 
   0x0000000000000a45 <+37>:	retq   

is easier to follow for the happy path than this:

   0x0000000000000a20 <+0>:	push   %rbp
   0x0000000000000a21 <+1>:	mov    %rsp,%rbp
   0x0000000000000a24 <+4>:	cmp    $0x2,%eax
   0x0000000000000a27 <+7>:	jb     0xa8e <__vdso_sgx_enter_enclave+110>
   0x0000000000000a29 <+9>:	cmp    $0x3,%eax
   0x0000000000000a2c <+12>:	ja     0xa8e <__vdso_sgx_enter_enclave+110>
   0x0000000000000a2e <+14>:	mov    0x10(%rbp),%rbx
   0x0000000000000a32 <+18>:	lea    0x0(%rip),%rcx        # 0xa39 <__vdso_sgx_enter_enclave+25>
   0x0000000000000a39 <+25>:	enclu  
   0x0000000000000a3c <+28>:	xor    %eax,%eax
   0x0000000000000a3e <+30>:	mov    %eax,%ecx
   0x0000000000000a40 <+32>:	mov    0x20(%rbp),%rax
   0x0000000000000a44 <+36>:	test   %rax,%rax
   0x0000000000000a47 <+39>:	je     0xa98 <__vdso_sgx_enter_enclave+120>
   0x0000000000000a49 <+41>:	mov    %rsp,%rbx
   0x0000000000000a4c <+44>:	and    $0xfffffffffffffff0,%rsp
   0x0000000000000a50 <+48>:	cld    
   0x0000000000000a51 <+49>:	pushq  0x18(%rbp)
   0x0000000000000a54 <+52>:	push   %rbx
   0x0000000000000a55 <+53>:	pushq  0x10(%rbp)
   0x0000000000000a58 <+56>:	callq  0xa62 <__vdso_sgx_enter_enclave+66>
   0x0000000000000a5d <+61>:	mov    %rbx,%rsp
   0x0000000000000a60 <+64>:	jmp    0xa24 <__vdso_sgx_enter_enclave+4>
   0x0000000000000a62 <+66>:	callq  0xa6e <__vdso_sgx_enter_enclave+78>
   0x0000000000000a67 <+71>:	pause  
   0x0000000000000a69 <+73>:	lfence 
   0x0000000000000a6c <+76>:	jmp    0xa67 <__vdso_sgx_enter_enclave+71>
   0x0000000000000a6e <+78>:	mov    %rax,(%rsp)
   0x0000000000000a72 <+82>:	retq   
   0x0000000000000a73 <+83>:	mov    0x18(%rbp),%rcx
   0x0000000000000a77 <+87>:	jrcxz  0xa87 <__vdso_sgx_enter_enclave+103>
   0x0000000000000a79 <+89>:	mov    %eax,(%rcx)
   0x0000000000000a7b <+91>:	mov    %di,0x4(%rcx)
   0x0000000000000a7f <+95>:	mov    %si,0x6(%rcx)
   0x0000000000000a83 <+99>:	mov    %rdx,0x8(%rcx)
   0x0000000000000a87 <+103>:	mov    $0xfffffff2,%eax
   0x0000000000000a8c <+108>:	jmp    0xa3e <__vdso_sgx_enter_enclave+30>
   0x0000000000000a8e <+110>:	cmp    $0x0,%eax
   0x0000000000000a91 <+113>:	jle    0xa98 <__vdso_sgx_enter_enclave+120>
   0x0000000000000a93 <+115>:	mov    $0xffffffea,%eax
   0x0000000000000a98 <+120>:	leaveq 
   0x0000000000000a99 <+121>:	retq  

and much easier to follow than the version where the exception struct is
filled in on a synchronous EEXIT:

   0x0000000000000a20 <+0>:	push   %rbp
   0x0000000000000a21 <+1>:	mov    %rsp,%rbp
   0x0000000000000a24 <+4>:	cmp    $0x2,%eax
   0x0000000000000a27 <+7>:	jb     0xa90 <__vdso_sgx_enter_enclave+112>
   0x0000000000000a29 <+9>:	cmp    $0x3,%eax
   0x0000000000000a2c <+12>:	ja     0xa90 <__vdso_sgx_enter_enclave+112>
   0x0000000000000a2e <+14>:	mov    0x10(%rbp),%rbx
   0x0000000000000a32 <+18>:	lea    0x0(%rip),%rcx        # 0xa39 <__vdso_sgx_enter_enclave+25>
   0x0000000000000a39 <+25>:	enclu  
   0x0000000000000a3c <+28>:	xor    %ebx,%ebx
   0x0000000000000a3e <+30>:	mov    0x18(%rbp),%rcx
   0x0000000000000a42 <+34>:	jrcxz  0xa54 <__vdso_sgx_enter_enclave+52>
   0x0000000000000a44 <+36>:	mov    %eax,(%rcx)
   0x0000000000000a46 <+38>:	jae    0xa54 <__vdso_sgx_enter_enclave+52>
   0x0000000000000a48 <+40>:	mov    %di,0x4(%rcx)
   0x0000000000000a4c <+44>:	mov    %si,0x6(%rcx)
   0x0000000000000a50 <+48>:	mov    %rdx,0x8(%rcx)
   0x0000000000000a54 <+52>:	mov    0x20(%rbp),%rax
   0x0000000000000a58 <+56>:	test   %rax,%rax
   0x0000000000000a5b <+59>:	cmove  %rbx,%rax
   0x0000000000000a5f <+63>:	je     0xa9a <__vdso_sgx_enter_enclave+122>

   ...

   0x0000000000000a9a <+122>:	leaveq 
   0x0000000000000a9b <+123>:	retq   


> >Using a callback requires a retpoline, and the only known motivation for
> >employing a callback is to allow the enclave to muck with the stack of
> >the untrusted runtime.
> >
> >Opportunistically replace the majority of the local labels with local
> >symbol names to improve the readability of the code.
> >
> >Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >---
> >  arch/x86/entry/vdso/vsgx_enter_enclave.S | 120 ++++++++++++++---------
> >  1 file changed, 71 insertions(+), 49 deletions(-)
> >
> >diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> >index de54e47c83f4..fc5622dcd2fa 100644
> >--- a/arch/x86/entry/vdso/vsgx_enter_enclave.S
> >+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> >@@ -85,75 +85,97 @@ ENTRY(__vdso_sgx_enter_enclave)
> >  	mov	%rsp, %rbp
> >  	.cfi_def_cfa_register	%rbp
> >-1:	/* EENTER <= leaf <= ERESUME */
> >+.Lenter_enclave:
> >+	/* EENTER <= leaf <= ERESUME */
> >  	cmp	$0x2, %eax
> >-	jb	6f
> >+	jb	.Linvalid_leaf
> >  	cmp	$0x3, %eax
> >-	ja	6f
> >+	ja	.Linvalid_leaf
> >  	/* Load TCS and AEP */
> >  	mov	0x10(%rbp), %rbx
> >-	lea	2f(%rip), %rcx
> >+	lea	.Lasync_exit_pointer(%rip), %rcx
> >  	/* Single ENCLU serving as both EENTER and AEP (ERESUME) */
> >-2:	enclu
> >+.Lasync_exit_pointer:
> >+.Lenclu_eenter_eresume:
> >+	enclu
> >-	/* EEXIT path */
> >+	/* EEXIT jumps here unless the enclave is doing something fancy. */
> >  	xor	%eax, %eax
> >-3:	mov	%eax, %ecx
> >-
> >-	/* Call the exit handler if supplied */
> >-	mov	0x20(%rbp), %rax
> >-	test	%rax, %rax
> >-	jz	7f
> 
> >-	/* Align stack per x86_64 ABI. The original %rsp is saved in %rbx to be
> >-	 * restored after the exit handler returns. */
> >+
> >+	/* Invoke userspace's exit handler if one was provided. */
> >+.Lhandle_exit:
> >+	cmp	$0, 0x20(%rbp)
> >+	jne	.Linvoke_userspace_handler
> >+
> >+.Lout:
> >+	leave
> >+	.cfi_def_cfa		%rsp, 8
> >+	ret
> >+
> >+.Linvalid_leaf:
> 
> Please set frame pointer back to %rbp here, or stack unwinding will fail.

Sorry, coffee isn't doing it's job, what's getting crushed, and where?

> >+	mov	$(-EINVAL), %eax
> >+	jmp	.Lout
> >+
> >+.Lhandle_exception:
> >+	mov	0x18(%rbp), %rcx
> >+	test    %rcx, %rcx
> >+	je	.Lskip_exception_info
> 
> A single "jrcxz .Lskip_exception_info" is equivalent to the above 2
> instructions combined.

Both implementations take a single uop on CPUs that support SGX.  IMO,
using the simpler and more common instructions is more universally
readable.

> >+
> >+	/* Fill optional exception info. */
> >+	mov	%eax, EX_LEAF(%rcx)
> >+	mov	%di,  EX_TRAPNR(%rcx)
> >+	mov	%si,  EX_ERROR_CODE(%rcx)
> >+	mov	%rdx, EX_ADDRESS(%rcx)
> >+.Lskip_exception_info:
> >+	mov	$(-EFAULT), %eax
> >+	jmp	.Lhandle_exit
> >+
> >+.Linvoke_userspace_handler:
> >+	/*
> >+	 * Align stack per x86_64 ABI. Save the original %rsp in %rbx to be
> >+	 * restored after the callback returns.
> >+	 */
> >  	mov	%rsp, %rbx
> >  	and	$-0x10, %rsp
> >-	/* Clear RFLAGS.DF per x86_64 ABI */
> >-	cld
> >-	/* Parameters for the exit handler */
> >+
> >+	/* Push @e, u_rsp and @tcs as parameters to the callback. */
> >  	push	0x18(%rbp)
> >  	push	%rbx
> >  	push	0x10(%rbp)
> >-	/* Call *%rax via retpoline */
> >-	call	40f
> >-	/* Restore %rsp to its original value left off by the enclave from last
> >-	 * exit */
> >+
> >+	/* Pass the "return" value to the callback via %rcx. */
> >+	mov	%eax, %ecx
> 
> @e (ex_info) is almost always needed by every callback as it also serves as
> the "context pointer". The return value on the other hand is insignificant
> because it could be deduced from @e->EX_LEAF anyway. So I'd retain %rcx and
> push %rax to the stack instead, given the purpose of this patch is to
> squeeze out a bit performance.

Please take this up in patch 02/16, which actually introduced this change.
> 
> >+
> >+	/* Clear RFLAGS.DF per x86_64 ABI */
> >+	cld
> >+
> >+	/* Load the callback pointer to %rax and invoke it via retpoline. */
> >+	mov	0x20(%rbp), %rax
> 
> Per X86_64 ABI, %rsp shall be 16 bytes aligned before "call". But %rsp here
> doesn't look aligned properly.

Argh, I probably botched it back in patch 02/16 too.  I'll see if I can
add a check to verify %rsp alignment in the selftest, verifying via code
inspection is bound to be error prone.

> >+	call	.Lretpoline

  reply	other threads:[~2019-10-09 19:10 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  4:45 [PATCH for_v23 00/16] x86/vdso: sgx: Major vDSO cleanup Sean Christopherson
2019-10-08  4:45 ` [PATCH for_v23 01/16] x86/vdso: sgx: Drop the pseudocode "documentation" Sean Christopherson
2019-10-08  4:45 ` [PATCH for_v23 02/16] x86/vdso: sgx: Do not use exception info to pass success/failure Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 03/16] x86/vdso: sgx: Rename the enclave exit handler typedef Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 04/16] x86/vdso: sgx: Move enclave exit handler declaration to UAPI header Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 05/16] x86/vdso: sgx: Add comment regarding kernel-doc shenanigans Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 06/16] x86/vdso: sgx: Rewrite __vdso_sgx_enter_enclave() function comment Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 07/16] selftests/x86: Fix linker warning in SGX selftest Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 08/16] selftests/x86/sgx: Use getauxval() to retrieve the vDSO base address Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 09/16] selftests/x86/sgx: Add helper function and macros to assert results Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 10/16] selftests/x86/sgx: Handle setup failures via test assertions Sean Christopherson
2019-10-15 10:16   ` Jarkko Sakkinen
2019-10-15 10:24     ` Jarkko Sakkinen
2019-10-15 10:25       ` Jarkko Sakkinen
2019-10-15 11:03         ` Jarkko Sakkinen
2019-10-15 16:27           ` Sean Christopherson
2019-10-16 10:20             ` Jarkko Sakkinen
2019-10-16 20:21         ` Sean Christopherson
2019-10-15 16:18     ` Sean Christopherson
2019-10-16 10:19       ` Jarkko Sakkinen
2019-10-08  4:46 ` [PATCH for_v23 11/16] selftests/x86/sgx: Sanitize the types for sgx_call()'s input params Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 12/16] selftests/x86/sgx: Move existing sub-test to a separate helper Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 13/16] selftests/x86/sgx: Add a test of the vDSO exception reporting mechanism Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 14/16] selftests/x86/sgx: Add test of vDSO with basic exit handler Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 15/16] selftests/x86/sgx: Add sub-test for exception behavior with " Sean Christopherson
2019-10-08  4:46 ` [PATCH for_v23 16/16] x86/vdso: sgx: Rework __vdso_sgx_enter_enclave() to prefer "no callback" Sean Christopherson
2019-10-09 18:00   ` Xing, Cedric
2019-10-09 19:10     ` Sean Christopherson [this message]
2019-10-10  0:21       ` Sean Christopherson
2019-10-10 17:49       ` Xing, Cedric
2019-10-10 23:59         ` Sean Christopherson
2019-10-16 22:18           ` Xing, Cedric
2019-10-16 22:53             ` Sean Christopherson
2019-10-10  8:10 ` [PATCH for_v23 00/16] x86/vdso: sgx: Major vDSO cleanup Jarkko Sakkinen
2019-10-10 16:08   ` Sean Christopherson
2019-10-14 21:04     ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191009191003.GD19952@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=cedric.xing@intel.com \
    --cc=jarkko.sakkinen@linux.intel.com \
    --cc=linux-sgx@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).