From: Sean Christopherson <sean.j.christopherson@intel.com>
To: "Xing, Cedric" <cedric.xing@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>,
linux-sgx@vger.kernel.org
Subject: Re: [PATCH for_v23 16/16] x86/vdso: sgx: Rework __vdso_sgx_enter_enclave() to prefer "no callback"
Date: Wed, 9 Oct 2019 12:10:03 -0700 [thread overview]
Message-ID: <20191009191003.GD19952@linux.intel.com> (raw)
In-Reply-To: <15c03e7e-cd98-5ea2-82a1-a11bec4abe2a@intel.com>
On Wed, Oct 09, 2019 at 11:00:55AM -0700, Xing, Cedric wrote:
> On 10/7/2019 9:46 PM, Sean Christopherson wrote:
> >Rework __vdso_sgx_enter_enclave() to prioritize the flow where userspace
> >is not providing a callback, which is the preferred method of operation.
>
> Processors have branch predictors so your "prioritizing" may not get what
> your want.
>
> But if you still insist, a simple ht/hnt prefixing the original branch
> instruction would have sufficed. Rearrangement of code blocks is indeed
> unnecessary.
Tying into the other thread, it's not a matter of absolute necessity, it's
a matter of us providing the best code possible.
> A caveat though, for any given process, whether it supplies a callback or
> not is usually hard-coded. So either it always takes the callback path, or
> it always takes the other. And the branch predictor will do well in both
> cases. It's usually unwise to apply ht/hnt prefixes.
Readability is the primary concern, performance is secondary concern. For
a random joe user, this:
0x0000000000000a20 <+0>: push %rbp
0x0000000000000a21 <+1>: mov %rsp,%rbp
0x0000000000000a24 <+4>: cmp $0x2,%eax
0x0000000000000a27 <+7>: jb 0xa46 <__vdso_sgx_enter_enclave+38>
0x0000000000000a29 <+9>: cmp $0x3,%eax
0x0000000000000a2c <+12>: ja 0xa46 <__vdso_sgx_enter_enclave+38>
0x0000000000000a2e <+14>: mov 0x10(%rbp),%rbx
0x0000000000000a32 <+18>: lea 0x0(%rip),%rcx # 0xa39 <__vdso_sgx_enter_enclave+25>
0x0000000000000a39 <+25>: enclu
0x0000000000000a3c <+28>: xor %eax,%eax
0x0000000000000a3e <+30>: cmpl $0x0,0x20(%rbp)
0x0000000000000a42 <+34>: jne 0xa6b <__vdso_sgx_enter_enclave+75>
0x0000000000000a44 <+36>: leaveq
0x0000000000000a45 <+37>: retq
is easier to follow for the happy path than this:
0x0000000000000a20 <+0>: push %rbp
0x0000000000000a21 <+1>: mov %rsp,%rbp
0x0000000000000a24 <+4>: cmp $0x2,%eax
0x0000000000000a27 <+7>: jb 0xa8e <__vdso_sgx_enter_enclave+110>
0x0000000000000a29 <+9>: cmp $0x3,%eax
0x0000000000000a2c <+12>: ja 0xa8e <__vdso_sgx_enter_enclave+110>
0x0000000000000a2e <+14>: mov 0x10(%rbp),%rbx
0x0000000000000a32 <+18>: lea 0x0(%rip),%rcx # 0xa39 <__vdso_sgx_enter_enclave+25>
0x0000000000000a39 <+25>: enclu
0x0000000000000a3c <+28>: xor %eax,%eax
0x0000000000000a3e <+30>: mov %eax,%ecx
0x0000000000000a40 <+32>: mov 0x20(%rbp),%rax
0x0000000000000a44 <+36>: test %rax,%rax
0x0000000000000a47 <+39>: je 0xa98 <__vdso_sgx_enter_enclave+120>
0x0000000000000a49 <+41>: mov %rsp,%rbx
0x0000000000000a4c <+44>: and $0xfffffffffffffff0,%rsp
0x0000000000000a50 <+48>: cld
0x0000000000000a51 <+49>: pushq 0x18(%rbp)
0x0000000000000a54 <+52>: push %rbx
0x0000000000000a55 <+53>: pushq 0x10(%rbp)
0x0000000000000a58 <+56>: callq 0xa62 <__vdso_sgx_enter_enclave+66>
0x0000000000000a5d <+61>: mov %rbx,%rsp
0x0000000000000a60 <+64>: jmp 0xa24 <__vdso_sgx_enter_enclave+4>
0x0000000000000a62 <+66>: callq 0xa6e <__vdso_sgx_enter_enclave+78>
0x0000000000000a67 <+71>: pause
0x0000000000000a69 <+73>: lfence
0x0000000000000a6c <+76>: jmp 0xa67 <__vdso_sgx_enter_enclave+71>
0x0000000000000a6e <+78>: mov %rax,(%rsp)
0x0000000000000a72 <+82>: retq
0x0000000000000a73 <+83>: mov 0x18(%rbp),%rcx
0x0000000000000a77 <+87>: jrcxz 0xa87 <__vdso_sgx_enter_enclave+103>
0x0000000000000a79 <+89>: mov %eax,(%rcx)
0x0000000000000a7b <+91>: mov %di,0x4(%rcx)
0x0000000000000a7f <+95>: mov %si,0x6(%rcx)
0x0000000000000a83 <+99>: mov %rdx,0x8(%rcx)
0x0000000000000a87 <+103>: mov $0xfffffff2,%eax
0x0000000000000a8c <+108>: jmp 0xa3e <__vdso_sgx_enter_enclave+30>
0x0000000000000a8e <+110>: cmp $0x0,%eax
0x0000000000000a91 <+113>: jle 0xa98 <__vdso_sgx_enter_enclave+120>
0x0000000000000a93 <+115>: mov $0xffffffea,%eax
0x0000000000000a98 <+120>: leaveq
0x0000000000000a99 <+121>: retq
and much easier to follow than the version where the exception struct is
filled in on a synchronous EEXIT:
0x0000000000000a20 <+0>: push %rbp
0x0000000000000a21 <+1>: mov %rsp,%rbp
0x0000000000000a24 <+4>: cmp $0x2,%eax
0x0000000000000a27 <+7>: jb 0xa90 <__vdso_sgx_enter_enclave+112>
0x0000000000000a29 <+9>: cmp $0x3,%eax
0x0000000000000a2c <+12>: ja 0xa90 <__vdso_sgx_enter_enclave+112>
0x0000000000000a2e <+14>: mov 0x10(%rbp),%rbx
0x0000000000000a32 <+18>: lea 0x0(%rip),%rcx # 0xa39 <__vdso_sgx_enter_enclave+25>
0x0000000000000a39 <+25>: enclu
0x0000000000000a3c <+28>: xor %ebx,%ebx
0x0000000000000a3e <+30>: mov 0x18(%rbp),%rcx
0x0000000000000a42 <+34>: jrcxz 0xa54 <__vdso_sgx_enter_enclave+52>
0x0000000000000a44 <+36>: mov %eax,(%rcx)
0x0000000000000a46 <+38>: jae 0xa54 <__vdso_sgx_enter_enclave+52>
0x0000000000000a48 <+40>: mov %di,0x4(%rcx)
0x0000000000000a4c <+44>: mov %si,0x6(%rcx)
0x0000000000000a50 <+48>: mov %rdx,0x8(%rcx)
0x0000000000000a54 <+52>: mov 0x20(%rbp),%rax
0x0000000000000a58 <+56>: test %rax,%rax
0x0000000000000a5b <+59>: cmove %rbx,%rax
0x0000000000000a5f <+63>: je 0xa9a <__vdso_sgx_enter_enclave+122>
...
0x0000000000000a9a <+122>: leaveq
0x0000000000000a9b <+123>: retq
> >Using a callback requires a retpoline, and the only known motivation for
> >employing a callback is to allow the enclave to muck with the stack of
> >the untrusted runtime.
> >
> >Opportunistically replace the majority of the local labels with local
> >symbol names to improve the readability of the code.
> >
> >Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >---
> > arch/x86/entry/vdso/vsgx_enter_enclave.S | 120 ++++++++++++++---------
> > 1 file changed, 71 insertions(+), 49 deletions(-)
> >
> >diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> >index de54e47c83f4..fc5622dcd2fa 100644
> >--- a/arch/x86/entry/vdso/vsgx_enter_enclave.S
> >+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> >@@ -85,75 +85,97 @@ ENTRY(__vdso_sgx_enter_enclave)
> > mov %rsp, %rbp
> > .cfi_def_cfa_register %rbp
> >-1: /* EENTER <= leaf <= ERESUME */
> >+.Lenter_enclave:
> >+ /* EENTER <= leaf <= ERESUME */
> > cmp $0x2, %eax
> >- jb 6f
> >+ jb .Linvalid_leaf
> > cmp $0x3, %eax
> >- ja 6f
> >+ ja .Linvalid_leaf
> > /* Load TCS and AEP */
> > mov 0x10(%rbp), %rbx
> >- lea 2f(%rip), %rcx
> >+ lea .Lasync_exit_pointer(%rip), %rcx
> > /* Single ENCLU serving as both EENTER and AEP (ERESUME) */
> >-2: enclu
> >+.Lasync_exit_pointer:
> >+.Lenclu_eenter_eresume:
> >+ enclu
> >- /* EEXIT path */
> >+ /* EEXIT jumps here unless the enclave is doing something fancy. */
> > xor %eax, %eax
> >-3: mov %eax, %ecx
> >-
> >- /* Call the exit handler if supplied */
> >- mov 0x20(%rbp), %rax
> >- test %rax, %rax
> >- jz 7f
>
> >- /* Align stack per x86_64 ABI. The original %rsp is saved in %rbx to be
> >- * restored after the exit handler returns. */
> >+
> >+ /* Invoke userspace's exit handler if one was provided. */
> >+.Lhandle_exit:
> >+ cmp $0, 0x20(%rbp)
> >+ jne .Linvoke_userspace_handler
> >+
> >+.Lout:
> >+ leave
> >+ .cfi_def_cfa %rsp, 8
> >+ ret
> >+
> >+.Linvalid_leaf:
>
> Please set frame pointer back to %rbp here, or stack unwinding will fail.
Sorry, coffee isn't doing it's job, what's getting crushed, and where?
> >+ mov $(-EINVAL), %eax
> >+ jmp .Lout
> >+
> >+.Lhandle_exception:
> >+ mov 0x18(%rbp), %rcx
> >+ test %rcx, %rcx
> >+ je .Lskip_exception_info
>
> A single "jrcxz .Lskip_exception_info" is equivalent to the above 2
> instructions combined.
Both implementations take a single uop on CPUs that support SGX. IMO,
using the simpler and more common instructions is more universally
readable.
> >+
> >+ /* Fill optional exception info. */
> >+ mov %eax, EX_LEAF(%rcx)
> >+ mov %di, EX_TRAPNR(%rcx)
> >+ mov %si, EX_ERROR_CODE(%rcx)
> >+ mov %rdx, EX_ADDRESS(%rcx)
> >+.Lskip_exception_info:
> >+ mov $(-EFAULT), %eax
> >+ jmp .Lhandle_exit
> >+
> >+.Linvoke_userspace_handler:
> >+ /*
> >+ * Align stack per x86_64 ABI. Save the original %rsp in %rbx to be
> >+ * restored after the callback returns.
> >+ */
> > mov %rsp, %rbx
> > and $-0x10, %rsp
> >- /* Clear RFLAGS.DF per x86_64 ABI */
> >- cld
> >- /* Parameters for the exit handler */
> >+
> >+ /* Push @e, u_rsp and @tcs as parameters to the callback. */
> > push 0x18(%rbp)
> > push %rbx
> > push 0x10(%rbp)
> >- /* Call *%rax via retpoline */
> >- call 40f
> >- /* Restore %rsp to its original value left off by the enclave from last
> >- * exit */
> >+
> >+ /* Pass the "return" value to the callback via %rcx. */
> >+ mov %eax, %ecx
>
> @e (ex_info) is almost always needed by every callback as it also serves as
> the "context pointer". The return value on the other hand is insignificant
> because it could be deduced from @e->EX_LEAF anyway. So I'd retain %rcx and
> push %rax to the stack instead, given the purpose of this patch is to
> squeeze out a bit performance.
Please take this up in patch 02/16, which actually introduced this change.
>
> >+
> >+ /* Clear RFLAGS.DF per x86_64 ABI */
> >+ cld
> >+
> >+ /* Load the callback pointer to %rax and invoke it via retpoline. */
> >+ mov 0x20(%rbp), %rax
>
> Per X86_64 ABI, %rsp shall be 16 bytes aligned before "call". But %rsp here
> doesn't look aligned properly.
Argh, I probably botched it back in patch 02/16 too. I'll see if I can
add a check to verify %rsp alignment in the selftest, verifying via code
inspection is bound to be error prone.
> >+ call .Lretpoline
next prev parent reply other threads:[~2019-10-09 19:10 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-08 4:45 [PATCH for_v23 00/16] x86/vdso: sgx: Major vDSO cleanup Sean Christopherson
2019-10-08 4:45 ` [PATCH for_v23 01/16] x86/vdso: sgx: Drop the pseudocode "documentation" Sean Christopherson
2019-10-08 4:45 ` [PATCH for_v23 02/16] x86/vdso: sgx: Do not use exception info to pass success/failure Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 03/16] x86/vdso: sgx: Rename the enclave exit handler typedef Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 04/16] x86/vdso: sgx: Move enclave exit handler declaration to UAPI header Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 05/16] x86/vdso: sgx: Add comment regarding kernel-doc shenanigans Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 06/16] x86/vdso: sgx: Rewrite __vdso_sgx_enter_enclave() function comment Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 07/16] selftests/x86: Fix linker warning in SGX selftest Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 08/16] selftests/x86/sgx: Use getauxval() to retrieve the vDSO base address Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 09/16] selftests/x86/sgx: Add helper function and macros to assert results Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 10/16] selftests/x86/sgx: Handle setup failures via test assertions Sean Christopherson
2019-10-15 10:16 ` Jarkko Sakkinen
2019-10-15 10:24 ` Jarkko Sakkinen
2019-10-15 10:25 ` Jarkko Sakkinen
2019-10-15 11:03 ` Jarkko Sakkinen
2019-10-15 16:27 ` Sean Christopherson
2019-10-16 10:20 ` Jarkko Sakkinen
2019-10-16 20:21 ` Sean Christopherson
2019-10-15 16:18 ` Sean Christopherson
2019-10-16 10:19 ` Jarkko Sakkinen
2019-10-08 4:46 ` [PATCH for_v23 11/16] selftests/x86/sgx: Sanitize the types for sgx_call()'s input params Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 12/16] selftests/x86/sgx: Move existing sub-test to a separate helper Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 13/16] selftests/x86/sgx: Add a test of the vDSO exception reporting mechanism Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 14/16] selftests/x86/sgx: Add test of vDSO with basic exit handler Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 15/16] selftests/x86/sgx: Add sub-test for exception behavior with " Sean Christopherson
2019-10-08 4:46 ` [PATCH for_v23 16/16] x86/vdso: sgx: Rework __vdso_sgx_enter_enclave() to prefer "no callback" Sean Christopherson
2019-10-09 18:00 ` Xing, Cedric
2019-10-09 19:10 ` Sean Christopherson [this message]
2019-10-10 0:21 ` Sean Christopherson
2019-10-10 17:49 ` Xing, Cedric
2019-10-10 23:59 ` Sean Christopherson
2019-10-16 22:18 ` Xing, Cedric
2019-10-16 22:53 ` Sean Christopherson
2019-10-10 8:10 ` [PATCH for_v23 00/16] x86/vdso: sgx: Major vDSO cleanup Jarkko Sakkinen
2019-10-10 16:08 ` Sean Christopherson
2019-10-14 21:04 ` Jarkko Sakkinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191009191003.GD19952@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=cedric.xing@intel.com \
--cc=jarkko.sakkinen@linux.intel.com \
--cc=linux-sgx@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).