From: Peter Zijlstra <peterz@infradead.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
Andy Lutomirski <luto@kernel.org>,
x86@kernel.org, Kostya Serebryany <kcc@google.com>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Alexander Potapenko <glider@google.com>,
Taras Madan <tarasmadan@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
"H . J . Lu" <hjl.tools@gmail.com>,
Andi Kleen <ak@linux.intel.com>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Bharata B Rao <bharata@amd.com>,
Jacob Pan <jacob.jun.pan@linux.intel.com>,
Ashok Raj <ashok.raj@intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Sami Tolvanen <samitolvanen@google.com>,
ndesaulniers@google.com, joao@overdrivepizza.com
Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
Date: Tue, 17 Jan 2023 16:02:06 +0100 [thread overview]
Message-ID: <Y8a4bmCU9wsenvvF@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20230117135703.voaumisreld7crfb@box>
On Tue, Jan 17, 2023 at 04:57:03PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jan 17, 2023 at 02:05:22PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:
> >
> > > #define __untagged_addr(untag_mask, addr)
> > > u64 __addr = (__force u64)(addr); \
> > > - s64 sign = (s64)__addr >> 63; \
> > > - __addr &= untag_mask | sign; \
> > > + if (static_branch_likely(&tagged_addr_key)) { \
> > > + s64 sign = (s64)__addr >> 63; \
> > > + __addr &= untag_mask | sign; \
> > > + } \
> > > (__force __typeof__(addr))__addr; \
> > > })
> > >
> > > #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
> >
> > Is the compiler clever enough to put the memop inside the branch?
>
> Hm. You mean current_untag_mask() inside static_branch_likely()?
>
> But it is preprocessor who does this, not compiler. So, yes, the memop is
> inside the branch.
>
> Or I didn't understand your question.
Nah, call it a pre-lunch dip, I overlooked the whole CPP angle -- d'0h.
That said, I did just put it through a compiler to see wth it did and it
is pretty gross:
GCC-12.2:
0000 00000000000023b0 <write_ok_or_segv>:
0000 23b0: 48 89 fa mov %rdi,%rdx
0003 23b3: eb 76 jmp 242b <write_ok_or_segv+0x7b>
0005 23b5: 65 48 8b 0d 00 00 00 00 mov %gs:0x0(%rip),%rcx # 23bd <write_ok_or_segv+0xd> 23b9: R_X86_64_PC32 tlbstate_untag_mask-0x4
000d 23bd: 48 89 f8 mov %rdi,%rax
0010 23c0: 48 c1 f8 3f sar $0x3f,%rax
0014 23c4: 48 09 c8 or %rcx,%rax
0017 23c7: 48 21 f8 and %rdi,%rax
001a 23ca: 48 b9 00 f0 ff ff ff 7f 00 00 movabs $0x7ffffffff000,%rcx
0024 23d4: 48 39 f1 cmp %rsi,%rcx
0027 23d7: 72 14 jb 23ed <write_ok_or_segv+0x3d>
0029 23d9: 48 29 f1 sub %rsi,%rcx
002c 23dc: be 01 00 00 00 mov $0x1,%esi
0031 23e1: 48 39 c1 cmp %rax,%rcx
0034 23e4: 72 07 jb 23ed <write_ok_or_segv+0x3d>
0036 23e6: 89 f0 mov %esi,%eax
0038 23e8: e9 00 00 00 00 jmp 23ed <write_ok_or_segv+0x3d> 23e9: R_X86_64_PLT32 __x86_return_thunk-0x4
003d 23ed: 65 48 8b 04 25 00 00 00 00 mov %gs:0x0,%rax 23f2: R_X86_64_32S pcpu_hot
0046 23f6: 48 89 90 68 0b 00 00 mov %rdx,0xb68(%rax)
004d 23fd: be 01 00 00 00 mov $0x1,%esi
0052 2402: bf 0b 00 00 00 mov $0xb,%edi
0057 2407: 48 c7 80 78 0b 00 00 06 00 00 00 movq $0x6,0xb78(%rax)
0062 2412: 48 c7 80 70 0b 00 00 0e 00 00 00 movq $0xe,0xb70(%rax)
006d 241d: e8 00 00 00 00 call 2422 <write_ok_or_segv+0x72> 241e: R_X86_64_PLT32 force_sig_fault-0x4
0072 2422: 31 f6 xor %esi,%esi
0074 2424: 89 f0 mov %esi,%eax
0076 2426: e9 00 00 00 00 jmp 242b <write_ok_or_segv+0x7b> 2427: R_X86_64_PLT32 __x86_return_thunk-0x4
007b 242b: 48 89 f8 mov %rdi,%rax
007e 242e: eb 9a jmp 23ca <write_ok_or_segv+0x1a>
Note the stupid jump to the end and back. Not all sites do this mind
you, but a fair number seem to do it.
Let me try llvm to see if it is any smarter.
CLANG-16:
0000 0000000000002d50 <write_ok_or_segv>:
0000 2d50: 41 57 push %r15
0002 2d52: 41 56 push %r14
0004 2d54: 41 54 push %r12
0006 2d56: 53 push %rbx
0007 2d57: 48 89 f3 mov %rsi,%rbx
000a 2d5a: 48 89 fa mov %rdi,%rdx
000d 2d5d: 49 89 fe mov %rdi,%r14
0010 2d60: eb 15 jmp 2d77 <write_ok_or_segv+0x27>
0012 2d62: 48 89 d0 mov %rdx,%rax
0015 2d65: 48 c1 f8 3f sar $0x3f,%rax
0019 2d69: 65 4c 8b 35 00 00 00 00 mov %gs:0x0(%rip),%r14 # 2d71 <write_ok_or_segv+0x21> 2d6d: R_X86_64_PC32 tlbstate_untag_mask-0x4
0021 2d71: 49 09 c6 or %rax,%r14
0024 2d74: 49 21 d6 and %rdx,%r14
0027 2d77: f3 0f 1e fa endbr64
002b 2d7b: 49 bf 00 f0 ff ff ff 7f 00 00 movabs $0x7ffffffff000,%r15
0035 2d85: 4d 89 fc mov %r15,%r12
0038 2d88: 49 29 dc sub %rbx,%r12
003b 2d8b: 72 05 jb 2d92 <write_ok_or_segv+0x42>
003d 2d8d: 4d 39 f4 cmp %r14,%r12
0040 2d90: 73 34 jae 2dc6 <write_ok_or_segv+0x76>
0042 2d92: 65 48 8b 05 00 00 00 00 mov %gs:0x0(%rip),%rax # 2d9a <write_ok_or_segv+0x4a> 2d96: R_X86_64_PC32 pcpu_hot-0x4
004a 2d9a: 48 c7 80 78 0b 00 00 06 00 00 00 movq $0x6,0xb78(%rax)
0055 2da5: 48 89 90 68 0b 00 00 mov %rdx,0xb68(%rax)
005c 2dac: 48 c7 80 70 0b 00 00 0e 00 00 00 movq $0xe,0xb70(%rax)
0067 2db7: bf 0b 00 00 00 mov $0xb,%edi
006c 2dbc: be 01 00 00 00 mov $0x1,%esi
0071 2dc1: e8 00 00 00 00 call 2dc6 <write_ok_or_segv+0x76> 2dc2: R_X86_64_PLT32 force_sig_fault-0x4
0076 2dc6: 4d 39 f4 cmp %r14,%r12
0079 2dc9: 0f 93 c1 setae %cl
007c 2dcc: 49 39 df cmp %rbx,%r15
007f 2dcf: 0f 93 c0 setae %al
0082 2dd2: 20 c8 and %cl,%al
0084 2dd4: 5b pop %rbx
0085 2dd5: 41 5c pop %r12
0087 2dd7: 41 5e pop %r14
0089 2dd9: 41 5f pop %r15
008b 2ddb: e9 00 00 00 00 jmp 2de0 <__pfx_get_gate_vma> 2ddc: R_X86_64_PLT32 __x86_return_thunk-0x4
Well, it got the untag right, but OMG.. :-( Joao, Sami, any idea why it
put an ENDBR in there?
next prev parent reply other threads:[~2023-01-17 15:04 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
2023-01-18 15:49 ` Peter Zijlstra
2023-01-18 15:59 ` Linus Torvalds
2023-01-18 16:48 ` Peter Zijlstra
2023-01-18 17:01 ` Linus Torvalds
2023-01-11 12:37 ` [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2023-01-11 13:49 ` Linus Torvalds
2023-01-11 14:14 ` Kirill A. Shutemov
2023-01-11 14:37 ` [PATCHv14.1 " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 05/17] mm: Introduce untagged_addr_remote() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
2023-01-17 13:05 ` Peter Zijlstra
2023-01-17 13:57 ` Kirill A. Shutemov
2023-01-17 15:02 ` Peter Zijlstra [this message]
2023-01-17 17:18 ` Linus Torvalds
2023-01-17 17:28 ` Linus Torvalds
2023-01-17 18:26 ` Nick Desaulniers
2023-01-17 18:33 ` Linus Torvalds
2023-01-17 19:17 ` Nick Desaulniers
2023-01-17 20:10 ` Linus Torvalds
2023-01-17 20:43 ` Linus Torvalds
2023-01-17 18:14 ` Peter Zijlstra
2023-01-17 18:21 ` Peter Zijlstra
2023-01-19 23:06 ` Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 14/17] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 15/17] selftests/x86/lam: Add inherit " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation Kirill A. Shutemov
2023-01-18 16:49 ` [PATCHv14 00/17] Linear Address Masking enabling Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y8a4bmCU9wsenvvF@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=ak@linux.intel.com \
--cc=andreyknvl@gmail.com \
--cc=ashok.raj@intel.com \
--cc=bharata@amd.com \
--cc=dave.hansen@linux.intel.com \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=hjl.tools@gmail.com \
--cc=jacob.jun.pan@linux.intel.com \
--cc=joao@overdrivepizza.com \
--cc=kcc@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=ndesaulniers@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=ryabinin.a.a@gmail.com \
--cc=samitolvanen@google.com \
--cc=tarasmadan@google.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.