All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	x86@kernel.org, Kostya Serebryany <kcc@google.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	Taras Madan <tarasmadan@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	"H . J . Lu" <hjl.tools@gmail.com>,
	Andi Kleen <ak@linux.intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Bharata B Rao <bharata@amd.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Sami Tolvanen <samitolvanen@google.com>,
	ndesaulniers@google.com, joao@overdrivepizza.com
Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
Date: Tue, 17 Jan 2023 16:02:06 +0100	[thread overview]
Message-ID: <Y8a4bmCU9wsenvvF@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20230117135703.voaumisreld7crfb@box>

On Tue, Jan 17, 2023 at 04:57:03PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jan 17, 2023 at 02:05:22PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:
> > 
> > >  #define __untagged_addr(untag_mask, addr)
> > >  	u64 __addr = (__force u64)(addr);				\
> > > -	s64 sign = (s64)__addr >> 63;					\
> > > -	__addr &= untag_mask | sign;					\
> > > +	if (static_branch_likely(&tagged_addr_key)) {			\
> > > +		s64 sign = (s64)__addr >> 63;				\
> > > +		__addr &= untag_mask | sign;				\
> > > +	}								\
> > >  	(__force __typeof__(addr))__addr;				\
> > >  })
> > >  
> > > #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
> > 
> > Is the compiler clever enough to put the memop inside the branch?
> 
> Hm. You mean current_untag_mask() inside static_branch_likely()?
> 
> But it is preprocessor who does this, not compiler. So, yes, the memop is
> inside the branch.
> 
> Or I didn't understand your question.

Nah, call it a pre-lunch dip, I overlooked the whole CPP angle -- d'0h.

That said, I did just put it through a compiler to see wth it did and it
is pretty gross:

GCC-12.2:

0000 00000000000023b0 <write_ok_or_segv>:
0000     23b0:  48 89 fa                mov    %rdi,%rdx
0003     23b3:  eb 76                   jmp    242b <write_ok_or_segv+0x7b>
0005     23b5:  65 48 8b 0d 00 00 00 00         mov    %gs:0x0(%rip),%rcx        # 23bd <write_ok_or_segv+0xd>  23b9: R_X86_64_PC32     tlbstate_untag_mask-0x4
000d     23bd:  48 89 f8                mov    %rdi,%rax
0010     23c0:  48 c1 f8 3f             sar    $0x3f,%rax
0014     23c4:  48 09 c8                or     %rcx,%rax
0017     23c7:  48 21 f8                and    %rdi,%rax
001a     23ca:  48 b9 00 f0 ff ff ff 7f 00 00   movabs $0x7ffffffff000,%rcx
0024     23d4:  48 39 f1                cmp    %rsi,%rcx
0027     23d7:  72 14                   jb     23ed <write_ok_or_segv+0x3d>
0029     23d9:  48 29 f1                sub    %rsi,%rcx
002c     23dc:  be 01 00 00 00          mov    $0x1,%esi
0031     23e1:  48 39 c1                cmp    %rax,%rcx
0034     23e4:  72 07                   jb     23ed <write_ok_or_segv+0x3d>
0036     23e6:  89 f0                   mov    %esi,%eax
0038     23e8:  e9 00 00 00 00          jmp    23ed <write_ok_or_segv+0x3d>     23e9: R_X86_64_PLT32    __x86_return_thunk-0x4
003d     23ed:  65 48 8b 04 25 00 00 00 00      mov    %gs:0x0,%rax     23f2: R_X86_64_32S      pcpu_hot
0046     23f6:  48 89 90 68 0b 00 00    mov    %rdx,0xb68(%rax)
004d     23fd:  be 01 00 00 00          mov    $0x1,%esi
0052     2402:  bf 0b 00 00 00          mov    $0xb,%edi
0057     2407:  48 c7 80 78 0b 00 00 06 00 00 00        movq   $0x6,0xb78(%rax)
0062     2412:  48 c7 80 70 0b 00 00 0e 00 00 00        movq   $0xe,0xb70(%rax)
006d     241d:  e8 00 00 00 00          call   2422 <write_ok_or_segv+0x72>     241e: R_X86_64_PLT32    force_sig_fault-0x4
0072     2422:  31 f6                   xor    %esi,%esi
0074     2424:  89 f0                   mov    %esi,%eax
0076     2426:  e9 00 00 00 00          jmp    242b <write_ok_or_segv+0x7b>     2427: R_X86_64_PLT32    __x86_return_thunk-0x4
007b     242b:  48 89 f8                mov    %rdi,%rax
007e     242e:  eb 9a                   jmp    23ca <write_ok_or_segv+0x1a>

Note the stupid jump to the end and back. Not all sites do this mind
you, but a fair number seem to do it.

Let me try llvm to see if it is any smarter.

CLANG-16:

0000 0000000000002d50 <write_ok_or_segv>:
0000     2d50:	41 57                	push   %r15
0002     2d52:	41 56                	push   %r14
0004     2d54:	41 54                	push   %r12
0006     2d56:	53                   	push   %rbx
0007     2d57:	48 89 f3             	mov    %rsi,%rbx
000a     2d5a:	48 89 fa             	mov    %rdi,%rdx
000d     2d5d:	49 89 fe             	mov    %rdi,%r14
0010     2d60:	eb 15                	jmp    2d77 <write_ok_or_segv+0x27>
0012     2d62:	48 89 d0             	mov    %rdx,%rax
0015     2d65:	48 c1 f8 3f          	sar    $0x3f,%rax
0019     2d69:	65 4c 8b 35 00 00 00 00 	mov    %gs:0x0(%rip),%r14        # 2d71 <write_ok_or_segv+0x21>	2d6d: R_X86_64_PC32	tlbstate_untag_mask-0x4
0021     2d71:	49 09 c6             	or     %rax,%r14
0024     2d74:	49 21 d6             	and    %rdx,%r14
0027     2d77:	f3 0f 1e fa          	endbr64
002b     2d7b:	49 bf 00 f0 ff ff ff 7f 00 00 	movabs $0x7ffffffff000,%r15
0035     2d85:	4d 89 fc             	mov    %r15,%r12
0038     2d88:	49 29 dc             	sub    %rbx,%r12
003b     2d8b:	72 05                	jb     2d92 <write_ok_or_segv+0x42>
003d     2d8d:	4d 39 f4             	cmp    %r14,%r12
0040     2d90:	73 34                	jae    2dc6 <write_ok_or_segv+0x76>
0042     2d92:	65 48 8b 05 00 00 00 00 	mov    %gs:0x0(%rip),%rax        # 2d9a <write_ok_or_segv+0x4a>	2d96: R_X86_64_PC32	pcpu_hot-0x4
004a     2d9a:	48 c7 80 78 0b 00 00 06 00 00 00 	movq   $0x6,0xb78(%rax)
0055     2da5:	48 89 90 68 0b 00 00 	mov    %rdx,0xb68(%rax)
005c     2dac:	48 c7 80 70 0b 00 00 0e 00 00 00 	movq   $0xe,0xb70(%rax)
0067     2db7:	bf 0b 00 00 00       	mov    $0xb,%edi
006c     2dbc:	be 01 00 00 00       	mov    $0x1,%esi
0071     2dc1:	e8 00 00 00 00       	call   2dc6 <write_ok_or_segv+0x76>	2dc2: R_X86_64_PLT32	force_sig_fault-0x4
0076     2dc6:	4d 39 f4             	cmp    %r14,%r12
0079     2dc9:	0f 93 c1             	setae  %cl
007c     2dcc:	49 39 df             	cmp    %rbx,%r15
007f     2dcf:	0f 93 c0             	setae  %al
0082     2dd2:	20 c8                	and    %cl,%al
0084     2dd4:	5b                   	pop    %rbx
0085     2dd5:	41 5c                	pop    %r12
0087     2dd7:	41 5e                	pop    %r14
0089     2dd9:	41 5f                	pop    %r15
008b     2ddb:	e9 00 00 00 00       	jmp    2de0 <__pfx_get_gate_vma>	2ddc: R_X86_64_PLT32	__x86_return_thunk-0x4

Well, it got the untag right, but OMG.. :-( Joao, Sami, any idea why it
put an ENDBR in there?

  reply	other threads:[~2023-01-17 15:04 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
2023-01-18 15:49   ` Peter Zijlstra
2023-01-18 15:59     ` Linus Torvalds
2023-01-18 16:48       ` Peter Zijlstra
2023-01-18 17:01         ` Linus Torvalds
2023-01-11 12:37 ` [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2023-01-11 13:49   ` Linus Torvalds
2023-01-11 14:14     ` Kirill A. Shutemov
2023-01-11 14:37       ` [PATCHv14.1 " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 05/17] mm: Introduce untagged_addr_remote() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
2023-01-17 13:05   ` Peter Zijlstra
2023-01-17 13:57     ` Kirill A. Shutemov
2023-01-17 15:02       ` Peter Zijlstra [this message]
2023-01-17 17:18         ` Linus Torvalds
2023-01-17 17:28           ` Linus Torvalds
2023-01-17 18:26             ` Nick Desaulniers
2023-01-17 18:33               ` Linus Torvalds
2023-01-17 19:17                 ` Nick Desaulniers
2023-01-17 20:10                   ` Linus Torvalds
2023-01-17 20:43                     ` Linus Torvalds
2023-01-17 18:14           ` Peter Zijlstra
2023-01-17 18:21           ` Peter Zijlstra
2023-01-19 23:06         ` Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 14/17] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 15/17] selftests/x86/lam: Add inherit " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation Kirill A. Shutemov
2023-01-18 16:49 ` [PATCHv14 00/17] Linear Address Masking enabling Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8a4bmCU9wsenvvF@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ak@linux.intel.com \
    --cc=andreyknvl@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=bharata@amd.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=hjl.tools@gmail.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=joao@overdrivepizza.com \
    --cc=kcc@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=ryabinin.a.a@gmail.com \
    --cc=samitolvanen@google.com \
    --cc=tarasmadan@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.