linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: "hpa@zytor.com" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Jason Baron <jbaron@akamai.com>, Jiri Kosina <jkosina@suse.cz>,
	David Laight <David.Laight@aculab.com>,
	Borislav Petkov <bp@alien8.de>, Julia Cartwright <julia@ni.com>,
	Jessica Yu <jeyu@kernel.org>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Edward Cree <ecree@solarflare.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: [PATCH v3 0/6] Static calls
Date: Fri, 11 Jan 2019 19:33:30 +0000	[thread overview]
Message-ID: <F39DA47E-BFCA-49F2-B2F4-C19F860CC79F@vmware.com> (raw)
In-Reply-To: <12578A17-E695-4DD5-AEC7-E29FAB2C8322@zytor.com>

> On Jan 11, 2019, at 11:23 AM, hpa@zytor.com wrote:
> 
> On January 11, 2019 11:03:30 AM PST, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> On Fri, Jan 11, 2019 at 7:15 AM Josh Poimboeuf <jpoimboe@redhat.com>
>> wrote:
>>>> Now, in the int3 handler can you take the faulting RIP and search
>> for it in
>>>> the “static-calls” table, writing the RIP+5 (offset) into R10
>> (return
>>>> address) and the target into R11. You make the int3 handler to
>> divert the
>>>> code execution by changing pt_regs->rip to point to a new function
>> that does:
>>>>      push R10
>>>>      jmp __x86_indirect_thunk_r11
>>>> 
>>>> And then you are done. No?
>>> 
>>> IIUC, that sounds pretty much like what Steven proposed:
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20181129122000.7fb4fb04%40gandalf.local.home&amp;data=02%7C01%7Cnamit%40vmware.com%7Ca665a074940b4630e3fc08d677fa753b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828314949874019&amp;sdata=fs2uo%2BjL%2FV3rpzIHJ%2B3QoyHg4KhV%2B%2FUPmUOLpy8S8p8%3D&amp;reserved=0
>>> I liked the idea, BUT, how would it work for callee-saved PV ops?  In
>>> that case there's only one clobbered register to work with (rax).
>> 
>> Actually, there's a much simpler model now that I think about it.
>> 
>> The BP fixup just fixes up %rip to to point to "bp_int3_handler".
>> 
>> And that's just a random text address set up by "text_poke_bp()".
>> 
>> So how about the static call rewriting simply do this:
>> 
>> - for each static call:
>> 
>> 1)   create a fixup code stub that does
>> 
>>       push $returnaddressforTHIScall
>>       jmp targetforTHIScall
>> 
>> 2) do
>> 
>>       on_each_cpu(do_sync_core, NULL, 1);
>> 
>>    to make sure all CPU's see this generated code
>> 
>> 3) do
>> 
>>       text_poke_bp(addr, newcode, newlen, generatedcode);
>> 
>> Ta-daa! Done.
>> 
>> In fact, it turns out that even the extra "do_sync_core()" in #2 is
>> unnecessary, because taking the BP will be serializing on the CPU that
>> takes it, so we can skip it.
>> 
>> End result: the text_poke_bp() function will do the two do_sync_core
>> IPI's that guarantee that by the time it returns, no other CPU is
>> using the generated code any more, so it can be re-used for the next
>> static call fixup.
>> 
>> Notice? No odd emulation, no need to adjust the stack in the BP
>> handler, just the regular "return to a different IP".
>> 
>> Now, there is a nasty special case with that stub, though.
>> 
>> So nasty thing with the whole "generate a stub for each call" case:
>> because it's dynamic and because of the re-use of the stub, you could
>> be in the situation where:
>> 
>> CPU1                  CPU2
>> ----                  ----
>> 
>> generate a stub
>> on_each_cpu(do_sync_core..)
>> text_poke_bp()
>> ...
>> 
>> rewrite to BP
>>                       trigger the BP
>>                       return to the stub
>>                       fun the first instruction of the stub
>>                       *INTERRUPT causes rescheduling*
>> 
>> on_each_cpu(do_sync_core..)
>> rewrite to good instruction
>> on_each_cpu(do_sync_core..)
>> 
>> free or re-generate the stub
>> 
>>                       !! The stub is still in use !!
>> 
>> So that simple "just generate the stub dynamically" isn't so simple
>> after all.
>> 
>> But it turns out that that is really simple to handle too. How do we do
>> that?
>> 
>> We do that by giving the BP handler *two* code sequences, and we make
>> the BP handler pick one depending on whether it is returning to a
>> "interrupts disabled" or "interrupts enabled" case.
>> 
>> So the BP handler does this:
>> 
>> - if we're returning with interrupts disabled, pick the simple stub
>> 
>> - if we're returning with interrupts enabled, clkear IF in the return
>> %rflags, and pick a *slightly* more complex stub:
>> 
>>       push $returnaddressforTHIScall
>>       sti
>>       jmp targetforTHIScall
>> 
>> and now the STI shadow will mean that this sequence is uninterruptible.
>> 
>> So we'd not do complex emulation of the call instruction at BP time,
>> but we'd do that *trivial* change at BP time.
>> 
>> This seems simple, doesn't need any temporary registers at all, and
>> doesn't need any extra stack magic. It literally needs just a trivial
>> sequence in poke_int3_handler().
>> 
>> The we'd change the end of poke_int3_handler() to do something like
>> this instead:
>> 
>>       void *newip = bp_int3_handler;
>>       ..
>>       if (new == magic_static_call_bp_int3_handler) {
>>               if (regs->flags &X86_FLAGS_IF) {
>>                       newip = magic_static_call_bp_int3_handler_sti;
>>                       regs->flags &= ~X86_FLAGS_IF;
>>       }
>>       regs->ip = (unsigned long) newip;
>>       return 1;
>> 
>> AAND now we're *really* done.
>> 
>> Does anybody see any issues in this?
>> 
>>             Linus
> 
> I still don't see why can't simply spin in the #BP handler until the patch is complete.
> 
> We can't have the #BP handler do any additional patching, as previously discussed, but spinning should be perfectly safe. The simplest way to spin it to just IRET; that both serializes and will re-take the exception if the patch is still in progress.
> 
> It requires exactly *no* awareness in the #BP handler, allows for the call to be replaced with inline code or a simple NOP if desired (or vice versa, as long as it is a single instruction.)
> 
> If I'm missing something, then please educate me or point me to previous discussion; I would greatly appreciate it.

One thing that comes to mind is that text_poke_bp() runs after patching the
int3 and before patching in the instruction:

	on_each_cpu(do_sync_core, NULL, 1);

If IRQs are disabled when the BP is hit, spinning can cause the system to
hang.


  reply	other threads:[~2019-01-11 19:33 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-09 22:59 [PATCH v3 0/6] Static calls Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 1/6] compiler.h: Make __ADDRESSABLE() symbol truly unique Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 2/6] static_call: Add basic static call infrastructure Josh Poimboeuf
2019-01-10 14:03   ` Edward Cree
2019-01-10 18:37     ` Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 3/6] x86/static_call: Add out-of-line static call implementation Josh Poimboeuf
2019-01-10  0:16   ` Nadav Amit
2019-01-10 16:28     ` Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 4/6] static_call: Add inline static call infrastructure Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible Josh Poimboeuf
2019-01-10  9:32   ` Nadav Amit
2019-01-10 17:20     ` Josh Poimboeuf
2019-01-10 17:29       ` Nadav Amit
2019-01-10 17:32       ` Steven Rostedt
2019-01-10 17:42         ` Sean Christopherson
2019-01-10 17:57           ` Steven Rostedt
2019-01-10 18:04             ` Sean Christopherson
2019-01-10 18:21               ` Josh Poimboeuf
2019-01-10 18:24               ` Steven Rostedt
2019-01-11 12:10               ` Alexandre Chartre
2019-01-11 15:28                 ` Josh Poimboeuf
2019-01-11 16:46                   ` Alexandre Chartre
2019-01-11 16:57                     ` Josh Poimboeuf
2019-01-11 17:41                       ` Jason Baron
2019-01-11 17:54                         ` Nadav Amit
2019-01-15 11:10                       ` Alexandre Chartre
2019-01-15 16:19                         ` Steven Rostedt
2019-01-15 16:45                           ` Alexandre Chartre
2019-01-11  0:59           ` hpa
2019-01-11  1:34             ` Sean Christopherson
2019-01-11  8:13               ` hpa
2019-01-09 22:59 ` [PATCH v3 6/6] x86/static_call: Add inline static call implementation for x86-64 Josh Poimboeuf
2019-01-10  1:21 ` [PATCH v3 0/6] Static calls Nadav Amit
2019-01-10 16:44   ` Josh Poimboeuf
2019-01-10 17:32     ` Nadav Amit
2019-01-10 18:18       ` Josh Poimboeuf
2019-01-10 19:45         ` Nadav Amit
2019-01-10 20:32           ` Josh Poimboeuf
2019-01-10 20:48             ` Nadav Amit
2019-01-10 20:57               ` Josh Poimboeuf
2019-01-10 21:47                 ` Nadav Amit
2019-01-10 17:31 ` Linus Torvalds
2019-01-10 20:51   ` H. Peter Anvin
2019-01-10 20:30 ` Peter Zijlstra
2019-01-10 20:52   ` Josh Poimboeuf
2019-01-10 23:02     ` Linus Torvalds
2019-01-11  0:56       ` Andy Lutomirski
2019-01-11  1:47         ` Nadav Amit
2019-01-11 15:15           ` Josh Poimboeuf
2019-01-11 15:48             ` Nadav Amit
2019-01-11 16:07               ` Josh Poimboeuf
2019-01-11 17:23                 ` Nadav Amit
2019-01-11 19:03             ` Linus Torvalds
2019-01-11 19:17               ` Nadav Amit
2019-01-11 19:23               ` hpa
2019-01-11 19:33                 ` Nadav Amit [this message]
2019-01-11 19:34                 ` Linus Torvalds
2019-01-13  0:34                   ` hpa
2019-01-13  0:36                   ` hpa
2019-01-11 19:39                 ` Jiri Kosina
2019-01-14  2:31                   ` H. Peter Anvin
2019-01-14  2:40                     ` H. Peter Anvin
2019-01-14 20:11                       ` Andy Lutomirski
2019-01-14 22:00                       ` H. Peter Anvin
2019-01-14 22:54                         ` H. Peter Anvin
2019-01-15  3:05                           ` Andy Lutomirski
2019-01-15  5:01                             ` H. Peter Anvin
2019-01-15  5:37                               ` H. Peter Anvin
2019-01-14 23:27                         ` Andy Lutomirski
2019-01-14 23:51                           ` Nadav Amit
2019-01-15  2:28                           ` hpa
2019-01-11 20:04               ` Josh Poimboeuf
2019-01-11 20:12                 ` Linus Torvalds
2019-01-11 20:31                   ` Josh Poimboeuf
2019-01-11 20:46                     ` Linus Torvalds
2019-01-11 21:05                       ` Andy Lutomirski
2019-01-11 21:10                         ` Linus Torvalds
2019-01-11 21:32                           ` Josh Poimboeuf
2019-01-14 12:28                         ` Peter Zijlstra
2019-01-11 21:22                       ` Josh Poimboeuf
2019-01-11 21:23                         ` Josh Poimboeuf
2019-01-11 21:25                         ` Josh Poimboeuf
2019-01-11 21:36                         ` Nadav Amit
2019-01-11 21:41                           ` Josh Poimboeuf
2019-01-11 21:55                             ` Steven Rostedt
2019-01-11 21:59                               ` Nadav Amit
2019-01-11 21:56                             ` Nadav Amit
2019-01-12 23:54                         ` Andy Lutomirski
2020-02-17 21:10     ` Jann Horn
2020-02-17 21:57       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F39DA47E-BFCA-49F2-B2F4-C19F860CC79F@vmware.com \
    --to=namit@vmware.com \
    --cc=David.Laight@aculab.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=ecree@solarflare.com \
    --cc=hpa@zytor.com \
    --cc=jbaron@akamai.com \
    --cc=jeyu@kernel.org \
    --cc=jkosina@suse.cz \
    --cc=jpoimboe@redhat.com \
    --cc=julia@ni.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).