linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Nadav Amit <namit@vmware.com>,
	Peter Zijlstra <peterz@infradead.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Jason Baron <jbaron@akamai.com>,
	David Laight <David.Laight@aculab.com>,
	Borislav Petkov <bp@alien8.de>, Julia Cartwright <julia@ni.com>,
	Jessica Yu <jeyu@kernel.org>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Edward Cree <ecree@solarflare.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: [PATCH v3 0/6] Static calls
Date: Mon, 14 Jan 2019 21:01:38 -0800	[thread overview]
Message-ID: <dd57daf3-4762-07c3-e0be-2cbb605802e4@zytor.com> (raw)
In-Reply-To: <CALCETrVGoVbToA7Dw_mo_gojvFnS=vB=UJxQgDi_R00skSX8pg@mail.gmail.com>

On 1/14/19 7:05 PM, Andy Lutomirski wrote:
> On Mon, Jan 14, 2019 at 2:55 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> I think this sequence ought to work (keep in mind we are already under a
>> mutex, so the global data is safe even if we are preempted):
> 
> I'm trying to wrap my head around this.  The states are:
> 
> 0: normal operation
> 1: writing 0xcc, can be canceled
> 2: writing final instruction.  The 0xcc was definitely synced to all CPUs.
> 3: patch is definitely installed but maybe not sync_cored.
> 

4: breakpoint has been canceled; need to redo patching.

>>
>>         set up page table entries
>>         invlpg
>>         set up bp patching global data
>>
>>         cpu = get_cpu()
>>
> So we're assuming that the state is
> 
>>         bp_old_value = atomic_read(bp_write_addr)
>>
>>         do {
> 
> So we're assuming that the state is 0 here.  A WARN_ON_ONCE to check
> that would be nice.

The state here can be 0 or 4.

>>                 atomic_write(&bp_poke_state, 1)
>>
>>                 atomic_write(bp_write_addr, 0xcc)
>>
>>                 mask <- online_cpu_mask - self
>>                 send IPIs
>>                 wait for mask = 0
>>
>>         } while (cmpxchg(&bp_poke_state, 1, 2) != 1);
>>
>>         patch sites, remove breakpoints after patching each one
> 
> Not sure what you mean by patch *sites*.  As written, this only
> supports one patch site at a time, since there's only one
> bp_write_addr, and fixing that may be complicated.  Not fixing it
> might also be a scalability problem.

Fixing it isn't all that complicated; we just need to have a list of
patch locations (which we need anyway!) and walk (or search) it instead
of checking just one; I omitted that detail for simplicity.

>>         atomic_write(&bp_poke_state, 3);
>>
>>         mask <- online_cpu_mask - self
>>         send IPIs
>>         wait for mask = 0
>>
>>         atomic_write(&bp_poke_state, 0);
>>
>>         tear down patching global data
>>         tear down page table entries
>>
>>
>>
>> The #BP handler would then look like:
>>
>>         state = cmpxchg(&bp_poke_state, 1, 4);
>>         switch (state) {
>>                 case 1:
>>                 case 4:
> 
> What is state 4?
> 
>>                         invlpg
>>                         cmpxchg(bp_write_addr, 0xcc, bp_old_value)

I'm 85% sure that the cmpxchg here is actually unnecessary, an
atomic_write() is sufficient.

>>                         break;
>>                 case 2:
>>                         invlpg
>>                         complete patch sequence
>>                         remove breakpoint
>>                         break;
> 
> ISTM you might as well change state to 3 here, but it's arguably unnecessary.

If and only if you have only one patch location you could, but again,
unnecessary.

>>                 case 3:
>>                         /* If we are here, the #BP will go away on its own */
>>                         break;
>>                 case 0:
>>                         /* No patching in progress!!! */
>>                         return 0;
>>         }
>>
>>         clear bit in mask
>>         return 1;
>>
>> The IPI handler:
>>
>>         clear bit in mask
>>         sync_core       /* Needed if multiple IPI events are chained */
> 
> I really like that this doesn't require fixups -- text_poke_bp() just
> works.  But I'm nervous about livelocks or maybe just extreme slowness
> under nasty loads.  Suppose some perf NMI code does a static call or
> uses a static call.  Now there's a situation where, under high
> frequency perf sampling, the patch process might almost always hit the
> breakpoint while in state 1.  It'll get reversed and done again, and
> we get stuck.  It would be neat if we could get the same "no
> deadlocks" property while significantly reducing the chance of a
> rollback.

This could be as simple as spinning for a limited time waiting for
states 0 or 3 if we are not the patching CPU. It is also not necessary
to wait for the mask to become zero for the first sync if we find
ourselves suddenly in state 4.

This wouldn't reduce the livelock probability to zero, but it ought to
reduce it enough that if we really are under such heavy event load we
may end up getting stuck in any number of ways...

> This is why I proposed something where we try to guarantee forward
> progress by making sure that any NMI code that might spin and wait for
> other CPUs is guaranteed to eventually sync_core(), clear its bit, and
> possibly finish a patch.  But this is a bit gross.

Yes, this gets really grotty and who knows how many code paths it would
touch.

	-hpa



  reply	other threads:[~2019-01-15  5:06 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-09 22:59 [PATCH v3 0/6] Static calls Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 1/6] compiler.h: Make __ADDRESSABLE() symbol truly unique Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 2/6] static_call: Add basic static call infrastructure Josh Poimboeuf
2019-01-10 14:03   ` Edward Cree
2019-01-10 18:37     ` Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 3/6] x86/static_call: Add out-of-line static call implementation Josh Poimboeuf
2019-01-10  0:16   ` Nadav Amit
2019-01-10 16:28     ` Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 4/6] static_call: Add inline static call infrastructure Josh Poimboeuf
2019-01-09 22:59 ` [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible Josh Poimboeuf
2019-01-10  9:32   ` Nadav Amit
2019-01-10 17:20     ` Josh Poimboeuf
2019-01-10 17:29       ` Nadav Amit
2019-01-10 17:32       ` Steven Rostedt
2019-01-10 17:42         ` Sean Christopherson
2019-01-10 17:57           ` Steven Rostedt
2019-01-10 18:04             ` Sean Christopherson
2019-01-10 18:21               ` Josh Poimboeuf
2019-01-10 18:24               ` Steven Rostedt
2019-01-11 12:10               ` Alexandre Chartre
2019-01-11 15:28                 ` Josh Poimboeuf
2019-01-11 16:46                   ` Alexandre Chartre
2019-01-11 16:57                     ` Josh Poimboeuf
2019-01-11 17:41                       ` Jason Baron
2019-01-11 17:54                         ` Nadav Amit
2019-01-15 11:10                       ` Alexandre Chartre
2019-01-15 16:19                         ` Steven Rostedt
2019-01-15 16:45                           ` Alexandre Chartre
2019-01-11  0:59           ` hpa
2019-01-11  1:34             ` Sean Christopherson
2019-01-11  8:13               ` hpa
2019-01-09 22:59 ` [PATCH v3 6/6] x86/static_call: Add inline static call implementation for x86-64 Josh Poimboeuf
2019-01-10  1:21 ` [PATCH v3 0/6] Static calls Nadav Amit
2019-01-10 16:44   ` Josh Poimboeuf
2019-01-10 17:32     ` Nadav Amit
2019-01-10 18:18       ` Josh Poimboeuf
2019-01-10 19:45         ` Nadav Amit
2019-01-10 20:32           ` Josh Poimboeuf
2019-01-10 20:48             ` Nadav Amit
2019-01-10 20:57               ` Josh Poimboeuf
2019-01-10 21:47                 ` Nadav Amit
2019-01-10 17:31 ` Linus Torvalds
2019-01-10 20:51   ` H. Peter Anvin
2019-01-10 20:30 ` Peter Zijlstra
2019-01-10 20:52   ` Josh Poimboeuf
2019-01-10 23:02     ` Linus Torvalds
2019-01-11  0:56       ` Andy Lutomirski
2019-01-11  1:47         ` Nadav Amit
2019-01-11 15:15           ` Josh Poimboeuf
2019-01-11 15:48             ` Nadav Amit
2019-01-11 16:07               ` Josh Poimboeuf
2019-01-11 17:23                 ` Nadav Amit
2019-01-11 19:03             ` Linus Torvalds
2019-01-11 19:17               ` Nadav Amit
2019-01-11 19:23               ` hpa
2019-01-11 19:33                 ` Nadav Amit
2019-01-11 19:34                 ` Linus Torvalds
2019-01-13  0:34                   ` hpa
2019-01-13  0:36                   ` hpa
2019-01-11 19:39                 ` Jiri Kosina
2019-01-14  2:31                   ` H. Peter Anvin
2019-01-14  2:40                     ` H. Peter Anvin
2019-01-14 20:11                       ` Andy Lutomirski
2019-01-14 22:00                       ` H. Peter Anvin
2019-01-14 22:54                         ` H. Peter Anvin
2019-01-15  3:05                           ` Andy Lutomirski
2019-01-15  5:01                             ` H. Peter Anvin [this message]
2019-01-15  5:37                               ` H. Peter Anvin
2019-01-14 23:27                         ` Andy Lutomirski
2019-01-14 23:51                           ` Nadav Amit
2019-01-15  2:28                           ` hpa
2019-01-11 20:04               ` Josh Poimboeuf
2019-01-11 20:12                 ` Linus Torvalds
2019-01-11 20:31                   ` Josh Poimboeuf
2019-01-11 20:46                     ` Linus Torvalds
2019-01-11 21:05                       ` Andy Lutomirski
2019-01-11 21:10                         ` Linus Torvalds
2019-01-11 21:32                           ` Josh Poimboeuf
2019-01-14 12:28                         ` Peter Zijlstra
2019-01-11 21:22                       ` Josh Poimboeuf
2019-01-11 21:23                         ` Josh Poimboeuf
2019-01-11 21:25                         ` Josh Poimboeuf
2019-01-11 21:36                         ` Nadav Amit
2019-01-11 21:41                           ` Josh Poimboeuf
2019-01-11 21:55                             ` Steven Rostedt
2019-01-11 21:59                               ` Nadav Amit
2019-01-11 21:56                             ` Nadav Amit
2019-01-12 23:54                         ` Andy Lutomirski
2020-02-17 21:10     ` Jann Horn
2020-02-17 21:57       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd57daf3-4762-07c3-e0be-2cbb605802e4@zytor.com \
    --to=hpa@zytor.com \
    --cc=David.Laight@aculab.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=ecree@solarflare.com \
    --cc=jbaron@akamai.com \
    --cc=jeyu@kernel.org \
    --cc=jikos@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=julia@ni.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).