From: Ankur Arora <ankur.a.arora@oracle.com> To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com, namit@vmware.com, mhiramat@kernel.org, jgross@suse.com, bp@alien8.de, vkuznets@redhat.com, pbonzini@redhat.com, boris.ostrovsky@oracle.com, mihai.carabas@oracle.com, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, virtualization@lists.linux-foundation.org, Ankur Arora <ankur.a.arora@oracle.com> Subject: [RFC PATCH 17/26] x86/alternatives: Add patching logic in text_poke_site() Date: Tue, 7 Apr 2020 22:03:14 -0700 [thread overview] Message-ID: <20200408050323.4237-18-ankur.a.arora@oracle.com> (raw) In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> Add actual poking and pipeline sync logic in poke_sync(). This is called from text_poke_site()). The patching logic is similar to that in text_poke_bp_batch() where we patch the first byte with an INT3, which serves as a barrier, then patch the remaining bytes and then come back and fixup the first byte. The first and the last steps are single byte writes and are thus atomic, and the second step is protected because the INT3 serves as a barrier. Between each of these steps is a global pipeline sync which ensures that remote pipelines flush out any stale opcodes that they might have cached. This is driven from poke_sync() where the primary introduces a sync_core() on secondary CPUs for every PATCH_SYNC_* state change. The corresponding loop on the secondary executes in text_poke_sync_site(). Note that breakpoints are not handled yet. CPU0 CPUx ---- ---- patch_worker() patch_worker() /* Traversal, insn-gen */ text_poke_sync_finish() tps.patch_worker() /* wait until: /* = paravirt_worker() */ * tps->state == PATCH_DONE */ /* for each patch-site */ generate_paravirt() runtime_patch() text_poke_site() text_poke_sync_site() poke_sync() /* for each: __text_do_poke() * PATCH_SYNC_[012] */ sync_one() sync_one() ack() ack() wait_for_acks() ... ... smp_store_release(&tps->state, PATCH_DONE) Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> --- arch/x86/kernel/alternative.c | 103 +++++++++++++++++++++++++++++++--- 1 file changed, 95 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 1c5acdc4f349..7fdaae9edbf0 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1441,27 +1441,57 @@ struct text_poke_state { static struct text_poke_state text_poke_state; +static void wait_for_acks(struct text_poke_state *tps) +{ + int cpu = smp_processor_id(); + + cpumask_set_cpu(cpu, &tps->sync_ack_map); + + /* Wait until all CPUs are known to have observed the state change. */ + while (cpumask_weight(&tps->sync_ack_map) < tps->num_acks) + cpu_relax(); +} + /** - * poke_sync() - transitions to the specified state. + * poke_sync() - carries out one poke-step for a single site and + * transitions to the specified state. + * Called with the target populated in poking_mm and poking_addr. * * @tps - struct text_poke_state * * @state - one of PATCH_SYNC_* states * @offset - offset to be patched * @insns - insns to write * @len - length of insn sequence + * + * Returns after all CPUs have observed the state change and called + * sync_core(). */ static void poke_sync(struct text_poke_state *tps, int state, int offset, const char *insns, int len) { + if (len) + __text_do_poke(offset, insns, len); /* - * STUB: no patching or synchronization, just go through the - * motions. + * Stores to tps.sync_ack_map are ordered with + * smp_load_acquire(tps->state) in text_poke_sync_site() + * so we can safely clear the cpumask. */ smp_store_release(&tps->state, state); + + cpumask_clear(&tps->sync_ack_map); + + /* + * Introduce a synchronizing instruction in local and remote insn + * streams. This flushes any stale cached uops from CPU pipelines. + */ + sync_one(); + + wait_for_acks(tps); } /** * text_poke_site() - called on the primary to patch a single call site. + * The interlocking sync work on the secondary is done in text_poke_sync_site(). * * Called in thread context with tps->state == PATCH_SYNC_DONE where it * takes tps->state through different PATCH_SYNC_* states, returning @@ -1514,6 +1544,43 @@ static void __maybe_unused text_poke_site(struct text_poke_state *tps, &prev_mm, ptep); } +/** + * text_poke_sync_site() -- called to synchronize the CPU pipeline + * on secondary CPUs for each patch site. + * + * Called in thread context with tps->state == PATCH_SYNC_0. + * + * Returns after having observed tps->state == PATCH_SYNC_DONE. + */ +static void text_poke_sync_site(struct text_poke_state *tps) +{ + int cpu = smp_processor_id(); + int prevstate = -1; + int acked; + + /* + * In thread context we arrive here expecting tps->state to move + * in-order from PATCH_SYNC_{0 -> 1 -> 2} -> PATCH_SYNC_DONE. + */ + do { + /* + * Wait until there's some work for us to do. + */ + smp_cond_load_acquire(&tps->state, + prevstate != VAL); + + prevstate = READ_ONCE(tps->state); + + if (prevstate < PATCH_SYNC_DONE) { + acked = cpumask_test_cpu(cpu, &tps->sync_ack_map); + + BUG_ON(acked); + sync_one(); + cpumask_set_cpu(cpu, &tps->sync_ack_map); + } + } while (prevstate < PATCH_SYNC_DONE); +} + /** * text_poke_sync_finish() -- called to synchronize the CPU pipeline * on secondary CPUs for all patch sites. @@ -1525,6 +1592,7 @@ static void text_poke_sync_finish(struct text_poke_state *tps) { while (true) { enum patch_state state; + int cpu = smp_processor_id(); state = READ_ONCE(tps->state); @@ -1535,11 +1603,24 @@ static void text_poke_sync_finish(struct text_poke_state *tps) if (state == PATCH_DONE) break; - /* - * Relax here while the primary makes up its mind on - * whether it is done or not. - */ - cpu_relax(); + if (state == PATCH_SYNC_DONE) { + /* + * Ack that we've seen the end of this iteration + * and then wait until everybody's ready to move + * to the next iteration or exit. + */ + cpumask_set_cpu(cpu, &tps->sync_ack_map); + smp_cond_load_acquire(&tps->state, + (state != VAL)); + } else if (state == PATCH_SYNC_0) { + /* + * PATCH_SYNC_1, PATCH_SYNC_2 are handled + * inside text_poke_sync_site(). + */ + text_poke_sync_site(tps); + } else { + BUG(); + } } } @@ -1549,6 +1630,12 @@ static int patch_worker(void *t) struct text_poke_state *tps = t; if (cpu == tps->primary_cpu) { + /* + * The init state is PATCH_SYNC_DONE. Wait until the + * secondaries have assembled before we start patching. + */ + wait_for_acks(tps); + /* * Generates insns and calls text_poke_site() to do the poking * and sync. -- 2.20.1
WARNING: multiple messages have this Message-ID (diff)
From: Ankur Arora <ankur.a.arora@oracle.com> To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: jgross@suse.com, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, peterz@infradead.org, hpa@zytor.com, Ankur Arora <ankur.a.arora@oracle.com>, virtualization@lists.linux-foundation.org, pbonzini@redhat.com, namit@vmware.com, mhiramat@kernel.org, jpoimboe@redhat.com, mihai.carabas@oracle.com, bp@alien8.de, vkuznets@redhat.com, boris.ostrovsky@oracle.com Subject: [RFC PATCH 17/26] x86/alternatives: Add patching logic in text_poke_site() Date: Tue, 7 Apr 2020 22:03:14 -0700 [thread overview] Message-ID: <20200408050323.4237-18-ankur.a.arora@oracle.com> (raw) In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com> Add actual poking and pipeline sync logic in poke_sync(). This is called from text_poke_site()). The patching logic is similar to that in text_poke_bp_batch() where we patch the first byte with an INT3, which serves as a barrier, then patch the remaining bytes and then come back and fixup the first byte. The first and the last steps are single byte writes and are thus atomic, and the second step is protected because the INT3 serves as a barrier. Between each of these steps is a global pipeline sync which ensures that remote pipelines flush out any stale opcodes that they might have cached. This is driven from poke_sync() where the primary introduces a sync_core() on secondary CPUs for every PATCH_SYNC_* state change. The corresponding loop on the secondary executes in text_poke_sync_site(). Note that breakpoints are not handled yet. CPU0 CPUx ---- ---- patch_worker() patch_worker() /* Traversal, insn-gen */ text_poke_sync_finish() tps.patch_worker() /* wait until: /* = paravirt_worker() */ * tps->state == PATCH_DONE */ /* for each patch-site */ generate_paravirt() runtime_patch() text_poke_site() text_poke_sync_site() poke_sync() /* for each: __text_do_poke() * PATCH_SYNC_[012] */ sync_one() sync_one() ack() ack() wait_for_acks() ... ... smp_store_release(&tps->state, PATCH_DONE) Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> --- arch/x86/kernel/alternative.c | 103 +++++++++++++++++++++++++++++++--- 1 file changed, 95 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 1c5acdc4f349..7fdaae9edbf0 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1441,27 +1441,57 @@ struct text_poke_state { static struct text_poke_state text_poke_state; +static void wait_for_acks(struct text_poke_state *tps) +{ + int cpu = smp_processor_id(); + + cpumask_set_cpu(cpu, &tps->sync_ack_map); + + /* Wait until all CPUs are known to have observed the state change. */ + while (cpumask_weight(&tps->sync_ack_map) < tps->num_acks) + cpu_relax(); +} + /** - * poke_sync() - transitions to the specified state. + * poke_sync() - carries out one poke-step for a single site and + * transitions to the specified state. + * Called with the target populated in poking_mm and poking_addr. * * @tps - struct text_poke_state * * @state - one of PATCH_SYNC_* states * @offset - offset to be patched * @insns - insns to write * @len - length of insn sequence + * + * Returns after all CPUs have observed the state change and called + * sync_core(). */ static void poke_sync(struct text_poke_state *tps, int state, int offset, const char *insns, int len) { + if (len) + __text_do_poke(offset, insns, len); /* - * STUB: no patching or synchronization, just go through the - * motions. + * Stores to tps.sync_ack_map are ordered with + * smp_load_acquire(tps->state) in text_poke_sync_site() + * so we can safely clear the cpumask. */ smp_store_release(&tps->state, state); + + cpumask_clear(&tps->sync_ack_map); + + /* + * Introduce a synchronizing instruction in local and remote insn + * streams. This flushes any stale cached uops from CPU pipelines. + */ + sync_one(); + + wait_for_acks(tps); } /** * text_poke_site() - called on the primary to patch a single call site. + * The interlocking sync work on the secondary is done in text_poke_sync_site(). * * Called in thread context with tps->state == PATCH_SYNC_DONE where it * takes tps->state through different PATCH_SYNC_* states, returning @@ -1514,6 +1544,43 @@ static void __maybe_unused text_poke_site(struct text_poke_state *tps, &prev_mm, ptep); } +/** + * text_poke_sync_site() -- called to synchronize the CPU pipeline + * on secondary CPUs for each patch site. + * + * Called in thread context with tps->state == PATCH_SYNC_0. + * + * Returns after having observed tps->state == PATCH_SYNC_DONE. + */ +static void text_poke_sync_site(struct text_poke_state *tps) +{ + int cpu = smp_processor_id(); + int prevstate = -1; + int acked; + + /* + * In thread context we arrive here expecting tps->state to move + * in-order from PATCH_SYNC_{0 -> 1 -> 2} -> PATCH_SYNC_DONE. + */ + do { + /* + * Wait until there's some work for us to do. + */ + smp_cond_load_acquire(&tps->state, + prevstate != VAL); + + prevstate = READ_ONCE(tps->state); + + if (prevstate < PATCH_SYNC_DONE) { + acked = cpumask_test_cpu(cpu, &tps->sync_ack_map); + + BUG_ON(acked); + sync_one(); + cpumask_set_cpu(cpu, &tps->sync_ack_map); + } + } while (prevstate < PATCH_SYNC_DONE); +} + /** * text_poke_sync_finish() -- called to synchronize the CPU pipeline * on secondary CPUs for all patch sites. @@ -1525,6 +1592,7 @@ static void text_poke_sync_finish(struct text_poke_state *tps) { while (true) { enum patch_state state; + int cpu = smp_processor_id(); state = READ_ONCE(tps->state); @@ -1535,11 +1603,24 @@ static void text_poke_sync_finish(struct text_poke_state *tps) if (state == PATCH_DONE) break; - /* - * Relax here while the primary makes up its mind on - * whether it is done or not. - */ - cpu_relax(); + if (state == PATCH_SYNC_DONE) { + /* + * Ack that we've seen the end of this iteration + * and then wait until everybody's ready to move + * to the next iteration or exit. + */ + cpumask_set_cpu(cpu, &tps->sync_ack_map); + smp_cond_load_acquire(&tps->state, + (state != VAL)); + } else if (state == PATCH_SYNC_0) { + /* + * PATCH_SYNC_1, PATCH_SYNC_2 are handled + * inside text_poke_sync_site(). + */ + text_poke_sync_site(tps); + } else { + BUG(); + } } } @@ -1549,6 +1630,12 @@ static int patch_worker(void *t) struct text_poke_state *tps = t; if (cpu == tps->primary_cpu) { + /* + * The init state is PATCH_SYNC_DONE. Wait until the + * secondaries have assembled before we start patching. + */ + wait_for_acks(tps); + /* * Generates insns and calls text_poke_site() to do the poking * and sync. -- 2.20.1
next prev parent reply other threads:[~2020-04-08 5:06 UTC|newest] Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-08 5:02 [RFC PATCH 00/26] Runtime paravirt patching Ankur Arora 2020-04-08 5:02 ` Ankur Arora 2020-04-08 5:02 ` [RFC PATCH 01/26] x86/paravirt: Specify subsection in PVOP macros Ankur Arora 2020-04-08 5:02 ` Ankur Arora 2020-04-08 5:02 ` [RFC PATCH 02/26] x86/paravirt: Allow paravirt patching post-init Ankur Arora 2020-04-08 5:02 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 03/26] x86/paravirt: PVRTOP macros for PARAVIRT_RUNTIME Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 04/26] x86/alternatives: Refactor alternatives_smp_module* Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 05/26] x86/alternatives: Rename alternatives_smp*, smp_alt_module Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 06/26] x86/alternatives: Remove stale symbols Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 07/26] x86/paravirt: Persist .parainstructions.runtime Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 08/26] x86/paravirt: Stash native pv-ops Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 09/26] x86/paravirt: Add runtime_patch() Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 11:05 ` Peter Zijlstra 2020-04-08 11:05 ` Peter Zijlstra 2020-04-08 11:05 ` Peter Zijlstra 2020-04-08 5:03 ` [RFC PATCH 10/26] x86/paravirt: Add primitives to stage pv-ops Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 11/26] x86/alternatives: Remove return value of text_poke*() Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 12/26] x86/alternatives: Use __get_unlocked_pte() in text_poke() Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 13/26] x86/alternatives: Split __text_poke() Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 14/26] x86/alternatives: Handle native insns in text_poke_loc*() Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 11:11 ` Peter Zijlstra 2020-04-08 11:11 ` Peter Zijlstra 2020-04-08 11:11 ` Peter Zijlstra 2020-04-08 11:17 ` Peter Zijlstra 2020-04-08 11:17 ` Peter Zijlstra 2020-04-08 11:17 ` Peter Zijlstra 2020-04-08 5:03 ` [RFC PATCH 15/26] x86/alternatives: Non-emulated text poking Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 11:13 ` Peter Zijlstra 2020-04-08 11:13 ` Peter Zijlstra 2020-04-08 11:13 ` Peter Zijlstra 2020-04-08 11:23 ` Peter Zijlstra 2020-04-08 11:23 ` Peter Zijlstra 2020-04-08 11:23 ` Peter Zijlstra 2020-04-08 5:03 ` [RFC PATCH 16/26] x86/alternatives: Add paravirt patching at runtime Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` Ankur Arora [this message] 2020-04-08 5:03 ` [RFC PATCH 17/26] x86/alternatives: Add patching logic in text_poke_site() Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 18/26] x86/alternatives: Handle BP in non-emulated text poking Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 19/26] x86/alternatives: NMI safe runtime patching Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 11:36 ` Peter Zijlstra 2020-04-08 11:36 ` Peter Zijlstra 2020-04-08 11:36 ` Peter Zijlstra 2020-04-08 5:03 ` [RFC PATCH 20/26] x86/paravirt: Enable pv-spinlocks in runtime_patch() Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 21/26] x86/alternatives: Paravirt runtime selftest Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 22/26] kvm/paravirt: Encapsulate KVM pv switching logic Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 23/26] x86/kvm: Add worker to trigger runtime patching Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 24/26] x86/kvm: Support dynamic CPUID hints Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 25/26] x86/kvm: Guest support for dynamic hints Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 5:03 ` [RFC PATCH 26/26] x86/kvm: Add hint change notifier for KVM_HINT_REALTIME Ankur Arora 2020-04-08 5:03 ` Ankur Arora 2020-04-08 12:08 ` [RFC PATCH 00/26] Runtime paravirt patching Peter Zijlstra 2020-04-08 12:08 ` Peter Zijlstra 2020-04-08 12:08 ` Peter Zijlstra 2020-04-08 13:33 ` Jürgen Groß 2020-04-08 13:33 ` Jürgen Groß 2020-04-08 14:49 ` Peter Zijlstra 2020-04-08 14:49 ` Peter Zijlstra 2020-04-08 14:49 ` Peter Zijlstra 2020-04-10 9:18 ` Ankur Arora 2020-04-10 9:18 ` Ankur Arora 2020-04-08 12:28 ` Jürgen Groß 2020-04-08 12:28 ` Jürgen Groß 2020-04-10 7:56 ` Ankur Arora 2020-04-10 7:56 ` Ankur Arora 2020-04-10 9:32 ` Ankur Arora 2020-04-10 9:32 ` Ankur Arora 2020-04-08 14:12 ` Thomas Gleixner 2020-04-08 14:12 ` Thomas Gleixner 2020-04-08 14:12 ` Thomas Gleixner 2020-04-10 9:55 ` Ankur Arora 2020-04-10 9:55 ` Ankur Arora
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200408050323.4237-18-ankur.a.arora@oracle.com \ --to=ankur.a.arora@oracle.com \ --cc=boris.ostrovsky@oracle.com \ --cc=bp@alien8.de \ --cc=hpa@zytor.com \ --cc=jgross@suse.com \ --cc=jpoimboe@redhat.com \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mhiramat@kernel.org \ --cc=mihai.carabas@oracle.com \ --cc=namit@vmware.com \ --cc=pbonzini@redhat.com \ --cc=peterz@infradead.org \ --cc=virtualization@lists.linux-foundation.org \ --cc=vkuznets@redhat.com \ --cc=x86@kernel.org \ --cc=xen-devel@lists.xenproject.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.