linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
@ 2021-08-28  0:35 Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref Sean Christopherson
                   ` (13 more replies)
  0 siblings, 14 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

This is a combination of ~2 series to fix bugs in the perf+KVM callbacks,
optimize the callbacks by employing static_call, and do a variety of
cleanup in both perf and KVM.

Patch 1 fixes a mostly-theoretical bug where perf can deref a NULL
pointer if KVM unregisters its callbacks while they're being accessed.
In practice, compilers tend to avoid problematic reloads of the pointer
and the PMI handler doesn't lose the race against module unloading,
i.e doesn't hit a use-after-free.

Patches 2 and 3 fix an Intel PT handling bug where KVM incorrectly
eats PT interrupts when PT is supposed to be owned entirely by the host.

Patches 4-7 clean up perf's callback infrastructure and switch to
static_call for arm64 and x86 (the only survivors).

Patches 8-13 clean up related KVM code and unify the arm64/x86 callbacks.

Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue", commit
680c7e3be6a3 ("KVM: x86: Exit to userspace ...").

v2 (relatively to static_call v10)
  - Split the patch into the semantic change (multiplexed ->state) and
    introduction of static_call.
  - Don't use '0' for "not a guest RIP".
  - Handle unregister path.
  - Drop changes for architectures that can be culled entirely.

v2 (relative to v1)
  - Drop per-cpu approach. [Peter]
  - Fix mostly-theoretical reload and use-after-free with READ_ONCE(),
    WRITE_ONCE(), and synchronize_rcu(). [Peter]
  - Avoid new exports like the plague. [Peter]

v1:
  - https://lkml.kernel.org/r/20210827005718.585190-1-seanjc@google.com

v10 static_call:
  - https://lkml.kernel.org/r/20210806133802.3528-2-lingshan.zhu@intel.com

Like Xu (2):
  perf/core: Rework guest callbacks to prepare for static_call support
  perf/core: Use static_call to optimize perf_guest_info_callbacks

Sean Christopherson (11):
  perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and
    deref
  KVM: x86: Register perf callbacks after calling vendor's
    hardware_setup()
  KVM: x86: Register Processor Trace interrupt hook iff PT enabled in
    guest
  perf: Stop pretending that perf can handle multiple guest callbacks
  perf: Force architectures to opt-in to guest callbacks
  KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu
    variable
  KVM: x86: More precisely identify NMI from guest when handling PMI
  KVM: Move x86's perf guest info callbacks to generic KVM
  KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
  KVM: arm64: Convert to the generic perf callbacks
  KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c /
    pmu.c

 arch/arm/kernel/perf_callchain.c   | 28 ++------------
 arch/arm64/Kconfig                 |  1 +
 arch/arm64/include/asm/kvm_host.h  | 13 ++++++-
 arch/arm64/kernel/perf_callchain.c | 28 +++++++++++---
 arch/arm64/kvm/Makefile            |  2 +-
 arch/arm64/kvm/arm.c               | 11 +++++-
 arch/arm64/kvm/perf.c              | 62 ------------------------------
 arch/arm64/kvm/pmu.c               |  8 ++++
 arch/csky/kernel/perf_callchain.c  | 10 -----
 arch/nds32/kernel/perf_event_cpu.c | 29 ++------------
 arch/riscv/kernel/perf_callchain.c | 10 -----
 arch/x86/Kconfig                   |  1 +
 arch/x86/events/core.c             | 36 ++++++++++++++---
 arch/x86/events/intel/core.c       |  7 ++--
 arch/x86/include/asm/kvm_host.h    |  8 +++-
 arch/x86/kvm/pmu.c                 |  2 +-
 arch/x86/kvm/svm/svm.c             |  2 +-
 arch/x86/kvm/vmx/vmx.c             | 25 +++++++++++-
 arch/x86/kvm/x86.c                 | 58 +++++-----------------------
 arch/x86/kvm/x86.h                 | 17 ++++++--
 arch/x86/xen/pmu.c                 | 32 +++++++--------
 include/kvm/arm_pmu.h              |  1 +
 include/linux/kvm_host.h           | 10 +++++
 include/linux/perf_event.h         | 26 ++++++++-----
 init/Kconfig                       |  3 ++
 kernel/events/core.c               | 24 ++++++------
 virt/kvm/kvm_main.c                | 40 +++++++++++++++++++
 27 files changed, 245 insertions(+), 249 deletions(-)
 delete mode 100644 arch/arm64/kvm/perf.c

-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28 19:44   ` Peter Zijlstra
  2021-08-28  0:35 ` [PATCH v2 02/13] KVM: x86: Register perf callbacks after calling vendor's hardware_setup() Sean Christopherson
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Protect perf_guest_cbs with READ_ONCE/WRITE_ONCE to ensure it's not
reloaded between a !NULL check and a dereference, and wait for all
readers via syncrhonize_rcu() to prevent use-after-free, e.g. if the
callbacks are being unregistered during module unload.  Because the
callbacks are global, it's possible for readers to run in parallel with
an unregister operation.

The bug has escaped notice because all dereferences of perf_guest_cbs
follow the same "perf_guest_cbs && perf_guest_cbs->is_in_guest()" pattern,
and it's extremely unlikely a compiler will reload perf_guest_cbs in this
sequence.  Compilers do reload perf_guest_cbs for future derefs, e.g. for
->is_user_mode(), but the ->is_in_guest() guard all but guarantees the
PMI handler will win the race, e.g. to nullify perf_guest_cbs, KVM has to
completely exit the guest and teardown down all VMs before KVM start its
module unload / unregister sequence.

But with help, unloading kvm_intel can trigger a NULL pointer derference,
e.g. wrapping perf_guest_cbs with READ_ONCE in perf_misc_flags() while
spamming kvm_intel module load/unload leads to:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:perf_misc_flags+0x1c/0x70
  Call Trace:
   perf_prepare_sample+0x53/0x6b0
   perf_event_output_forward+0x67/0x160
   __perf_event_overflow+0x52/0xf0
   handle_pmi_common+0x207/0x300
   intel_pmu_handle_irq+0xcf/0x410
   perf_event_nmi_handler+0x28/0x50
   nmi_handle+0xc7/0x260
   default_do_nmi+0x6b/0x170
   exc_nmi+0x103/0x130
   asm_exc_nmi+0x76/0xbf

Fixes: 39447b386c84 ("perf: Enhance perf to allow for guest statistic collection from host")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm/kernel/perf_callchain.c   | 17 +++++++++++------
 arch/arm64/kernel/perf_callchain.c | 18 ++++++++++++------
 arch/csky/kernel/perf_callchain.c  |  6 ++++--
 arch/nds32/kernel/perf_event_cpu.c | 17 +++++++++++------
 arch/riscv/kernel/perf_callchain.c |  7 +++++--
 arch/x86/events/core.c             | 17 +++++++++++------
 arch/x86/events/intel/core.c       |  9 ++++++---
 include/linux/perf_event.h         |  8 ++++++++
 kernel/events/core.c               |  9 +++++++--
 9 files changed, 75 insertions(+), 33 deletions(-)

diff --git a/arch/arm/kernel/perf_callchain.c b/arch/arm/kernel/perf_callchain.c
index 3b69a76d341e..1626dfc6f6ce 100644
--- a/arch/arm/kernel/perf_callchain.c
+++ b/arch/arm/kernel/perf_callchain.c
@@ -62,9 +62,10 @@ user_backtrace(struct frame_tail __user *tail,
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct frame_tail __user *tail;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -98,9 +99,10 @@ callchain_trace(struct stackframe *fr,
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe fr;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -111,18 +113,21 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-		return perf_guest_cbs->get_guest_ip();
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
+	if (guest_cbs && guest_cbs->is_in_guest())
+		return guest_cbs->get_guest_ip();
 
 	return instruction_pointer(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	int misc = 0;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		if (perf_guest_cbs->is_user_mode())
+	if (guest_cbs && guest_cbs->is_in_guest()) {
+		if (guest_cbs->is_user_mode())
 			misc |= PERF_RECORD_MISC_GUEST_USER;
 		else
 			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 4a72c2727309..86d9f2013172 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -102,7 +102,9 @@ compat_user_backtrace(struct compat_frame_tail __user *tail,
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -147,9 +149,10 @@ static bool callchain_trace(void *data, unsigned long pc)
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe frame;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -160,18 +163,21 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-		return perf_guest_cbs->get_guest_ip();
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
+	if (guest_cbs && guest_cbs->is_in_guest())
+		return guest_cbs->get_guest_ip();
 
 	return instruction_pointer(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	int misc = 0;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		if (perf_guest_cbs->is_user_mode())
+	if (guest_cbs && guest_cbs->is_in_guest()) {
+		if (guest_cbs->is_user_mode())
 			misc |= PERF_RECORD_MISC_GUEST_USER;
 		else
 			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/csky/kernel/perf_callchain.c b/arch/csky/kernel/perf_callchain.c
index ab55e98ee8f6..35318a635a5f 100644
--- a/arch/csky/kernel/perf_callchain.c
+++ b/arch/csky/kernel/perf_callchain.c
@@ -86,10 +86,11 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry,
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	unsigned long fp = 0;
 
 	/* C-SKY does not support virtualization. */
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+	if (guest_cbs && guest_cbs->is_in_guest())
 		return;
 
 	fp = regs->regs[4];
@@ -110,10 +111,11 @@ void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe fr;
 
 	/* C-SKY does not support virtualization. */
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		pr_warn("C-SKY does not support perf in guest mode!");
 		return;
 	}
diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c
index 0ce6f9f307e6..f38791960781 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -1363,6 +1363,7 @@ void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 		    struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	unsigned long fp = 0;
 	unsigned long gp = 0;
 	unsigned long lp = 0;
@@ -1371,7 +1372,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 
 	leaf_fp = 0;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -1479,9 +1480,10 @@ void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 		      struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe fr;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -1493,20 +1495,23 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
 	/* However, NDS32 does not support virtualization */
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-		return perf_guest_cbs->get_guest_ip();
+	if (guest_cbs && guest_cbs->is_in_guest())
+		return guest_cbs->get_guest_ip();
 
 	return instruction_pointer(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	int misc = 0;
 
 	/* However, NDS32 does not support virtualization */
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		if (perf_guest_cbs->is_user_mode())
+	if (guest_cbs && guest_cbs->is_in_guest()) {
+		if (guest_cbs->is_user_mode())
 			misc |= PERF_RECORD_MISC_GUEST_USER;
 		else
 			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/riscv/kernel/perf_callchain.c b/arch/riscv/kernel/perf_callchain.c
index 0bb1854dce83..8ecfc4c128bc 100644
--- a/arch/riscv/kernel/perf_callchain.c
+++ b/arch/riscv/kernel/perf_callchain.c
@@ -56,10 +56,11 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry,
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	unsigned long fp = 0;
 
 	/* RISC-V does not support perf in guest mode. */
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+	if (guest_cbs && guest_cbs->is_in_guest())
 		return;
 
 	fp = regs->s0;
@@ -78,8 +79,10 @@ static bool fill_callchain(void *entry, unsigned long pc)
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
 	/* RISC-V does not support perf in guest mode. */
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		pr_warn("RISC-V does not support perf in guest mode!");
 		return;
 	}
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1eb45139fcc6..ffb3e6c0d367 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2761,10 +2761,11 @@ static bool perf_hw_regs(struct pt_regs *regs)
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct unwind_state state;
 	unsigned long addr;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2864,10 +2865,11 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stack_frame frame;
 	const struct stack_frame __user *fp;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->is_in_guest()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2944,18 +2946,21 @@ static unsigned long code_segment_base(struct pt_regs *regs)
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-		return perf_guest_cbs->get_guest_ip();
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
+	if (guest_cbs && guest_cbs->is_in_guest())
+		return guest_cbs->get_guest_ip();
 
 	return regs->ip + code_segment_base(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	int misc = 0;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		if (perf_guest_cbs->is_user_mode())
+	if (guest_cbs && guest_cbs->is_in_guest()) {
+		if (guest_cbs->is_user_mode())
 			misc |= PERF_RECORD_MISC_GUEST_USER;
 		else
 			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index fca7a6e2242f..9baa46185d94 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2786,6 +2786,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 {
 	struct perf_sample_data data;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_guest_info_callbacks *guest_cbs;
 	int bit;
 	int handled = 0;
 	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
@@ -2852,9 +2853,11 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	 */
 	if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
 		handled++;
-		if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
-			perf_guest_cbs->handle_intel_pt_intr))
-			perf_guest_cbs->handle_intel_pt_intr();
+
+		guest_cbs = perf_get_guest_cbs();
+		if (unlikely(guest_cbs && guest_cbs->is_in_guest() &&
+			     guest_cbs->handle_intel_pt_intr))
+			guest_cbs->handle_intel_pt_intr();
 		else
 			intel_pt_interrupt();
 	}
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2d510ad750ed..6b0405e578c1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1237,6 +1237,14 @@ extern void perf_event_bpf_event(struct bpf_prog *prog,
 				 u16 flags);
 
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
+static inline struct perf_guest_info_callbacks *perf_get_guest_cbs(void)
+{
+	/* Reg/unreg perf_guest_cbs waits for readers via synchronize_rcu(). */
+	lockdep_assert_preemption_disabled();
+
+	/* Prevent reloading between a !NULL check and dereferences. */
+	return READ_ONCE(perf_guest_cbs);
+}
 extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 464917096e73..2126f6327321 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6491,14 +6491,19 @@ struct perf_guest_info_callbacks *perf_guest_cbs;
 
 int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
-	perf_guest_cbs = cbs;
+	if (WARN_ON_ONCE(perf_guest_cbs))
+		return -EBUSY;
+
+	WRITE_ONCE(perf_guest_cbs, cbs);
+	synchronize_rcu();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
 
 int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
-	perf_guest_cbs = NULL;
+	WRITE_ONCE(perf_guest_cbs, NULL);
+	synchronize_rcu();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 02/13] KVM: x86: Register perf callbacks after calling vendor's hardware_setup()
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 03/13] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest Sean Christopherson
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Wait to register perf callbacks until after doing vendor hardaware setup.
VMX's hardware_setup() configures Intel Processor Trace (PT) mode, and a
future fix to register the Intel PT guest interrupt hook if and only if
Intel PT is exposed to the guest will consume the configured PT mode.

Delaying registration to hardware setup is effectively a nop as KVM's perf
hooks all pivot on the per-CPU current_vcpu, which is non-NULL only when
KVM is handling an IRQ/NMI in a VM-Exit path.  I.e. current_vcpu will be
NULL throughout both kvm_arch_init() and kvm_arch_hardware_setup().

Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Artem Kashkanov <artem.kashkanov@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 86539c1686fa..fb6015f97f9e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8426,8 +8426,6 @@ int kvm_arch_init(void *opaque)
 
 	kvm_timer_init();
 
-	perf_register_guest_info_callbacks(&kvm_guest_cbs);
-
 	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
 		host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 		supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
@@ -8461,7 +8459,6 @@ void kvm_arch_exit(void)
 		clear_hv_tscchange_cb();
 #endif
 	kvm_lapic_exit();
-	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
 
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
 		cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
@@ -11064,6 +11061,8 @@ int kvm_arch_hardware_setup(void *opaque)
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 	kvm_ops_static_call_update();
 
+	perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		supported_xss = 0;
 
@@ -11091,6 +11090,8 @@ int kvm_arch_hardware_setup(void *opaque)
 
 void kvm_arch_hardware_unsetup(void)
 {
+	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
 	static_call(kvm_x86_hardware_unsetup)();
 }
 
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 03/13] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 02/13] KVM: x86: Register perf callbacks after calling vendor's hardware_setup() Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 04/13] perf: Stop pretending that perf can handle multiple guest callbacks Sean Christopherson
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Override the Processor Trace (PT) interrupt handler for guest mode if and
only if PT is configured for host+guest mode, i.e. is being used
independently by both host and guest.  If PT is configured for system
mode, the host fully controls PT and must handle all events.

Fixes: 8479e04e7d6b ("KVM: x86: Inject PMI for KVM guest")
Cc: stable@vger.kernel.org
Cc: Like Xu <like.xu.linux@gmail.com>
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: Artem Kashkanov <artem.kashkanov@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx/vmx.c          | 1 +
 arch/x86/kvm/x86.c              | 5 ++++-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09b256db394a..1ea4943a73d7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1494,6 +1494,7 @@ struct kvm_x86_init_ops {
 	int (*disabled_by_bios)(void);
 	int (*check_processor_compatibility)(void);
 	int (*hardware_setup)(void);
+	bool (*intel_pt_intr_in_guest)(void);
 
 	struct kvm_x86_ops *runtime_ops;
 };
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fada1055f325..f19d72136f77 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7896,6 +7896,7 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = {
 	.disabled_by_bios = vmx_disabled_by_bios,
 	.check_processor_compatibility = vmx_check_processor_compat,
 	.hardware_setup = hardware_setup,
+	.intel_pt_intr_in_guest = vmx_pt_mode_is_host_guest,
 
 	.runtime_ops = &vmx_x86_ops,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb6015f97f9e..ffc6c2d73508 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8305,7 +8305,7 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
 	.is_in_guest		= kvm_is_in_guest,
 	.is_user_mode		= kvm_is_user_mode,
 	.get_guest_ip		= kvm_get_guest_ip,
-	.handle_intel_pt_intr	= kvm_handle_intel_pt_intr,
+	.handle_intel_pt_intr	= NULL,
 };
 
 #ifdef CONFIG_X86_64
@@ -11061,6 +11061,8 @@ int kvm_arch_hardware_setup(void *opaque)
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 	kvm_ops_static_call_update();
 
+	if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
+		kvm_guest_cbs.handle_intel_pt_intr = kvm_handle_intel_pt_intr;
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);
 
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
@@ -11091,6 +11093,7 @@ int kvm_arch_hardware_setup(void *opaque)
 void kvm_arch_hardware_unsetup(void)
 {
 	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+	kvm_guest_cbs.handle_intel_pt_intr = NULL;
 
 	static_call(kvm_x86_hardware_unsetup)();
 }
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 04/13] perf: Stop pretending that perf can handle multiple guest callbacks
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 03/13] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 05/13] perf: Force architectures to opt-in to " Sean Christopherson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Drop the 'int' return value from the perf (un)register callbacks helpers
and stop pretending perf can support multiple callbacks.  The 'int'
returns are not future proofing anything as none of the callers take
action on an error.  It's also not obvious that there will ever be
co-tenant hypervisors, and if there are, that allowing multiple callbacks
to be registered is desirable or even correct.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  4 ++--
 arch/arm64/kvm/perf.c             |  8 ++++----
 arch/x86/kvm/x86.c                |  2 +-
 include/linux/perf_event.h        | 11 +++++------
 kernel/events/core.c              | 14 +++-----------
 5 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..ed940aec89e0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -670,8 +670,8 @@ unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
 int kvm_handle_mmio_return(struct kvm_vcpu *vcpu);
 int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa);
 
-int kvm_perf_init(void);
-int kvm_perf_teardown(void);
+void kvm_perf_init(void);
+void kvm_perf_teardown(void);
 
 long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
 gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index 151c31fb9860..039fe59399a2 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -48,15 +48,15 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
 	.get_guest_ip	= kvm_get_guest_ip,
 };
 
-int kvm_perf_init(void)
+void kvm_perf_init(void)
 {
 	if (kvm_pmu_probe_pmuver() != 0xf && !is_protected_kvm_enabled())
 		static_branch_enable(&kvm_arm_pmu_available);
 
-	return perf_register_guest_info_callbacks(&kvm_guest_cbs);
+	perf_register_guest_info_callbacks(&kvm_guest_cbs);
 }
 
-int kvm_perf_teardown(void)
+void kvm_perf_teardown(void)
 {
-	return perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+	perf_unregister_guest_info_callbacks();
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffc6c2d73508..bae951344e28 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11092,7 +11092,7 @@ int kvm_arch_hardware_setup(void *opaque)
 
 void kvm_arch_hardware_unsetup(void)
 {
-	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+	perf_unregister_guest_info_callbacks();
 	kvm_guest_cbs.handle_intel_pt_intr = NULL;
 
 	static_call(kvm_x86_hardware_unsetup)();
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6b0405e578c1..2b77e2154b61 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1245,8 +1245,8 @@ static inline struct perf_guest_info_callbacks *perf_get_guest_cbs(void)
 	/* Prevent reloading between a !NULL check and dereferences. */
 	return READ_ONCE(perf_guest_cbs);
 }
-extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
-extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
+extern void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
+extern void perf_unregister_guest_info_callbacks(void);
 
 extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
@@ -1489,10 +1489,9 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)	{ }
 static inline void
 perf_bp_event(struct perf_event *event, void *data)			{ }
 
-static inline int perf_register_guest_info_callbacks
-(struct perf_guest_info_callbacks *callbacks)				{ return 0; }
-static inline int perf_unregister_guest_info_callbacks
-(struct perf_guest_info_callbacks *callbacks)				{ return 0; }
+static inline void perf_register_guest_info_callbacks
+(struct perf_guest_info_callbacks *callbacks)				{ }
+static inline void perf_unregister_guest_info_callbacks(void)		{ }
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2126f6327321..1be95d9ace46 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6482,29 +6482,21 @@ static void perf_pending_event(struct irq_work *entry)
 		perf_swevent_put_recursion_context(rctx);
 }
 
-/*
- * We assume there is only KVM supporting the callbacks.
- * Later on, we might change it to a list if there is
- * another virtualization implementation supporting the callbacks.
- */
 struct perf_guest_info_callbacks *perf_guest_cbs;
-
-int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
 	if (WARN_ON_ONCE(perf_guest_cbs))
-		return -EBUSY;
+		return;
 
 	WRITE_ONCE(perf_guest_cbs, cbs);
 	synchronize_rcu();
-	return 0;
 }
 EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
 
-int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+void perf_unregister_guest_info_callbacks(void)
 {
 	WRITE_ONCE(perf_guest_cbs, NULL);
 	synchronize_rcu();
-	return 0;
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
 
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 05/13] perf: Force architectures to opt-in to guest callbacks
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 04/13] perf: Stop pretending that perf can handle multiple guest callbacks Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28 19:47   ` Peter Zijlstra
  2021-08-28  0:35 ` [PATCH v2 06/13] perf/core: Rework guest callbacks to prepare for static_call support Sean Christopherson
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Introduce HAVE_GUEST_PERF_EVENTS and require architectures to select it
to allow registering guest callbacks in perf.  Future patches will convert
the callbacks to static_call.  Rather than churn a bunch of arch code (that
was presumably copy+pasted from x86), remove it wholesale as it's useless
and at best wasting cycles.

Wrap even the stubs with an #ifdef to avoid an arch sneaking in a bogus
registration with CONFIG_PERF_EVENTS=n.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm/kernel/perf_callchain.c   | 33 ++++-------------------------
 arch/arm64/Kconfig                 |  1 +
 arch/csky/kernel/perf_callchain.c  | 12 -----------
 arch/nds32/kernel/perf_event_cpu.c | 34 ++++--------------------------
 arch/riscv/kernel/perf_callchain.c | 13 ------------
 arch/x86/Kconfig                   |  1 +
 include/linux/perf_event.h         |  4 ++++
 init/Kconfig                       |  3 +++
 kernel/events/core.c               |  2 ++
 9 files changed, 19 insertions(+), 84 deletions(-)

diff --git a/arch/arm/kernel/perf_callchain.c b/arch/arm/kernel/perf_callchain.c
index 1626dfc6f6ce..bc6b246ab55e 100644
--- a/arch/arm/kernel/perf_callchain.c
+++ b/arch/arm/kernel/perf_callchain.c
@@ -62,14 +62,8 @@ user_backtrace(struct frame_tail __user *tail,
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct frame_tail __user *tail;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		/* We don't support guest os callchain now */
-		return;
-	}
-
 	perf_callchain_store(entry, regs->ARM_pc);
 
 	if (!current->mm)
@@ -99,44 +93,25 @@ callchain_trace(struct stackframe *fr,
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe fr;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		/* We don't support guest os callchain now */
-		return;
-	}
-
 	arm_get_current_stackframe(regs, &fr);
 	walk_stackframe(&fr, callchain_trace, entry);
 }
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-
-	if (guest_cbs && guest_cbs->is_in_guest())
-		return guest_cbs->get_guest_ip();
-
 	return instruction_pointer(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	int misc = 0;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		if (guest_cbs->is_user_mode())
-			misc |= PERF_RECORD_MISC_GUEST_USER;
-		else
-			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
-	} else {
-		if (user_mode(regs))
-			misc |= PERF_RECORD_MISC_USER;
-		else
-			misc |= PERF_RECORD_MISC_KERNEL;
-	}
+	if (user_mode(regs))
+		misc |= PERF_RECORD_MISC_USER;
+	else
+		misc |= PERF_RECORD_MISC_KERNEL;
 
 	return misc;
 }
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b5b13a932561..72a201a686c5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -190,6 +190,7 @@ config ARM64
 	select HAVE_NMI
 	select HAVE_PATA_PLATFORM
 	select HAVE_PERF_EVENTS
+	select HAVE_GUEST_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/arch/csky/kernel/perf_callchain.c b/arch/csky/kernel/perf_callchain.c
index 35318a635a5f..92057de08f4f 100644
--- a/arch/csky/kernel/perf_callchain.c
+++ b/arch/csky/kernel/perf_callchain.c
@@ -86,13 +86,8 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry,
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	unsigned long fp = 0;
 
-	/* C-SKY does not support virtualization. */
-	if (guest_cbs && guest_cbs->is_in_guest())
-		return;
-
 	fp = regs->regs[4];
 	perf_callchain_store(entry, regs->pc);
 
@@ -111,15 +106,8 @@ void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe fr;
 
-	/* C-SKY does not support virtualization. */
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		pr_warn("C-SKY does not support perf in guest mode!");
-		return;
-	}
-
 	fr.fp = regs->regs[4];
 	fr.lr = regs->lr;
 	walk_stackframe(&fr, entry);
diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c
index f38791960781..a78a879e7ef1 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -1363,7 +1363,6 @@ void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 		    struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	unsigned long fp = 0;
 	unsigned long gp = 0;
 	unsigned long lp = 0;
@@ -1372,11 +1371,6 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 
 	leaf_fp = 0;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		/* We don't support guest os callchain now */
-		return;
-	}
-
 	perf_callchain_store(entry, regs->ipc);
 	fp = regs->fp;
 	gp = regs->gp;
@@ -1480,13 +1474,8 @@ void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 		      struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe fr;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		/* We don't support guest os callchain now */
-		return;
-	}
 	fr.fp = regs->fp;
 	fr.lp = regs->lp;
 	fr.sp = regs->sp;
@@ -1495,32 +1484,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-
-	/* However, NDS32 does not support virtualization */
-	if (guest_cbs && guest_cbs->is_in_guest())
-		return guest_cbs->get_guest_ip();
-
 	return instruction_pointer(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	int misc = 0;
 
-	/* However, NDS32 does not support virtualization */
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		if (guest_cbs->is_user_mode())
-			misc |= PERF_RECORD_MISC_GUEST_USER;
-		else
-			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
-	} else {
-		if (user_mode(regs))
-			misc |= PERF_RECORD_MISC_USER;
-		else
-			misc |= PERF_RECORD_MISC_KERNEL;
-	}
+	if (user_mode(regs))
+		misc |= PERF_RECORD_MISC_USER;
+	else
+		misc |= PERF_RECORD_MISC_KERNEL;
 
 	return misc;
 }
diff --git a/arch/riscv/kernel/perf_callchain.c b/arch/riscv/kernel/perf_callchain.c
index 8ecfc4c128bc..1fc075b8f764 100644
--- a/arch/riscv/kernel/perf_callchain.c
+++ b/arch/riscv/kernel/perf_callchain.c
@@ -56,13 +56,8 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry,
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	unsigned long fp = 0;
 
-	/* RISC-V does not support perf in guest mode. */
-	if (guest_cbs && guest_cbs->is_in_guest())
-		return;
-
 	fp = regs->s0;
 	perf_callchain_store(entry, regs->epc);
 
@@ -79,13 +74,5 @@ static bool fill_callchain(void *entry, unsigned long pc)
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-
-	/* RISC-V does not support perf in guest mode. */
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		pr_warn("RISC-V does not support perf in guest mode!");
-		return;
-	}
-
 	walk_stackframe(NULL, regs, fill_callchain, entry);
 }
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 49270655e827..a4de4aa7a3df 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -229,6 +229,7 @@ config X86
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_EVENTS_NMI
 	select HAVE_HARDLOCKUP_DETECTOR_PERF	if PERF_EVENTS && HAVE_PERF_EVENTS_NMI
+	select HAVE_GUEST_PERF_EVENTS
 	select HAVE_PCI
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2b77e2154b61..e75971f85cc8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1236,6 +1236,7 @@ extern void perf_event_bpf_event(struct bpf_prog *prog,
 				 enum perf_bpf_event_type type,
 				 u16 flags);
 
+#ifdef CONFIG_HAVE_GUEST_PERF_EVENTS
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
 static inline struct perf_guest_info_callbacks *perf_get_guest_cbs(void)
 {
@@ -1247,6 +1248,7 @@ static inline struct perf_guest_info_callbacks *perf_get_guest_cbs(void)
 }
 extern void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 extern void perf_unregister_guest_info_callbacks(void);
+#endif /* CONFIG_HAVE_GUEST_PERF_EVENTS */
 
 extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
@@ -1489,9 +1491,11 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)	{ }
 static inline void
 perf_bp_event(struct perf_event *event, void *data)			{ }
 
+#ifdef CONFIG_HAVE_GUEST_PERF_EVENTS
 static inline void perf_register_guest_info_callbacks
 (struct perf_guest_info_callbacks *callbacks)				{ }
 static inline void perf_unregister_guest_info_callbacks(void)		{ }
+#endif
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
 
diff --git a/init/Kconfig b/init/Kconfig
index 55f9f7738ebb..9ef51ae53977 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1776,6 +1776,9 @@ config HAVE_PERF_EVENTS
 	help
 	  See tools/perf/design.txt for details.
 
+config HAVE_GUEST_PERF_EVENTS
+	bool
+
 config PERF_USE_VMALLOC
 	bool
 	help
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1be95d9ace46..d7f606e06446 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6482,6 +6482,7 @@ static void perf_pending_event(struct irq_work *entry)
 		perf_swevent_put_recursion_context(rctx);
 }
 
+#ifdef CONFIG_HAVE_GUEST_PERF_EVENTS
 struct perf_guest_info_callbacks *perf_guest_cbs;
 void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
@@ -6499,6 +6500,7 @@ void perf_unregister_guest_info_callbacks(void)
 	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
+#endif
 
 static void
 perf_output_sample_regs(struct perf_output_handle *handle,
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 06/13] perf/core: Rework guest callbacks to prepare for static_call support
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 05/13] perf: Force architectures to opt-in to " Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 07/13] perf/core: Use static_call to optimize perf_guest_info_callbacks Sean Christopherson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

From: Like Xu <like.xu@linux.intel.com>

To prepare for using static_calls to optimize perf's guest callbacks,
replace ->is_in_guest and ->is_user_mode with a new multiplexed hook
->state, tweak ->handle_intel_pt_intr to play nice with being called when
there is no active guest, and drop "guest" from ->is_in_guest.

Return '0' from ->state and ->handle_intel_pt_intr to indicate "not in
guest" so that DEFINE_STATIC_CALL_RET0 can be used to define the static
calls, i.e. no callback == !guest.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
[sean: extracted from static_call patch, fixed get_ip() bug, wrote changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kernel/perf_callchain.c | 13 +++++-----
 arch/arm64/kvm/perf.c              | 35 +++++++++++---------------
 arch/x86/events/core.c             | 13 +++++-----
 arch/x86/events/intel/core.c       |  5 +---
 arch/x86/include/asm/kvm_host.h    |  2 +-
 arch/x86/kvm/pmu.c                 |  2 +-
 arch/x86/kvm/x86.c                 | 40 ++++++++++++++++--------------
 arch/x86/xen/pmu.c                 | 32 ++++++++++--------------
 include/linux/perf_event.h         | 10 +++++---
 kernel/events/core.c               |  1 +
 10 files changed, 74 insertions(+), 79 deletions(-)

diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 86d9f2013172..274dc3e11b6d 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -104,7 +104,7 @@ void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 {
 	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->state()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -152,7 +152,7 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe frame;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->state()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -165,8 +165,8 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
 	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 
-	if (guest_cbs && guest_cbs->is_in_guest())
-		return guest_cbs->get_guest_ip();
+	if (guest_cbs && guest_cbs->state())
+		return guest_cbs->get_ip();
 
 	return instruction_pointer(regs);
 }
@@ -174,10 +174,11 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
 	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+	unsigned int guest_state = guest_cbs ? guest_cbs->state() : 0;
 	int misc = 0;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		if (guest_cbs->is_user_mode())
+	if (guest_state) {
+		if (guest_state & PERF_GUEST_USER)
 			misc |= PERF_RECORD_MISC_GUEST_USER;
 		else
 			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index 039fe59399a2..893de1a51fea 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -13,39 +13,34 @@
 
 DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
 
-static int kvm_is_in_guest(void)
+static unsigned int kvm_guest_state(void)
 {
-        return kvm_get_running_vcpu() != NULL;
-}
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+	unsigned int state;
 
-static int kvm_is_user_mode(void)
-{
-	struct kvm_vcpu *vcpu;
-
-	vcpu = kvm_get_running_vcpu();
+	if (!vcpu)
+		return 0;
 
-	if (vcpu)
-		return !vcpu_mode_priv(vcpu);
+	state = PERF_GUEST_ACTIVE;
+	if (!vcpu_mode_priv(vcpu))
+		state |= PERF_GUEST_USER;
 
-	return 0;
+	return state;
 }
 
 static unsigned long kvm_get_guest_ip(void)
 {
-	struct kvm_vcpu *vcpu;
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
-	vcpu = kvm_get_running_vcpu();
+	if (WARN_ON_ONCE(!vcpu))
+		return 0;
 
-	if (vcpu)
-		return *vcpu_pc(vcpu);
-
-	return 0;
+	return *vcpu_pc(vcpu);
 }
 
 static struct perf_guest_info_callbacks kvm_guest_cbs = {
-	.is_in_guest	= kvm_is_in_guest,
-	.is_user_mode	= kvm_is_user_mode,
-	.get_guest_ip	= kvm_get_guest_ip,
+	.state		= kvm_guest_state,
+	.get_ip		= kvm_get_guest_ip,
 };
 
 void kvm_perf_init(void)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ffb3e6c0d367..3a7630fdd340 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2765,7 +2765,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	struct unwind_state state;
 	unsigned long addr;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->state()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2869,7 +2869,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
 	struct stack_frame frame;
 	const struct stack_frame __user *fp;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
+	if (guest_cbs && guest_cbs->state()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2948,8 +2948,8 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
 	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 
-	if (guest_cbs && guest_cbs->is_in_guest())
-		return guest_cbs->get_guest_ip();
+	if (guest_cbs && guest_cbs->state())
+		return guest_cbs->get_ip();
 
 	return regs->ip + code_segment_base(regs);
 }
@@ -2957,10 +2957,11 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
 	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+	unsigned int guest_state = guest_cbs ? guest_cbs->state() : 0;
 	int misc = 0;
 
-	if (guest_cbs && guest_cbs->is_in_guest()) {
-		if (guest_cbs->is_user_mode())
+	if (guest_state) {
+		if (guest_state & PERF_GUEST_USER)
 			misc |= PERF_RECORD_MISC_GUEST_USER;
 		else
 			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 9baa46185d94..524ad1f747bd 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2855,10 +2855,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 		handled++;
 
 		guest_cbs = perf_get_guest_cbs();
-		if (unlikely(guest_cbs && guest_cbs->is_in_guest() &&
-			     guest_cbs->handle_intel_pt_intr))
-			guest_cbs->handle_intel_pt_intr();
-		else
+		if (likely(!guest_cbs || !guest_cbs->handle_intel_pt_intr()))
 			intel_pt_interrupt();
 	}
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1ea4943a73d7..1080166fc0cf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1874,7 +1874,7 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
 void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);
 
-int kvm_is_in_guest(void);
+unsigned int kvm_guest_state(void);
 
 void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
 				     u32 size);
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0772bad9165c..5b68d4188de0 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -87,7 +87,7 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
 		 * woken up. So we should wake it, but this is impossible from
 		 * NMI context. Do it from irq work instead.
 		 */
-		if (!kvm_is_in_guest())
+		if (!kvm_guest_state())
 			irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
 		else
 			kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bae951344e28..b2a4d085aa4f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8267,44 +8267,48 @@ static void kvm_timer_init(void)
 DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
 EXPORT_PER_CPU_SYMBOL_GPL(current_vcpu);
 
-int kvm_is_in_guest(void)
+unsigned int kvm_guest_state(void)
 {
-	return __this_cpu_read(current_vcpu) != NULL;
-}
+	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
+	unsigned int state;
 
-static int kvm_is_user_mode(void)
-{
-	int user_mode = 3;
+	if (!vcpu)
+		return 0;
 
-	if (__this_cpu_read(current_vcpu))
-		user_mode = static_call(kvm_x86_get_cpl)(__this_cpu_read(current_vcpu));
+	state = PERF_GUEST_ACTIVE;
+	if (static_call(kvm_x86_get_cpl)(vcpu))
+		state |= PERF_GUEST_USER;
 
-	return user_mode != 0;
+	return state;
 }
 
-static unsigned long kvm_get_guest_ip(void)
+static unsigned long kvm_guest_get_ip(void)
 {
-	unsigned long ip = 0;
+	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
 
-	if (__this_cpu_read(current_vcpu))
-		ip = kvm_rip_read(__this_cpu_read(current_vcpu));
+	if (WARN_ON_ONCE(!vcpu))
+		return 0;
 
-	return ip;
+	return kvm_rip_read(vcpu);
 }
 
-static void kvm_handle_intel_pt_intr(void)
+static unsigned int kvm_handle_intel_pt_intr(void)
 {
 	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
 
+	/* '0' on failure so that the !PT case can use a RET0 static call. */
+	if (!vcpu)
+		return 0;
+
 	kvm_make_request(KVM_REQ_PMI, vcpu);
 	__set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT,
 			(unsigned long *)&vcpu->arch.pmu.global_status);
+	return 1;
 }
 
 static struct perf_guest_info_callbacks kvm_guest_cbs = {
-	.is_in_guest		= kvm_is_in_guest,
-	.is_user_mode		= kvm_is_user_mode,
-	.get_guest_ip		= kvm_get_guest_ip,
+	.state			= kvm_guest_state,
+	.get_ip			= kvm_guest_get_ip,
 	.handle_intel_pt_intr	= NULL,
 };
 
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index e13b0b49fcdf..89dd6b1708b0 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -413,34 +413,29 @@ int pmu_apic_update(uint32_t val)
 }
 
 /* perf callbacks */
-static int xen_is_in_guest(void)
+static unsigned int xen_guest_state(void)
 {
 	const struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
+	unsigned int state = 0;
 
 	if (!xenpmu_data) {
 		pr_warn_once("%s: pmudata not initialized\n", __func__);
-		return 0;
+		return state;
 	}
 
 	if (!xen_initial_domain() || (xenpmu_data->domain_id >= DOMID_SELF))
-		return 0;
+		return state;
 
-	return 1;
-}
+	state |= PERF_GUEST_ACTIVE;
 
-static int xen_is_user_mode(void)
-{
-	const struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
-
-	if (!xenpmu_data) {
-		pr_warn_once("%s: pmudata not initialized\n", __func__);
-		return 0;
+	if (xenpmu_data->pmu.pmu_flags & PMU_SAMPLE_PV) {
+		if (xenpmu_data->pmu.pmu_flags & PMU_SAMPLE_USER)
+			state |= PERF_GUEST_USER;
+	} else if (xenpmu_data->pmu.r.regs.cpl & 3) {
+		state |= PERF_GUEST_USER;
 	}
 
-	if (xenpmu_data->pmu.pmu_flags & PMU_SAMPLE_PV)
-		return (xenpmu_data->pmu.pmu_flags & PMU_SAMPLE_USER);
-	else
-		return !!(xenpmu_data->pmu.r.regs.cpl & 3);
+	return state;
 }
 
 static unsigned long xen_get_guest_ip(void)
@@ -456,9 +451,8 @@ static unsigned long xen_get_guest_ip(void)
 }
 
 static struct perf_guest_info_callbacks xen_guest_cbs = {
-	.is_in_guest            = xen_is_in_guest,
-	.is_user_mode           = xen_is_user_mode,
-	.get_guest_ip           = xen_get_guest_ip,
+	.state                  = xen_guest_state,
+	.get_ip			= xen_get_guest_ip,
 };
 
 /* Convert registers from Xen's format to Linux' */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e75971f85cc8..a5d5893b80b0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -26,11 +26,13 @@
 # include <asm/local64.h>
 #endif
 
+#define PERF_GUEST_ACTIVE	0x01
+#define PERF_GUEST_USER	0x02
+
 struct perf_guest_info_callbacks {
-	int				(*is_in_guest)(void);
-	int				(*is_user_mode)(void);
-	unsigned long			(*get_guest_ip)(void);
-	void				(*handle_intel_pt_intr)(void);
+	unsigned int			(*state)(void);
+	unsigned long			(*get_ip)(void);
+	unsigned int			(*handle_intel_pt_intr)(void);
 };
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d7f606e06446..ec36e7aded89 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6484,6 +6484,7 @@ static void perf_pending_event(struct irq_work *entry)
 
 #ifdef CONFIG_HAVE_GUEST_PERF_EVENTS
 struct perf_guest_info_callbacks *perf_guest_cbs;
+
 void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
 	if (WARN_ON_ONCE(perf_guest_cbs))
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 07/13] perf/core: Use static_call to optimize perf_guest_info_callbacks
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 06/13] perf/core: Rework guest callbacks to prepare for static_call support Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 08/13] KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable Sean Christopherson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

From: Like Xu <like.xu@linux.intel.com>

Use static_call to optimize perf's guest callbacks on arm64 and x86,
which are now the only architectures that define the callbacks.  Use
DEFINE_STATIC_CALL_RET0 as the default/NULL for all guest callbacks, as
the callback semantics are that a return value '0' means "not in guest".

static_call obviously avoids the overhead of CONFIG_RETPOLINE=y, but is
also advantageous versus other solutions, e.g. per-cpu callbacks, in that
a per-cpu memory load is not needed to detect the !guest case.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
[sean: split out patch, drop __weak, tweak updaters, rewrite changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kernel/perf_callchain.c | 31 +++++++++++++++---------
 arch/x86/events/core.c             | 38 ++++++++++++++++++++++--------
 arch/x86/events/intel/core.c       |  7 +++---
 include/linux/perf_event.h         |  9 +------
 kernel/events/core.c               |  2 ++
 5 files changed, 54 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 274dc3e11b6d..18cf6e608778 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2015 ARM Limited
  */
 #include <linux/perf_event.h>
+#include <linux/static_call.h>
 #include <linux/uaccess.h>
 
 #include <asm/pointer_auth.h>
@@ -99,12 +100,24 @@ compat_user_backtrace(struct compat_frame_tail __user *tail,
 }
 #endif /* CONFIG_COMPAT */
 
+DEFINE_STATIC_CALL_RET0(arm64_guest_state, *(perf_guest_cbs->state));
+DEFINE_STATIC_CALL_RET0(arm64_guest_get_ip, *(perf_guest_cbs->get_ip));
+
+void arch_perf_update_guest_cbs(struct perf_guest_info_callbacks *guest_cbs)
+{
+	if (guest_cbs) {
+		static_call_update(arm64_guest_state, guest_cbs->state);
+		static_call_update(arm64_guest_get_ip, guest_cbs->get_ip);
+	} else {
+		static_call_update(arm64_guest_state, (void *)&__static_call_return0);
+		static_call_update(arm64_guest_get_ip, (void *)&__static_call_return0);
+	}
+}
+
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-
-	if (guest_cbs && guest_cbs->state()) {
+	if (static_call(arm64_guest_state)()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -149,10 +162,9 @@ static bool callchain_trace(void *data, unsigned long pc)
 void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stackframe frame;
 
-	if (guest_cbs && guest_cbs->state()) {
+	if (static_call(arm64_guest_state)()) {
 		/* We don't support guest os callchain now */
 		return;
 	}
@@ -163,18 +175,15 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-
-	if (guest_cbs && guest_cbs->state())
-		return guest_cbs->get_ip();
+	if (static_call(arm64_guest_state)())
+		return static_call(arm64_guest_get_ip)();
 
 	return instruction_pointer(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-	unsigned int guest_state = guest_cbs ? guest_cbs->state() : 0;
+	unsigned int guest_state = static_call(arm64_guest_state)();
 	int misc = 0;
 
 	if (guest_state) {
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 3a7630fdd340..508a677edd8c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -90,6 +90,29 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);
  */
 DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
 
+DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
+DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
+DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
+
+void arch_perf_update_guest_cbs(struct perf_guest_info_callbacks *guest_cbs)
+{
+	if (guest_cbs) {
+		static_call_update(x86_guest_state, guest_cbs->state);
+		static_call_update(x86_guest_get_ip, guest_cbs->get_ip);
+	} else {
+		static_call_update(x86_guest_state, (void *)&__static_call_return0);
+		static_call_update(x86_guest_get_ip, (void *)&__static_call_return0);
+	}
+
+	/* Implementing ->handle_intel_pt_intr is optional. */
+	if (guest_cbs && guest_cbs->handle_intel_pt_intr)
+		static_call_update(x86_guest_handle_intel_pt_intr,
+				   guest_cbs->handle_intel_pt_intr);
+	else
+		static_call_update(x86_guest_handle_intel_pt_intr,
+				   (void *)&__static_call_return0);
+}
+
 u64 __read_mostly hw_cache_event_ids
 				[PERF_COUNT_HW_CACHE_MAX]
 				[PERF_COUNT_HW_CACHE_OP_MAX]
@@ -2761,11 +2784,10 @@ static bool perf_hw_regs(struct pt_regs *regs)
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct unwind_state state;
 	unsigned long addr;
 
-	if (guest_cbs && guest_cbs->state()) {
+	if (static_call(x86_guest_state)()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2865,11 +2887,10 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
 	struct stack_frame frame;
 	const struct stack_frame __user *fp;
 
-	if (guest_cbs && guest_cbs->state()) {
+	if (static_call(x86_guest_state)()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2946,18 +2967,15 @@ static unsigned long code_segment_base(struct pt_regs *regs)
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-
-	if (guest_cbs && guest_cbs->state())
-		return guest_cbs->get_ip();
+	if (static_call(x86_guest_state)())
+		return static_call(x86_guest_get_ip)();
 
 	return regs->ip + code_segment_base(regs);
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
-	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
-	unsigned int guest_state = guest_cbs ? guest_cbs->state() : 0;
+	unsigned int guest_state = static_call(x86_guest_state)();
 	int misc = 0;
 
 	if (guest_state) {
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 524ad1f747bd..fb1bd7a0e1a6 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2782,11 +2782,12 @@ static void intel_pmu_reset(void)
 	local_irq_restore(flags);
 }
 
+DECLARE_STATIC_CALL(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
+
 static int handle_pmi_common(struct pt_regs *regs, u64 status)
 {
 	struct perf_sample_data data;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
-	struct perf_guest_info_callbacks *guest_cbs;
 	int bit;
 	int handled = 0;
 	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
@@ -2853,9 +2854,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	 */
 	if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
 		handled++;
-
-		guest_cbs = perf_get_guest_cbs();
-		if (likely(!guest_cbs || !guest_cbs->handle_intel_pt_intr()))
+		if (!static_call(x86_guest_handle_intel_pt_intr)())
 			intel_pt_interrupt();
 	}
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a5d5893b80b0..3fa1014218f4 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1240,14 +1240,7 @@ extern void perf_event_bpf_event(struct bpf_prog *prog,
 
 #ifdef CONFIG_HAVE_GUEST_PERF_EVENTS
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
-static inline struct perf_guest_info_callbacks *perf_get_guest_cbs(void)
-{
-	/* Reg/unreg perf_guest_cbs waits for readers via synchronize_rcu(). */
-	lockdep_assert_preemption_disabled();
-
-	/* Prevent reloading between a !NULL check and dereferences. */
-	return READ_ONCE(perf_guest_cbs);
-}
+extern void arch_perf_update_guest_cbs(struct perf_guest_info_callbacks *guest_cbs);
 extern void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 extern void perf_unregister_guest_info_callbacks(void);
 #endif /* CONFIG_HAVE_GUEST_PERF_EVENTS */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ec36e7aded89..fb0fd670ab23 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6491,6 +6491,7 @@ void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 		return;
 
 	WRITE_ONCE(perf_guest_cbs, cbs);
+	arch_perf_update_guest_cbs(cbs);
 	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
@@ -6498,6 +6499,7 @@ EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
 void perf_unregister_guest_info_callbacks(void)
 {
 	WRITE_ONCE(perf_guest_cbs, NULL);
+	arch_perf_update_guest_cbs(NULL);
 	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 08/13] KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 07/13] perf/core: Use static_call to optimize perf_guest_info_callbacks Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 09/13] KVM: x86: More precisely identify NMI from guest when handling PMI Sean Christopherson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Use the generic kvm_running_vcpu plus a new 'handling_intr_from_guest'
variable in kvm_arch_vcpu instead of the semi-redundant current_vcpu.
kvm_before/after_interrupt() must be called while the vCPU is loaded,
(which protects against preemption), thus kvm_running_vcpu is guaranteed
to be non-NULL when handling_intr_from_guest is non-zero.

Switching to kvm_get_running_vcpu() will allows moving KVM's perf
callbacks to generic code, and the new flag will be used in a future
patch to more precisely identify the "NMI from guest" case.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +--
 arch/x86/kvm/pmu.c              |  2 +-
 arch/x86/kvm/x86.c              | 21 ++++++++++++---------
 arch/x86/kvm/x86.h              | 10 ++++++----
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1080166fc0cf..2d86a2dfc775 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -763,6 +763,7 @@ struct kvm_vcpu_arch {
 	unsigned nmi_pending; /* NMI queued after currently running handler */
 	bool nmi_injected;    /* Trying to inject an NMI this entry */
 	bool smi_pending;    /* SMI queued after currently running handler */
+	u8 handling_intr_from_guest;
 
 	struct kvm_mtrr mtrr_state;
 	u64 pat;
@@ -1874,8 +1875,6 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
 void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);
 
-unsigned int kvm_guest_state(void);
-
 void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
 				     u32 size);
 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 5b68d4188de0..eef48258e50f 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -87,7 +87,7 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
 		 * woken up. So we should wake it, but this is impossible from
 		 * NMI context. Do it from irq work instead.
 		 */
-		if (!kvm_guest_state())
+		if (!kvm_handling_nmi_from_guest(pmc->vcpu))
 			irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
 		else
 			kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2a4d085aa4f..6df300c55461 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8264,15 +8264,17 @@ static void kvm_timer_init(void)
 			  kvmclock_cpu_online, kvmclock_cpu_down_prep);
 }
 
-DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
-EXPORT_PER_CPU_SYMBOL_GPL(current_vcpu);
+static inline bool kvm_pmi_in_guest(struct kvm_vcpu *vcpu)
+{
+	return vcpu && vcpu->arch.handling_intr_from_guest;
+}
 
-unsigned int kvm_guest_state(void)
+static unsigned int kvm_guest_state(void)
 {
-	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 	unsigned int state;
 
-	if (!vcpu)
+	if (!kvm_pmi_in_guest(vcpu))
 		return 0;
 
 	state = PERF_GUEST_ACTIVE;
@@ -8284,9 +8286,10 @@ unsigned int kvm_guest_state(void)
 
 static unsigned long kvm_guest_get_ip(void)
 {
-	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
-	if (WARN_ON_ONCE(!vcpu))
+	/* Retrieving the IP must be guarded by a call to kvm_guest_state(). */
+	if (WARN_ON_ONCE(!kvm_pmi_in_guest(vcpu)))
 		return 0;
 
 	return kvm_rip_read(vcpu);
@@ -8294,10 +8297,10 @@ static unsigned long kvm_guest_get_ip(void)
 
 static unsigned int kvm_handle_intel_pt_intr(void)
 {
-	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 	/* '0' on failure so that the !PT case can use a RET0 static call. */
-	if (!vcpu)
+	if (!kvm_pmi_in_guest(vcpu))
 		return 0;
 
 	kvm_make_request(KVM_REQ_PMI, vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 7d66d63dc55a..a9c107e7c907 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -387,18 +387,20 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm)
 	return kvm->arch.cstate_in_guest;
 }
 
-DECLARE_PER_CPU(struct kvm_vcpu *, current_vcpu);
-
 static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu)
 {
-	__this_cpu_write(current_vcpu, vcpu);
+	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 1);
 }
 
 static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
 {
-	__this_cpu_write(current_vcpu, NULL);
+	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 0);
 }
 
+static inline bool kvm_handling_nmi_from_guest(struct kvm_vcpu *vcpu)
+{
+	return !!vcpu->arch.handling_intr_from_guest;
+}
 
 static inline bool kvm_pat_valid(u64 data)
 {
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 09/13] KVM: x86: More precisely identify NMI from guest when handling PMI
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 08/13] KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 10/13] KVM: Move x86's perf guest info callbacks to generic KVM Sean Christopherson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Differntiate between IRQ and NMI for KVM's PMC overflow callback, which
was originally invoked in response to an NMI that arrived while the guest
was running, but was inadvertantly changed to fire on IRQs as well when
support for perf without PMU/NMI was added to KVM.  In practice, this
should be a nop as the PMC overflow callback shouldn't be reached, but
it's a cheap and easy fix that also better documents the situation.

Note, this also doesn't completely prevent false positives if perf
somehow ends up calling into KVM, e.g. an NMI can arrive in host after
KVM sets its flag.

Fixes: dd60d217062f ("KVM: x86: Fix perf timer mode IP reporting")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c |  2 +-
 arch/x86/kvm/vmx/vmx.c |  4 +++-
 arch/x86/kvm/x86.c     |  2 +-
 arch/x86/kvm/x86.h     | 13 ++++++++++---
 4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1a70e11f0487..0a0c01744b63 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3843,7 +3843,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	}
 
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
-		kvm_before_interrupt(vcpu);
+		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
 
 	kvm_load_host_xsave_state(vcpu);
 	stgi();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f19d72136f77..61a4f5ff2acd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6344,7 +6344,9 @@ void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
 static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
 					unsigned long entry)
 {
-	kvm_before_interrupt(vcpu);
+	bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist;
+
+	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
 	vmx_do_interrupt_nmi_irqoff(entry);
 	kvm_after_interrupt(vcpu);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6df300c55461..1427ac1fc1f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9676,7 +9676,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * interrupts on processors that implement an interrupt shadow, the
 	 * stat.exits increment will do nicely.
 	 */
-	kvm_before_interrupt(vcpu);
+	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
 	local_irq_enable();
 	++vcpu->stat.exits;
 	local_irq_disable();
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a9c107e7c907..9b26f9b09d2a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -387,9 +387,16 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm)
 	return kvm->arch.cstate_in_guest;
 }
 
-static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu)
+enum kvm_intr_type {
+	/* Values are arbitrary, but must be non-zero. */
+	KVM_HANDLING_IRQ = 1,
+	KVM_HANDLING_NMI,
+};
+
+static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
+					enum kvm_intr_type intr)
 {
-	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 1);
+	WRITE_ONCE(vcpu->arch.handling_intr_from_guest, (u8)intr);
 }
 
 static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
@@ -399,7 +406,7 @@ static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_handling_nmi_from_guest(struct kvm_vcpu *vcpu)
 {
-	return !!vcpu->arch.handling_intr_from_guest;
+	return vcpu->arch.handling_intr_from_guest == KVM_HANDLING_NMI;
 }
 
 static inline bool kvm_pat_valid(u64 data)
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 10/13] KVM: Move x86's perf guest info callbacks to generic KVM
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 09/13] KVM: x86: More precisely identify NMI from guest when handling PMI Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 11/13] KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c Sean Christopherson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Move x86's perf guest callbacks into common KVM, as they are semantically
identical to arm64's callbacks (the only other such KVM callbacks).
arm64 will convert to the common versions in a future patch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++
 arch/x86/kvm/x86.c              | 53 +++++++--------------------------
 include/linux/kvm_host.h        | 12 ++++++++
 virt/kvm/kvm_main.c             | 40 +++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2d86a2dfc775..a98c7907110c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1543,6 +1543,10 @@ static inline int kvm_arch_flush_remote_tlb(struct kvm *kvm)
 		return -ENOTSUPP;
 }
 
+#define __KVM_WANT_PERF_CALLBACKS
+#define kvm_arch_pmi_in_guest(vcpu) \
+	((vcpu) && (vcpu)->arch.handling_intr_from_guest)
+
 int kvm_mmu_module_init(void);
 void kvm_mmu_module_exit(void);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1427ac1fc1f2..1bea616402e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8264,43 +8264,12 @@ static void kvm_timer_init(void)
 			  kvmclock_cpu_online, kvmclock_cpu_down_prep);
 }
 
-static inline bool kvm_pmi_in_guest(struct kvm_vcpu *vcpu)
-{
-	return vcpu && vcpu->arch.handling_intr_from_guest;
-}
-
-static unsigned int kvm_guest_state(void)
-{
-	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
-	unsigned int state;
-
-	if (!kvm_pmi_in_guest(vcpu))
-		return 0;
-
-	state = PERF_GUEST_ACTIVE;
-	if (static_call(kvm_x86_get_cpl)(vcpu))
-		state |= PERF_GUEST_USER;
-
-	return state;
-}
-
-static unsigned long kvm_guest_get_ip(void)
-{
-	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
-
-	/* Retrieving the IP must be guarded by a call to kvm_guest_state(). */
-	if (WARN_ON_ONCE(!kvm_pmi_in_guest(vcpu)))
-		return 0;
-
-	return kvm_rip_read(vcpu);
-}
-
 static unsigned int kvm_handle_intel_pt_intr(void)
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 	/* '0' on failure so that the !PT case can use a RET0 static call. */
-	if (!kvm_pmi_in_guest(vcpu))
+	if (!kvm_arch_pmi_in_guest(vcpu))
 		return 0;
 
 	kvm_make_request(KVM_REQ_PMI, vcpu);
@@ -8309,12 +8278,6 @@ static unsigned int kvm_handle_intel_pt_intr(void)
 	return 1;
 }
 
-static struct perf_guest_info_callbacks kvm_guest_cbs = {
-	.state			= kvm_guest_state,
-	.get_ip			= kvm_guest_get_ip,
-	.handle_intel_pt_intr	= NULL,
-};
-
 #ifdef CONFIG_X86_64
 static void pvclock_gtod_update_fn(struct work_struct *work)
 {
@@ -11068,9 +11031,11 @@ int kvm_arch_hardware_setup(void *opaque)
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 	kvm_ops_static_call_update();
 
+	/* Temporary ugliness. */
 	if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
-		kvm_guest_cbs.handle_intel_pt_intr = kvm_handle_intel_pt_intr;
-	perf_register_guest_info_callbacks(&kvm_guest_cbs);
+		kvm_register_perf_callbacks(kvm_handle_intel_pt_intr);
+	else
+		kvm_register_perf_callbacks(NULL);
 
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		supported_xss = 0;
@@ -11099,8 +11064,7 @@ int kvm_arch_hardware_setup(void *opaque)
 
 void kvm_arch_hardware_unsetup(void)
 {
-	perf_unregister_guest_info_callbacks();
-	kvm_guest_cbs.handle_intel_pt_intr = NULL;
+	kvm_unregister_perf_callbacks();
 
 	static_call(kvm_x86_hardware_unsetup)();
 }
@@ -11727,6 +11691,11 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 	return vcpu->arch.preempted_in_kernel;
 }
 
+unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
+{
+	return kvm_rip_read(vcpu);
+}
+
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e4d712e9f760..34d99034852f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1163,6 +1163,18 @@ static inline bool kvm_arch_intc_initialized(struct kvm *kvm)
 }
 #endif
 
+#ifdef __KVM_WANT_PERF_CALLBACKS
+
+void kvm_set_intel_pt_intr_handler(unsigned int (*handler)(void));
+unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu);
+
+void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void));
+static inline void kvm_unregister_perf_callbacks(void)
+{
+	perf_unregister_guest_info_callbacks();
+}
+#endif
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
 void kvm_arch_destroy_vm(struct kvm *kvm);
 void kvm_arch_sync_events(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3e67c93ca403..2c61db39b501 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5460,6 +5460,46 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
         return &kvm_running_vcpu;
 }
 
+#ifdef __KVM_WANT_PERF_CALLBACKS
+static unsigned int kvm_guest_state(void)
+{
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+	unsigned int state;
+
+	if (!kvm_arch_pmi_in_guest(vcpu))
+		return 0;
+
+	state = PERF_GUEST_ACTIVE;
+	if (!kvm_arch_vcpu_in_kernel(vcpu))
+		state |= PERF_GUEST_USER;
+
+	return state;
+}
+
+static unsigned long kvm_guest_get_ip(void)
+{
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+	/* Retrieving the IP must be guarded by a call to kvm_guest_state(). */
+	if (WARN_ON_ONCE(!kvm_arch_pmi_in_guest(vcpu)))
+		return 0;
+
+	return kvm_arch_vcpu_get_ip(vcpu);
+}
+
+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+	.state			= kvm_guest_state,
+	.get_ip			= kvm_guest_get_ip,
+	.handle_intel_pt_intr	= NULL,
+};
+
+void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void))
+{
+	kvm_guest_cbs.handle_intel_pt_intr = pt_intr_handler;
+	perf_register_guest_info_callbacks(&kvm_guest_cbs);
+}
+#endif
+
 struct kvm_cpu_compat_check {
 	void *opaque;
 	int *ret;
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 11/13] KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (9 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 10/13] KVM: Move x86's perf guest info callbacks to generic KVM Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 12/13] KVM: arm64: Convert to the generic perf callbacks Sean Christopherson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Now that all state needed for VMX's PT interrupt handler is exposed to
vmx.c (specifically the currently running vCPU), move the handler into
vmx.c where it belongs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx/vmx.c          | 22 +++++++++++++++++++++-
 arch/x86/kvm/x86.c              | 20 +-------------------
 include/linux/kvm_host.h        |  2 --
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a98c7907110c..7a3d1dcfef39 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1495,7 +1495,7 @@ struct kvm_x86_init_ops {
 	int (*disabled_by_bios)(void);
 	int (*check_processor_compatibility)(void);
 	int (*hardware_setup)(void);
-	bool (*intel_pt_intr_in_guest)(void);
+	unsigned int (*handle_intel_pt_intr)(void);
 
 	struct kvm_x86_ops *runtime_ops;
 };
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 61a4f5ff2acd..33f92febe3ce 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7687,6 +7687,20 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
 };
 
+static unsigned int vmx_handle_intel_pt_intr(void)
+{
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+	/* '0' on failure so that the !PT case can use a RET0 static call. */
+	if (!kvm_arch_pmi_in_guest(vcpu))
+		return 0;
+
+	kvm_make_request(KVM_REQ_PMI, vcpu);
+	__set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT,
+		  (unsigned long *)&vcpu->arch.pmu.global_status);
+	return 1;
+}
+
 static __init void vmx_setup_user_return_msrs(void)
 {
 
@@ -7713,6 +7727,8 @@ static __init void vmx_setup_user_return_msrs(void)
 		kvm_add_user_return_msr(vmx_uret_msrs_list[i]);
 }
 
+static struct kvm_x86_init_ops vmx_init_ops __initdata;
+
 static __init int hardware_setup(void)
 {
 	unsigned long host_bndcfgs;
@@ -7873,6 +7889,10 @@ static __init int hardware_setup(void)
 		return -EINVAL;
 	if (!enable_ept || !cpu_has_vmx_intel_pt())
 		pt_mode = PT_MODE_SYSTEM;
+	if (pt_mode == PT_MODE_HOST_GUEST)
+		vmx_init_ops.handle_intel_pt_intr = vmx_handle_intel_pt_intr;
+	else
+		vmx_init_ops.handle_intel_pt_intr = NULL;
 
 	setup_default_sgx_lepubkeyhash();
 
@@ -7898,7 +7918,7 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = {
 	.disabled_by_bios = vmx_disabled_by_bios,
 	.check_processor_compatibility = vmx_check_processor_compat,
 	.hardware_setup = hardware_setup,
-	.intel_pt_intr_in_guest = vmx_pt_mode_is_host_guest,
+	.handle_intel_pt_intr = NULL,
 
 	.runtime_ops = &vmx_x86_ops,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1bea616402e6..b79b2d29260d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8264,20 +8264,6 @@ static void kvm_timer_init(void)
 			  kvmclock_cpu_online, kvmclock_cpu_down_prep);
 }
 
-static unsigned int kvm_handle_intel_pt_intr(void)
-{
-	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
-
-	/* '0' on failure so that the !PT case can use a RET0 static call. */
-	if (!kvm_arch_pmi_in_guest(vcpu))
-		return 0;
-
-	kvm_make_request(KVM_REQ_PMI, vcpu);
-	__set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT,
-			(unsigned long *)&vcpu->arch.pmu.global_status);
-	return 1;
-}
-
 #ifdef CONFIG_X86_64
 static void pvclock_gtod_update_fn(struct work_struct *work)
 {
@@ -11031,11 +11017,7 @@ int kvm_arch_hardware_setup(void *opaque)
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 	kvm_ops_static_call_update();
 
-	/* Temporary ugliness. */
-	if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
-		kvm_register_perf_callbacks(kvm_handle_intel_pt_intr);
-	else
-		kvm_register_perf_callbacks(NULL);
+	kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
 
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		supported_xss = 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 34d99034852f..b9235c3ac6af 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1164,8 +1164,6 @@ static inline bool kvm_arch_intc_initialized(struct kvm *kvm)
 #endif
 
 #ifdef __KVM_WANT_PERF_CALLBACKS
-
-void kvm_set_intel_pt_intr_handler(unsigned int (*handler)(void));
 unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu);
 
 void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void));
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 12/13] KVM: arm64: Convert to the generic perf callbacks
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (10 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 11/13] KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28  0:35 ` [PATCH v2 13/13] KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c / pmu.c Sean Christopherson
  2021-08-28 20:13 ` [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Peter Zijlstra
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Drop arm64's version of the callbacks in favor of the callbacks provided
by generic KVM, which are semantically identical.  Implement the "get ip"
hook as needed.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 +++++++++++
 arch/arm64/kvm/arm.c              |  5 +++++
 arch/arm64/kvm/perf.c             | 34 ++-----------------------------
 3 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ed940aec89e0..73dc402ded1f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -673,6 +673,18 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa);
 void kvm_perf_init(void);
 void kvm_perf_teardown(void);
 
+#ifdef CONFIG_PERF_EVENTS
+#define __KVM_WANT_PERF_CALLBACKS
+static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu)
+{
+	/* Any callback while a vCPU is loaded is considered to be in guest. */
+	return !!vcpu;
+}
+#else
+static inline void kvm_register_perf_callbacks(void) {}
+static inline void kvm_unregister_perf_callbacks(void) {}
+#endif
+
 long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
 gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu);
 void kvm_update_stolen_time(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..2b542fdc237e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -500,6 +500,11 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 	return vcpu_mode_priv(vcpu);
 }
 
+unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
+{
+	return *vcpu_pc(vcpu);
+}
+
 /* Just ensure a guest exit from a particular CPU */
 static void exit_vm_noop(void *info)
 {
diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index 893de1a51fea..0b902e0d5b5d 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -13,45 +13,15 @@
 
 DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
 
-static unsigned int kvm_guest_state(void)
-{
-	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
-	unsigned int state;
-
-	if (!vcpu)
-		return 0;
-
-	state = PERF_GUEST_ACTIVE;
-	if (!vcpu_mode_priv(vcpu))
-		state |= PERF_GUEST_USER;
-
-	return state;
-}
-
-static unsigned long kvm_get_guest_ip(void)
-{
-	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
-
-	if (WARN_ON_ONCE(!vcpu))
-		return 0;
-
-	return *vcpu_pc(vcpu);
-}
-
-static struct perf_guest_info_callbacks kvm_guest_cbs = {
-	.state		= kvm_guest_state,
-	.get_ip		= kvm_get_guest_ip,
-};
-
 void kvm_perf_init(void)
 {
 	if (kvm_pmu_probe_pmuver() != 0xf && !is_protected_kvm_enabled())
 		static_branch_enable(&kvm_arm_pmu_available);
 
-	perf_register_guest_info_callbacks(&kvm_guest_cbs);
+	kvm_register_perf_callbacks(NULL);
 }
 
 void kvm_perf_teardown(void)
 {
-	perf_unregister_guest_info_callbacks();
+	kvm_unregister_perf_callbacks();
 }
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 13/13] KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c / pmu.c
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (11 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 12/13] KVM: arm64: Convert to the generic perf callbacks Sean Christopherson
@ 2021-08-28  0:35 ` Sean Christopherson
  2021-08-28 20:13 ` [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Peter Zijlstra
  13 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-28  0:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Paolo Bonzini, Boris Ostrovsky, Juergen Gross
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

Call KVM's (un)register perf callbacks helpers directly from arm.c, and
move the PMU bits into pmu.c and rename the related helper accordingly.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ---
 arch/arm64/kvm/Makefile           |  2 +-
 arch/arm64/kvm/arm.c              |  6 ++++--
 arch/arm64/kvm/perf.c             | 27 ---------------------------
 arch/arm64/kvm/pmu.c              |  8 ++++++++
 include/kvm/arm_pmu.h             |  1 +
 6 files changed, 14 insertions(+), 33 deletions(-)
 delete mode 100644 arch/arm64/kvm/perf.c

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 73dc402ded1f..d549b58120bc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -670,9 +670,6 @@ unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
 int kvm_handle_mmio_return(struct kvm_vcpu *vcpu);
 int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa);
 
-void kvm_perf_init(void);
-void kvm_perf_teardown(void);
-
 #ifdef CONFIG_PERF_EVENTS
 #define __KVM_WANT_PERF_CALLBACKS
 static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..0bcc378b7961 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_KVM) += hyp/
 
 kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
-	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
+	 arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2b542fdc237e..48f89d80f464 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1744,7 +1744,9 @@ static int init_subsystems(void)
 	if (err)
 		goto out;
 
-	kvm_perf_init();
+	kvm_pmu_init();
+	kvm_register_perf_callbacks(NULL);
+
 	kvm_sys_reg_table_init();
 
 out:
@@ -2160,7 +2162,7 @@ int kvm_arch_init(void *opaque)
 /* NOP: Compiling as a module not supported */
 void kvm_arch_exit(void)
 {
-	kvm_perf_teardown();
+	kvm_unregister_perf_callbacks();
 }
 
 static int __init early_kvm_mode_cfg(char *arg)
diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
deleted file mode 100644
index 0b902e0d5b5d..000000000000
--- a/arch/arm64/kvm/perf.c
+++ /dev/null
@@ -1,27 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Based on the x86 implementation.
- *
- * Copyright (C) 2012 ARM Ltd.
- * Author: Marc Zyngier <marc.zyngier@arm.com>
- */
-
-#include <linux/perf_event.h>
-#include <linux/kvm_host.h>
-
-#include <asm/kvm_emulate.h>
-
-DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
-
-void kvm_perf_init(void)
-{
-	if (kvm_pmu_probe_pmuver() != 0xf && !is_protected_kvm_enabled())
-		static_branch_enable(&kvm_arm_pmu_available);
-
-	kvm_register_perf_callbacks(NULL);
-}
-
-void kvm_perf_teardown(void)
-{
-	kvm_unregister_perf_callbacks();
-}
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 03a6c1f4a09a..d98b57a17043 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -7,6 +7,14 @@
 #include <linux/perf_event.h>
 #include <asm/kvm_hyp.h>
 
+DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
+
+void kvm_pmu_init(void)
+{
+	if (kvm_pmu_probe_pmuver() != 0xf && !is_protected_kvm_enabled())
+		static_branch_enable(&kvm_arm_pmu_available);
+}
+
 /*
  * Given the perf event attributes and system type, determine
  * if we are going to need to switch counters at guest entry/exit.
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 864b9997efb2..42270676498d 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -14,6 +14,7 @@
 #define ARMV8_PMU_MAX_COUNTER_PAIRS	((ARMV8_PMU_MAX_COUNTERS + 1) >> 1)
 
 DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
+void kvm_pmu_init(void);
 
 static __always_inline bool kvm_arm_support_pmu_v3(void)
 {
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref
  2021-08-28  0:35 ` [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref Sean Christopherson
@ 2021-08-28 19:44   ` Peter Zijlstra
  2021-09-16 21:41     ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-08-28 19:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Fri, Aug 27, 2021 at 05:35:46PM -0700, Sean Christopherson wrote:

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2d510ad750ed..6b0405e578c1 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1237,6 +1237,14 @@ extern void perf_event_bpf_event(struct bpf_prog *prog,
>  				 u16 flags);
>  
>  extern struct perf_guest_info_callbacks *perf_guest_cbs;
> +static inline struct perf_guest_info_callbacks *perf_get_guest_cbs(void)
> +{
> +	/* Reg/unreg perf_guest_cbs waits for readers via synchronize_rcu(). */
> +	lockdep_assert_preemption_disabled();
> +
> +	/* Prevent reloading between a !NULL check and dereferences. */
> +	return READ_ONCE(perf_guest_cbs);
> +}

Nice..

>  extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
>  extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
>  
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 464917096e73..2126f6327321 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6491,14 +6491,19 @@ struct perf_guest_info_callbacks *perf_guest_cbs;
>  
>  int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
>  {
> -	perf_guest_cbs = cbs;
> +	if (WARN_ON_ONCE(perf_guest_cbs))
> +		return -EBUSY;
> +
> +	WRITE_ONCE(perf_guest_cbs, cbs);
> +	synchronize_rcu();

You're waiting for all NULL users to go away? :-) IOW, we can do without
this synchronize_rcu() call.

>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
>  
>  int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
>  {
> -	perf_guest_cbs = NULL;

	if (WARN_ON_ONCE(perf_guest_cbs != cbs))
		return -EBUSY;

?

> +	WRITE_ONCE(perf_guest_cbs, NULL);
> +	synchronize_rcu();
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);

Yes, this ought to work fine.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 05/13] perf: Force architectures to opt-in to guest callbacks
  2021-08-28  0:35 ` [PATCH v2 05/13] perf: Force architectures to opt-in to " Sean Christopherson
@ 2021-08-28 19:47   ` Peter Zijlstra
  2021-09-16 22:30     ` Sean Christopherson
  2021-09-21 16:44     ` Paolo Bonzini
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-08-28 19:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Fri, Aug 27, 2021 at 05:35:50PM -0700, Sean Christopherson wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index 55f9f7738ebb..9ef51ae53977 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1776,6 +1776,9 @@ config HAVE_PERF_EVENTS
>  	help
>  	  See tools/perf/design.txt for details.
>  
> +config HAVE_GUEST_PERF_EVENTS
> +	bool
	depends on HAVE_KVM

?

> +
>  config PERF_USE_VMALLOC
>  	bool
>  	help

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
                   ` (12 preceding siblings ...)
  2021-08-28  0:35 ` [PATCH v2 13/13] KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c / pmu.c Sean Christopherson
@ 2021-08-28 20:13 ` Peter Zijlstra
  2021-09-16 21:37   ` Sean Christopherson
  13 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2021-08-28 20:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Fri, Aug 27, 2021 at 05:35:45PM -0700, Sean Christopherson wrote:
> Like Xu (2):
>   perf/core: Rework guest callbacks to prepare for static_call support
>   perf/core: Use static_call to optimize perf_guest_info_callbacks
> 
> Sean Christopherson (11):
>   perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and
>     deref
>   KVM: x86: Register perf callbacks after calling vendor's
>     hardware_setup()
>   KVM: x86: Register Processor Trace interrupt hook iff PT enabled in
>     guest
>   perf: Stop pretending that perf can handle multiple guest callbacks
>   perf: Force architectures to opt-in to guest callbacks
>   KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu
>     variable
>   KVM: x86: More precisely identify NMI from guest when handling PMI
>   KVM: Move x86's perf guest info callbacks to generic KVM
>   KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
>   KVM: arm64: Convert to the generic perf callbacks
>   KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c /
>     pmu.c

Lets keep the whole intel_pt crud inside x86...

---
Index: linux-2.6/arch/x86/events/core.c
===================================================================
--- linux-2.6.orig/arch/x86/events/core.c
+++ linux-2.6/arch/x86/events/core.c
@@ -92,7 +92,7 @@ DEFINE_STATIC_CALL_RET0(x86_pmu_guest_ge
 
 DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
 DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
-DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
+DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, unsigned int (*)(void));
 
 void arch_perf_update_guest_cbs(struct perf_guest_info_callbacks *guest_cbs)
 {
@@ -103,14 +103,6 @@ void arch_perf_update_guest_cbs(struct p
 		static_call_update(x86_guest_state, (void *)&__static_call_return0);
 		static_call_update(x86_guest_get_ip, (void *)&__static_call_return0);
 	}
-
-	/* Implementing ->handle_intel_pt_intr is optional. */
-	if (guest_cbs && guest_cbs->handle_intel_pt_intr)
-		static_call_update(x86_guest_handle_intel_pt_intr,
-				   guest_cbs->handle_intel_pt_intr);
-	else
-		static_call_update(x86_guest_handle_intel_pt_intr,
-				   (void *)&__static_call_return0);
 }
 
 u64 __read_mostly hw_cache_event_ids
Index: linux-2.6/arch/x86/events/intel/core.c
===================================================================
--- linux-2.6.orig/arch/x86/events/intel/core.c
+++ linux-2.6/arch/x86/events/intel/core.c
@@ -2782,7 +2782,7 @@ static void intel_pmu_reset(void)
 	local_irq_restore(flags);
 }
 
-DECLARE_STATIC_CALL(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
+DECLARE_STATIC_CALL(x86_guest_handle_intel_pt_intr, unsigned int (*)(void));
 
 static int handle_pmi_common(struct pt_regs *regs, u64 status)
 {
Index: linux-2.6/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6.orig/arch/x86/kvm/x86.c
+++ linux-2.6/arch/x86/kvm/x86.c
@@ -10960,7 +10960,14 @@ int kvm_arch_hardware_setup(void *opaque
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 	kvm_ops_static_call_update();
 
-	kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
+	kvm_register_perf_callbacks();
+	if (ops->handle_intel_pt_intr) {
+		static_call_update(x86_guest_handle_intel_pt_intr,
+				   ops->handle_intel_pt_intr);
+	} else {
+		static_call_update(x86_guest_handle_intel_pt_intr,
+				   (void *)&__static_call_return0);
+	}
 
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		supported_xss = 0;
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -32,7 +32,6 @@
 struct perf_guest_info_callbacks {
 	unsigned int			(*state)(void);
 	unsigned long			(*get_ip)(void);
-	unsigned int			(*handle_intel_pt_intr)(void);
 };
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
Index: linux-2.6/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.orig/virt/kvm/kvm_main.c
+++ linux-2.6/virt/kvm/kvm_main.c
@@ -5374,12 +5374,10 @@ static unsigned long kvm_guest_get_ip(vo
 static struct perf_guest_info_callbacks kvm_guest_cbs = {
 	.state			= kvm_guest_state,
 	.get_ip			= kvm_guest_get_ip,
-	.handle_intel_pt_intr	= NULL,
 };
 
-void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void))
+void kvm_register_perf_callbacks(void)
 {
-	kvm_guest_cbs.handle_intel_pt_intr = pt_intr_handler;
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);
 }
 #endif
Index: linux-2.6/arch/arm64/kvm/arm.c
===================================================================
--- linux-2.6.orig/arch/arm64/kvm/arm.c
+++ linux-2.6/arch/arm64/kvm/arm.c
@@ -1749,7 +1749,7 @@ static int init_subsystems(void)
 		goto out;
 
 	kvm_pmu_init();
-	kvm_register_perf_callbacks(NULL);
+	kvm_register_perf_callbacks();
 
 	kvm_sys_reg_table_init();
 
Index: linux-2.6/include/linux/kvm_host.h
===================================================================
--- linux-2.6.orig/include/linux/kvm_host.h
+++ linux-2.6/include/linux/kvm_host.h
@@ -1121,7 +1121,7 @@ static inline bool kvm_arch_intc_initial
 #ifdef __KVM_WANT_PERF_CALLBACKS
 unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu);
 
-void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void));
+void kvm_register_perf_callbacks(void);
 static inline void kvm_unregister_perf_callbacks(void)
 {
 	perf_unregister_guest_info_callbacks();

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-08-28 20:13 ` [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Peter Zijlstra
@ 2021-09-16 21:37   ` Sean Christopherson
  2021-09-17  7:28     ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2021-09-16 21:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Sat, Aug 28, 2021, Peter Zijlstra wrote:
> On Fri, Aug 27, 2021 at 05:35:45PM -0700, Sean Christopherson wrote:
> > Like Xu (2):
> >   perf/core: Rework guest callbacks to prepare for static_call support
> >   perf/core: Use static_call to optimize perf_guest_info_callbacks
> > 
> > Sean Christopherson (11):
> >   perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and
> >     deref
> >   KVM: x86: Register perf callbacks after calling vendor's
> >     hardware_setup()
> >   KVM: x86: Register Processor Trace interrupt hook iff PT enabled in
> >     guest
> >   perf: Stop pretending that perf can handle multiple guest callbacks
> >   perf: Force architectures to opt-in to guest callbacks
> >   KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu
> >     variable
> >   KVM: x86: More precisely identify NMI from guest when handling PMI
> >   KVM: Move x86's perf guest info callbacks to generic KVM
> >   KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
> >   KVM: arm64: Convert to the generic perf callbacks
> >   KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c /
> >     pmu.c

Argh, sorry, I somehow managed to miss all of your replies.  I'll get back to
this series next week.  Thanks for the quick response!

> Lets keep the whole intel_pt crud inside x86...

In theory, I like the idea of burying intel_pt inside x86 (and even in Intel+VMX code
for the most part), but the actual implementation is a bit gross.  Because of the
whole "KVM can be a module" thing, either the static call and __static_call_return0
would need to be exported, or a new register/unregister pair would have to be exported.

The unregister path would also need its own synchronize_rcu().  In general, I
don't love duplicating the logic, but it's not the end of the world.

Either way works for me.  Paolo or Peter, do either of you have a preference?

> ---
> Index: linux-2.6/arch/x86/events/core.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/events/core.c
> +++ linux-2.6/arch/x86/events/core.c
> @@ -92,7 +92,7 @@ DEFINE_STATIC_CALL_RET0(x86_pmu_guest_ge
>  
>  DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
>  DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
> -DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
> +DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, unsigned int (*)(void));

FWIW, the param needs to be a raw function, not a function pointer. 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref
  2021-08-28 19:44   ` Peter Zijlstra
@ 2021-09-16 21:41     ` Sean Christopherson
  0 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-09-16 21:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Sat, Aug 28, 2021, Peter Zijlstra wrote:
> On Fri, Aug 27, 2021 at 05:35:46PM -0700, Sean Christopherson wrote:
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 464917096e73..2126f6327321 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6491,14 +6491,19 @@ struct perf_guest_info_callbacks *perf_guest_cbs;
> >  
> >  int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
> >  {
> > -	perf_guest_cbs = cbs;
> > +	if (WARN_ON_ONCE(perf_guest_cbs))
> > +		return -EBUSY;
> > +
> > +	WRITE_ONCE(perf_guest_cbs, cbs);
> > +	synchronize_rcu();
> 
> You're waiting for all NULL users to go away? :-) IOW, we can do without
> this synchronize_rcu() call.

Doh, right.  I was thinking KVM needed to wait for in-progress NMI to exit to
ensure guest PT interrupts are handled correctly, but obviously the NMI handler
needs to exit for that CPU to get into a guest...

> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
> >  
> >  int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
> >  {
> > -	perf_guest_cbs = NULL;
> 
> 	if (WARN_ON_ONCE(perf_guest_cbs != cbs))
> 		return -EBUSY;
> 
> ?

Works for me.  I guess I'm more optimistic about people not being morons :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 05/13] perf: Force architectures to opt-in to guest callbacks
  2021-08-28 19:47   ` Peter Zijlstra
@ 2021-09-16 22:30     ` Sean Christopherson
  2021-09-21 16:44     ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-09-16 22:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Sat, Aug 28, 2021, Peter Zijlstra wrote:
> On Fri, Aug 27, 2021 at 05:35:50PM -0700, Sean Christopherson wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 55f9f7738ebb..9ef51ae53977 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1776,6 +1776,9 @@ config HAVE_PERF_EVENTS
> >  	help
> >  	  See tools/perf/design.txt for details.
> >  
> > +config HAVE_GUEST_PERF_EVENTS
> > +	bool
> 	depends on HAVE_KVM
> 
> ?

Ah, nice!  We can go even further to:

	depends on HAVE_PERF_EVENTS && HAVE_KVM

though I'm pretty sure all architectures that select HAVE_KVM also select
HAVE_PERF_EVENTS.

Huh.  arm64 doesn't select HAVE_KVM even though it selects almost literally every
other HAVE_KVM_* config.  arm64 has some other weirdness with CONFIG_KVM, I'll add
a patch or two to fix that stuff and amend this patch as above.

Thanks again!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-09-16 21:37   ` Sean Christopherson
@ 2021-09-17  7:28     ` Peter Zijlstra
  2021-09-17 16:53       ` Sean Christopherson
  2021-09-20 12:05       ` Paolo Bonzini
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2021-09-17  7:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Thu, Sep 16, 2021 at 09:37:43PM +0000, Sean Christopherson wrote:
> On Sat, Aug 28, 2021, Peter Zijlstra wrote:

> Argh, sorry, I somehow managed to miss all of your replies.  I'll get back to
> this series next week.  Thanks for the quick response!
> 
> > Lets keep the whole intel_pt crud inside x86...
> 
> In theory, I like the idea of burying intel_pt inside x86 (and even in
> Intel+VMX code for the most part), but the actual implementation is a
> bit gross.  Because of the whole "KVM can be a module" thing,

ARGH!! we should really fix that. I've heard other archs have made much
better choices here.

> either
> the static call and __static_call_return0 would need to be exported,
> or a new register/unregister pair would have to be exported.

So I don't mind exporting __static_call_return0, but exporting a raw
static_call is much like exporting a function pointer :/

> The unregister path would also need its own synchronize_rcu().  In general, I
> don't love duplicating the logic, but it's not the end of the world.
> 
> Either way works for me.  Paolo or Peter, do either of you have a preference?

Can we de-feature kvm as a module and only have this PT functionality
when built-in? :-)


> > ---
> > Index: linux-2.6/arch/x86/events/core.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/events/core.c
> > +++ linux-2.6/arch/x86/events/core.c
> > @@ -92,7 +92,7 @@ DEFINE_STATIC_CALL_RET0(x86_pmu_guest_ge
> >  
> >  DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
> >  DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
> > -DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
> > +DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, unsigned int (*)(void));
> 
> FWIW, the param needs to be a raw function, not a function pointer. 

Yeah, I keep making that mistake.. and I wrote the bloody thing :/

I have a 'fix' for that, but I need to finish that and also flag-day
change :-(

  https://lkml.kernel.org/r/YS+0eIeAJsRRArk4@hirez.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-09-17  7:28     ` Peter Zijlstra
@ 2021-09-17 16:53       ` Sean Christopherson
  2021-09-20 12:05       ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-09-17 16:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Paolo Bonzini,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Fri, Sep 17, 2021, Peter Zijlstra wrote:
> On Thu, Sep 16, 2021 at 09:37:43PM +0000, Sean Christopherson wrote:
> So I don't mind exporting __static_call_return0, but exporting a raw
> static_call is much like exporting a function pointer :/

Ya, that part is quite gross.

> > The unregister path would also need its own synchronize_rcu().  In general, I
> > don't love duplicating the logic, but it's not the end of the world.
> > 
> > Either way works for me.  Paolo or Peter, do either of you have a preference?
> 
> Can we de-feature kvm as a module and only have this PT functionality
> when built-in? :-)

I agree that many of the for-KVM exports are ugly, especially several of the
perf exports, but I will fight tooth and nail to keep KVM-as-a-module.  It is
invaluable for development and testing, and in the not-too-distant future there
is KVM-maintenance related functionality that we'd like to implement that relies
on KVM being a module.

I would be more than happy to help explore approaches that reduce the for-KVM
exports, but I am strongly opposed to defeaturing KVM-as-a-module.  I have a few
nascent ideas for eliminating a handful of a random exports, but no clever ideas
for eliminating perf's for-KVM exports.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-09-17  7:28     ` Peter Zijlstra
  2021-09-17 16:53       ` Sean Christopherson
@ 2021-09-20 12:05       ` Paolo Bonzini
  2021-09-20 12:22         ` Marc Zyngier
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-09-20 12:05 UTC (permalink / raw)
  To: Peter Zijlstra, Sean Christopherson
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Boris Ostrovsky,
	Juergen Gross, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	James Morse, Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On 17/09/21 09:28, Peter Zijlstra wrote:
>> In theory, I like the idea of burying intel_pt inside x86 (and even in
>> Intel+VMX code for the most part), but the actual implementation is a
>> bit gross.  Because of the whole "KVM can be a module" thing,
> 
> ARGH!! we should really fix that. I've heard other archs have made much
> better choices here.

I think that's only ARM, and even then it is only because of limitations 
of the hardware which mostly apply only if VHE is not in use.

If anything, it's ARM that should support module build in VHE mode 
(Linux would still need to know whether it will be running at EL1 or 
EL2, but KVM's functionality is as self-contained as on x86 in the VHE 
case).

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-09-20 12:05       ` Paolo Bonzini
@ 2021-09-20 12:22         ` Marc Zyngier
  2021-09-20 13:18           ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2021-09-20 12:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Sean Christopherson, Ingo Molnar,
	Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Guo Ren, Nick Hu, Greentime Hu, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Thomas Gleixner,
	Borislav Petkov, x86, Boris Ostrovsky, Juergen Gross,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Mon, 20 Sep 2021 13:05:25 +0100,
Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 17/09/21 09:28, Peter Zijlstra wrote:
> >> In theory, I like the idea of burying intel_pt inside x86 (and even in
> >> Intel+VMX code for the most part), but the actual implementation is a
> >> bit gross.  Because of the whole "KVM can be a module" thing,
> > 
> > ARGH!! we should really fix that. I've heard other archs have made much
> > better choices here.
> 
> I think that's only ARM, and even then it is only because of
> limitations of the hardware which mostly apply only if VHE is not in
> use.
> 
> If anything, it's ARM that should support module build in VHE mode
> (Linux would still need to know whether it will be running at EL1 or
> EL2, but KVM's functionality is as self-contained as on x86 in the VHE
> case).

I don't see this happening anytime soon. At least not before we
declare the arm64 single kernel image policy to be obsolete.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-09-20 12:22         ` Marc Zyngier
@ 2021-09-20 13:18           ` Paolo Bonzini
  2021-09-20 13:40             ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-09-20 13:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Peter Zijlstra, Sean Christopherson, Ingo Molnar,
	Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Guo Ren, Nick Hu, Greentime Hu, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Thomas Gleixner,
	Borislav Petkov, x86, Boris Ostrovsky, Juergen Gross,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On 20/09/21 14:22, Marc Zyngier wrote:
>> I think that's only ARM, and even then it is only because of
>> limitations of the hardware which mostly apply only if VHE is not in
>> use.
>>
>> If anything, it's ARM that should support module build in VHE mode
>> (Linux would still need to know whether it will be running at EL1 or
>> EL2, but KVM's functionality is as self-contained as on x86 in the VHE
>> case).
> I don't see this happening anytime soon. At least not before we
> declare the arm64 single kernel image policy to be obsolete.

--verbose please. :)  I am sure you're right, but I don't understand the 
link between the two.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-09-20 13:18           ` Paolo Bonzini
@ 2021-09-20 13:40             ` Marc Zyngier
  2021-09-20 18:22               ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2021-09-20 13:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Sean Christopherson, Ingo Molnar,
	Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Guo Ren, Nick Hu, Greentime Hu, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Thomas Gleixner,
	Borislav Petkov, x86, Boris Ostrovsky, Juergen Gross,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Mon, 20 Sep 2021 14:18:30 +0100,
Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 20/09/21 14:22, Marc Zyngier wrote:
> >> I think that's only ARM, and even then it is only because of
> >> limitations of the hardware which mostly apply only if VHE is not in
> >> use.
> >> 
> >> If anything, it's ARM that should support module build in VHE mode
> >> (Linux would still need to know whether it will be running at EL1 or
> >> EL2, but KVM's functionality is as self-contained as on x86 in the VHE
> >> case).
> > I don't see this happening anytime soon. At least not before we
> > declare the arm64 single kernel image policy to be obsolete.
> 
> --verbose please. :)  I am sure you're right, but I don't understand
> the link between the two.

To start making KVM/arm64 modular, you'd have to build it such as
there is no support for the nVHE hypervisor anymore. Which would mean
two different configs (one that can only work with VHE, and one for
the rest) and contradicts the current single kernel image policy.

It is bad enough that we have to support 3 sets of page sizes.
Doubling the validation space for the sake of being able to unload KVM
seems a dubious prospect.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks
  2021-09-20 13:40             ` Marc Zyngier
@ 2021-09-20 18:22               ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-09-20 18:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Peter Zijlstra, Sean Christopherson, Ingo Molnar,
	Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Guo Ren, Nick Hu, Greentime Hu, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Thomas Gleixner,
	Borislav Petkov, x86, Boris Ostrovsky, Juergen Gross,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, James Morse,
	Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On 20/09/21 15:40, Marc Zyngier wrote:
>>> At least not before we
>>> declare the arm64 single kernel image policy to be obsolete.
>>
>> --verbose please.:)   I am sure you're right, but I don't understand
>> the link between the two.
>
> To start making KVM/arm64 modular, you'd have to build it such as
> there is no support for the nVHE hypervisor anymore. Which would mean
> two different configs (one that can only work with VHE, and one for
> the rest) and contradicts the current single kernel image policy.

Ah okay, I interpreted the policy as "it's possible to build a single 
kernel image but it would be possible to build an image for a subset of 
the features as well".

In that case you could have one config that can work either with or 
without VHE (and supports y/n) and one config that can only work with 
VHE (and supports y/m/n).  The code to enter VHE EL2 would of course 
always be builtin.

> It is bad enough that we have to support 3 sets of page sizes.
> Doubling the validation space for the sake of being able to unload KVM
> seems a dubious prospect.

It's not even a configuration that matches kconfig very well, since it 
does have a way to build something *only as a module*, but not a way to 
build something only as built-in.

That said, if you had the possibility to unload/reload KVM, you'll 
quickly become unable to live without it. :)

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 05/13] perf: Force architectures to opt-in to guest callbacks
  2021-08-28 19:47   ` Peter Zijlstra
  2021-09-16 22:30     ` Sean Christopherson
@ 2021-09-21 16:44     ` Paolo Bonzini
  2021-09-21 21:29       ` Sean Christopherson
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-09-21 16:44 UTC (permalink / raw)
  To: Peter Zijlstra, Sean Christopherson
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Will Deacon, Mark Rutland,
	Catalin Marinas, Marc Zyngier, Guo Ren, Nick Hu, Greentime Hu,
	Vincent Chen, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Thomas Gleixner, Borislav Petkov, x86, Boris Ostrovsky,
	Juergen Gross, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	James Morse, Alexandru Elisei, Suzuki K Poulose, H. Peter Anvin,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On 28/08/21 21:47, Peter Zijlstra wrote:
>> +config HAVE_GUEST_PERF_EVENTS
>> +	bool
> 	depends on HAVE_KVM

It won't really do anything, since Kconfig does not detects conflicts 
between select' and 'depends on' clauses.

Rather, should the symbol be selected by KVM, instead of ARM64 and X86?

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 05/13] perf: Force architectures to opt-in to guest callbacks
  2021-09-21 16:44     ` Paolo Bonzini
@ 2021-09-21 21:29       ` Sean Christopherson
  0 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-09-21 21:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Will Deacon, Mark Rutland, Catalin Marinas, Marc Zyngier,
	Guo Ren, Nick Hu, Greentime Hu, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Thomas Gleixner, Borislav Petkov, x86,
	Boris Ostrovsky, Juergen Gross, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, James Morse, Alexandru Elisei, Suzuki K Poulose,
	H. Peter Anvin, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Stefano Stabellini, linux-perf-users, linux-kernel,
	linux-arm-kernel, kvmarm, linux-csky, linux-riscv, kvm,
	xen-devel, Artem Kashkanov, Like Xu, Zhu Lingshan

On Tue, Sep 21, 2021, Paolo Bonzini wrote:
> On 28/08/21 21:47, Peter Zijlstra wrote:
> > > +config HAVE_GUEST_PERF_EVENTS
> > > +	bool
> > 	depends on HAVE_KVM
> 
> It won't really do anything, since Kconfig does not detects conflicts
> between select' and 'depends on' clauses.

It does throw a WARN, though the build doesn't fail.

WARNING: unmet direct dependencies detected for HAVE_GUEST_PERF_EVENTS
  Depends on [n]: HAVE_KVM [=n] && HAVE_PERF_EVENTS [=y]
  Selected by [y]:
  - ARM64 [=y]

WARNING: unmet direct dependencies detected for HAVE_GUEST_PERF_EVENTS
  Depends on [n]: HAVE_KVM [=n] && HAVE_PERF_EVENTS [=y]
  Selected by [y]:
  - ARM64 [=y]

WARNING: unmet direct dependencies detected for HAVE_GUEST_PERF_EVENTS
  Depends on [n]: HAVE_KVM [=n] && HAVE_PERF_EVENTS [=y]
  Selected by [y]:
  - ARM64 [=y]

> Rather, should the symbol be selected by KVM, instead of ARM64 and X86?

By KVM, you mean KVM in arm64 and x86, correct?  Because HAVE_GUEST_PERF_EVENTS
should not be selected for s390, PPC, or MIPS.

Oh, and Xen also uses the callbacks on x86, which means the HAVE_KVM part is
arguabably wrong, even though it's guaranteed to be true for the XEN_PV case.
I'll drop that dependency and send out a separate series to clean up the arm64
side of HAVE_KVM.

The reason I didn't bury HAVE_GUEST_PERF_EVENTS under KVM (and XEN_PV) is that
there are number of references to the callbacks throught perf and I didn't want
to create #ifdef hell.

But I think I figured out a not-awful solution.  If there are wrappers+stubs for
the guest callback users, then the new Kconfig can be selected on-demand instead
of unconditionally by arm64 and x86.  That has the added bonus of eliminating
the relevant code paths for !KVM (and !XEN_PV on x86), with or without static_call.
It also obviates the needs for __KVM_WANT_GUEST_PERF_EVENTS or whatever I called
that thing.

It more or less requires defining the static calls in generic perf, but I think
that actually ends up being good thing as it consolidates more code without
introducing more #ifdefs.  The diffstats for the static_call() conversions are
also quite nice.

 include/linux/perf_event.h | 28 ++++++----------------------
 kernel/events/core.c       | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 22 deletions(-)

I'll try to get a new version out today or tomorrow.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-09-21 21:30 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-28  0:35 [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 01/13] perf: Ensure perf_guest_cbs aren't reloaded between !NULL check and deref Sean Christopherson
2021-08-28 19:44   ` Peter Zijlstra
2021-09-16 21:41     ` Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 02/13] KVM: x86: Register perf callbacks after calling vendor's hardware_setup() Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 03/13] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 04/13] perf: Stop pretending that perf can handle multiple guest callbacks Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 05/13] perf: Force architectures to opt-in to " Sean Christopherson
2021-08-28 19:47   ` Peter Zijlstra
2021-09-16 22:30     ` Sean Christopherson
2021-09-21 16:44     ` Paolo Bonzini
2021-09-21 21:29       ` Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 06/13] perf/core: Rework guest callbacks to prepare for static_call support Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 07/13] perf/core: Use static_call to optimize perf_guest_info_callbacks Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 08/13] KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 09/13] KVM: x86: More precisely identify NMI from guest when handling PMI Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 10/13] KVM: Move x86's perf guest info callbacks to generic KVM Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 11/13] KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 12/13] KVM: arm64: Convert to the generic perf callbacks Sean Christopherson
2021-08-28  0:35 ` [PATCH v2 13/13] KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c / pmu.c Sean Christopherson
2021-08-28 20:13 ` [PATCH v2 00/13] perf: KVM: Fix, optimize, and clean up callbacks Peter Zijlstra
2021-09-16 21:37   ` Sean Christopherson
2021-09-17  7:28     ` Peter Zijlstra
2021-09-17 16:53       ` Sean Christopherson
2021-09-20 12:05       ` Paolo Bonzini
2021-09-20 12:22         ` Marc Zyngier
2021-09-20 13:18           ` Paolo Bonzini
2021-09-20 13:40             ` Marc Zyngier
2021-09-20 18:22               ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).