All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 RESEND 0/5] x86: The perf/x86 changes to support guest Arch LBR
@ 2021-03-22  6:06 Like Xu
  2021-03-22  6:06 ` [PATCH v4 RESEND 1/5] perf/x86/intel: Fix the comment about guest LBR support on KVM Like Xu
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Like Xu @ 2021-03-22  6:06 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, Kan Liang, x86, linux-kernel,
	Like Xu

Hi Peter,

Please help review these minor perf/x86 changes in this patch set,
and we need some of them to support Guest Architectural LBR in KVM.

If you are interested in the KVM emulation, please check
https://lore.kernel.org/kvm/20210314155225.206661-1-like.xu@linux.intel.com/

Please check more details in each commit and feel free to comment.

Like Xu (5):
  perf/x86/intel: Fix the comment about guest LBR support on KVM
  perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
  perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
  perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h

 arch/x86/events/core.c           |  8 +++++---
 arch/x86/events/intel/bts.c      |  2 +-
 arch/x86/events/intel/core.c     |  6 +++---
 arch/x86/events/intel/lbr.c      | 28 +++++++++++++++++-----------
 arch/x86/events/perf_event.h     |  8 +++++++-
 arch/x86/include/asm/msr-index.h |  1 +
 6 files changed, 34 insertions(+), 19 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 RESEND 1/5] perf/x86/intel: Fix the comment about guest LBR support on KVM
  2021-03-22  6:06 [PATCH v4 RESEND 0/5] x86: The perf/x86 changes to support guest Arch LBR Like Xu
@ 2021-03-22  6:06 ` Like Xu
  2021-03-22  6:06 ` [PATCH v4 RESEND 2/5] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers Like Xu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Like Xu @ 2021-03-22  6:06 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, Kan Liang, x86, linux-kernel,
	Like Xu, Andi Kleen

Starting from v5.12, KVM reports guest LBR and extra_regs support
when the host has relevant support. Just delete this part of the
comment and fix a typo incidentally.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 37ce38403cb8..382dd3994463 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5737,8 +5737,7 @@ __init int intel_pmu_init(void)
 
 	/*
 	 * Access LBR MSR may cause #GP under certain circumstances.
-	 * E.g. KVM doesn't support LBR MSR
-	 * Check all LBT MSR here.
+	 * Check all LBR MSR here.
 	 * Disable LBR access if any LBR MSRs can not be accessed.
 	 */
 	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 RESEND 2/5] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
  2021-03-22  6:06 [PATCH v4 RESEND 0/5] x86: The perf/x86 changes to support guest Arch LBR Like Xu
  2021-03-22  6:06 ` [PATCH v4 RESEND 1/5] perf/x86/intel: Fix the comment about guest LBR support on KVM Like Xu
@ 2021-03-22  6:06 ` Like Xu
  2021-03-23 21:38   ` Peter Zijlstra
  2021-03-22  6:06 ` [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Like Xu @ 2021-03-22  6:06 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, Kan Liang, x86, linux-kernel,
	Like Xu, Andi Kleen

If the platform supports LBR_INFO register, the x86_pmu.lbr_info will
be assigned in intel_pmu_?_lbr_init_?() and it's safe to expose LBR_INFO
in the x86_perf_get_lbr() directly, instead of relying on lbr_format check.

Also Architectural LBR has IA32_LBR_x_INFO instead of LBR_FORMAT_INFO_x
to hold metadata for the operation, including mispredict, TSX, and
elapsed cycle time information.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/lbr.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 21890dacfcfe..355ea70f1879 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1832,12 +1832,10 @@ void __init intel_pmu_arch_lbr_init(void)
  */
 int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
 {
-	int lbr_fmt = x86_pmu.intel_cap.lbr_format;
-
 	lbr->nr = x86_pmu.lbr_nr;
 	lbr->from = x86_pmu.lbr_from;
 	lbr->to = x86_pmu.lbr_to;
-	lbr->info = (lbr_fmt == LBR_FORMAT_INFO) ? x86_pmu.lbr_info : 0;
+	lbr->info = x86_pmu.lbr_info;
 
 	return 0;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-22  6:06 [PATCH v4 RESEND 0/5] x86: The perf/x86 changes to support guest Arch LBR Like Xu
  2021-03-22  6:06 ` [PATCH v4 RESEND 1/5] perf/x86/intel: Fix the comment about guest LBR support on KVM Like Xu
  2021-03-22  6:06 ` [PATCH v4 RESEND 2/5] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers Like Xu
@ 2021-03-22  6:06 ` Like Xu
  2021-03-23 20:56   ` Liang, Kan
                     ` (2 more replies)
  2021-03-22  6:06 ` [PATCH v4 RESEND 4/5] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR Like Xu
  2021-03-22  6:06 ` [PATCH v4 RESEND 5/5] perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h Like Xu
  4 siblings, 3 replies; 17+ messages in thread
From: Like Xu @ 2021-03-22  6:06 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, Kan Liang, x86, linux-kernel,
	Like Xu

If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
might_sleep_if() deep in kmem_cache_alloc() will generate the following
trace, and potentially cause a deadlock when another LBR event is added:

[  243.115549] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
[  243.117576] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 839, name: perf
[  243.119326] INFO: lockdep is turned off.
[  243.120249] irq event stamp: 0
[  243.120967] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  243.122415] hardirqs last disabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[  243.124302] softirqs last  enabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[  243.126255] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  243.128119] CPU: 0 PID: 839 Comm: perf Not tainted 5.11.0-rc4-guest+ #8
[  243.129654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  243.131520] Call Trace:
[  243.132112]  dump_stack+0x8d/0xb5
[  243.132896]  ___might_sleep.cold.106+0xb3/0xc3
[  243.133984]  slab_pre_alloc_hook.constprop.85+0x96/0xd0
[  243.135208]  ? intel_pmu_lbr_add+0x152/0x170
[  243.136207]  kmem_cache_alloc+0x36/0x250
[  243.137126]  intel_pmu_lbr_add+0x152/0x170
[  243.138088]  x86_pmu_add+0x83/0xd0
[  243.138889]  ? lock_acquire+0x158/0x350
[  243.139791]  ? lock_acquire+0x158/0x350
[  243.140694]  ? lock_acquire+0x158/0x350
[  243.141625]  ? lock_acquired+0x1e3/0x360
[  243.142544]  ? lock_release+0x1bf/0x340
[  243.143726]  ? trace_hardirqs_on+0x1a/0xd0
[  243.144823]  ? lock_acquired+0x1e3/0x360
[  243.145742]  ? lock_release+0x1bf/0x340
[  243.147107]  ? __slab_free+0x49/0x540
[  243.147966]  ? trace_hardirqs_on+0x1a/0xd0
[  243.148924]  event_sched_in.isra.129+0xf8/0x2a0
[  243.149989]  merge_sched_in+0x261/0x3e0
[  243.150889]  ? trace_hardirqs_on+0x1a/0xd0
[  243.151869]  visit_groups_merge.constprop.135+0x130/0x4a0
[  243.153122]  ? sched_clock_cpu+0xc/0xb0
[  243.154023]  ctx_sched_in+0x101/0x210
[  243.154884]  ctx_resched+0x6f/0xc0
[  243.155686]  perf_event_exec+0x21e/0x2e0
[  243.156641]  begin_new_exec+0x5e5/0xbd0
[  243.157540]  load_elf_binary+0x6af/0x1770
[  243.158478]  ? __kernel_read+0x19d/0x2b0
[  243.159977]  ? lock_acquire+0x158/0x350
[  243.160876]  ? __kernel_read+0x19d/0x2b0
[  243.161796]  bprm_execve+0x3c8/0x840
[  243.162638]  do_execveat_common.isra.38+0x1a5/0x1c0
[  243.163776]  __x64_sys_execve+0x32/0x40
[  243.164676]  do_syscall_64+0x33/0x40
[  243.165514]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  243.166746] RIP: 0033:0x7f6180a26feb
[  243.167590] Code: Unable to access opcode bytes at RIP 0x7f6180a26fc1.
[  243.169097] RSP: 002b:00007ffc6558ce18 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  243.170844] RAX: ffffffffffffffda RBX: 00007ffc65592d30 RCX: 00007f6180a26feb
[  243.172514] RDX: 000055657f408dc0 RSI: 00007ffc65592410 RDI: 00007ffc65592d30
[  243.174162] RBP: 00007ffc6558ce80 R08: 00007ffc6558cde0 R09: 0000000000000000
[  243.176042] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc65592410
[  243.177696] R13: 000055657f408dc0 R14: 0000000000000001 R15: 00007ffc65592410

One of the solution is to use GFP_ATOMIC, but it will make the code less
reliable under memory pressue. Let's move the memory allocation out of
the sleeping region and put it into the x86_reserve_hardware().

The disadvantage of this fix is that the cpuc->lbr_xsave memory
will be allocated for each cpu like the legacy ds_buffer.

Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/core.c       |  8 +++++---
 arch/x86/events/intel/bts.c  |  2 +-
 arch/x86/events/intel/lbr.c  | 22 ++++++++++++++++------
 arch/x86/events/perf_event.h |  8 +++++++-
 4 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df17129695..a4ce669cc78d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -373,7 +373,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
 	return x86_pmu_extra_regs(val, event);
 }
 
-int x86_reserve_hardware(void)
+int x86_reserve_hardware(struct perf_event *event)
 {
 	int err = 0;
 
@@ -382,8 +382,10 @@ int x86_reserve_hardware(void)
 		if (atomic_read(&pmc_refcount) == 0) {
 			if (!reserve_pmc_hardware())
 				err = -EBUSY;
-			else
+			else {
 				reserve_ds_buffers();
+				reserve_lbr_buffers(event);
+			}
 		}
 		if (!err)
 			atomic_inc(&pmc_refcount);
@@ -634,7 +636,7 @@ static int __x86_pmu_event_init(struct perf_event *event)
 	if (!x86_pmu_initialized())
 		return -ENODEV;
 
-	err = x86_reserve_hardware();
+	err = x86_reserve_hardware(event);
 	if (err)
 		return err;
 
diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 731dd8d0dbb1..057bb2f761a9 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -564,7 +564,7 @@ static int bts_event_init(struct perf_event *event)
 	if (x86_add_exclusive(x86_lbr_exclusive_bts))
 		return -EBUSY;
 
-	ret = x86_reserve_hardware();
+	ret = x86_reserve_hardware(event);
 	if (ret) {
 		x86_del_exclusive(x86_lbr_exclusive_bts);
 		return ret;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 355ea70f1879..237876733e12 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)
 
 void intel_pmu_lbr_add(struct perf_event *event)
 {
-	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
 	if (!x86_pmu.lbr_nr)
@@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
 	perf_sched_cb_inc(event->ctx->pmu);
 	if (!cpuc->lbr_users++ && !event->total_time_running)
 		intel_pmu_lbr_reset();
-
-	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
-	    kmem_cache && !cpuc->lbr_xsave &&
-	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
-		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
 }
 
 void release_lbr_buffers(void)
@@ -721,6 +715,22 @@ void release_lbr_buffers(void)
 	}
 }
 
+void reserve_lbr_buffers(struct perf_event *event)
+{
+	struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+	struct cpu_hw_events *cpuc;
+	int cpu;
+
+	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+		return;
+
+	for_each_possible_cpu(cpu) {
+		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
+			cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
+	}
+}
+
 void intel_pmu_lbr_del(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 53b2b5fc23bc..2fe77d3a98d6 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -968,7 +968,7 @@ int x86_add_exclusive(unsigned int what);
 
 void x86_del_exclusive(unsigned int what);
 
-int x86_reserve_hardware(void);
+int x86_reserve_hardware(struct perf_event *event);
 
 void x86_release_hardware(void);
 
@@ -1135,6 +1135,8 @@ void reserve_ds_buffers(void);
 
 void release_lbr_buffers(void);
 
+void reserve_lbr_buffers(struct perf_event *event);
+
 extern struct event_constraint bts_constraint;
 extern struct event_constraint vlbr_constraint;
 
@@ -1282,6 +1284,10 @@ static inline void release_lbr_buffers(void)
 {
 }
 
+static inline void reserve_lbr_buffers(struct perf_event *event)
+{
+}
+
 static inline int intel_pmu_init(void)
 {
 	return 0;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 RESEND 4/5] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
  2021-03-22  6:06 [PATCH v4 RESEND 0/5] x86: The perf/x86 changes to support guest Arch LBR Like Xu
                   ` (2 preceding siblings ...)
  2021-03-22  6:06 ` [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
@ 2021-03-22  6:06 ` Like Xu
  2021-03-23 21:49   ` Peter Zijlstra
  2021-03-22  6:06 ` [PATCH v4 RESEND 5/5] perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h Like Xu
  4 siblings, 1 reply; 17+ messages in thread
From: Like Xu @ 2021-03-22  6:06 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, Kan Liang, x86, linux-kernel,
	Like Xu, Andi Kleen

The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). KVM will
generate #GP for this MSR access, thereby preventing the initialization
of the guest LBR.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 382dd3994463..7f6d748421f2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5740,7 +5740,8 @@ __init int intel_pmu_init(void)
 	 * Check all LBR MSR here.
 	 * Disable LBR access if any LBR MSRs can not be accessed.
 	 */
-	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
+	if (x86_pmu.lbr_nr && !boot_cpu_has(X86_FEATURE_ARCH_LBR) &&
+	    !check_msr(x86_pmu.lbr_tos, 0x3UL))
 		x86_pmu.lbr_nr = 0;
 	for (i = 0; i < x86_pmu.lbr_nr; i++) {
 		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 RESEND 5/5] perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h
  2021-03-22  6:06 [PATCH v4 RESEND 0/5] x86: The perf/x86 changes to support guest Arch LBR Like Xu
                   ` (3 preceding siblings ...)
  2021-03-22  6:06 ` [PATCH v4 RESEND 4/5] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR Like Xu
@ 2021-03-22  6:06 ` Like Xu
  4 siblings, 0 replies; 17+ messages in thread
From: Like Xu @ 2021-03-22  6:06 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, Kan Liang, x86, linux-kernel,
	Like Xu

The ARCH_LBR_CTL_MASK will be reused for Arch LBR emulation in the KVM.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/lbr.c      | 2 --
 arch/x86/include/asm/msr-index.h | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 237876733e12..f60339ff0c13 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -168,8 +168,6 @@ enum {
 	 ARCH_LBR_RETURN		|\
 	 ARCH_LBR_OTHER_BRANCH)
 
-#define ARCH_LBR_CTL_MASK			0x7f000e
-
 static void intel_pmu_lbr_filter(struct cpu_hw_events *cpuc);
 
 static __always_inline bool is_lbr_call_stack_bit_set(u64 config)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..8f3375961efc 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -169,6 +169,7 @@
 #define LBR_INFO_BR_TYPE		(0xfull << LBR_INFO_BR_TYPE_OFFSET)
 
 #define MSR_ARCH_LBR_CTL		0x000014ce
+#define ARCH_LBR_CTL_MASK		0x7f000e
 #define ARCH_LBR_CTL_LBREN		BIT(0)
 #define ARCH_LBR_CTL_CPL_OFFSET		1
 #define ARCH_LBR_CTL_CPL		(0x3ull << ARCH_LBR_CTL_CPL_OFFSET)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-22  6:06 ` [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
@ 2021-03-23 20:56   ` Liang, Kan
  2021-03-23 21:41   ` Peter Zijlstra
  2021-03-24  1:32   ` Namhyung Kim
  2 siblings, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2021-03-23 20:56 UTC (permalink / raw)
  To: Like Xu, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, x86, linux-kernel



On 3/22/2021 2:06 AM, Like Xu wrote:
> If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
> might_sleep_if() deep in kmem_cache_alloc() will generate the following
> trace, and potentially cause a deadlock when another LBR event is added:
> 
> [  243.115549] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
> [  243.117576] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 839, name: perf
> [  243.119326] INFO: lockdep is turned off.
> [  243.120249] irq event stamp: 0
> [  243.120967] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [  243.122415] hardirqs last disabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
> [  243.124302] softirqs last  enabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
> [  243.126255] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  243.128119] CPU: 0 PID: 839 Comm: perf Not tainted 5.11.0-rc4-guest+ #8
> [  243.129654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> [  243.131520] Call Trace:
> [  243.132112]  dump_stack+0x8d/0xb5
> [  243.132896]  ___might_sleep.cold.106+0xb3/0xc3
> [  243.133984]  slab_pre_alloc_hook.constprop.85+0x96/0xd0
> [  243.135208]  ? intel_pmu_lbr_add+0x152/0x170
> [  243.136207]  kmem_cache_alloc+0x36/0x250
> [  243.137126]  intel_pmu_lbr_add+0x152/0x170
> [  243.138088]  x86_pmu_add+0x83/0xd0
> [  243.138889]  ? lock_acquire+0x158/0x350
> [  243.139791]  ? lock_acquire+0x158/0x350
> [  243.140694]  ? lock_acquire+0x158/0x350
> [  243.141625]  ? lock_acquired+0x1e3/0x360
> [  243.142544]  ? lock_release+0x1bf/0x340
> [  243.143726]  ? trace_hardirqs_on+0x1a/0xd0
> [  243.144823]  ? lock_acquired+0x1e3/0x360
> [  243.145742]  ? lock_release+0x1bf/0x340
> [  243.147107]  ? __slab_free+0x49/0x540
> [  243.147966]  ? trace_hardirqs_on+0x1a/0xd0
> [  243.148924]  event_sched_in.isra.129+0xf8/0x2a0
> [  243.149989]  merge_sched_in+0x261/0x3e0
> [  243.150889]  ? trace_hardirqs_on+0x1a/0xd0
> [  243.151869]  visit_groups_merge.constprop.135+0x130/0x4a0
> [  243.153122]  ? sched_clock_cpu+0xc/0xb0
> [  243.154023]  ctx_sched_in+0x101/0x210
> [  243.154884]  ctx_resched+0x6f/0xc0
> [  243.155686]  perf_event_exec+0x21e/0x2e0
> [  243.156641]  begin_new_exec+0x5e5/0xbd0
> [  243.157540]  load_elf_binary+0x6af/0x1770
> [  243.158478]  ? __kernel_read+0x19d/0x2b0
> [  243.159977]  ? lock_acquire+0x158/0x350
> [  243.160876]  ? __kernel_read+0x19d/0x2b0
> [  243.161796]  bprm_execve+0x3c8/0x840
> [  243.162638]  do_execveat_common.isra.38+0x1a5/0x1c0
> [  243.163776]  __x64_sys_execve+0x32/0x40
> [  243.164676]  do_syscall_64+0x33/0x40
> [  243.165514]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  243.166746] RIP: 0033:0x7f6180a26feb
> [  243.167590] Code: Unable to access opcode bytes at RIP 0x7f6180a26fc1.
> [  243.169097] RSP: 002b:00007ffc6558ce18 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
> [  243.170844] RAX: ffffffffffffffda RBX: 00007ffc65592d30 RCX: 00007f6180a26feb
> [  243.172514] RDX: 000055657f408dc0 RSI: 00007ffc65592410 RDI: 00007ffc65592d30
> [  243.174162] RBP: 00007ffc6558ce80 R08: 00007ffc6558cde0 R09: 0000000000000000
> [  243.176042] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc65592410
> [  243.177696] R13: 000055657f408dc0 R14: 0000000000000001 R15: 00007ffc65592410
> 
> One of the solution is to use GFP_ATOMIC, but it will make the code less
> reliable under memory pressue. Let's move the memory allocation out of
> the sleeping region and put it into the x86_reserve_hardware().
> 
> The disadvantage of this fix is that the cpuc->lbr_xsave memory
> will be allocated for each cpu like the legacy ds_buffer.
> 
> Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
> Suggested-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>

I observed the same issue when I did LBR test on an ADL machine. This 
patch fixes the issue.

Tested-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan
> ---
>   arch/x86/events/core.c       |  8 +++++---
>   arch/x86/events/intel/bts.c  |  2 +-
>   arch/x86/events/intel/lbr.c  | 22 ++++++++++++++++------
>   arch/x86/events/perf_event.h |  8 +++++++-
>   4 files changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 18df17129695..a4ce669cc78d 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -373,7 +373,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
>   	return x86_pmu_extra_regs(val, event);
>   }
>   
> -int x86_reserve_hardware(void)
> +int x86_reserve_hardware(struct perf_event *event)
>   {
>   	int err = 0;
>   
> @@ -382,8 +382,10 @@ int x86_reserve_hardware(void)
>   		if (atomic_read(&pmc_refcount) == 0) {
>   			if (!reserve_pmc_hardware())
>   				err = -EBUSY;
> -			else
> +			else {
>   				reserve_ds_buffers();
> +				reserve_lbr_buffers(event);
> +			}
>   		}
>   		if (!err)
>   			atomic_inc(&pmc_refcount);
> @@ -634,7 +636,7 @@ static int __x86_pmu_event_init(struct perf_event *event)
>   	if (!x86_pmu_initialized())
>   		return -ENODEV;
>   
> -	err = x86_reserve_hardware();
> +	err = x86_reserve_hardware(event);
>   	if (err)
>   		return err;
>   
> diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
> index 731dd8d0dbb1..057bb2f761a9 100644
> --- a/arch/x86/events/intel/bts.c
> +++ b/arch/x86/events/intel/bts.c
> @@ -564,7 +564,7 @@ static int bts_event_init(struct perf_event *event)
>   	if (x86_add_exclusive(x86_lbr_exclusive_bts))
>   		return -EBUSY;
>   
> -	ret = x86_reserve_hardware();
> +	ret = x86_reserve_hardware(event);
>   	if (ret) {
>   		x86_del_exclusive(x86_lbr_exclusive_bts);
>   		return ret;
> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> index 355ea70f1879..237876733e12 100644
> --- a/arch/x86/events/intel/lbr.c
> +++ b/arch/x86/events/intel/lbr.c
> @@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)
>   
>   void intel_pmu_lbr_add(struct perf_event *event)
>   {
> -	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>   
>   	if (!x86_pmu.lbr_nr)
> @@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
>   	perf_sched_cb_inc(event->ctx->pmu);
>   	if (!cpuc->lbr_users++ && !event->total_time_running)
>   		intel_pmu_lbr_reset();
> -
> -	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
> -	    kmem_cache && !cpuc->lbr_xsave &&
> -	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
> -		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
>   }
>   
>   void release_lbr_buffers(void)
> @@ -721,6 +715,22 @@ void release_lbr_buffers(void)
>   	}
>   }
>   
> +void reserve_lbr_buffers(struct perf_event *event)
> +{
> +	struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
> +	struct cpu_hw_events *cpuc;
> +	int cpu;
> +
> +	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
> +		return;
> +
> +	for_each_possible_cpu(cpu) {
> +		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
> +		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
> +			cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
> +	}
> +}
> +
>   void intel_pmu_lbr_del(struct perf_event *event)
>   {
>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 53b2b5fc23bc..2fe77d3a98d6 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -968,7 +968,7 @@ int x86_add_exclusive(unsigned int what);
>   
>   void x86_del_exclusive(unsigned int what);
>   
> -int x86_reserve_hardware(void);
> +int x86_reserve_hardware(struct perf_event *event);
>   
>   void x86_release_hardware(void);
>   
> @@ -1135,6 +1135,8 @@ void reserve_ds_buffers(void);
>   
>   void release_lbr_buffers(void);
>   
> +void reserve_lbr_buffers(struct perf_event *event);
> +
>   extern struct event_constraint bts_constraint;
>   extern struct event_constraint vlbr_constraint;
>   
> @@ -1282,6 +1284,10 @@ static inline void release_lbr_buffers(void)
>   {
>   }
>   
> +static inline void reserve_lbr_buffers(struct perf_event *event)
> +{
> +}
> +
>   static inline int intel_pmu_init(void)
>   {
>   	return 0;
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 2/5] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
  2021-03-22  6:06 ` [PATCH v4 RESEND 2/5] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers Like Xu
@ 2021-03-23 21:38   ` Peter Zijlstra
  2021-03-24  2:02     ` Like Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-03-23 21:38 UTC (permalink / raw)
  To: Like Xu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Kan Liang, x86, linux-kernel, Andi Kleen

On Mon, Mar 22, 2021 at 02:06:32PM +0800, Like Xu wrote:
> If the platform supports LBR_INFO register, the x86_pmu.lbr_info will
> be assigned in intel_pmu_?_lbr_init_?() and it's safe to expose LBR_INFO

You mean: intel_pmu_lbr_*init*(). '?' is a single character glob and
you've got too many '_'s.

> in the x86_perf_get_lbr() directly, instead of relying on lbr_format check.

But, afaict, not every model calls one of those. CORE_YONAH for example
doesn't.

> Also Architectural LBR has IA32_LBR_x_INFO instead of LBR_FORMAT_INFO_x
> to hold metadata for the operation, including mispredict, TSX, and
> elapsed cycle time information.

Relevance?

Wouldn't it be much simpler to simple say something like:

  "x86_pmu.lbr_info is 0 unless explicitly initialized, so there's no
  point checking lbr_fmt"

> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> ---
>  arch/x86/events/intel/lbr.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> index 21890dacfcfe..355ea70f1879 100644
> --- a/arch/x86/events/intel/lbr.c
> +++ b/arch/x86/events/intel/lbr.c
> @@ -1832,12 +1832,10 @@ void __init intel_pmu_arch_lbr_init(void)
>   */
>  int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
>  {
> -	int lbr_fmt = x86_pmu.intel_cap.lbr_format;
> -
>  	lbr->nr = x86_pmu.lbr_nr;
>  	lbr->from = x86_pmu.lbr_from;
>  	lbr->to = x86_pmu.lbr_to;
> -	lbr->info = (lbr_fmt == LBR_FORMAT_INFO) ? x86_pmu.lbr_info : 0;
> +	lbr->info = x86_pmu.lbr_info;
>  
>  	return 0;
>  }
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-22  6:06 ` [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
  2021-03-23 20:56   ` Liang, Kan
@ 2021-03-23 21:41   ` Peter Zijlstra
  2021-03-23 23:03     ` Liang, Kan
  2021-03-24  1:32   ` Namhyung Kim
  2 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-03-23 21:41 UTC (permalink / raw)
  To: Like Xu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Kan Liang, x86, linux-kernel

On Mon, Mar 22, 2021 at 02:06:33PM +0800, Like Xu wrote:
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 18df17129695..a4ce669cc78d 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -373,7 +373,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
>  	return x86_pmu_extra_regs(val, event);
>  }
>  
> -int x86_reserve_hardware(void)
> +int x86_reserve_hardware(struct perf_event *event)
>  {
>  	int err = 0;
>  
> @@ -382,8 +382,10 @@ int x86_reserve_hardware(void)
>  		if (atomic_read(&pmc_refcount) == 0) {
>  			if (!reserve_pmc_hardware())
>  				err = -EBUSY;
> -			else
> +			else {
>  				reserve_ds_buffers();
> +				reserve_lbr_buffers(event);
> +			}
>  		}
>  		if (!err)
>  			atomic_inc(&pmc_refcount);

This makes absolutely no sense what so ever. This is only executed for
the first event that gets here.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 4/5] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
  2021-03-22  6:06 ` [PATCH v4 RESEND 4/5] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR Like Xu
@ 2021-03-23 21:49   ` Peter Zijlstra
  2021-03-24  3:32     ` Like Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-03-23 21:49 UTC (permalink / raw)
  To: Like Xu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Kan Liang, x86, linux-kernel, Andi Kleen

On Mon, Mar 22, 2021 at 02:06:34PM +0800, Like Xu wrote:
> The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). KVM will
> generate #GP for this MSR access, thereby preventing the initialization
> of the guest LBR.
> 
> Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> ---
>  arch/x86/events/intel/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 382dd3994463..7f6d748421f2 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -5740,7 +5740,8 @@ __init int intel_pmu_init(void)
>  	 * Check all LBR MSR here.
>  	 * Disable LBR access if any LBR MSRs can not be accessed.
>  	 */
> -	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
> +	if (x86_pmu.lbr_nr && !boot_cpu_has(X86_FEATURE_ARCH_LBR) &&
> +	    !check_msr(x86_pmu.lbr_tos, 0x3UL))
>  		x86_pmu.lbr_nr = 0;

But when ARCH_LBR we don't set lbr_tos, so we check MSR 0x000, not 0x1c9.

Do we want check_msr() to ignore msr==0 ?
Additionally, do we want a check for lbr_info ?

>  	for (i = 0; i < x86_pmu.lbr_nr; i++) {
>  		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-23 21:41   ` Peter Zijlstra
@ 2021-03-23 23:03     ` Liang, Kan
  0 siblings, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2021-03-23 23:03 UTC (permalink / raw)
  To: Peter Zijlstra, Like Xu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel



On 3/23/2021 5:41 PM, Peter Zijlstra wrote:
> On Mon, Mar 22, 2021 at 02:06:33PM +0800, Like Xu wrote:
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 18df17129695..a4ce669cc78d 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -373,7 +373,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
>>   	return x86_pmu_extra_regs(val, event);
>>   }
>>   
>> -int x86_reserve_hardware(void)
>> +int x86_reserve_hardware(struct perf_event *event)
>>   {
>>   	int err = 0;
>>   
>> @@ -382,8 +382,10 @@ int x86_reserve_hardware(void)
>>   		if (atomic_read(&pmc_refcount) == 0) {
>>   			if (!reserve_pmc_hardware())
>>   				err = -EBUSY;
>> -			else
>> +			else {
>>   				reserve_ds_buffers();
>> +				reserve_lbr_buffers(event);
>> +			}
>>   		}
>>   		if (!err)
>>   			atomic_inc(&pmc_refcount);
> 
> This makes absolutely no sense what so ever. This is only executed for
> the first event that gets here.
> 

The LBR xsave buffer is a per-CPU buffer, not a per-event buffer. I 
think we only need to allocate the buffer once when initializing the 
first event.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-22  6:06 ` [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
  2021-03-23 20:56   ` Liang, Kan
  2021-03-23 21:41   ` Peter Zijlstra
@ 2021-03-24  1:32   ` Namhyung Kim
  2021-03-24  3:46     ` Like Xu
  2 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2021-03-24  1:32 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Thomas Gleixner,
	Borislav Petkov, Kan Liang, x86, linux-kernel

Hello,

On Mon, Mar 22, 2021 at 3:14 PM Like Xu <like.xu@linux.intel.com> wrote:
> +void reserve_lbr_buffers(struct perf_event *event)
> +{
> +       struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
> +       struct cpu_hw_events *cpuc;
> +       int cpu;
> +
> +       if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
> +               return;
> +
> +       for_each_possible_cpu(cpu) {
> +               cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
> +               if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
> +                       cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
> +       }
> +}

I think we should use kmem_cache_alloc_node().

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 2/5] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
  2021-03-23 21:38   ` Peter Zijlstra
@ 2021-03-24  2:02     ` Like Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Like Xu @ 2021-03-24  2:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Kan Liang, x86, linux-kernel, Andi Kleen

On 2021/3/24 5:38, Peter Zijlstra wrote:
> On Mon, Mar 22, 2021 at 02:06:32PM +0800, Like Xu wrote:
>> If the platform supports LBR_INFO register, the x86_pmu.lbr_info will
>> be assigned in intel_pmu_?_lbr_init_?() and it's safe to expose LBR_INFO
> 
> You mean: intel_pmu_lbr_*init*(). '?' is a single character glob and
> you've got too many '_'s.
> 
>> in the x86_perf_get_lbr() directly, instead of relying on lbr_format check.
> 
> But, afaict, not every model calls one of those. CORE_YONAH for example
> doesn't.
> 
>> Also Architectural LBR has IA32_LBR_x_INFO instead of LBR_FORMAT_INFO_x
>> to hold metadata for the operation, including mispredict, TSX, and
>> elapsed cycle time information.
> 
> Relevance?
> 
> Wouldn't it be much simpler to simple say something like:
> 
>    "x86_pmu.lbr_info is 0 unless explicitly initialized, so there's no
>    point checking lbr_fmt"

Yes, it is simpler and I will apply it in the next version.

> 
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> ---
>>   arch/x86/events/intel/lbr.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
>> index 21890dacfcfe..355ea70f1879 100644
>> --- a/arch/x86/events/intel/lbr.c
>> +++ b/arch/x86/events/intel/lbr.c
>> @@ -1832,12 +1832,10 @@ void __init intel_pmu_arch_lbr_init(void)
>>    */
>>   int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
>>   {
>> -	int lbr_fmt = x86_pmu.intel_cap.lbr_format;
>> -
>>   	lbr->nr = x86_pmu.lbr_nr;
>>   	lbr->from = x86_pmu.lbr_from;
>>   	lbr->to = x86_pmu.lbr_to;
>> -	lbr->info = (lbr_fmt == LBR_FORMAT_INFO) ? x86_pmu.lbr_info : 0;
>> +	lbr->info = x86_pmu.lbr_info;
>>   
>>   	return 0;
>>   }
>> -- 
>> 2.29.2
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 4/5] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
  2021-03-23 21:49   ` Peter Zijlstra
@ 2021-03-24  3:32     ` Like Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Like Xu @ 2021-03-24  3:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Kan Liang, x86, linux-kernel, Andi Kleen

On 2021/3/24 5:49, Peter Zijlstra wrote:
> On Mon, Mar 22, 2021 at 02:06:34PM +0800, Like Xu wrote:
>> The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). KVM will
>> generate #GP for this MSR access, thereby preventing the initialization
>> of the guest LBR.
>>
>> Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> ---
>>   arch/x86/events/intel/core.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 382dd3994463..7f6d748421f2 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -5740,7 +5740,8 @@ __init int intel_pmu_init(void)
>>   	 * Check all LBR MSR here.
>>   	 * Disable LBR access if any LBR MSRs can not be accessed.
>>   	 */
>> -	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
>> +	if (x86_pmu.lbr_nr && !boot_cpu_has(X86_FEATURE_ARCH_LBR) &&
>> +	    !check_msr(x86_pmu.lbr_tos, 0x3UL))
>>   		x86_pmu.lbr_nr = 0;
> 
> But when ARCH_LBR we don't set lbr_tos, so we check MSR 0x000, not 0x1c9.

It's true.

> 
> Do we want check_msr() to ignore msr==0 ?

Considering another target of check_msr() is for uncore msrs,
how about this change:

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 759226919a36..06fa31a01a5b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4704,10 +4704,10 @@ static bool check_msr(unsigned long msr, u64 mask)
         u64 val_old, val_new, val_tmp;

         /*
-        * Disable the check for real HW, so we don't
+        * Disable the check for real HW or non-sense msr, so we don't
          * mess with potentionaly enabled registers:
          */
-       if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
+       if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) || !msr)
                 return true;

         /*


> Additionally, do we want a check for lbr_info ?

I am not inclined to do this because we may have
virtualized model-specific guest LBR support
which may break the cpu_model assumption.

> 
>>   	for (i = 0; i < x86_pmu.lbr_nr; i++) {
>>   		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&
>> -- 
>> 2.29.2
>>


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-24  1:32   ` Namhyung Kim
@ 2021-03-24  3:46     ` Like Xu
  2021-03-24  4:04       ` Namhyung Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Like Xu @ 2021-03-24  3:46 UTC (permalink / raw)
  To: Namhyung Kim, Kan Liang, Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Thomas Gleixner, Borislav Petkov,
	x86, linux-kernel

Hi Namhyung,

On 2021/3/24 9:32, Namhyung Kim wrote:
> Hello,
> 
> On Mon, Mar 22, 2021 at 3:14 PM Like Xu <like.xu@linux.intel.com> wrote:
>> +void reserve_lbr_buffers(struct perf_event *event)
>> +{
>> +       struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
>> +       struct cpu_hw_events *cpuc;
>> +       int cpu;
>> +
>> +       if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
>> +               return;
>> +
>> +       for_each_possible_cpu(cpu) {
>> +               cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
>> +               if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
>> +                       cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
>> +       }
>> +}
> 
> I think we should use kmem_cache_alloc_node().

"kmem_cache_alloc_node - Allocate an object on the specified node"

The reserve_lbr_buffers() is called in __x86_pmu_event_init().
When the LBR perf_event is scheduled to another node, it seems
that we will not call init() and allocate again.

Do you mean use kmem_cache_alloc_node() for each numa_nodes_parsed ?

> 
> Thanks,
> Namhyung
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-24  3:46     ` Like Xu
@ 2021-03-24  4:04       ` Namhyung Kim
  2021-03-24  5:42         ` Like Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2021-03-24  4:04 UTC (permalink / raw)
  To: Like Xu
  Cc: Kan Liang, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel

On Wed, Mar 24, 2021 at 12:47 PM Like Xu <like.xu@linux.intel.com> wrote:
>
> Hi Namhyung,
>
> On 2021/3/24 9:32, Namhyung Kim wrote:
> > Hello,
> >
> > On Mon, Mar 22, 2021 at 3:14 PM Like Xu <like.xu@linux.intel.com> wrote:
> >> +void reserve_lbr_buffers(struct perf_event *event)
> >> +{
> >> +       struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
> >> +       struct cpu_hw_events *cpuc;
> >> +       int cpu;
> >> +
> >> +       if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
> >> +               return;
> >> +
> >> +       for_each_possible_cpu(cpu) {
> >> +               cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
> >> +               if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
> >> +                       cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
> >> +       }
> >> +}
> >
> > I think we should use kmem_cache_alloc_node().
>
> "kmem_cache_alloc_node - Allocate an object on the specified node"
>
> The reserve_lbr_buffers() is called in __x86_pmu_event_init().
> When the LBR perf_event is scheduled to another node, it seems
> that we will not call init() and allocate again.
>
> Do you mean use kmem_cache_alloc_node() for each numa_nodes_parsed ?

I assume cpuc->lbr_xsave will be accessed for that cpu only.
Then it needs to allocate it in the node that cpu belongs to.
Something like below..

    cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
                                               cpu_to_node(cpu));

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-03-24  4:04       ` Namhyung Kim
@ 2021-03-24  5:42         ` Like Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Like Xu @ 2021-03-24  5:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Kan Liang, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel

On 2021/3/24 12:04, Namhyung Kim wrote:
> On Wed, Mar 24, 2021 at 12:47 PM Like Xu <like.xu@linux.intel.com> wrote:
>>
>> Hi Namhyung,
>>
>> On 2021/3/24 9:32, Namhyung Kim wrote:
>>> Hello,
>>>
>>> On Mon, Mar 22, 2021 at 3:14 PM Like Xu <like.xu@linux.intel.com> wrote:
>>>> +void reserve_lbr_buffers(struct perf_event *event)
>>>> +{
>>>> +       struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
>>>> +       struct cpu_hw_events *cpuc;
>>>> +       int cpu;
>>>> +
>>>> +       if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
>>>> +               return;
>>>> +
>>>> +       for_each_possible_cpu(cpu) {
>>>> +               cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
>>>> +               if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
>>>> +                       cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
>>>> +       }
>>>> +}
>>>
>>> I think we should use kmem_cache_alloc_node().
>>
>> "kmem_cache_alloc_node - Allocate an object on the specified node"
>>
>> The reserve_lbr_buffers() is called in __x86_pmu_event_init().
>> When the LBR perf_event is scheduled to another node, it seems
>> that we will not call init() and allocate again.
>>
>> Do you mean use kmem_cache_alloc_node() for each numa_nodes_parsed ?
> 
> I assume cpuc->lbr_xsave will be accessed for that cpu only.
> Then it needs to allocate it in the node that cpu belongs to.
> Something like below..
> 
>      cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
>                                                 cpu_to_node(cpu));

Thanks, it helps and I will apply it in the next version.

> 
> Thanks,
> Namhyung
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-24  5:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22  6:06 [PATCH v4 RESEND 0/5] x86: The perf/x86 changes to support guest Arch LBR Like Xu
2021-03-22  6:06 ` [PATCH v4 RESEND 1/5] perf/x86/intel: Fix the comment about guest LBR support on KVM Like Xu
2021-03-22  6:06 ` [PATCH v4 RESEND 2/5] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers Like Xu
2021-03-23 21:38   ` Peter Zijlstra
2021-03-24  2:02     ` Like Xu
2021-03-22  6:06 ` [PATCH v4 RESEND 3/5] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
2021-03-23 20:56   ` Liang, Kan
2021-03-23 21:41   ` Peter Zijlstra
2021-03-23 23:03     ` Liang, Kan
2021-03-24  1:32   ` Namhyung Kim
2021-03-24  3:46     ` Like Xu
2021-03-24  4:04       ` Namhyung Kim
2021-03-24  5:42         ` Like Xu
2021-03-22  6:06 ` [PATCH v4 RESEND 4/5] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR Like Xu
2021-03-23 21:49   ` Peter Zijlstra
2021-03-24  3:32     ` Like Xu
2021-03-22  6:06 ` [PATCH v4 RESEND 5/5] perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h Like Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.