linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR
@ 2021-04-30  5:22 Like Xu
  2021-04-30  5:22 ` [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Like Xu @ 2021-04-30  5:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Kan Liang, Borislav Petkov, seanjc, x86, Like Xu

The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
In a guest that should support Architecture LBR, check_msr()
will be a non-related check for the architecture MSR 0x0
(IA32_P5_MC_ADDR) that is also not supported by KVM.

The failure will cause x86_pmu.lbr_nr = 0, thereby preventing
the initialization of the guest Arch LBR. Fix it by avoiding
this extraneous check in intel_pmu_init() for Arch LBR.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
v1->v2 Changelog:
- Avoid checking unrelated Architecture MSR 0x0 in a simple way

 arch/x86/events/intel/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5272f349dca2..456aa6ffd9a1 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6250,7 +6250,8 @@ __init int intel_pmu_init(void)
 	 * Check all LBT MSR here.
 	 * Disable LBR access if any LBR MSRs can not be accessed.
 	 */
-	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
+	if (x86_pmu.lbr_nr && !static_cpu_has(X86_FEATURE_ARCH_LBR) &&
+	    !check_msr(x86_pmu.lbr_tos, 0x3UL))
 		x86_pmu.lbr_nr = 0;
 	for (i = 0; i < x86_pmu.lbr_nr; i++) {
 		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-04-30  5:22 [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR Like Xu
@ 2021-04-30  5:22 ` Like Xu
  2021-05-10  2:10   ` Like Xu
                     ` (2 more replies)
  2021-05-18  9:30 ` [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR Peter Zijlstra
  2021-05-19  8:02 ` [tip: perf/urgent] perf/x86: Avoid touching LBR_TOS MSR for " tip-bot2 for Like Xu
  2 siblings, 3 replies; 10+ messages in thread
From: Like Xu @ 2021-04-30  5:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Kan Liang, Borislav Petkov, seanjc, x86, Like Xu

If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
might_sleep_if() deep in kmem_cache_alloc() will generate the following
trace, and potentially cause a deadlock when another LBR event is added:

[  243.115549] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
[  243.117576] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 839, name: perf
[  243.119326] INFO: lockdep is turned off.
[  243.120249] irq event stamp: 0
[  243.120967] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  243.122415] hardirqs last disabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[  243.124302] softirqs last  enabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[  243.126255] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  243.128119] CPU: 0 PID: 839 Comm: perf Not tainted 5.11.0-rc4-guest+ #8
[  243.129654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  243.131520] Call Trace:
[  243.132112]  dump_stack+0x8d/0xb5
[  243.132896]  ___might_sleep.cold.106+0xb3/0xc3
[  243.133984]  slab_pre_alloc_hook.constprop.85+0x96/0xd0
[  243.135208]  ? intel_pmu_lbr_add+0x152/0x170
[  243.136207]  kmem_cache_alloc+0x36/0x250
[  243.137126]  intel_pmu_lbr_add+0x152/0x170
[  243.138088]  x86_pmu_add+0x83/0xd0
[  243.138889]  ? lock_acquire+0x158/0x350
[  243.139791]  ? lock_acquire+0x158/0x350
[  243.140694]  ? lock_acquire+0x158/0x350
[  243.141625]  ? lock_acquired+0x1e3/0x360
[  243.142544]  ? lock_release+0x1bf/0x340
[  243.143726]  ? trace_hardirqs_on+0x1a/0xd0
[  243.144823]  ? lock_acquired+0x1e3/0x360
[  243.145742]  ? lock_release+0x1bf/0x340
[  243.147107]  ? __slab_free+0x49/0x540
[  243.147966]  ? trace_hardirqs_on+0x1a/0xd0
[  243.148924]  event_sched_in.isra.129+0xf8/0x2a0
[  243.149989]  merge_sched_in+0x261/0x3e0
[  243.150889]  ? trace_hardirqs_on+0x1a/0xd0
[  243.151869]  visit_groups_merge.constprop.135+0x130/0x4a0
[  243.153122]  ? sched_clock_cpu+0xc/0xb0
[  243.154023]  ctx_sched_in+0x101/0x210
[  243.154884]  ctx_resched+0x6f/0xc0
[  243.155686]  perf_event_exec+0x21e/0x2e0
[  243.156641]  begin_new_exec+0x5e5/0xbd0
[  243.157540]  load_elf_binary+0x6af/0x1770
[  243.158478]  ? __kernel_read+0x19d/0x2b0
[  243.159977]  ? lock_acquire+0x158/0x350
[  243.160876]  ? __kernel_read+0x19d/0x2b0
[  243.161796]  bprm_execve+0x3c8/0x840
[  243.162638]  do_execveat_common.isra.38+0x1a5/0x1c0
[  243.163776]  __x64_sys_execve+0x32/0x40
[  243.164676]  do_syscall_64+0x33/0x40
[  243.165514]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  243.166746] RIP: 0033:0x7f6180a26feb
[  243.167590] Code: Unable to access opcode bytes at RIP 0x7f6180a26fc1.
[  243.169097] RSP: 002b:00007ffc6558ce18 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  243.170844] RAX: ffffffffffffffda RBX: 00007ffc65592d30 RCX: 00007f6180a26feb
[  243.172514] RDX: 000055657f408dc0 RSI: 00007ffc65592410 RDI: 00007ffc65592d30
[  243.174162] RBP: 00007ffc6558ce80 R08: 00007ffc6558cde0 R09: 0000000000000000
[  243.176042] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc65592410
[  243.177696] R13: 000055657f408dc0 R14: 0000000000000001 R15: 00007ffc65592410

One of the solution is to use GFP_ATOMIC, but it will make the code less
reliable under memory pressue. Let's move the memory allocation out of
the sleeping region and put it into the intel_pmu_hw_config(). The LBR
xsave buffer is a per-CPU buffer, not a per-event buffer. This buffer is
allocated once when initializing the LBR event.

The disadvantage of this fix is that the cpuc->lbr_xsave memory
will be allocated for each cpu like the legacy ds_buffer.

Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
v1->v2 Changelog:
- Frob reserve_lbr_buffers() in intel_pmu_hw_config().

 arch/x86/events/intel/core.c |  2 ++
 arch/x86/events/intel/lbr.c  | 25 +++++++++++++++++++------
 arch/x86/events/perf_event.h |  6 ++++++
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 456aa6ffd9a1..19027aa01f82 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3745,6 +3745,8 @@ static int intel_pmu_hw_config(struct perf_event *event)
 
 			event->destroy = hw_perf_lbr_event_destroy;
 		}
+
+		reserve_lbr_buffers(event);
 	}
 
 	if (event->attr.aux_output) {
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index bb4486c4155a..1c39155c9f67 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)
 
 void intel_pmu_lbr_add(struct perf_event *event)
 {
-	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
 	if (!x86_pmu.lbr_nr)
@@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
 	perf_sched_cb_inc(event->ctx->pmu);
 	if (!cpuc->lbr_users++ && !event->total_time_running)
 		intel_pmu_lbr_reset();
-
-	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
-	    kmem_cache && !cpuc->lbr_xsave &&
-	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
-		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
 }
 
 void release_lbr_buffers(void)
@@ -722,6 +716,25 @@ void release_lbr_buffers(void)
 	}
 }
 
+void reserve_lbr_buffers(struct perf_event *event)
+{
+	struct kmem_cache *kmem_cache;
+	struct cpu_hw_events *cpuc;
+	int cpu;
+
+	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+		return;
+
+	for_each_possible_cpu(cpu) {
+		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
+		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
+			cpuc->lbr_xsave =
+				kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
+						      cpu_to_node(cpu));
+	}
+}
+
 void intel_pmu_lbr_del(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e7d4fd..727e5490e78c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1244,6 +1244,8 @@ void reserve_ds_buffers(void);
 
 void release_lbr_buffers(void);
 
+void reserve_lbr_buffers(struct perf_event *event);
+
 extern struct event_constraint bts_constraint;
 extern struct event_constraint vlbr_constraint;
 
@@ -1393,6 +1395,10 @@ static inline void release_lbr_buffers(void)
 {
 }
 
+static inline void reserve_lbr_buffers(struct perf_event *event)
+{
+}
+
 static inline int intel_pmu_init(void)
 {
 	return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-04-30  5:22 ` [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
@ 2021-05-10  2:10   ` Like Xu
  2021-05-18  8:48     ` Like Xu
  2021-05-18 10:02   ` Peter Zijlstra
  2021-05-19  8:02   ` [tip: perf/urgent] perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context tip-bot2 for Like Xu
  2 siblings, 1 reply; 10+ messages in thread
From: Like Xu @ 2021-05-10  2:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Kan Liang, Borislav Petkov, seanjc, x86

Hi Peter, do you have any comments on this version ?

On 2021/4/30 13:22, Like Xu wrote:
> If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
> might_sleep_if() deep in kmem_cache_alloc() will generate the following
> trace, and potentially cause a deadlock when another LBR event is added:
> 
> [  243.115549] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
> [  243.117576] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 839, name: perf
> [  243.119326] INFO: lockdep is turned off.
> [  243.120249] irq event stamp: 0
> [  243.120967] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [  243.122415] hardirqs last disabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
> [  243.124302] softirqs last  enabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
> [  243.126255] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  243.128119] CPU: 0 PID: 839 Comm: perf Not tainted 5.11.0-rc4-guest+ #8
> [  243.129654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> [  243.131520] Call Trace:
> [  243.132112]  dump_stack+0x8d/0xb5
> [  243.132896]  ___might_sleep.cold.106+0xb3/0xc3
> [  243.133984]  slab_pre_alloc_hook.constprop.85+0x96/0xd0
> [  243.135208]  ? intel_pmu_lbr_add+0x152/0x170
> [  243.136207]  kmem_cache_alloc+0x36/0x250
> [  243.137126]  intel_pmu_lbr_add+0x152/0x170
> [  243.138088]  x86_pmu_add+0x83/0xd0
> [  243.138889]  ? lock_acquire+0x158/0x350
> [  243.139791]  ? lock_acquire+0x158/0x350
> [  243.140694]  ? lock_acquire+0x158/0x350
> [  243.141625]  ? lock_acquired+0x1e3/0x360
> [  243.142544]  ? lock_release+0x1bf/0x340
> [  243.143726]  ? trace_hardirqs_on+0x1a/0xd0
> [  243.144823]  ? lock_acquired+0x1e3/0x360
> [  243.145742]  ? lock_release+0x1bf/0x340
> [  243.147107]  ? __slab_free+0x49/0x540
> [  243.147966]  ? trace_hardirqs_on+0x1a/0xd0
> [  243.148924]  event_sched_in.isra.129+0xf8/0x2a0
> [  243.149989]  merge_sched_in+0x261/0x3e0
> [  243.150889]  ? trace_hardirqs_on+0x1a/0xd0
> [  243.151869]  visit_groups_merge.constprop.135+0x130/0x4a0
> [  243.153122]  ? sched_clock_cpu+0xc/0xb0
> [  243.154023]  ctx_sched_in+0x101/0x210
> [  243.154884]  ctx_resched+0x6f/0xc0
> [  243.155686]  perf_event_exec+0x21e/0x2e0
> [  243.156641]  begin_new_exec+0x5e5/0xbd0
> [  243.157540]  load_elf_binary+0x6af/0x1770
> [  243.158478]  ? __kernel_read+0x19d/0x2b0
> [  243.159977]  ? lock_acquire+0x158/0x350
> [  243.160876]  ? __kernel_read+0x19d/0x2b0
> [  243.161796]  bprm_execve+0x3c8/0x840
> [  243.162638]  do_execveat_common.isra.38+0x1a5/0x1c0
> [  243.163776]  __x64_sys_execve+0x32/0x40
> [  243.164676]  do_syscall_64+0x33/0x40
> [  243.165514]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  243.166746] RIP: 0033:0x7f6180a26feb
> [  243.167590] Code: Unable to access opcode bytes at RIP 0x7f6180a26fc1.
> [  243.169097] RSP: 002b:00007ffc6558ce18 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
> [  243.170844] RAX: ffffffffffffffda RBX: 00007ffc65592d30 RCX: 00007f6180a26feb
> [  243.172514] RDX: 000055657f408dc0 RSI: 00007ffc65592410 RDI: 00007ffc65592d30
> [  243.174162] RBP: 00007ffc6558ce80 R08: 00007ffc6558cde0 R09: 0000000000000000
> [  243.176042] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc65592410
> [  243.177696] R13: 000055657f408dc0 R14: 0000000000000001 R15: 00007ffc65592410
> 
> One of the solution is to use GFP_ATOMIC, but it will make the code less
> reliable under memory pressue. Let's move the memory allocation out of
> the sleeping region and put it into the intel_pmu_hw_config(). The LBR
> xsave buffer is a per-CPU buffer, not a per-event buffer. This buffer is
> allocated once when initializing the LBR event.
> 
> The disadvantage of this fix is that the cpuc->lbr_xsave memory
> will be allocated for each cpu like the legacy ds_buffer.
> 
> Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
> Tested-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
> v1->v2 Changelog:
> - Frob reserve_lbr_buffers() in intel_pmu_hw_config().
> 
>   arch/x86/events/intel/core.c |  2 ++
>   arch/x86/events/intel/lbr.c  | 25 +++++++++++++++++++------
>   arch/x86/events/perf_event.h |  6 ++++++
>   3 files changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 456aa6ffd9a1..19027aa01f82 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3745,6 +3745,8 @@ static int intel_pmu_hw_config(struct perf_event *event)
>   
>   			event->destroy = hw_perf_lbr_event_destroy;
>   		}
> +
> +		reserve_lbr_buffers(event);
>   	}
>   
>   	if (event->attr.aux_output) {
> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> index bb4486c4155a..1c39155c9f67 100644
> --- a/arch/x86/events/intel/lbr.c
> +++ b/arch/x86/events/intel/lbr.c
> @@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)
>   
>   void intel_pmu_lbr_add(struct perf_event *event)
>   {
> -	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>   
>   	if (!x86_pmu.lbr_nr)
> @@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
>   	perf_sched_cb_inc(event->ctx->pmu);
>   	if (!cpuc->lbr_users++ && !event->total_time_running)
>   		intel_pmu_lbr_reset();
> -
> -	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
> -	    kmem_cache && !cpuc->lbr_xsave &&
> -	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
> -		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
>   }
>   
>   void release_lbr_buffers(void)
> @@ -722,6 +716,25 @@ void release_lbr_buffers(void)
>   	}
>   }
>   
> +void reserve_lbr_buffers(struct perf_event *event)
> +{
> +	struct kmem_cache *kmem_cache;
> +	struct cpu_hw_events *cpuc;
> +	int cpu;
> +
> +	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
> +		return;
> +
> +	for_each_possible_cpu(cpu) {
> +		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
> +		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
> +		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
> +			cpuc->lbr_xsave =
> +				kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
> +						      cpu_to_node(cpu));
> +	}
> +}
> +
>   void intel_pmu_lbr_del(struct perf_event *event)
>   {
>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 27fa85e7d4fd..727e5490e78c 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -1244,6 +1244,8 @@ void reserve_ds_buffers(void);
>   
>   void release_lbr_buffers(void);
>   
> +void reserve_lbr_buffers(struct perf_event *event);
> +
>   extern struct event_constraint bts_constraint;
>   extern struct event_constraint vlbr_constraint;
>   
> @@ -1393,6 +1395,10 @@ static inline void release_lbr_buffers(void)
>   {
>   }
>   
> +static inline void reserve_lbr_buffers(struct perf_event *event)
> +{
> +}
> +
>   static inline int intel_pmu_init(void)
>   {
>   	return 0;
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-05-10  2:10   ` Like Xu
@ 2021-05-18  8:48     ` Like Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Like Xu @ 2021-05-18  8:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Kan Liang, Borislav Petkov, seanjc, x86

Hi Peter, do we have any comments on these two patches ?

On 2021/5/10 10:10, Like Xu wrote:
> Hi Peter, do you have any comments on this version ?
> 
> On 2021/4/30 13:22, Like Xu wrote:
>> If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
>> might_sleep_if() deep in kmem_cache_alloc() will generate the following
>> trace, and potentially cause a deadlock when another LBR event is added:
>>
>> [  243.115549] BUG: sleeping function called from invalid context at 
>> include/linux/sched/mm.h:196
>> [  243.117576] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 
>> 839, name: perf
>> [  243.119326] INFO: lockdep is turned off.
>> [  243.120249] irq event stamp: 0
>> [  243.120967] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>> [  243.122415] hardirqs last disabled at (0): [<ffffffff810d9bf5>] 
>> copy_process+0xa45/0x1dc0
>> [  243.124302] softirqs last  enabled at (0): [<ffffffff810d9bf5>] 
>> copy_process+0xa45/0x1dc0
>> [  243.126255] softirqs last disabled at (0): [<0000000000000000>] 0x0
>> [  243.128119] CPU: 0 PID: 839 Comm: perf Not tainted 5.11.0-rc4-guest+ #8
>> [  243.129654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>> BIOS 0.0.0 02/06/2015
>> [  243.131520] Call Trace:
>> [  243.132112]  dump_stack+0x8d/0xb5
>> [  243.132896]  ___might_sleep.cold.106+0xb3/0xc3
>> [  243.133984]  slab_pre_alloc_hook.constprop.85+0x96/0xd0
>> [  243.135208]  ? intel_pmu_lbr_add+0x152/0x170
>> [  243.136207]  kmem_cache_alloc+0x36/0x250
>> [  243.137126]  intel_pmu_lbr_add+0x152/0x170
>> [  243.138088]  x86_pmu_add+0x83/0xd0
>> [  243.138889]  ? lock_acquire+0x158/0x350
>> [  243.139791]  ? lock_acquire+0x158/0x350
>> [  243.140694]  ? lock_acquire+0x158/0x350
>> [  243.141625]  ? lock_acquired+0x1e3/0x360
>> [  243.142544]  ? lock_release+0x1bf/0x340
>> [  243.143726]  ? trace_hardirqs_on+0x1a/0xd0
>> [  243.144823]  ? lock_acquired+0x1e3/0x360
>> [  243.145742]  ? lock_release+0x1bf/0x340
>> [  243.147107]  ? __slab_free+0x49/0x540
>> [  243.147966]  ? trace_hardirqs_on+0x1a/0xd0
>> [  243.148924]  event_sched_in.isra.129+0xf8/0x2a0
>> [  243.149989]  merge_sched_in+0x261/0x3e0
>> [  243.150889]  ? trace_hardirqs_on+0x1a/0xd0
>> [  243.151869]  visit_groups_merge.constprop.135+0x130/0x4a0
>> [  243.153122]  ? sched_clock_cpu+0xc/0xb0
>> [  243.154023]  ctx_sched_in+0x101/0x210
>> [  243.154884]  ctx_resched+0x6f/0xc0
>> [  243.155686]  perf_event_exec+0x21e/0x2e0
>> [  243.156641]  begin_new_exec+0x5e5/0xbd0
>> [  243.157540]  load_elf_binary+0x6af/0x1770
>> [  243.158478]  ? __kernel_read+0x19d/0x2b0
>> [  243.159977]  ? lock_acquire+0x158/0x350
>> [  243.160876]  ? __kernel_read+0x19d/0x2b0
>> [  243.161796]  bprm_execve+0x3c8/0x840
>> [  243.162638]  do_execveat_common.isra.38+0x1a5/0x1c0
>> [  243.163776]  __x64_sys_execve+0x32/0x40
>> [  243.164676]  do_syscall_64+0x33/0x40
>> [  243.165514]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  243.166746] RIP: 0033:0x7f6180a26feb
>> [  243.167590] Code: Unable to access opcode bytes at RIP 0x7f6180a26fc1.
>> [  243.169097] RSP: 002b:00007ffc6558ce18 EFLAGS: 00000202 ORIG_RAX: 
>> 000000000000003b
>> [  243.170844] RAX: ffffffffffffffda RBX: 00007ffc65592d30 RCX: 
>> 00007f6180a26feb
>> [  243.172514] RDX: 000055657f408dc0 RSI: 00007ffc65592410 RDI: 
>> 00007ffc65592d30
>> [  243.174162] RBP: 00007ffc6558ce80 R08: 00007ffc6558cde0 R09: 
>> 0000000000000000
>> [  243.176042] R10: 0000000000000008 R11: 0000000000000202 R12: 
>> 00007ffc65592410
>> [  243.177696] R13: 000055657f408dc0 R14: 0000000000000001 R15: 
>> 00007ffc65592410
>>
>> One of the solution is to use GFP_ATOMIC, but it will make the code less
>> reliable under memory pressue. Let's move the memory allocation out of
>> the sleeping region and put it into the intel_pmu_hw_config(). The LBR
>> xsave buffer is a per-CPU buffer, not a per-event buffer. This buffer is
>> allocated once when initializing the LBR event.
>>
>> The disadvantage of this fix is that the cpuc->lbr_xsave memory
>> will be allocated for each cpu like the legacy ds_buffer.
>>
>> Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
>> Tested-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> ---
>> v1->v2 Changelog:
>> - Frob reserve_lbr_buffers() in intel_pmu_hw_config().
>>
>>   arch/x86/events/intel/core.c |  2 ++
>>   arch/x86/events/intel/lbr.c  | 25 +++++++++++++++++++------
>>   arch/x86/events/perf_event.h |  6 ++++++
>>   3 files changed, 27 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 456aa6ffd9a1..19027aa01f82 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -3745,6 +3745,8 @@ static int intel_pmu_hw_config(struct perf_event 
>> *event)
>>               event->destroy = hw_perf_lbr_event_destroy;
>>           }
>> +
>> +        reserve_lbr_buffers(event);
>>       }
>>       if (event->attr.aux_output) {
>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
>> index bb4486c4155a..1c39155c9f67 100644
>> --- a/arch/x86/events/intel/lbr.c
>> +++ b/arch/x86/events/intel/lbr.c
>> @@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned 
>> br_sel)
>>   void intel_pmu_lbr_add(struct perf_event *event)
>>   {
>> -    struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
>>       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>       if (!x86_pmu.lbr_nr)
>> @@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
>>       perf_sched_cb_inc(event->ctx->pmu);
>>       if (!cpuc->lbr_users++ && !event->total_time_running)
>>           intel_pmu_lbr_reset();
>> -
>> -    if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
>> -        kmem_cache && !cpuc->lbr_xsave &&
>> -        (cpuc->lbr_users != cpuc->lbr_pebs_users))
>> -        cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
>>   }
>>   void release_lbr_buffers(void)
>> @@ -722,6 +716,25 @@ void release_lbr_buffers(void)
>>       }
>>   }
>> +void reserve_lbr_buffers(struct perf_event *event)
>> +{
>> +    struct kmem_cache *kmem_cache;
>> +    struct cpu_hw_events *cpuc;
>> +    int cpu;
>> +
>> +    if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
>> +        return;
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
>> +        kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
>> +        if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
>> +            cpuc->lbr_xsave =
>> +                kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
>> +                              cpu_to_node(cpu));
>> +    }
>> +}
>> +
>>   void intel_pmu_lbr_del(struct perf_event *event)
>>   {
>>       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
>> index 27fa85e7d4fd..727e5490e78c 100644
>> --- a/arch/x86/events/perf_event.h
>> +++ b/arch/x86/events/perf_event.h
>> @@ -1244,6 +1244,8 @@ void reserve_ds_buffers(void);
>>   void release_lbr_buffers(void);
>> +void reserve_lbr_buffers(struct perf_event *event);
>> +
>>   extern struct event_constraint bts_constraint;
>>   extern struct event_constraint vlbr_constraint;
>> @@ -1393,6 +1395,10 @@ static inline void release_lbr_buffers(void)
>>   {
>>   }
>> +static inline void reserve_lbr_buffers(struct perf_event *event)
>> +{
>> +}
>> +
>>   static inline int intel_pmu_init(void)
>>   {
>>       return 0;
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR
  2021-04-30  5:22 [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR Like Xu
  2021-04-30  5:22 ` [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
@ 2021-05-18  9:30 ` Peter Zijlstra
  2021-05-19  8:02 ` [tip: perf/urgent] perf/x86: Avoid touching LBR_TOS MSR for " tip-bot2 for Like Xu
  2 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-18  9:30 UTC (permalink / raw)
  To: Like Xu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Kan Liang, Borislav Petkov, seanjc, x86

On Fri, Apr 30, 2021 at 01:22:46PM +0800, Like Xu wrote:

> - Avoid checking unrelated Architecture MSR 0x0 in a simple way

I'm thinking the below is simpler still, no?

> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 5272f349dca2..456aa6ffd9a1 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -6250,7 +6250,8 @@ __init int intel_pmu_init(void)
>  	 * Check all LBT MSR here.
>  	 * Disable LBR access if any LBR MSRs can not be accessed.
>  	 */
> -	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
> +	if (x86_pmu.lbr_nr && !static_cpu_has(X86_FEATURE_ARCH_LBR) &&
> +	    !check_msr(x86_pmu.lbr_tos, 0x3UL))
>  		x86_pmu.lbr_nr = 0;
>  	for (i = 0; i < x86_pmu.lbr_nr; i++) {
>  		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2521d03de5e0..e28892270c58 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6253,7 +6253,7 @@ __init int intel_pmu_init(void)
 	 * Check all LBT MSR here.
 	 * Disable LBR access if any LBR MSRs can not be accessed.
 	 */
-	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
+	if (x86_pmu.lbr_tos && !check_msr(x86_pmu.lbr_tos, 0x3UL))
 		x86_pmu.lbr_nr = 0;
 	for (i = 0; i < x86_pmu.lbr_nr; i++) {
 		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-04-30  5:22 ` [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
  2021-05-10  2:10   ` Like Xu
@ 2021-05-18 10:02   ` Peter Zijlstra
  2021-05-18 12:34     ` Like Xu
  2021-05-19  8:02   ` [tip: perf/urgent] perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context tip-bot2 for Like Xu
  2 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-18 10:02 UTC (permalink / raw)
  To: Like Xu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Kan Liang, Borislav Petkov, seanjc, x86

On Fri, Apr 30, 2021 at 01:22:47PM +0800, Like Xu wrote:
> If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
> might_sleep_if() deep in kmem_cache_alloc() will generate the following
> trace, and potentially cause a deadlock when another LBR event is added:

...

> One of the solution is to use GFP_ATOMIC, but it will make the code less
> reliable under memory pressue. Let's move the memory allocation out of
> the sleeping region and put it into the intel_pmu_hw_config(). The LBR
> xsave buffer is a per-CPU buffer, not a per-event buffer. This buffer is
> allocated once when initializing the LBR event.
> 
> The disadvantage of this fix is that the cpuc->lbr_xsave memory
> will be allocated for each cpu like the legacy ds_buffer.
> 
> Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
> Tested-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
> v1->v2 Changelog:
> - Frob reserve_lbr_buffers() in intel_pmu_hw_config().
> 
>  arch/x86/events/intel/core.c |  2 ++
>  arch/x86/events/intel/lbr.c  | 25 +++++++++++++++++++------
>  arch/x86/events/perf_event.h |  6 ++++++
>  3 files changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 456aa6ffd9a1..19027aa01f82 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3745,6 +3745,8 @@ static int intel_pmu_hw_config(struct perf_event *event)
>  
>  			event->destroy = hw_perf_lbr_event_destroy;
>  		}
> +
> +		reserve_lbr_buffers(event);

Ok, so this would actually work..

>  	}
>  
>  	if (event->attr.aux_output) {

> @@ -722,6 +716,25 @@ void release_lbr_buffers(void)
>  	}
>  }
>  
> +void reserve_lbr_buffers(struct perf_event *event)
> +{
> +	struct kmem_cache *kmem_cache;
> +	struct cpu_hw_events *cpuc;
> +	int cpu;
> +
> +	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
> +		return;
> +
> +	for_each_possible_cpu(cpu) {
> +		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
> +		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
> +		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
> +			cpuc->lbr_xsave =
> +				kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
> +						      cpu_to_node(cpu));

(coding style fail)

But then I looked at this function, and srlsy that !precise_ip is the
only thing you need @event for? Why do we care?

> +	}
> +}


Without that silly @event dependency you can go back to the original
form, which makes much more sense since now reserve and release are
symmetric.

---
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -396,10 +396,12 @@ int x86_reserve_hardware(void)
 	if (!atomic_inc_not_zero(&pmc_refcount)) {
 		mutex_lock(&pmc_reserve_mutex);
 		if (atomic_read(&pmc_refcount) == 0) {
-			if (!reserve_pmc_hardware())
+			if (!reserve_pmc_hardware()) {
 				err = -EBUSY;
-			else
+			} else {
 				reserve_ds_buffers();
+				reserve_lbr_buffers();
+			}
 		}
 		if (!err)
 			atomic_inc(&pmc_refcount);
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -658,7 +658,6 @@ static inline bool branch_user_callstack
 
 void intel_pmu_lbr_add(struct perf_event *event)
 {
-	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
 	if (!x86_pmu.lbr_nr)
@@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event
 	perf_sched_cb_inc(event->ctx->pmu);
 	if (!cpuc->lbr_users++ && !event->total_time_running)
 		intel_pmu_lbr_reset();
-
-	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
-	    kmem_cache && !cpuc->lbr_xsave &&
-	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
-		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
 }
 
 void release_lbr_buffers(void)
@@ -722,6 +716,26 @@ void release_lbr_buffers(void)
 	}
 }
 
+void reserve_lbr_buffers(void)
+{
+	struct kmem_cache *kmem_cache;
+	struct cpu_hw_events *cpuc;
+	int cpu;
+
+	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+		return;
+
+	for_each_possible_cpu(cpu) {
+		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
+		if (!kmem_cache || cpuc->lbr_xsave)
+			continue;
+
+		cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
+							cpu_to_node(cpu));
+	}
+}
+
 void intel_pmu_lbr_del(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1244,6 +1244,8 @@ void reserve_ds_buffers(void);
 
 void release_lbr_buffers(void);
 
+void reserve_lbr_buffers(void);
+
 extern struct event_constraint bts_constraint;
 extern struct event_constraint vlbr_constraint;
 
@@ -1393,6 +1395,10 @@ static inline void release_lbr_buffers(v
 {
 }
 
+static inline void reserve_lbr_buffers(void)
+{
+}
+
 static inline int intel_pmu_init(void)
 {
 	return 0;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-05-18 10:02   ` Peter Zijlstra
@ 2021-05-18 12:34     ` Like Xu
  2021-05-18 13:29       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Like Xu @ 2021-05-18 12:34 UTC (permalink / raw)
  To: Peter Zijlstra, Kan Liang
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Borislav Petkov, seanjc, x86

On 2021/5/18 18:02, Peter Zijlstra wrote:
> On Fri, Apr 30, 2021 at 01:22:47PM +0800, Like Xu wrote:
>> If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
>> might_sleep_if() deep in kmem_cache_alloc() will generate the following
>> trace, and potentially cause a deadlock when another LBR event is added:
> 
> ...
> 
>> One of the solution is to use GFP_ATOMIC, but it will make the code less
>> reliable under memory pressue. Let's move the memory allocation out of
>> the sleeping region and put it into the intel_pmu_hw_config(). The LBR
>> xsave buffer is a per-CPU buffer, not a per-event buffer. This buffer is
>> allocated once when initializing the LBR event.
>>
>> The disadvantage of this fix is that the cpuc->lbr_xsave memory
>> will be allocated for each cpu like the legacy ds_buffer.
>>
>> Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
>> Tested-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> ---
>> v1->v2 Changelog:
>> - Frob reserve_lbr_buffers() in intel_pmu_hw_config().
>>
>>   arch/x86/events/intel/core.c |  2 ++
>>   arch/x86/events/intel/lbr.c  | 25 +++++++++++++++++++------
>>   arch/x86/events/perf_event.h |  6 ++++++
>>   3 files changed, 27 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 456aa6ffd9a1..19027aa01f82 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -3745,6 +3745,8 @@ static int intel_pmu_hw_config(struct perf_event *event)
>>   
>>   			event->destroy = hw_perf_lbr_event_destroy;
>>   		}
>> +
>> +		reserve_lbr_buffers(event);
> 
> Ok, so this would actually work..
> 
>>   	}
>>   
>>   	if (event->attr.aux_output) {
> 
>> @@ -722,6 +716,25 @@ void release_lbr_buffers(void)
>>   	}
>>   }
>>   
>> +void reserve_lbr_buffers(struct perf_event *event)
>> +{
>> +	struct kmem_cache *kmem_cache;
>> +	struct cpu_hw_events *cpuc;
>> +	int cpu;
>> +
>> +	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
>> +		return;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
>> +		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
>> +		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
>> +			cpuc->lbr_xsave =
>> +				kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
>> +						      cpu_to_node(cpu));
> 
> (coding style fail)
> 
> But then I looked at this function, and srlsy that !precise_ip is the
> only thing you need @event for? Why do we care?

Kan once commented, we only need to alloc the buffer for the non-PEBS
event. It seems the check "(cpuc->lbr_users != cpuc->lbr_pebs_users)"
is implicitly removed.

I think we still need to check !precision_ip, right ?

> 
>> +	}
>> +}
> 
> 
> Without that silly @event dependency you can go back to the original
> form, which makes much more sense since now reserve and release are
> symmetric.
> 
> ---
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -396,10 +396,12 @@ int x86_reserve_hardware(void)
>   	if (!atomic_inc_not_zero(&pmc_refcount)) {
>   		mutex_lock(&pmc_reserve_mutex);
>   		if (atomic_read(&pmc_refcount) == 0) {
> -			if (!reserve_pmc_hardware())
> +			if (!reserve_pmc_hardware()) {
>   				err = -EBUSY;
> -			else
> +			} else {
>   				reserve_ds_buffers();
> +				reserve_lbr_buffers();
> +			}
>   		}
>   		if (!err)
>   			atomic_inc(&pmc_refcount);
> --- a/arch/x86/events/intel/lbr.c
> +++ b/arch/x86/events/intel/lbr.c
> @@ -658,7 +658,6 @@ static inline bool branch_user_callstack
>   
>   void intel_pmu_lbr_add(struct perf_event *event)
>   {
> -	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>   
>   	if (!x86_pmu.lbr_nr)
> @@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event
>   	perf_sched_cb_inc(event->ctx->pmu);
>   	if (!cpuc->lbr_users++ && !event->total_time_running)
>   		intel_pmu_lbr_reset();
> -
> -	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
> -	    kmem_cache && !cpuc->lbr_xsave &&
> -	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
> -		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
>   }
>   
>   void release_lbr_buffers(void)
> @@ -722,6 +716,26 @@ void release_lbr_buffers(void)
>   	}
>   }
>   
> +void reserve_lbr_buffers(void)
> +{
> +	struct kmem_cache *kmem_cache;
> +	struct cpu_hw_events *cpuc;
> +	int cpu;
> +
> +	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
> +		return;
> +
> +	for_each_possible_cpu(cpu) {
> +		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
> +		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
> +		if (!kmem_cache || cpuc->lbr_xsave)
> +			continue;
> +
> +		cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
> +							cpu_to_node(cpu));
> +	}
> +}
> +
>   void intel_pmu_lbr_del(struct perf_event *event)
>   {
>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -1244,6 +1244,8 @@ void reserve_ds_buffers(void);
>   
>   void release_lbr_buffers(void);
>   
> +void reserve_lbr_buffers(void);
> +
>   extern struct event_constraint bts_constraint;
>   extern struct event_constraint vlbr_constraint;
>   
> @@ -1393,6 +1395,10 @@ static inline void release_lbr_buffers(v
>   {
>   }
>   
> +static inline void reserve_lbr_buffers(void)
> +{
> +}
> +
>   static inline int intel_pmu_init(void)
>   {
>   	return 0;
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-05-18 12:34     ` Like Xu
@ 2021-05-18 13:29       ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-18 13:29 UTC (permalink / raw)
  To: Like Xu
  Cc: Kan Liang, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel,
	Borislav Petkov, seanjc, x86

On Tue, May 18, 2021 at 08:34:33PM +0800, Like Xu wrote:
> > > +	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
> > > +		return;
> > > +
> > > +	for_each_possible_cpu(cpu) {
> > > +		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
> > > +		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
> > > +		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
> > > +			cpuc->lbr_xsave =
> > > +				kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
> > > +						      cpu_to_node(cpu));
> > 
> > (coding style fail)
> > 
> > But then I looked at this function, and srlsy that !precise_ip is the
> > only thing you need @event for? Why do we care?
> 
> Kan once commented, we only need to alloc the buffer for the non-PEBS
> event. It seems the check "(cpuc->lbr_users != cpuc->lbr_pebs_users)"
> is implicitly removed.
> 
> I think we still need to check !precision_ip, right ?

It's just a memory allocation; who cares if we allocate but not use it
ever?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: perf/urgent] perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context
  2021-04-30  5:22 ` [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
  2021-05-10  2:10   ` Like Xu
  2021-05-18 10:02   ` Peter Zijlstra
@ 2021-05-19  8:02   ` tip-bot2 for Like Xu
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Like Xu @ 2021-05-19  8:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Like Xu, Peter Zijlstra (Intel), Kan Liang, x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     488e13a489e9707a7e81e1991fdd1f20c0f04689
Gitweb:        https://git.kernel.org/tip/488e13a489e9707a7e81e1991fdd1f20c0f04689
Author:        Like Xu <like.xu@linux.intel.com>
AuthorDate:    Fri, 30 Apr 2021 13:22:47 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 May 2021 12:53:47 +02:00

perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context

If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
might_sleep_if() deep in kmem_cache_alloc() will generate the following
trace, and potentially cause a deadlock when another LBR event is added:

  [] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
  [] Call Trace:
  []  kmem_cache_alloc+0x36/0x250
  []  intel_pmu_lbr_add+0x152/0x170
  []  x86_pmu_add+0x83/0xd0

Make it symmetric with the release_lbr_buffers() call and mirror the
existing DS buffers.

Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
Signed-off-by: Like Xu <like.xu@linux.intel.com>
[peterz: simplified]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/20210430052247.3079672-2-like.xu@linux.intel.com
---
 arch/x86/events/core.c       |  6 ++++--
 arch/x86/events/intel/lbr.c  | 26 ++++++++++++++++++++------
 arch/x86/events/perf_event.h |  6 ++++++
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8e50932..8f71dd7 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -396,10 +396,12 @@ int x86_reserve_hardware(void)
 	if (!atomic_inc_not_zero(&pmc_refcount)) {
 		mutex_lock(&pmc_reserve_mutex);
 		if (atomic_read(&pmc_refcount) == 0) {
-			if (!reserve_pmc_hardware())
+			if (!reserve_pmc_hardware()) {
 				err = -EBUSY;
-			else
+			} else {
 				reserve_ds_buffers();
+				reserve_lbr_buffers();
+			}
 		}
 		if (!err)
 			atomic_inc(&pmc_refcount);
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 76dbab6..4409d2c 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)
 
 void intel_pmu_lbr_add(struct perf_event *event)
 {
-	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
 	if (!x86_pmu.lbr_nr)
@@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
 	perf_sched_cb_inc(event->ctx->pmu);
 	if (!cpuc->lbr_users++ && !event->total_time_running)
 		intel_pmu_lbr_reset();
-
-	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
-	    kmem_cache && !cpuc->lbr_xsave &&
-	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
-		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
 }
 
 void release_lbr_buffers(void)
@@ -722,6 +716,26 @@ void release_lbr_buffers(void)
 	}
 }
 
+void reserve_lbr_buffers(void)
+{
+	struct kmem_cache *kmem_cache;
+	struct cpu_hw_events *cpuc;
+	int cpu;
+
+	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+		return;
+
+	for_each_possible_cpu(cpu) {
+		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+		kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
+		if (!kmem_cache || cpuc->lbr_xsave)
+			continue;
+
+		cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
+							cpu_to_node(cpu));
+	}
+}
+
 void intel_pmu_lbr_del(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e..ad87cb3 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1244,6 +1244,8 @@ void reserve_ds_buffers(void);
 
 void release_lbr_buffers(void);
 
+void reserve_lbr_buffers(void);
+
 extern struct event_constraint bts_constraint;
 extern struct event_constraint vlbr_constraint;
 
@@ -1393,6 +1395,10 @@ static inline void release_lbr_buffers(void)
 {
 }
 
+static inline void reserve_lbr_buffers(void)
+{
+}
+
 static inline int intel_pmu_init(void)
 {
 	return 0;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip: perf/urgent] perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
  2021-04-30  5:22 [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR Like Xu
  2021-04-30  5:22 ` [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
  2021-05-18  9:30 ` [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR Peter Zijlstra
@ 2021-05-19  8:02 ` tip-bot2 for Like Xu
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Like Xu @ 2021-05-19  8:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Like Xu, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     3317c26a4b413b41364f2c4b83c778c6aba1576d
Gitweb:        https://git.kernel.org/tip/3317c26a4b413b41364f2c4b83c778c6aba1576d
Author:        Like Xu <like.xu@linux.intel.com>
AuthorDate:    Fri, 30 Apr 2021 13:22:46 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 May 2021 12:53:47 +02:00

perf/x86: Avoid touching LBR_TOS MSR for Arch LBR

The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
In a guest that should support Architecture LBR, check_msr()
will be a non-related check for the architecture MSR 0x0
(IA32_P5_MC_ADDR) that is also not supported by KVM.

The failure will cause x86_pmu.lbr_nr = 0, thereby preventing
the initialization of the guest Arch LBR. Fix it by avoiding
this extraneous check in intel_pmu_init() for Arch LBR.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Like Xu <like.xu@linux.intel.com>
[peterz: simpler still]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210430052247.3079672-1-like.xu@linux.intel.com
---
 arch/x86/events/intel/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2521d03..e288922 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6253,7 +6253,7 @@ __init int intel_pmu_init(void)
 	 * Check all LBT MSR here.
 	 * Disable LBR access if any LBR MSRs can not be accessed.
 	 */
-	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
+	if (x86_pmu.lbr_tos && !check_msr(x86_pmu.lbr_tos, 0x3UL))
 		x86_pmu.lbr_nr = 0;
 	for (i = 0; i < x86_pmu.lbr_nr; i++) {
 		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-19  8:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30  5:22 [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR Like Xu
2021-04-30  5:22 ` [PATCH v2 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
2021-05-10  2:10   ` Like Xu
2021-05-18  8:48     ` Like Xu
2021-05-18 10:02   ` Peter Zijlstra
2021-05-18 12:34     ` Like Xu
2021-05-18 13:29       ` Peter Zijlstra
2021-05-19  8:02   ` [tip: perf/urgent] perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context tip-bot2 for Like Xu
2021-05-18  9:30 ` [PATCH v2 1/2] perf/x86: Skip checking if 0x0 MSR exists for guest Arch LBR Peter Zijlstra
2021-05-19  8:02 ` [tip: perf/urgent] perf/x86: Avoid touching LBR_TOS MSR for " tip-bot2 for Like Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).