All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0
@ 2021-04-21  2:18 Like Xu
  2021-04-21  2:18 ` [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
  2021-04-21 15:30 ` [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0 Sean Christopherson
  0 siblings, 2 replies; 8+ messages in thread
From: Like Xu @ 2021-04-21  2:18 UTC (permalink / raw)
  To: Peter Zijlstra, Kan Liang
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel, Like Xu

The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
When ARCH_LBR we don't set lbr_tos, the failure from the
check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
thereby preventing the initialization of the guest LBR.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5272f349dca2..5036496caa60 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
 	u64 val_old, val_new, val_tmp;
 
 	/*
-	 * Disable the check for real HW, so we don't
+	 * Disable the check for real HW or non-sense msr, so we don't
 	 * mess with potentionaly enabled registers:
 	 */
-	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
+	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) || !msr)
 		return true;
 
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-04-21  2:18 [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0 Like Xu
@ 2021-04-21  2:18 ` Like Xu
  2021-04-21  8:38   ` Peter Zijlstra
  2021-04-21 15:30 ` [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0 Sean Christopherson
  1 sibling, 1 reply; 8+ messages in thread
From: Like Xu @ 2021-04-21  2:18 UTC (permalink / raw)
  To: Peter Zijlstra, Kan Liang
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel, Like Xu

If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
might_sleep_if() deep in kmem_cache_alloc() will generate the following
trace, and potentially cause a deadlock when another LBR event is added:

[  243.115549] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
[  243.117576] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 839, name: perf
[  243.119326] INFO: lockdep is turned off.
[  243.120249] irq event stamp: 0
[  243.120967] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  243.122415] hardirqs last disabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[  243.124302] softirqs last  enabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[  243.126255] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  243.128119] CPU: 0 PID: 839 Comm: perf Not tainted 5.11.0-rc4-guest+ #8
[  243.129654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  243.131520] Call Trace:
[  243.132112]  dump_stack+0x8d/0xb5
[  243.132896]  ___might_sleep.cold.106+0xb3/0xc3
[  243.133984]  slab_pre_alloc_hook.constprop.85+0x96/0xd0
[  243.135208]  ? intel_pmu_lbr_add+0x152/0x170
[  243.136207]  kmem_cache_alloc+0x36/0x250
[  243.137126]  intel_pmu_lbr_add+0x152/0x170
[  243.138088]  x86_pmu_add+0x83/0xd0
[  243.138889]  ? lock_acquire+0x158/0x350
[  243.139791]  ? lock_acquire+0x158/0x350
[  243.140694]  ? lock_acquire+0x158/0x350
[  243.141625]  ? lock_acquired+0x1e3/0x360
[  243.142544]  ? lock_release+0x1bf/0x340
[  243.143726]  ? trace_hardirqs_on+0x1a/0xd0
[  243.144823]  ? lock_acquired+0x1e3/0x360
[  243.145742]  ? lock_release+0x1bf/0x340
[  243.147107]  ? __slab_free+0x49/0x540
[  243.147966]  ? trace_hardirqs_on+0x1a/0xd0
[  243.148924]  event_sched_in.isra.129+0xf8/0x2a0
[  243.149989]  merge_sched_in+0x261/0x3e0
[  243.150889]  ? trace_hardirqs_on+0x1a/0xd0
[  243.151869]  visit_groups_merge.constprop.135+0x130/0x4a0
[  243.153122]  ? sched_clock_cpu+0xc/0xb0
[  243.154023]  ctx_sched_in+0x101/0x210
[  243.154884]  ctx_resched+0x6f/0xc0
[  243.155686]  perf_event_exec+0x21e/0x2e0
[  243.156641]  begin_new_exec+0x5e5/0xbd0
[  243.157540]  load_elf_binary+0x6af/0x1770
[  243.158478]  ? __kernel_read+0x19d/0x2b0
[  243.159977]  ? lock_acquire+0x158/0x350
[  243.160876]  ? __kernel_read+0x19d/0x2b0
[  243.161796]  bprm_execve+0x3c8/0x840
[  243.162638]  do_execveat_common.isra.38+0x1a5/0x1c0
[  243.163776]  __x64_sys_execve+0x32/0x40
[  243.164676]  do_syscall_64+0x33/0x40
[  243.165514]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  243.166746] RIP: 0033:0x7f6180a26feb
[  243.167590] Code: Unable to access opcode bytes at RIP 0x7f6180a26fc1.
[  243.169097] RSP: 002b:00007ffc6558ce18 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  243.170844] RAX: ffffffffffffffda RBX: 00007ffc65592d30 RCX: 00007f6180a26feb
[  243.172514] RDX: 000055657f408dc0 RSI: 00007ffc65592410 RDI: 00007ffc65592d30
[  243.174162] RBP: 00007ffc6558ce80 R08: 00007ffc6558cde0 R09: 0000000000000000
[  243.176042] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc65592410
[  243.177696] R13: 000055657f408dc0 R14: 0000000000000001 R15: 00007ffc65592410

One of the solution is to use GFP_ATOMIC, but it will make the code less
reliable under memory pressue. Let's move the memory allocation out of
the sleeping region and put it into the x86_reserve_hardware(). The LBR
xsave buffer is a per-CPU buffer, not a per-event buffer. This buffer is
allocated once when initializing the first event.

The disadvantage of this fix is that the cpuc->lbr_xsave memory
will be allocated for each cpu like the legacy ds_buffer.

Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/core.c       |  8 +++++---
 arch/x86/events/intel/bts.c  |  2 +-
 arch/x86/events/intel/lbr.c  | 23 +++++++++++++++++------
 arch/x86/events/perf_event.h |  8 +++++++-
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 3fe66b7aa721..ac8677ad8399 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -389,7 +389,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
 	return x86_pmu_extra_regs(val, event);
 }
 
-int x86_reserve_hardware(void)
+int x86_reserve_hardware(struct perf_event *event)
 {
 	int err = 0;
 
@@ -398,8 +398,10 @@ int x86_reserve_hardware(void)
 		if (atomic_read(&pmc_refcount) == 0) {
 			if (!reserve_pmc_hardware())
 				err = -EBUSY;
-			else
+			else {
 				reserve_ds_buffers();
+				reserve_lbr_buffers(event);
+			}
 		}
 		if (!err)
 			atomic_inc(&pmc_refcount);
@@ -650,7 +652,7 @@ static int __x86_pmu_event_init(struct perf_event *event)
 	if (!x86_pmu_initialized())
 		return -ENODEV;
 
-	err = x86_reserve_hardware();
+	err = x86_reserve_hardware(event);
 	if (err)
 		return err;
 
diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 731dd8d0dbb1..057bb2f761a9 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -564,7 +564,7 @@ static int bts_event_init(struct perf_event *event)
 	if (x86_add_exclusive(x86_lbr_exclusive_bts))
 		return -EBUSY;
 
-	ret = x86_reserve_hardware();
+	ret = x86_reserve_hardware(event);
 	if (ret) {
 		x86_del_exclusive(x86_lbr_exclusive_bts);
 		return ret;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index bb4486c4155a..49c014ee6080 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)
 
 void intel_pmu_lbr_add(struct perf_event *event)
 {
-	struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
 	if (!x86_pmu.lbr_nr)
@@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
 	perf_sched_cb_inc(event->ctx->pmu);
 	if (!cpuc->lbr_users++ && !event->total_time_running)
 		intel_pmu_lbr_reset();
-
-	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
-	    kmem_cache && !cpuc->lbr_xsave &&
-	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
-		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
 }
 
 void release_lbr_buffers(void)
@@ -722,6 +716,23 @@ void release_lbr_buffers(void)
 	}
 }
 
+void reserve_lbr_buffers(struct perf_event *event)
+{
+	struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+	struct cpu_hw_events *cpuc;
+	int cpu;
+
+	if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+		return;
+
+	for_each_possible_cpu(cpu) {
+		cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+		if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
+			cpuc->lbr_xsave = kmem_cache_alloc_node(kmem_cache, GFP_KERNEL,
+								cpu_to_node(cpu));
+	}
+}
+
 void intel_pmu_lbr_del(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e7d4fd..ca46625c6f33 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1067,7 +1067,7 @@ int x86_add_exclusive(unsigned int what);
 
 void x86_del_exclusive(unsigned int what);
 
-int x86_reserve_hardware(void);
+int x86_reserve_hardware(struct perf_event *event);
 
 void x86_release_hardware(void);
 
@@ -1244,6 +1244,8 @@ void reserve_ds_buffers(void);
 
 void release_lbr_buffers(void);
 
+void reserve_lbr_buffers(struct perf_event *event);
+
 extern struct event_constraint bts_constraint;
 extern struct event_constraint vlbr_constraint;
 
@@ -1393,6 +1395,10 @@ static inline void release_lbr_buffers(void)
 {
 }
 
+static inline void reserve_lbr_buffers(struct perf_event *event)
+{
+}
+
 static inline int intel_pmu_init(void)
 {
 	return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-04-21  2:18 ` [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
@ 2021-04-21  8:38   ` Peter Zijlstra
  2021-04-21  8:48     ` Like Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2021-04-21  8:38 UTC (permalink / raw)
  To: Like Xu
  Cc: Kan Liang, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel

On Wed, Apr 21, 2021 at 10:18:25AM +0800, Like Xu wrote:
> -int x86_reserve_hardware(void)
> +int x86_reserve_hardware(struct perf_event *event)
>  {
>  	int err = 0;
>  
> @@ -398,8 +398,10 @@ int x86_reserve_hardware(void)
>  		if (atomic_read(&pmc_refcount) == 0) {
>  			if (!reserve_pmc_hardware())
>  				err = -EBUSY;
> -			else
> +			else {
>  				reserve_ds_buffers();
> +				reserve_lbr_buffers(event);
> +			}
>  		}
>  		if (!err)
>  			atomic_inc(&pmc_refcount);
> @@ -650,7 +652,7 @@ static int __x86_pmu_event_init(struct perf_event *event)
>  	if (!x86_pmu_initialized())
>  		return -ENODEV;
>  
> -	err = x86_reserve_hardware();
> +	err = x86_reserve_hardware(event);
>  	if (err)
>  		return err;
>  

This is still complete garbage..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-04-21  8:38   ` Peter Zijlstra
@ 2021-04-21  8:48     ` Like Xu
  2021-04-21  9:21       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Like Xu @ 2021-04-21  8:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kan Liang, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel

Hi Peter,

On 2021/4/21 16:38, Peter Zijlstra wrote:
> On Wed, Apr 21, 2021 at 10:18:25AM +0800, Like Xu wrote:
>> -int x86_reserve_hardware(void)
>> +int x86_reserve_hardware(struct perf_event *event)
>>   {
>>   	int err = 0;
>>   
>> @@ -398,8 +398,10 @@ int x86_reserve_hardware(void)
>>   		if (atomic_read(&pmc_refcount) == 0) {
>>   			if (!reserve_pmc_hardware())
>>   				err = -EBUSY;
>> -			else
>> +			else {
>>   				reserve_ds_buffers();
>> +				reserve_lbr_buffers(event);
>> +			}
>>   		}
>>   		if (!err)
>>   			atomic_inc(&pmc_refcount);
>> @@ -650,7 +652,7 @@ static int __x86_pmu_event_init(struct perf_event *event)
>>   	if (!x86_pmu_initialized())
>>   		return -ENODEV;
>>   
>> -	err = x86_reserve_hardware();
>> +	err = x86_reserve_hardware(event);
>>   	if (err)
>>   		return err;
>>   
> 
> This is still complete garbage..

Hhh,thanks for your comment!

So do we have a better idea to alloc cpuc->lbr_xsave
to avoid this kind of call trace ?

> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
  2021-04-21  8:48     ` Like Xu
@ 2021-04-21  9:21       ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2021-04-21  9:21 UTC (permalink / raw)
  To: Like Xu
  Cc: Kan Liang, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, x86, linux-kernel

On Wed, Apr 21, 2021 at 04:48:36PM +0800, Like Xu wrote:
> Hi Peter,
> 
> On 2021/4/21 16:38, Peter Zijlstra wrote:
> > On Wed, Apr 21, 2021 at 10:18:25AM +0800, Like Xu wrote:
> > > -int x86_reserve_hardware(void)
> > > +int x86_reserve_hardware(struct perf_event *event)
> > >   {
> > >   	int err = 0;
> > > @@ -398,8 +398,10 @@ int x86_reserve_hardware(void)
> > >   		if (atomic_read(&pmc_refcount) == 0) {
> > >   			if (!reserve_pmc_hardware())
> > >   				err = -EBUSY;
> > > -			else
> > > +			else {
> > >   				reserve_ds_buffers();
> > > +				reserve_lbr_buffers(event);
> > > +			}
> > >   		}
> > >   		if (!err)
> > >   			atomic_inc(&pmc_refcount);
> > > @@ -650,7 +652,7 @@ static int __x86_pmu_event_init(struct perf_event *event)
> > >   	if (!x86_pmu_initialized())
> > >   		return -ENODEV;
> > > -	err = x86_reserve_hardware();
> > > +	err = x86_reserve_hardware(event);
> > >   	if (err)
> > >   		return err;
> > 
> > This is still complete garbage..
> 
> Hhh,thanks for your comment!

The nice one was here:

  https://lkml.kernel.org/r/20210323214140.GE4746@worktop.programming.kicks-ass.net

> So do we have a better idea to alloc cpuc->lbr_xsave
> to avoid this kind of call trace ?

You thinking this is actually hard scares me. Frob something in
intel_pmu_hw_config() or thereabouts.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0
  2021-04-21  2:18 [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0 Like Xu
  2021-04-21  2:18 ` [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
@ 2021-04-21 15:30 ` Sean Christopherson
  2021-04-22  1:18   ` Like Xu
  1 sibling, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2021-04-21 15:30 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Kan Liang, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, x86, linux-kernel

On Wed, Apr 21, 2021, Like Xu wrote:
> The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
> When ARCH_LBR we don't set lbr_tos, the failure from the
> check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
> thereby preventing the initialization of the guest LBR.
> 
> Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  arch/x86/events/intel/core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 5272f349dca2..5036496caa60 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
>  	u64 val_old, val_new, val_tmp;
>  
>  	/*
> -	 * Disable the check for real HW, so we don't
> +	 * Disable the check for real HW or non-sense msr, so we don't

I think this should be "undefined MSR" or something along those lines.  MSR 0x0
is a "real" MSR, on Intel CPUs it's an alias for IA32_MC0_ADDR; at least it's
supposed to be, most/all Intel CPUs incorrectly alias it to IA32_MC0_CTL.

Anyways, my point is that if your definition of "nonsense" is any MSR that is
not a valid perf MSR, then this check is woefully incompletely.  If your
definition is a nonsensical value, then this comment is simply wrong.

What you're really looking for is precisely the case where the MSR was zero
initialized and never defined.

>  	 * mess with potentionaly enabled registers:
>  	 */
> -	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
> +	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) || !msr)
>  		return true;
>  
>  	/*
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0
  2021-04-21 15:30 ` [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0 Sean Christopherson
@ 2021-04-22  1:18   ` Like Xu
  2021-04-22  1:38     ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Like Xu @ 2021-04-22  1:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Kan Liang, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, x86, linux-kernel

On 2021/4/21 23:30, Sean Christopherson wrote:
> On Wed, Apr 21, 2021, Like Xu wrote:
>> The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
>> When ARCH_LBR we don't set lbr_tos, the failure from the
>> check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
>> thereby preventing the initialization of the guest LBR.
>>
>> Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>   arch/x86/events/intel/core.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 5272f349dca2..5036496caa60 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
>>   	u64 val_old, val_new, val_tmp;
>>   
>>   	/*
>> -	 * Disable the check for real HW, so we don't
>> +	 * Disable the check for real HW or non-sense msr, so we don't
> 
> I think this should be "undefined MSR" or something along those lines.  MSR 0x0
> is a "real" MSR, on Intel CPUs it's an alias for IA32_MC0_ADDR; at least it's
> supposed to be, most/all Intel CPUs incorrectly alias it to IA32_MC0_CTL.

Thank you, Sean.

<idle>-0       [000] dN.. 38980.032347: read_msr: 0, value fff

Do we have a historic story or specification for this kind of alias ?

#define MSR_IA32_MC0_ADDR               0x00000402
#define MSR_IA32_MC0_CTL                0x00000400

> 
> Anyways, my point is that if your definition of "nonsense" is any MSR that is
> not a valid perf MSR, then this check is woefully incompletely.  If your
> definition is a nonsensical value, then this comment is simply wrong.
> 
> What you're really looking for is precisely the case where the MSR was zero
> initialized and never defined.
> 
>>   	 * mess with potentionaly enabled registers:
>>   	 */
>> -	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
>> +	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) || !msr)
>>   		return true;
>>   
>>   	/*
>> -- 
>> 2.30.2
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0
  2021-04-22  1:18   ` Like Xu
@ 2021-04-22  1:38     ` Sean Christopherson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-04-22  1:38 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Kan Liang, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, x86, linux-kernel

On Thu, Apr 22, 2021, Like Xu wrote:
> On 2021/4/21 23:30, Sean Christopherson wrote:
> > On Wed, Apr 21, 2021, Like Xu wrote:
> > > The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
> > > When ARCH_LBR we don't set lbr_tos, the failure from the
> > > check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
> > > thereby preventing the initialization of the guest LBR.
> > > 
> > > Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
> > > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > > Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> > > ---
> > >   arch/x86/events/intel/core.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index 5272f349dca2..5036496caa60 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
> > >   	u64 val_old, val_new, val_tmp;
> > >   	/*
> > > -	 * Disable the check for real HW, so we don't
> > > +	 * Disable the check for real HW or non-sense msr, so we don't
> > 
> > I think this should be "undefined MSR" or something along those lines.  MSR 0x0
> > is a "real" MSR, on Intel CPUs it's an alias for IA32_MC0_ADDR; at least it's
> > supposed to be, most/all Intel CPUs incorrectly alias it to IA32_MC0_CTL.
> 
> Thank you, Sean.
> 
> <idle>-0       [000] dN.. 38980.032347: read_msr: 0, value fff
> 
> Do we have a historic story or specification for this kind of alias ?

It's kinda documented in the SDM under "2.1 ARCHITECTURAL MSRS"

  0H 0 IA32_P5_MC_ADDR (P5_MC_ADDR)  Pentium Processor (05_01H)
  1H 1 IA32_P5_MC_TYPE (P5_MC_TYPE)  DF_DM = 05_01H

The history is that very early machine check support only had a single "bank",
with MSR 0x0 holding the address and MSR 0x1 holding the type.  When the MSRs were
relocated to the 0x400 range, presumably to have room to grow the list, the MSRs
were aliased to maintain backwards compatibility (again, an assumption).

Unfortunately, that backwards compatibility apparently didn't get tested, and MSR
0x0 ended up aliased to 0x400 instead of 0x402.

The only reason I'm aware of all this because SGX is soft disabled by ucode if
any of the machine check banks are disabled by writing MCn_CTL.  Some folks found
out the hard way way doing WRMSR with an uninitialized index, i.e. WRMSR(0),
would disable SGX.

If you want a good giggle, you can verify on pretty much any Intel silicon:

  $ rdmsr 0x400
  ff
  $ wrmsr 0x0 0
  $ rdmsr 0x400
  0

> #define MSR_IA32_MC0_ADDR               0x00000402
> #define MSR_IA32_MC0_CTL                0x00000400

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-22  1:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21  2:18 [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0 Like Xu
2021-04-21  2:18 ` [PATCH RESEND 2/2] perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region Like Xu
2021-04-21  8:38   ` Peter Zijlstra
2021-04-21  8:48     ` Like Xu
2021-04-21  9:21       ` Peter Zijlstra
2021-04-21 15:30 ` [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0 Sean Christopherson
2021-04-22  1:18   ` Like Xu
2021-04-22  1:38     ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.