All of lore.kernel.org
 help / color / mirror / Atom feed
* [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318
@ 2021-07-28 16:49 Vince Weaver
  2021-07-29  9:14 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Vince Weaver @ 2021-07-28 16:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim

Hello

ther perf_fuzzer on current linux-git on a Haswell system triggers the 
following.

I've truncated the call chain, as it goes on for quite a while, let me 
know if you want/need more information.

Vince

[32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
[32694.101374] Call Trace:
[32694.103974]  perf_clear_dirty_counters+0x86/0x100
[32694.109027]  switch_mm_irqs_off+0x1d0/0x430
[32694.113498]  __schedule+0x29f/0x1490
[32694.117300]  ? cr4_update_irqsoff+0x2a/0x30
[32694.121762]  ? switch_mm_irqs_off+0x1ba/0x430
[32694.126418]  ? rcu_eqs_exit.constprop.0+0x2e/0x60
[32694.131515]  ? cpuidle_enter_state+0xb7/0x350
[32694.136152]  schedule_idle+0x26/0x40
[32694.139974]  do_idle+0x16e/0x280
[32694.143421]  cpu_startup_entry+0x19/0x20
[32694.147643]  secondary_startup_64_no_verify+0xb0/0xbb
[32694.973583] Call Trace:
[32694.976215]  perf_clear_dirty_counters+0x86/0x100
[32694.981290]  switch_mm_irqs_off+0x1d0/0x430
[32694.985797]  __schedule+0x29f/0x1490
[32694.989664]  ? cr4_update_irqsoff+0x2a/0x30
[32694.994134]  ? switch_mm_irqs_off+0x1ba/0x430
[32694.998789]  ? rcu_eqs_exit.constprop.0+0x2e/0x60
[32695.003831]  ? cpuidle_enter_state+0xb7/0x350
[32695.008521]  schedule_idle+0x26/0x40
[32695.012362]  do_idle+0x16e/0x280
[32695.015838]  cpu_startup_entry+0x19/0x20
[32695.020041]  secondary_startup_64_no_verify+0xb0/0xbb
[32695.126530] Call Trace:
[32695.129184]  perf_clear_dirty_counters+0x86/0x100
[32695.134268]  switch_mm_irqs_off+0x1d0/0x430
[32695.138757]  __schedule+0x29f/0x1490
[32695.142577]  ? tick_nohz_get_sleep_length+0x6b/0xa0
[32695.147806]  ? rcu_eqs_exit.constprop.0+0x2e/0x60
[32695.152806]  ? cpuidle_enter_state+0xb7/0x350
[32695.157500]  schedule_idle+0x26/0x40
[32695.161342]  do_idle+0x16e/0x280
[32695.164798]  cpu_startup_entry+0x19/0x20
[32695.169010]  secondary_startup_64_no_verify+0xb0/0xbb


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318
  2021-07-28 16:49 [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318 Vince Weaver
@ 2021-07-29  9:14 ` Peter Zijlstra
  2021-07-29 13:21   ` Liang, Kan
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Peter Zijlstra @ 2021-07-29  9:14 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Liang,
	Kan

On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
> [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
> [32694.101374] Call Trace:
> [32694.103974]  perf_clear_dirty_counters+0x86/0x100

Hmm.. if I read this right that's MSR_ARCH_PERFMON_FIXED_CTR0 + i, given
that FIXED_CTR0 is 0x309 that gives i == 15, which is FIXED_BTS.

I'm thinking something like this ought to cure things.

---
 arch/x86/events/core.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1eb45139fcc6..04edf8017961 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2489,13 +2489,15 @@ void perf_clear_dirty_counters(void)
 		return;
 
 	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
-		/* Metrics and fake events don't have corresponding HW counters. */
-		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
-			continue;
-		else if (i >= INTEL_PMC_IDX_FIXED)
+		if (i >= INTEL_PMC_IDX_FIXED) {
+			/* Metrics and fake events don't have corresponding HW counters. */
+			if ((i - INTEL_PMC_IDX_FIXED) >= x86_pmu.num_counters_fixed)
+				continue;
+
 			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
-		else
+		} else {
 			wrmsrl(x86_pmu_event_addr(i), 0);
+		}
 	}
 
 	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318
  2021-07-29  9:14 ` Peter Zijlstra
@ 2021-07-29 13:21   ` Liang, Kan
  2021-07-29 15:30     ` Peter Zijlstra
  2021-07-29 16:54   ` Vince Weaver
  2021-08-05  9:34   ` [tip: perf/urgent] perf/x86: Fix out of bound MSR access tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 7+ messages in thread
From: Liang, Kan @ 2021-07-29 13:21 UTC (permalink / raw)
  To: Peter Zijlstra, Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Liang,
	Kan



On 7/29/2021 5:14 AM, Peter Zijlstra wrote:
> On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
>> [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
>> [32694.101374] Call Trace:
>> [32694.103974]  perf_clear_dirty_counters+0x86/0x100
> 
> Hmm.. if I read this right that's MSR_ARCH_PERFMON_FIXED_CTR0 + i, given
> that FIXED_CTR0 is 0x309 that gives i == 15, which is FIXED_BTS.
> 
> I'm thinking something like this ought to cure things.
> 
> ---
>   arch/x86/events/core.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 1eb45139fcc6..04edf8017961 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2489,13 +2489,15 @@ void perf_clear_dirty_counters(void)
>   		return;
>   
>   	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> -		/* Metrics and fake events don't have corresponding HW counters. */
> -		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> -			continue;
> -		else if (i >= INTEL_PMC_IDX_FIXED)
> +		if (i >= INTEL_PMC_IDX_FIXED) {
> +			/* Metrics and fake events don't have corresponding HW counters. */
> +			if ((i - INTEL_PMC_IDX_FIXED) >= x86_pmu.num_counters_fixed)

Yes, the fix is better. My previous implementation tries to pick up all 
the special cases. It's very likely to miss some special cases like 
FIXED_BTS and probably any new fake events added later if there are.
Thanks for the fix.

The x86_pmu.num_counters_fixed should work well on HSW. But we have 
hybrid machines now. I think we can use
hybrid(cpuc->pmu, num_counters_fixed) instead, which should be more 
accurate.


Thanks,
Kan

> +				continue;
> +
>   			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
> -		else
> +		} else {
>   			wrmsrl(x86_pmu_event_addr(i), 0);
> +		}
>   	}
>   
>   	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318
  2021-07-29 13:21   ` Liang, Kan
@ 2021-07-29 15:30     ` Peter Zijlstra
  2021-08-02 11:40       ` Like Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2021-07-29 15:30 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Liang, Kan

On Thu, Jul 29, 2021 at 09:21:01AM -0400, Liang, Kan wrote:
> 
> 
> On 7/29/2021 5:14 AM, Peter Zijlstra wrote:
> > On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
> > > [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
> > > [32694.101374] Call Trace:
> > > [32694.103974]  perf_clear_dirty_counters+0x86/0x100
> > 
> > Hmm.. if I read this right that's MSR_ARCH_PERFMON_FIXED_CTR0 + i, given
> > that FIXED_CTR0 is 0x309 that gives i == 15, which is FIXED_BTS.
> > 
> > I'm thinking something like this ought to cure things.
> > 
> > ---
> >   arch/x86/events/core.c | 12 +++++++-----
> >   1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 1eb45139fcc6..04edf8017961 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -2489,13 +2489,15 @@ void perf_clear_dirty_counters(void)
> >   		return;
> >   	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> > -		/* Metrics and fake events don't have corresponding HW counters. */
> > -		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> > -			continue;
> > -		else if (i >= INTEL_PMC_IDX_FIXED)
> > +		if (i >= INTEL_PMC_IDX_FIXED) {
> > +			/* Metrics and fake events don't have corresponding HW counters. */
> > +			if ((i - INTEL_PMC_IDX_FIXED) >= x86_pmu.num_counters_fixed)
> 
> Yes, the fix is better. My previous implementation tries to pick up all the
> special cases. It's very likely to miss some special cases like FIXED_BTS
> and probably any new fake events added later if there are.
> Thanks for the fix.
> 
> The x86_pmu.num_counters_fixed should work well on HSW. But we have hybrid
> machines now. I think we can use
> hybrid(cpuc->pmu, num_counters_fixed) instead, which should be more
> accurate.

Yes, good point. I guess I still need to adjust to the hybrid world.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318
  2021-07-29  9:14 ` Peter Zijlstra
  2021-07-29 13:21   ` Liang, Kan
@ 2021-07-29 16:54   ` Vince Weaver
  2021-08-05  9:34   ` [tip: perf/urgent] perf/x86: Fix out of bound MSR access tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 7+ messages in thread
From: Vince Weaver @ 2021-07-29 16:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Liang, Kan

On Thu, 29 Jul 2021, Peter Zijlstra wrote:

> On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
> > [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
> > [32694.101374] Call Trace:
> > [32694.103974]  perf_clear_dirty_counters+0x86/0x100
> 
> Hmm.. if I read this right that's MSR_ARCH_PERFMON_FIXED_CTR0 + i, given
> that FIXED_CTR0 is 0x309 that gives i == 15, which is FIXED_BTS.
> 
> I'm thinking something like this ought to cure things.

I know it sounds like the complete fix is a bit different from this, but I 
did want to report that the patch does fix the issue on my machine.

Vince Weaver
vincent.weaver@maine.edu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318
  2021-07-29 15:30     ` Peter Zijlstra
@ 2021-08-02 11:40       ` Like Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Like Xu @ 2021-08-02 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Liang, Kan
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Liang, Kan

On 29/7/2021 11:30 pm, Peter Zijlstra wrote:
> On Thu, Jul 29, 2021 at 09:21:01AM -0400, Liang, Kan wrote:
>>
>>
>> On 7/29/2021 5:14 AM, Peter Zijlstra wrote:
>>> On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
>>>> [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
>>>> [32694.101374] Call Trace:
>>>> [32694.103974]  perf_clear_dirty_counters+0x86/0x100
>>>
>>> Hmm.. if I read this right that's MSR_ARCH_PERFMON_FIXED_CTR0 + i, given
>>> that FIXED_CTR0 is 0x309 that gives i == 15, which is FIXED_BTS.
>>>
>>> I'm thinking something like this ought to cure things.
>>>
>>> ---
>>>    arch/x86/events/core.c | 12 +++++++-----
>>>    1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>>> index 1eb45139fcc6..04edf8017961 100644
>>> --- a/arch/x86/events/core.c
>>> +++ b/arch/x86/events/core.c
>>> @@ -2489,13 +2489,15 @@ void perf_clear_dirty_counters(void)
>>>    		return;
>>>    	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
>>> -		/* Metrics and fake events don't have corresponding HW counters. */
>>> -		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
>>> -			continue;
>>> -		else if (i >= INTEL_PMC_IDX_FIXED)
>>> +		if (i >= INTEL_PMC_IDX_FIXED) {
>>> +			/* Metrics and fake events don't have corresponding HW counters. */
>>> +			if ((i - INTEL_PMC_IDX_FIXED) >= x86_pmu.num_counters_fixed)
>>
>> Yes, the fix is better. My previous implementation tries to pick up all the
>> special cases. It's very likely to miss some special cases like FIXED_BTS
>> and probably any new fake events added later if there are.
>> Thanks for the fix.
>>
>> The x86_pmu.num_counters_fixed should work well on HSW. But we have hybrid
>> machines now. I think we can use
>> hybrid(cpuc->pmu, num_counters_fixed) instead, which should be more
>> accurate.
> 
> Yes, good point. I guess I still need to adjust to the hybrid world.

I recently enabled guest BTS and encountered the same call trace.

For the proposal hybrid(cpuc->pmu, num_counters_fixed),
Tested-by: Like Xu <likexu@tencent.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: perf/urgent] perf/x86: Fix out of bound MSR access
  2021-07-29  9:14 ` Peter Zijlstra
  2021-07-29 13:21   ` Liang, Kan
  2021-07-29 16:54   ` Vince Weaver
@ 2021-08-05  9:34   ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-08-05  9:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vince Weaver, Peter Zijlstra (Intel), Like Xu, x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     f4b4b45652578357031fbbef7f7a1b04f6fa2dc3
Gitweb:        https://git.kernel.org/tip/f4b4b45652578357031fbbef7f7a1b04f6fa2dc3
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 29 Jul 2021 11:14:57 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 04 Aug 2021 15:16:33 +02:00

perf/x86: Fix out of bound MSR access

On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
> [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
> [32694.101374] Call Trace:
> [32694.103974]  perf_clear_dirty_counters+0x86/0x100

The problem being that it doesn't filter out all fake counters, in
specific the above (erroneously) tries to use FIXED_BTS. Limit the
fixed counters indexes to the hardware supplied number.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Like Xu <likexu@tencent.com>
Link: https://lkml.kernel.org/r/YQJxka3dxgdIdebG@hirez.programming.kicks-ass.net
---
 arch/x86/events/core.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1eb4513..3092fbf 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2489,13 +2489,15 @@ void perf_clear_dirty_counters(void)
 		return;
 
 	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
-		/* Metrics and fake events don't have corresponding HW counters. */
-		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
-			continue;
-		else if (i >= INTEL_PMC_IDX_FIXED)
+		if (i >= INTEL_PMC_IDX_FIXED) {
+			/* Metrics and fake events don't have corresponding HW counters. */
+			if ((i - INTEL_PMC_IDX_FIXED) >= hybrid(cpuc->pmu, num_counters_fixed))
+				continue;
+
 			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
-		else
+		} else {
 			wrmsrl(x86_pmu_event_addr(i), 0);
+		}
 	}
 
 	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-05  9:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28 16:49 [perf] fuzzer triggers unchecked MSR access error: WRMSR to 0x318 Vince Weaver
2021-07-29  9:14 ` Peter Zijlstra
2021-07-29 13:21   ` Liang, Kan
2021-07-29 15:30     ` Peter Zijlstra
2021-08-02 11:40       ` Like Xu
2021-07-29 16:54   ` Vince Weaver
2021-08-05  9:34   ` [tip: perf/urgent] perf/x86: Fix out of bound MSR access tip-bot2 for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.