linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL
@ 2020-05-27 12:30 kan.liang
  2020-05-27 12:59 ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: kan.liang @ 2020-05-27 12:30 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: ak, Kan Liang, stable

From: Kan Liang <kan.liang@linux.intel.com>

When counting IMC uncore events on some TGL machines, an oops will be
triggered.
  [ 393.101262] BUG: unable to handle page fault for address:
  ffffb45200e15858
  [ 393.101269] #PF: supervisor read access in kernel mode
  [ 393.101271] #PF: error_code(0x0000) - not-present page

Current perf uncore driver still use the IMC MAP SIZE inherited from
SNB, which is 0x6000.
However, the offset of IMC uncore counters for some TGL machines is
larger than 0x6000, e.g. 0xd8a0.

Enlarge the IMC MAP SIZE for TGL to 0xe000.

Fixes: fdb64822443e ("perf/x86: Add Intel Tiger Lake uncore support")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Tested-by: Chao Qin <chao.qin@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/events/intel/uncore_snb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index 3de1065..1038e9f 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -1085,6 +1085,7 @@ static struct pci_dev *tgl_uncore_get_mc_dev(void)
 }
 
 #define TGL_UNCORE_MMIO_IMC_MEM_OFFSET		0x10000
+#define TGL_UNCORE_PCI_IMC_MAP_SIZE		0xe000
 
 static void tgl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
 {
@@ -1112,7 +1113,7 @@ static void tgl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
 	addr |= ((resource_size_t)mch_bar << 32);
 #endif
 
-	box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
+	box->io_addr = ioremap(addr, TGL_UNCORE_PCI_IMC_MAP_SIZE);
 }
 
 static struct intel_uncore_ops tgl_uncore_imc_freerunning_ops = {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL
  2020-05-27 12:30 [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL kan.liang
@ 2020-05-27 12:59 ` David Laight
  2020-05-27 14:46   ` Liang, Kan
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2020-05-27 12:59 UTC (permalink / raw)
  To: 'kan.liang@linux.intel.com', peterz, mingo, linux-kernel
  Cc: ak, stable

From: kan.liang@linux.intel.com
> Sent: 27 May 2020 13:31
> 
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> When counting IMC uncore events on some TGL machines, an oops will be
> triggered.
>   [ 393.101262] BUG: unable to handle page fault for address:
>   ffffb45200e15858
>   [ 393.101269] #PF: supervisor read access in kernel mode
>   [ 393.101271] #PF: error_code(0x0000) - not-present page
> 
> Current perf uncore driver still use the IMC MAP SIZE inherited from
> SNB, which is 0x6000.
> However, the offset of IMC uncore counters for some TGL machines is
> larger than 0x6000, e.g. 0xd8a0.
> 
> Enlarge the IMC MAP SIZE for TGL to 0xe000.

Replacing one 'random' constant with a different one
doesn't seem like a proper fix.

Surely the actual bounds of the 'memory' area are properly
defined somewhere.
Or at least should come from a table.

You also need to verify that the offsets are within the mapped area.
An unexpected offset shouldn't try to access an invalid address.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL
  2020-05-27 12:59 ` David Laight
@ 2020-05-27 14:46   ` Liang, Kan
  2020-05-27 14:51     ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Liang, Kan @ 2020-05-27 14:46 UTC (permalink / raw)
  To: David Laight, peterz, mingo, linux-kernel; +Cc: ak, stable



On 5/27/2020 8:59 AM, David Laight wrote:
> From: kan.liang@linux.intel.com
>> Sent: 27 May 2020 13:31
>>
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> When counting IMC uncore events on some TGL machines, an oops will be
>> triggered.
>>    [ 393.101262] BUG: unable to handle page fault for address:
>>    ffffb45200e15858
>>    [ 393.101269] #PF: supervisor read access in kernel mode
>>    [ 393.101271] #PF: error_code(0x0000) - not-present page
>>
>> Current perf uncore driver still use the IMC MAP SIZE inherited from
>> SNB, which is 0x6000.
>> However, the offset of IMC uncore counters for some TGL machines is
>> larger than 0x6000, e.g. 0xd8a0.
>>
>> Enlarge the IMC MAP SIZE for TGL to 0xe000.
> 
> Replacing one 'random' constant with a different one
> doesn't seem like a proper fix.
> 
> Surely the actual bounds of the 'memory' area are properly
> defined somewhere.
> Or at least should come from a table.
> 
> You also need to verify that the offsets are within the mapped area.
> An unexpected offset shouldn't try to access an invalid address.

Thanks for the review.

I agree that we should add a check before mapping the area to prevent 
the issue happens again.

I think the check should be a generic check for all platforms which try 
to map an area, not just for TGL. I will submit a separate patch for the 
check.

Thanks,
Kan

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL
  2020-05-27 14:46   ` Liang, Kan
@ 2020-05-27 14:51     ` David Laight
  2020-05-27 15:01       ` Liang, Kan
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2020-05-27 14:51 UTC (permalink / raw)
  To: 'Liang, Kan', peterz, mingo, linux-kernel; +Cc: ak, stable

From: Liang, Kan
> Sent: 27 May 2020 15:47
> On 5/27/2020 8:59 AM, David Laight wrote:
> > From: kan.liang@linux.intel.com
> >> Sent: 27 May 2020 13:31
> >>
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> When counting IMC uncore events on some TGL machines, an oops will be
> >> triggered.
> >>    [ 393.101262] BUG: unable to handle page fault for address:
> >>    ffffb45200e15858
> >>    [ 393.101269] #PF: supervisor read access in kernel mode
> >>    [ 393.101271] #PF: error_code(0x0000) - not-present page
> >>
> >> Current perf uncore driver still use the IMC MAP SIZE inherited from
> >> SNB, which is 0x6000.
> >> However, the offset of IMC uncore counters for some TGL machines is
> >> larger than 0x6000, e.g. 0xd8a0.
> >>
> >> Enlarge the IMC MAP SIZE for TGL to 0xe000.
> >
> > Replacing one 'random' constant with a different one
> > doesn't seem like a proper fix.
> >
> > Surely the actual bounds of the 'memory' area are properly
> > defined somewhere.
> > Or at least should come from a table.
> >
> > You also need to verify that the offsets are within the mapped area.
> > An unexpected offset shouldn't try to access an invalid address.
> 
> Thanks for the review.
> 
> I agree that we should add a check before mapping the area to prevent
> the issue happens again.
> 
> I think the check should be a generic check for all platforms which try
> to map an area, not just for TGL. I will submit a separate patch for the
> check.

You need a check that the actual access is withing the mapped area.
So instead of getting an OOPS you get a error.

This is after you've mapped it.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL
  2020-05-27 14:51     ` David Laight
@ 2020-05-27 15:01       ` Liang, Kan
  2020-05-27 15:17         ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Liang, Kan @ 2020-05-27 15:01 UTC (permalink / raw)
  To: David Laight, peterz, mingo, linux-kernel; +Cc: ak, stable



On 5/27/2020 10:51 AM, David Laight wrote:
> From: Liang, Kan
>> Sent: 27 May 2020 15:47
>> On 5/27/2020 8:59 AM, David Laight wrote:
>>> From: kan.liang@linux.intel.com
>>>> Sent: 27 May 2020 13:31
>>>>
>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>
>>>> When counting IMC uncore events on some TGL machines, an oops will be
>>>> triggered.
>>>>     [ 393.101262] BUG: unable to handle page fault for address:
>>>>     ffffb45200e15858
>>>>     [ 393.101269] #PF: supervisor read access in kernel mode
>>>>     [ 393.101271] #PF: error_code(0x0000) - not-present page
>>>>
>>>> Current perf uncore driver still use the IMC MAP SIZE inherited from
>>>> SNB, which is 0x6000.
>>>> However, the offset of IMC uncore counters for some TGL machines is
>>>> larger than 0x6000, e.g. 0xd8a0.
>>>>
>>>> Enlarge the IMC MAP SIZE for TGL to 0xe000.
>>>
>>> Replacing one 'random' constant with a different one
>>> doesn't seem like a proper fix.
>>>
>>> Surely the actual bounds of the 'memory' area are properly
>>> defined somewhere.
>>> Or at least should come from a table.
>>>
>>> You also need to verify that the offsets are within the mapped area.
>>> An unexpected offset shouldn't try to access an invalid address.
>>
>> Thanks for the review.
>>
>> I agree that we should add a check before mapping the area to prevent
>> the issue happens again.
>>
>> I think the check should be a generic check for all platforms which try
>> to map an area, not just for TGL. I will submit a separate patch for the
>> check.
> 
> You need a check that the actual access is withing the mapped area.
> So instead of getting an OOPS you get a error.
> 
> This is after you've mapped it.

Sure. Will add a WARN_ONCE() before the actual access.

Thanks,
Kan

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL
  2020-05-27 15:01       ` Liang, Kan
@ 2020-05-27 15:17         ` David Laight
  2020-05-27 16:02           ` Liang, Kan
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2020-05-27 15:17 UTC (permalink / raw)
  To: 'Liang, Kan', peterz, mingo, linux-kernel; +Cc: ak, stable

From: Liang, Kan
> Sent: 27 May 2020 16:01
> On 5/27/2020 10:51 AM, David Laight wrote:
> > From: Liang, Kan
> >> Sent: 27 May 2020 15:47
> >> On 5/27/2020 8:59 AM, David Laight wrote:
> >>> From: kan.liang@linux.intel.com
> >>>> Sent: 27 May 2020 13:31
> >>>>
> >>>> From: Kan Liang <kan.liang@linux.intel.com>
> >>>>
> >>>> When counting IMC uncore events on some TGL machines, an oops will be
> >>>> triggered.
> >>>>     [ 393.101262] BUG: unable to handle page fault for address:
> >>>>     ffffb45200e15858
> >>>>     [ 393.101269] #PF: supervisor read access in kernel mode
> >>>>     [ 393.101271] #PF: error_code(0x0000) - not-present page
> >>>>
> >>>> Current perf uncore driver still use the IMC MAP SIZE inherited from
> >>>> SNB, which is 0x6000.
> >>>> However, the offset of IMC uncore counters for some TGL machines is
> >>>> larger than 0x6000, e.g. 0xd8a0.
> >>>>
> >>>> Enlarge the IMC MAP SIZE for TGL to 0xe000.
> >>>
> >>> Replacing one 'random' constant with a different one
> >>> doesn't seem like a proper fix.
> >>>
> >>> Surely the actual bounds of the 'memory' area are properly
> >>> defined somewhere.
> >>> Or at least should come from a table.
> >>>
> >>> You also need to verify that the offsets are within the mapped area.
> >>> An unexpected offset shouldn't try to access an invalid address.
> >>
> >> Thanks for the review.
> >>
> >> I agree that we should add a check before mapping the area to prevent
> >> the issue happens again.
> >>
> >> I think the check should be a generic check for all platforms which try
> >> to map an area, not just for TGL. I will submit a separate patch for the
> >> check.
> >
> > You need a check that the actual access is withing the mapped area.
> > So instead of getting an OOPS you get a error.
> >
> > This is after you've mapped it.
> 
> Sure. Will add a WARN_ONCE() before the actual access.

No that will still panic some systems.
pr_warn() is all you need.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL
  2020-05-27 15:17         ` David Laight
@ 2020-05-27 16:02           ` Liang, Kan
  0 siblings, 0 replies; 7+ messages in thread
From: Liang, Kan @ 2020-05-27 16:02 UTC (permalink / raw)
  To: David Laight, peterz, mingo, linux-kernel; +Cc: ak, stable



On 5/27/2020 11:17 AM, David Laight wrote:
> From: Liang, Kan
>> Sent: 27 May 2020 16:01
>> On 5/27/2020 10:51 AM, David Laight wrote:
>>> From: Liang, Kan
>>>> Sent: 27 May 2020 15:47
>>>> On 5/27/2020 8:59 AM, David Laight wrote:
>>>>> From: kan.liang@linux.intel.com
>>>>>> Sent: 27 May 2020 13:31
>>>>>>
>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>>
>>>>>> When counting IMC uncore events on some TGL machines, an oops will be
>>>>>> triggered.
>>>>>>      [ 393.101262] BUG: unable to handle page fault for address:
>>>>>>      ffffb45200e15858
>>>>>>      [ 393.101269] #PF: supervisor read access in kernel mode
>>>>>>      [ 393.101271] #PF: error_code(0x0000) - not-present page
>>>>>>
>>>>>> Current perf uncore driver still use the IMC MAP SIZE inherited from
>>>>>> SNB, which is 0x6000.
>>>>>> However, the offset of IMC uncore counters for some TGL machines is
>>>>>> larger than 0x6000, e.g. 0xd8a0.
>>>>>>
>>>>>> Enlarge the IMC MAP SIZE for TGL to 0xe000.
>>>>>
>>>>> Replacing one 'random' constant with a different one
>>>>> doesn't seem like a proper fix.
>>>>>
>>>>> Surely the actual bounds of the 'memory' area are properly
>>>>> defined somewhere.
>>>>> Or at least should come from a table.
>>>>>
>>>>> You also need to verify that the offsets are within the mapped area.
>>>>> An unexpected offset shouldn't try to access an invalid address.
>>>>
>>>> Thanks for the review.
>>>>
>>>> I agree that we should add a check before mapping the area to prevent
>>>> the issue happens again.
>>>>
>>>> I think the check should be a generic check for all platforms which try
>>>> to map an area, not just for TGL. I will submit a separate patch for the
>>>> check.
>>>
>>> You need a check that the actual access is withing the mapped area.
>>> So instead of getting an OOPS you get a error.
>>>
>>> This is after you've mapped it.
>>
>> Sure. Will add a WARN_ONCE() before the actual access.
> 
> No that will still panic some systems.
> pr_warn() is all you need.
>

If we print a warning for each access, there will be too many warnings.
I think I will use pr_warn_once instead.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-05-27 16:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 12:30 [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL kan.liang
2020-05-27 12:59 ` David Laight
2020-05-27 14:46   ` Liang, Kan
2020-05-27 14:51     ` David Laight
2020-05-27 15:01       ` Liang, Kan
2020-05-27 15:17         ` David Laight
2020-05-27 16:02           ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).