linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf record doesn't work on rtd129x SoC
@ 2019-12-04  4:55 Wang YanQing
  2019-12-04  7:28 ` Andreas Färber
  0 siblings, 1 reply; 6+ messages in thread
From: Wang YanQing @ 2019-12-04  4:55 UTC (permalink / raw)
  To: afaerber; +Cc: linux-kernel, linux-soc, linux-realtek-soc

Hi Andreas Färber!

I use "perf record" to debug performance issue on RTD1296 SOC, it does't work, but
the "perf stat" is ok!

After some dig in the kernel, I find the reason is no pmu overflow interrupt, I think
below pmu configuration isn't right for RTD1296:
"
        arm_pmu: arm-pmu {
                compatible = "arm,cortex-a53-pmu";
                interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
        };
"

We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't right too.

Any suggestion is welcome.

Thanks!




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf record doesn't work on rtd129x SoC
  2019-12-04  4:55 perf record doesn't work on rtd129x SoC Wang YanQing
@ 2019-12-04  7:28 ` Andreas Färber
  2019-12-04 11:20   ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Färber @ 2019-12-04  7:28 UTC (permalink / raw)
  To: Wang YanQing
  Cc: linux-kernel, linux-soc, linux-realtek-soc, linux-arm-kernel,
	Mark Rutland, Will Deacon

Hi YanQing,

+ LAKML + Mark + Will

Am 04.12.19 um 05:55 schrieb Wang YanQing:
> I use "perf record" to debug performance issue on RTD1296 SOC, it does't work, but
> the "perf stat" is ok!

Thanks for the report - which board, branch and (base) tag are you
testing against? And are you building perf yourself from kernel sources,
or are you using some distro package?

I only have Busybox in my initrd on DS418; I have not tested perf.

> After some dig in the kernel, I find the reason is no pmu overflow interrupt, I think
> below pmu configuration isn't right for RTD1296:
> "
>         arm_pmu: arm-pmu {
>                 compatible = "arm,cortex-a53-pmu";
>                 interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
>         };
> "
> 
> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't right too.

Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi:

&arm_pmu {
	interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;
};

48 and high/4 match what I see in the latest BSP:

https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116

> Any suggestion is welcome.
> 
> Thanks!

The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3".
By my reading of arch/arm64/kernel/perf_event.c the only difference
between the two should be the name and an extra cache_map. You could try
the other compatible string in your .dts, but I doubt it'll help.

Hopefully the Realtek or Arm guys can shed some light.

Regards,
Andreas

-- 
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer
HRB 36809 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf record doesn't work on rtd129x SoC
  2019-12-04  7:28 ` Andreas Färber
@ 2019-12-04 11:20   ` Robin Murphy
  2019-12-04 11:38     ` Marc Zyngier
  2019-12-04 14:51     ` Robin Murphy
  0 siblings, 2 replies; 6+ messages in thread
From: Robin Murphy @ 2019-12-04 11:20 UTC (permalink / raw)
  To: Andreas Färber, Wang YanQing
  Cc: Mark Rutland, linux-realtek-soc, Will Deacon, linux-kernel,
	linux-soc, linux-arm-kernel

On 2019-12-04 7:28 am, Andreas Färber wrote:
> Hi YanQing,
> 
> + LAKML + Mark + Will
> 
> Am 04.12.19 um 05:55 schrieb Wang YanQing:
>> I use "perf record" to debug performance issue on RTD1296 SOC, it does't work, but
>> the "perf stat" is ok!
> 
> Thanks for the report - which board, branch and (base) tag are you
> testing against? And are you building perf yourself from kernel sources,
> or are you using some distro package?
> 
> I only have Busybox in my initrd on DS418; I have not tested perf.
> 
>> After some dig in the kernel, I find the reason is no pmu overflow interrupt, I think
>> below pmu configuration isn't right for RTD1296:
>> "
>>          arm_pmu: arm-pmu {
>>                  compatible = "arm,cortex-a53-pmu";
>>                  interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
>>          };
>> "
>>
>> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't right too.
> 
> Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi:
> 
> &arm_pmu {
> 	interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;
> };

That doesn't help much, since 4 affinities for one SPI is rather 
nonsensical.

> 48 and high/4 match what I see in the latest BSP:
> 
> https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116
> 
>> Any suggestion is welcome.
>>
>> Thanks!
> 
> The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3".
> By my reading of arch/arm64/kernel/perf_event.c the only difference
> between the two should be the name and an extra cache_map. You could try
> the other compatible string in your .dts, but I doubt it'll help.
> 
> Hopefully the Realtek or Arm guys can shed some light.

If the SoC really has all 4 overflow interrupts combined into a single 
SPI line, then sampling just isn't going to be supported - it's 
unreasonably difficult to handle overflow when the IRQ may be taken on 
the wrong CPU.

Robin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf record doesn't work on rtd129x SoC
  2019-12-04 11:20   ` Robin Murphy
@ 2019-12-04 11:38     ` Marc Zyngier
  2019-12-04 14:51     ` Robin Murphy
  1 sibling, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2019-12-04 11:38 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andreas Färber, Wang YanQing, Mark Rutland,
	linux-realtek-soc, Will Deacon, linux-kernel, linux-soc,
	linux-arm-kernel

On 2019-12-04 11:20, Robin Murphy wrote:
> On 2019-12-04 7:28 am, Andreas Färber wrote:
>> Hi YanQing,
>> + LAKML + Mark + Will
>> Am 04.12.19 um 05:55 schrieb Wang YanQing:
>>> I use "perf record" to debug performance issue on RTD1296 SOC, it 
>>> does't work, but
>>> the "perf stat" is ok!
>> Thanks for the report - which board, branch and (base) tag are you
>> testing against? And are you building perf yourself from kernel 
>> sources,
>> or are you using some distro package?
>> I only have Busybox in my initrd on DS418; I have not tested perf.
>>
>>> After some dig in the kernel, I find the reason is no pmu overflow 
>>> interrupt, I think
>>> below pmu configuration isn't right for RTD1296:
>>> "
>>>          arm_pmu: arm-pmu {
>>>                  compatible = "arm,cortex-a53-pmu";
>>>                  interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
>>>          };
>>> "
>>>
>>> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't 
>>> right too.
>> Note that above rtd129x.dtsi snippet is not complete. See 
>> rtd1296.dtsi:
>> &arm_pmu {
>> 	interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;
>> };
>
> That doesn't help much, since 4 affinities for one SPI is rather 
> nonsensical.
>
>> 48 and high/4 match what I see in the latest BSP:
>> 
>> https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116
>>
>>> Any suggestion is welcome.
>>>
>>> Thanks!
>> The only difference I see is "arm,cortex-a53-pmu" vs. 
>> "arm,armv8-pmuv3".
>> By my reading of arch/arm64/kernel/perf_event.c the only difference
>> between the two should be the name and an extra cache_map. You could 
>> try
>> the other compatible string in your .dts, but I doubt it'll help.
>> Hopefully the Realtek or Arm guys can shed some light.
>
> If the SoC really has all 4 overflow interrupts combined into a
> single SPI line, then sampling just isn't going to be supported - 
> it's
> unreasonably difficult to handle overflow when the IRQ may be taken 
> on
> the wrong CPU.

Indeed. And I've recently found this exact design blunder on a brand 
new
Amlogic SoC, where the per-core interrupts have been OR'd together.
And not just for the PMU! It is the same situation for the GIC, CTI,
and a couple of other things. The only sane interrupts are the timers.

(sound of a PCB hitting the bin...)

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf record doesn't work on rtd129x SoC
  2019-12-04 11:20   ` Robin Murphy
  2019-12-04 11:38     ` Marc Zyngier
@ 2019-12-04 14:51     ` Robin Murphy
  2019-12-09  9:18       ` Wang YanQing
  1 sibling, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2019-12-04 14:51 UTC (permalink / raw)
  To: Andreas Färber, Wang YanQing
  Cc: Mark Rutland, linux-realtek-soc, Will Deacon, linux-kernel,
	linux-soc, linux-arm-kernel

On 04/12/2019 11:20 am, Robin Murphy wrote:
> On 2019-12-04 7:28 am, Andreas Färber wrote:
>> Hi YanQing,
>>
>> + LAKML + Mark + Will
>>
>> Am 04.12.19 um 05:55 schrieb Wang YanQing:
>>> I use "perf record" to debug performance issue on RTD1296 SOC, it 
>>> does't work, but
>>> the "perf stat" is ok!
>>
>> Thanks for the report - which board, branch and (base) tag are you
>> testing against? And are you building perf yourself from kernel sources,
>> or are you using some distro package?
>>
>> I only have Busybox in my initrd on DS418; I have not tested perf.
>>
>>> After some dig in the kernel, I find the reason is no pmu overflow 
>>> interrupt, I think
>>> below pmu configuration isn't right for RTD1296:
>>> "
>>>          arm_pmu: arm-pmu {
>>>                  compatible = "arm,cortex-a53-pmu";
>>>                  interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
>>>          };
>>> "
>>>
>>> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't 
>>> right too.
>>
>> Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi:
>>
>> &arm_pmu {
>>     interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;
>> };
> 
> That doesn't help much, since 4 affinities for one SPI is rather 
> nonsensical.
> 
>> 48 and high/4 match what I see in the latest BSP:
>>
>> https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116 
>>
>>
>>> Any suggestion is welcome.
>>>
>>> Thanks!
>>
>> The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3".
>> By my reading of arch/arm64/kernel/perf_event.c the only difference
>> between the two should be the name and an extra cache_map. You could try
>> the other compatible string in your .dts, but I doubt it'll help.
>>
>> Hopefully the Realtek or Arm guys can shed some light.
> 
> If the SoC really has all 4 overflow interrupts combined into a single 
> SPI line, then sampling just isn't going to be supported - it's 
> unreasonably difficult to handle overflow when the IRQ may be taken on 
> the wrong CPU.

On closer inspection, that BSP kernel implements a whole hrtimer-based 
bodge in arm_pmu to apparently work around not having usable interrupts, 
so yeah, this isn't going to be usable, sorry.

Robin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf record doesn't work on rtd129x SoC
  2019-12-04 14:51     ` Robin Murphy
@ 2019-12-09  9:18       ` Wang YanQing
  0 siblings, 0 replies; 6+ messages in thread
From: Wang YanQing @ 2019-12-09  9:18 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andreas Färber, Mark Rutland, linux-realtek-soc,
	Will Deacon, linux-kernel, linux-soc, linux-arm-kernel

On Wed, Dec 04, 2019 at 02:51:24PM +0000, Robin Murphy wrote:
> On 04/12/2019 11:20 am, Robin Murphy wrote:
> > On 2019-12-04 7:28 am, Andreas Färber wrote:
> >> Hi YanQing,
> >>
> >> + LAKML + Mark + Will
> >>
> >> Am 04.12.19 um 05:55 schrieb Wang YanQing:
> >>> I use "perf record" to debug performance issue on RTD1296 SOC, it 
> >>> does't work, but
> >>> the "perf stat" is ok!
> >>
> >> Thanks for the report - which board, branch and (base) tag are you
> >> testing against? And are you building perf yourself from kernel sources,
> >> or are you using some distro package?
> >>
> >> I only have Busybox in my initrd on DS418; I have not tested perf.
> >>
> >>> After some dig in the kernel, I find the reason is no pmu overflow 
> >>> interrupt, I think
> >>> below pmu configuration isn't right for RTD1296:
> >>> "
> >>>          arm_pmu: arm-pmu {
> >>>                  compatible = "arm,cortex-a53-pmu";
> >>>                  interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
> >>>          };
> >>> "
> >>>
> >>> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't 
> >>> right too.
> >>
> >> Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi:
> >>
> >> &arm_pmu {
> >>     interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;
> >> };
> > 
> > That doesn't help much, since 4 affinities for one SPI is rather 
> > nonsensical.
> > 
> >> 48 and high/4 match what I see in the latest BSP:
> >>
> >> https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116 
> >>
> >>
> >>> Any suggestion is welcome.
> >>>
> >>> Thanks!
> >>
> >> The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3".
> >> By my reading of arch/arm64/kernel/perf_event.c the only difference
> >> between the two should be the name and an extra cache_map. You could try
> >> the other compatible string in your .dts, but I doubt it'll help.
> >>
> >> Hopefully the Realtek or Arm guys can shed some light.
> > 
> > If the SoC really has all 4 overflow interrupts combined into a single 
> > SPI line, then sampling just isn't going to be supported - it's 
> > unreasonably difficult to handle overflow when the IRQ may be taken on 
> > the wrong CPU.
> 
> On closer inspection, that BSP kernel implements a whole hrtimer-based 
> bodge in arm_pmu to apparently work around not having usable interrupts, 
> so yeah, this isn't going to be usable, sorry.
> 
> Robin.

Hi all!

Thanks for all suggestions and inspection, if we make sure it is a hardware
design blunder, then it is accpetable for me, I just need to make sure it
isn't the kernel's fault, if so that's will be our fault:)

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-09  9:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-04  4:55 perf record doesn't work on rtd129x SoC Wang YanQing
2019-12-04  7:28 ` Andreas Färber
2019-12-04 11:20   ` Robin Murphy
2019-12-04 11:38     ` Marc Zyngier
2019-12-04 14:51     ` Robin Murphy
2019-12-09  9:18       ` Wang YanQing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).