* Re: perf record doesn't work on rtd129x SoC [not found] <20191204045559.GA10458@udknight> @ 2019-12-04 7:28 ` Andreas Färber 2019-12-04 11:20 ` Robin Murphy 0 siblings, 1 reply; 5+ messages in thread From: Andreas Färber @ 2019-12-04 7:28 UTC (permalink / raw) To: Wang YanQing Cc: Mark Rutland, linux-realtek-soc, Will Deacon, linux-kernel, linux-soc, linux-arm-kernel Hi YanQing, + LAKML + Mark + Will Am 04.12.19 um 05:55 schrieb Wang YanQing: > I use "perf record" to debug performance issue on RTD1296 SOC, it does't work, but > the "perf stat" is ok! Thanks for the report - which board, branch and (base) tag are you testing against? And are you building perf yourself from kernel sources, or are you using some distro package? I only have Busybox in my initrd on DS418; I have not tested perf. > After some dig in the kernel, I find the reason is no pmu overflow interrupt, I think > below pmu configuration isn't right for RTD1296: > " > arm_pmu: arm-pmu { > compatible = "arm,cortex-a53-pmu"; > interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>; > }; > " > > We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't right too. Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi: &arm_pmu { interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>; }; 48 and high/4 match what I see in the latest BSP: https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116 > Any suggestion is welcome. > > Thanks! The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3". By my reading of arch/arm64/kernel/perf_event.c the only difference between the two should be the name and an extra cache_map. You could try the other compatible string in your .dts, but I doubt it'll help. Hopefully the Realtek or Arm guys can shed some light. Regards, Andreas -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer HRB 36809 (AG Nürnberg) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf record doesn't work on rtd129x SoC 2019-12-04 7:28 ` perf record doesn't work on rtd129x SoC Andreas Färber @ 2019-12-04 11:20 ` Robin Murphy 2019-12-04 11:38 ` Marc Zyngier 2019-12-04 14:51 ` Robin Murphy 0 siblings, 2 replies; 5+ messages in thread From: Robin Murphy @ 2019-12-04 11:20 UTC (permalink / raw) To: Andreas Färber, Wang YanQing Cc: Mark Rutland, linux-realtek-soc, Will Deacon, linux-kernel, linux-soc, linux-arm-kernel On 2019-12-04 7:28 am, Andreas Färber wrote: > Hi YanQing, > > + LAKML + Mark + Will > > Am 04.12.19 um 05:55 schrieb Wang YanQing: >> I use "perf record" to debug performance issue on RTD1296 SOC, it does't work, but >> the "perf stat" is ok! > > Thanks for the report - which board, branch and (base) tag are you > testing against? And are you building perf yourself from kernel sources, > or are you using some distro package? > > I only have Busybox in my initrd on DS418; I have not tested perf. > >> After some dig in the kernel, I find the reason is no pmu overflow interrupt, I think >> below pmu configuration isn't right for RTD1296: >> " >> arm_pmu: arm-pmu { >> compatible = "arm,cortex-a53-pmu"; >> interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>; >> }; >> " >> >> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't right too. > > Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi: > > &arm_pmu { > interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>; > }; That doesn't help much, since 4 affinities for one SPI is rather nonsensical. > 48 and high/4 match what I see in the latest BSP: > > https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116 > >> Any suggestion is welcome. >> >> Thanks! > > The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3". > By my reading of arch/arm64/kernel/perf_event.c the only difference > between the two should be the name and an extra cache_map. You could try > the other compatible string in your .dts, but I doubt it'll help. > > Hopefully the Realtek or Arm guys can shed some light. If the SoC really has all 4 overflow interrupts combined into a single SPI line, then sampling just isn't going to be supported - it's unreasonably difficult to handle overflow when the IRQ may be taken on the wrong CPU. Robin. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf record doesn't work on rtd129x SoC 2019-12-04 11:20 ` Robin Murphy @ 2019-12-04 11:38 ` Marc Zyngier 2019-12-04 14:51 ` Robin Murphy 1 sibling, 0 replies; 5+ messages in thread From: Marc Zyngier @ 2019-12-04 11:38 UTC (permalink / raw) To: Robin Murphy Cc: Mark Rutland, linux-realtek-soc, Will Deacon, linux-kernel, Wang YanQing, linux-soc, Andreas Färber, linux-arm-kernel On 2019-12-04 11:20, Robin Murphy wrote: > On 2019-12-04 7:28 am, Andreas Färber wrote: >> Hi YanQing, >> + LAKML + Mark + Will >> Am 04.12.19 um 05:55 schrieb Wang YanQing: >>> I use "perf record" to debug performance issue on RTD1296 SOC, it >>> does't work, but >>> the "perf stat" is ok! >> Thanks for the report - which board, branch and (base) tag are you >> testing against? And are you building perf yourself from kernel >> sources, >> or are you using some distro package? >> I only have Busybox in my initrd on DS418; I have not tested perf. >> >>> After some dig in the kernel, I find the reason is no pmu overflow >>> interrupt, I think >>> below pmu configuration isn't right for RTD1296: >>> " >>> arm_pmu: arm-pmu { >>> compatible = "arm,cortex-a53-pmu"; >>> interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>; >>> }; >>> " >>> >>> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't >>> right too. >> Note that above rtd129x.dtsi snippet is not complete. See >> rtd1296.dtsi: >> &arm_pmu { >> interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>; >> }; > > That doesn't help much, since 4 affinities for one SPI is rather > nonsensical. > >> 48 and high/4 match what I see in the latest BSP: >> >> https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116 >> >>> Any suggestion is welcome. >>> >>> Thanks! >> The only difference I see is "arm,cortex-a53-pmu" vs. >> "arm,armv8-pmuv3". >> By my reading of arch/arm64/kernel/perf_event.c the only difference >> between the two should be the name and an extra cache_map. You could >> try >> the other compatible string in your .dts, but I doubt it'll help. >> Hopefully the Realtek or Arm guys can shed some light. > > If the SoC really has all 4 overflow interrupts combined into a > single SPI line, then sampling just isn't going to be supported - > it's > unreasonably difficult to handle overflow when the IRQ may be taken > on > the wrong CPU. Indeed. And I've recently found this exact design blunder on a brand new Amlogic SoC, where the per-core interrupts have been OR'd together. And not just for the PMU! It is the same situation for the GIC, CTI, and a couple of other things. The only sane interrupts are the timers. (sound of a PCB hitting the bin...) M. -- Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf record doesn't work on rtd129x SoC 2019-12-04 11:20 ` Robin Murphy 2019-12-04 11:38 ` Marc Zyngier @ 2019-12-04 14:51 ` Robin Murphy 2019-12-09 9:18 ` Wang YanQing 1 sibling, 1 reply; 5+ messages in thread From: Robin Murphy @ 2019-12-04 14:51 UTC (permalink / raw) To: Andreas Färber, Wang YanQing Cc: Mark Rutland, linux-realtek-soc, Will Deacon, linux-kernel, linux-soc, linux-arm-kernel On 04/12/2019 11:20 am, Robin Murphy wrote: > On 2019-12-04 7:28 am, Andreas Färber wrote: >> Hi YanQing, >> >> + LAKML + Mark + Will >> >> Am 04.12.19 um 05:55 schrieb Wang YanQing: >>> I use "perf record" to debug performance issue on RTD1296 SOC, it >>> does't work, but >>> the "perf stat" is ok! >> >> Thanks for the report - which board, branch and (base) tag are you >> testing against? And are you building perf yourself from kernel sources, >> or are you using some distro package? >> >> I only have Busybox in my initrd on DS418; I have not tested perf. >> >>> After some dig in the kernel, I find the reason is no pmu overflow >>> interrupt, I think >>> below pmu configuration isn't right for RTD1296: >>> " >>> arm_pmu: arm-pmu { >>> compatible = "arm,cortex-a53-pmu"; >>> interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>; >>> }; >>> " >>> >>> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't >>> right too. >> >> Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi: >> >> &arm_pmu { >> interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>; >> }; > > That doesn't help much, since 4 affinities for one SPI is rather > nonsensical. > >> 48 and high/4 match what I see in the latest BSP: >> >> https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116 >> >> >>> Any suggestion is welcome. >>> >>> Thanks! >> >> The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3". >> By my reading of arch/arm64/kernel/perf_event.c the only difference >> between the two should be the name and an extra cache_map. You could try >> the other compatible string in your .dts, but I doubt it'll help. >> >> Hopefully the Realtek or Arm guys can shed some light. > > If the SoC really has all 4 overflow interrupts combined into a single > SPI line, then sampling just isn't going to be supported - it's > unreasonably difficult to handle overflow when the IRQ may be taken on > the wrong CPU. On closer inspection, that BSP kernel implements a whole hrtimer-based bodge in arm_pmu to apparently work around not having usable interrupts, so yeah, this isn't going to be usable, sorry. Robin. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf record doesn't work on rtd129x SoC 2019-12-04 14:51 ` Robin Murphy @ 2019-12-09 9:18 ` Wang YanQing 0 siblings, 0 replies; 5+ messages in thread From: Wang YanQing @ 2019-12-09 9:18 UTC (permalink / raw) To: Robin Murphy Cc: Mark Rutland, linux-realtek-soc, Will Deacon, linux-kernel, linux-soc, Andreas Färber, linux-arm-kernel On Wed, Dec 04, 2019 at 02:51:24PM +0000, Robin Murphy wrote: > On 04/12/2019 11:20 am, Robin Murphy wrote: > > On 2019-12-04 7:28 am, Andreas Färber wrote: > >> Hi YanQing, > >> > >> + LAKML + Mark + Will > >> > >> Am 04.12.19 um 05:55 schrieb Wang YanQing: > >>> I use "perf record" to debug performance issue on RTD1296 SOC, it > >>> does't work, but > >>> the "perf stat" is ok! > >> > >> Thanks for the report - which board, branch and (base) tag are you > >> testing against? And are you building perf yourself from kernel sources, > >> or are you using some distro package? > >> > >> I only have Busybox in my initrd on DS418; I have not tested perf. > >> > >>> After some dig in the kernel, I find the reason is no pmu overflow > >>> interrupt, I think > >>> below pmu configuration isn't right for RTD1296: > >>> " > >>> arm_pmu: arm-pmu { > >>> compatible = "arm,cortex-a53-pmu"; > >>> interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>; > >>> }; > >>> " > >>> > >>> We need 4 PMU SPI for RTD1296 (4 cores), and I guess the 48 isn't > >>> right too. > >> > >> Note that above rtd129x.dtsi snippet is not complete. See rtd1296.dtsi: > >> > >> &arm_pmu { > >> interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>; > >> }; > > > > That doesn't help much, since 4 affinities for one SPI is rather > > nonsensical. > > > >> 48 and high/4 match what I see in the latest BSP: > >> > >> https://github.com/BPI-SINOVOIP/BPI-M4-bsp/blob/master/linux-rtk/arch/arm64/boot/dts/realtek/rtd129x/rtd-1296.dtsi#L116 > >> > >> > >>> Any suggestion is welcome. > >>> > >>> Thanks! > >> > >> The only difference I see is "arm,cortex-a53-pmu" vs. "arm,armv8-pmuv3". > >> By my reading of arch/arm64/kernel/perf_event.c the only difference > >> between the two should be the name and an extra cache_map. You could try > >> the other compatible string in your .dts, but I doubt it'll help. > >> > >> Hopefully the Realtek or Arm guys can shed some light. > > > > If the SoC really has all 4 overflow interrupts combined into a single > > SPI line, then sampling just isn't going to be supported - it's > > unreasonably difficult to handle overflow when the IRQ may be taken on > > the wrong CPU. > > On closer inspection, that BSP kernel implements a whole hrtimer-based > bodge in arm_pmu to apparently work around not having usable interrupts, > so yeah, this isn't going to be usable, sorry. > > Robin. Hi all! Thanks for all suggestions and inspection, if we make sure it is a hardware design blunder, then it is accpetable for me, I just need to make sure it isn't the kernel's fault, if so that's will be our fault:) Thanks. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-12-09 9:31 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20191204045559.GA10458@udknight> 2019-12-04 7:28 ` perf record doesn't work on rtd129x SoC Andreas Färber 2019-12-04 11:20 ` Robin Murphy 2019-12-04 11:38 ` Marc Zyngier 2019-12-04 14:51 ` Robin Murphy 2019-12-09 9:18 ` Wang YanQing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).