linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
@ 2019-04-12 17:25 Willy Wolff
  2019-04-15 12:24 ` Anand Moon
  2019-04-19 17:53 ` Willy Wolff
  0 siblings, 2 replies; 8+ messages in thread
From: Willy Wolff @ 2019-04-12 17:25 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Kukjin Kim, Krzysztof Kozlowski,
	devicetree, linux-arm-kernel, linux-samsung-soc, linux-kernel

Add device tree entries for PMU of ARM CCI-400.

$ sudo ./perf stat -a -C 0 -e CCI_400/config=0xff,name=cycles/ sleep 1

 Performance counter stats for 'system wide':

       420,303,619      cycles

       1.019058775 seconds time elapsed

Tested on Odroid-xu3 and 4.

Signed-off-by: Willy Wolff <willy.mh.wolff.ml@gmail.com>
---
 arch/arm/boot/dts/exynos5420.dtsi | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi
index aaff15880761..be58650aca35 100644
--- a/arch/arm/boot/dts/exynos5420.dtsi
+++ b/arch/arm/boot/dts/exynos5420.dtsi
@@ -158,7 +158,7 @@
 			#address-cells = <1>;
 			#size-cells = <1>;
 			reg = <0x10d20000 0x1000>;
-			ranges = <0x0 0x10d20000 0x6000>;
+			ranges = <0x0 0x10d20000 0x10000>;
 
 			cci_control0: slave-if@4000 {
 				compatible = "arm,cci-400-ctrl-if";
@@ -170,6 +170,16 @@
 				interface-type = "ace";
 				reg = <0x5000 0x1000>;
 			};
+
+			pmu@9000 {
+				compatible = "arm,cci-400-pmu,r0";
+				reg = <0x9000 0x5000>;
+				interrupts = <0 105 4>,
+					     <0 101 4>,
+					     <0 102 4>,
+					     <0 103 4>,
+					     <0 104 4>;
+			};
 		};
 
 		clock: clock-controller@10010000 {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
  2019-04-12 17:25 [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs Willy Wolff
@ 2019-04-15 12:24 ` Anand Moon
  2019-04-16 10:19   ` Krzysztof Kozlowski
  2019-04-19 17:53 ` Willy Wolff
  1 sibling, 1 reply; 8+ messages in thread
From: Anand Moon @ 2019-04-15 12:24 UTC (permalink / raw)
  To: Willy Wolff
  Cc: Rob Herring, Mark Rutland, Kukjin Kim, Krzysztof Kozlowski,
	devicetree, linux-arm-kernel, linux-samsung-soc, Linux Kernel

Hi Willy,

On Fri, 12 Apr 2019 at 22:55, Willy Wolff <willy.mh.wolff.ml@gmail.com> wrote:
>
> Add device tree entries for PMU of ARM CCI-400.
>
> $ sudo ./perf stat -a -C 0 -e CCI_400/config=0xff,name=cycles/ sleep 1
>
>  Performance counter stats for 'system wide':
>
>        420,303,619      cycles
>
>        1.019058775 seconds time elapsed
>
> Tested on Odroid-xu3 and 4.
>
> Signed-off-by: Willy Wolff <willy.mh.wolff.ml@gmail.com>
> ---
>  arch/arm/boot/dts/exynos5420.dtsi | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi
> index aaff15880761..be58650aca35 100644
> --- a/arch/arm/boot/dts/exynos5420.dtsi
> +++ b/arch/arm/boot/dts/exynos5420.dtsi
> @@ -158,7 +158,7 @@
>                         #address-cells = <1>;
>                         #size-cells = <1>;
>                         reg = <0x10d20000 0x1000>;
> -                       ranges = <0x0 0x10d20000 0x6000>;
> +                       ranges = <0x0 0x10d20000 0x10000>;
>
>                         cci_control0: slave-if@4000 {
>                                 compatible = "arm,cci-400-ctrl-if";
> @@ -170,6 +170,16 @@
>                                 interface-type = "ace";
>                                 reg = <0x5000 0x1000>;
>                         };
> +
> +                       pmu@9000 {
> +                               compatible = "arm,cci-400-pmu,r0";
> +                               reg = <0x9000 0x5000>;

As per Exynos 5422 user manual below interrupts should be as follow.

+                               interrupts = <GIC_SPI 176
IRQ_TYPE_LEVEL_HIGH>, /* CCI_N_EVENT_CNT0_OVF */
+                                       <GIC_SPI 177
IRQ_TYPE_LEVEL_HIGH>, /* CCI_N_EVENT_CNT1_OVF */
+                                       <GIC_SPI 178
IRQ_TYPE_LEVEL_HIGH>, /* CCI_N_EVENT_CNT2_OVF */
+                                       <GIC_SPI 179
IRQ_TYPE_LEVEL_HIGH>, /* CCI_N_EVENT_CNT3_OVF */
+                                       <GIC_SPI 180
IRQ_TYPE_LEVEL_HIGH>, /* CCI_N_EVENT_CNT4_OVF */
+                                       <GIC_SPI 181
IRQ_TYPE_LEVEL_HIGH>; /* CCI_NERR */

> +                               interrupts = <0 105 4>,
> +                                            <0 101 4>,
> +                                            <0 102 4>,
> +                                            <0 103 4>,
> +                                            <0 104 4>;
> +                       };
>                 };
>
>                 clock: clock-controller@10010000 {
> --
> 2.11.0
>

But I am observing follow kernel warning after I enable
CONFIG_ARM_CCI_PMU + exynos_defconfig.

[    4.557701]  mmcblk0: p1
[    4.561036] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:908
[    4.568075] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
[    4.574656] 1 lock held by swapper/0/1:
[    4.578397]  #0: (ptrval) (&dev->mutex){....}, at:
device_driver_attach+0x18/0x60
[    4.585900] Preemption disabled at:
[    4.585909] [<c077eca8>] cci_pmu_probe+0x1cc/0x4a0
[    4.594122] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc5-dirty #11
[    4.600853] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[    4.606928] [<c011255c>] (unwind_backtrace) from [<c010de98>]
(show_stack+0x10/0x14)
[    4.614642] [<c010de98>] (show_stack) from [<c0a6ee0c>]
(dump_stack+0x98/0xc4)
[    4.621832] [<c0a6ee0c>] (dump_stack) from [<c0156e68>]
(___might_sleep+0x20c/0x2c0)
[    4.629545] [<c0156e68>] (___might_sleep) from [<c0a8a470>]
(__mutex_lock+0x3c/0xa34)
[    4.637343] [<c0a8a470>] (__mutex_lock) from [<c0a8ae84>]
(mutex_lock_nested+0x1c/0x24)
[    4.645318] [<c0a8ae84>] (mutex_lock_nested) from [<c021b498>]
(perf_pmu_register+0x20/0x40c)
[    4.653812] [<c021b498>] (perf_pmu_register) from [<c077edd0>]
(cci_pmu_probe+0x2f4/0x4a0)
[    4.662044] [<c077edd0>] (cci_pmu_probe) from [<c0591880>]
(platform_drv_probe+0x48/0x98)
[    4.670186] [<c0591880>] (platform_drv_probe) from [<c058f1b8>]
(really_probe+0x24c/0x410)
[    4.678418] [<c058f1b8>] (really_probe) from [<c058f530>]
(driver_probe_device+0x78/0x1c0)
[    4.686651] [<c058f530>] (driver_probe_device) from [<c058f8d4>]
(device_driver_attach+0x58/0x60)
[    4.695492] [<c058f8d4>] (device_driver_attach) from [<c058f994>]
(__driver_attach+0xb8/0x158)
[    4.704070] [<c058f994>] (__driver_attach) from [<c058d1f4>]
(bus_for_each_dev+0x74/0xb4)
[    4.712213] [<c058d1f4>] (bus_for_each_dev) from [<c058e398>]
(bus_add_driver+0x1c0/0x200)
[    4.720446] [<c058e398>] (bus_add_driver) from [<c0590850>]
(driver_register+0x74/0x108)
[    4.728507] [<c0590850>] (driver_register) from [<c010315c>]
(do_one_initcall+0x90/0x434)
[    4.736655] [<c010315c>] (do_one_initcall) from [<c0f0131c>]
(kernel_init_freeable+0x448/0x4ec)
[    4.745319] [<c0f0131c>] (kernel_init_freeable) from [<c0a87c44>]
(kernel_init+0x8/0x110)
[    4.753460] [<c0a87c44>] (kernel_init) from [<c01010b4>]
(ret_from_fork+0x14/0x20)
[    4.760995] Exception stack(0xe88e1fb0 to 0xe88e1ff8)
[    4.766011] 1fa0:                                     00000000
00000000 00000000 00000000
[    4.774170] 1fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[    4.782315] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    4.789337] ARM CCI_400 PMU driver probed
[    4.797261] NET: Registered protocol family 10

Hi Krzysztof,

Cache Coherent Interface (CCI) among Cortex-A15 and Cortex-A7, G2D, G3D and SSS

Level 0 > CPU blocks such as Cortex-A15 (CA15), Cortex-A7 (CA7) are
joined as the member of Level 0 CCI bus

Level 1 > Display engine block (DISP) and 2D graphic engines (G2D) are
directly connected to Level 1.
  DISP, MDMA, SSS.

Level 2 > While all the other IP is connected to Level 1 bus via Level 2 bus
   G3D, MSCL, MFC, ISP, JPEG/Rotator/DMA/PERI, NAND/SD/EMMC.

So my question is the mapped with the cci ip block correct.
Level 0 (cci_control0)
Level 1 (cci_control1)
Level 2 (cci_control1)

Best Regards

-Anand

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
  2019-04-15 12:24 ` Anand Moon
@ 2019-04-16 10:19   ` Krzysztof Kozlowski
  2019-04-17  4:26     ` Anand Moon
  0 siblings, 1 reply; 8+ messages in thread
From: Krzysztof Kozlowski @ 2019-04-16 10:19 UTC (permalink / raw)
  To: Anand Moon
  Cc: Willy Wolff, Rob Herring, Mark Rutland, Kukjin Kim, devicetree,
	linux-arm-kernel, linux-samsung-soc, Linux Kernel

On Mon, 15 Apr 2019 at 14:24, Anand Moon <linux.amoon@gmail.com> wrote:
> Cache Coherent Interface (CCI) among Cortex-A15 and Cortex-A7, G2D, G3D and SSS
>
> Level 0 > CPU blocks such as Cortex-A15 (CA15), Cortex-A7 (CA7) are
> joined as the member of Level 0 CCI bus
>
> Level 1 > Display engine block (DISP) and 2D graphic engines (G2D) are
> directly connected to Level 1.
>   DISP, MDMA, SSS.
>
> Level 2 > While all the other IP is connected to Level 1 bus via Level 2 bus
>    G3D, MSCL, MFC, ISP, JPEG/Rotator/DMA/PERI, NAND/SD/EMMC.
>
> So my question is the mapped with the cci ip block correct.
> Level 0 (cci_control0)
> Level 1 (cci_control1)
> Level 2 (cci_control1)

Hi Anand,

I do not understand the question - what is mapped with correctly or not?

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
  2019-04-16 10:19   ` Krzysztof Kozlowski
@ 2019-04-17  4:26     ` Anand Moon
  2019-04-17  7:44       ` Krzysztof Kozlowski
  0 siblings, 1 reply; 8+ messages in thread
From: Anand Moon @ 2019-04-17  4:26 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Willy Wolff, Rob Herring, Mark Rutland, Kukjin Kim, devicetree,
	linux-arm-kernel, linux-samsung-soc, Linux Kernel

Hi Krzysztof,

On Tue, 16 Apr 2019 at 15:49, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>
> On Mon, 15 Apr 2019 at 14:24, Anand Moon <linux.amoon@gmail.com> wrote:
> > Cache Coherent Interface (CCI) among Cortex-A15 and Cortex-A7, G2D, G3D and SSS
> >
> > Level 0 > CPU blocks such as Cortex-A15 (CA15), Cortex-A7 (CA7) are
> > joined as the member of Level 0 CCI bus
> >
> > Level 1 > Display engine block (DISP) and 2D graphic engines (G2D) are
> > directly connected to Level 1.
> >   DISP, MDMA, SSS.
> >
> > Level 2 > While all the other IP is connected to Level 1 bus via Level 2 bus
> >    G3D, MSCL, MFC, ISP, JPEG/Rotator/DMA/PERI, NAND/SD/EMMC.
> >
> > So my question is the mapped with the cci ip block correct.
> > Level 0 (cci_control0)
> > Level 1 (cci_control1)
> > Level 2 (cci_control1)
>
> Hi Anand,
>
> I do not understand the question - what is mapped with correctly or not?
>
> Best regards,
> Krzysztof

Following the https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/cci.txt
CCI node linked to CPU and DMA nodes for example.

On this line below diagram from Exynos 5422 UM show various IP block
linked to CCI level.
Below image just elaborate overall architecture of Exynos 5422 CCI.

[0] https://imgur.com/gallery/0xJSwGQ

So we should map the various IP block to corresponding CCI level.

Best Regards
-Anand

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
  2019-04-17  4:26     ` Anand Moon
@ 2019-04-17  7:44       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 8+ messages in thread
From: Krzysztof Kozlowski @ 2019-04-17  7:44 UTC (permalink / raw)
  To: Anand Moon
  Cc: Willy Wolff, Rob Herring, Mark Rutland, Kukjin Kim, devicetree,
	linux-arm-kernel, linux-samsung-soc, Linux Kernel

On Wed, 17 Apr 2019 at 06:26, Anand Moon <linux.amoon@gmail.com> wrote:
>
> Hi Krzysztof,
>
> On Tue, 16 Apr 2019 at 15:49, Krzysztof Kozlowski <krzk@kernel.org> wrote:
> >
> > On Mon, 15 Apr 2019 at 14:24, Anand Moon <linux.amoon@gmail.com> wrote:
> > > Cache Coherent Interface (CCI) among Cortex-A15 and Cortex-A7, G2D, G3D and SSS
> > >
> > > Level 0 > CPU blocks such as Cortex-A15 (CA15), Cortex-A7 (CA7) are
> > > joined as the member of Level 0 CCI bus
> > >
> > > Level 1 > Display engine block (DISP) and 2D graphic engines (G2D) are
> > > directly connected to Level 1.
> > >   DISP, MDMA, SSS.
> > >
> > > Level 2 > While all the other IP is connected to Level 1 bus via Level 2 bus
> > >    G3D, MSCL, MFC, ISP, JPEG/Rotator/DMA/PERI, NAND/SD/EMMC.
> > >
> > > So my question is the mapped with the cci ip block correct.
> > > Level 0 (cci_control0)
> > > Level 1 (cci_control1)
> > > Level 2 (cci_control1)
> >
> > Hi Anand,
> >
> > I do not understand the question - what is mapped with correctly or not?
> >
> > Best regards,
> > Krzysztof
>
> Following the https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/cci.txt
> CCI node linked to CPU and DMA nodes for example.
>
> On this line below diagram from Exynos 5422 UM show various IP block
> linked to CCI level.
> Below image just elaborate overall architecture of Exynos 5422 CCI.
>
> [0] https://imgur.com/gallery/0xJSwGQ
>
> So we should map the various IP block to corresponding CCI level.

Willy's patch did not touch cci_control{0,1} nor any other CCI levels
so I do not get what are you commenting. As for other CCI ports - we
do not define them and I do not see any users of device CCI API
(cci_enable_port_by_device() and cci_disable_port_by_device()). But
feel free to propose patches changing this. In general - it is easier
to discuss if you show the code/patch, not talk about some theoretical
change.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
  2019-04-12 17:25 [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs Willy Wolff
  2019-04-15 12:24 ` Anand Moon
@ 2019-04-19 17:53 ` Willy Wolff
  2019-04-19 21:18   ` Robin Murphy
  1 sibling, 1 reply; 8+ messages in thread
From: Willy Wolff @ 2019-04-19 17:53 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Kukjin Kim, Krzysztof Kozlowski,
	devicetree, linux-arm-kernel, linux-samsung-soc,
	Linux Kernel Mailing List

Hi,

This patch can be dropped, as it needs more work.

In fact, the interrupts seems to be wrong. The interrupts suggested by
Anand Moon gave the same following results.

export CCI_DEV=CCI_400
export OMP_NUM_THREADS=2
sudo --preserve-env ./perf stat -a \
  -e armv7_cortex_a7/config=0x11,name=a7_cycles/ \
  -e armv7_cortex_a15/config=0x11,name=a15_cycles/ \
  -e armv7_cortex_a7/config=0x19,name=a7_bus/ \
  -e armv7_cortex_a15/config=0x19,name=a15_bus/ \
  -e ${CCI_DEV}/config=0xff,name=cci400_cycles/ \
  -e ${CCI_DEV}/config=0x0,name=cci400_si_rrq_hs_any/ \
  -e ${CCI_DEV}/config=0xc,name=cci400_si_wrq_hs_any/ \
  taskset -c 0,7 /home/user/cg.x.A 1

[..]

 Performance counter stats for 'system wide':

     9,362,850,550      a7_cycles
     1,682,125,760      a15_cycles
        68,920,347      a7_bus
        61,484,352      a15_bus
     3,789,936,935      cci400_cycles
                 0      cci400_si_rrq_hs_any
                 0      cci400_si_wrq_hs_any

       9.541340558 seconds time elapsed

cg.x.A comes from NAS benchmark suite, compiled with fopenmp support, setup
to run 2 threads and taskmapped to ran on both a7 and a15 clusters.
a7_bus and a15_bus report main memory accesses.

Only cci400_cycles seems to be correct. However, all pmcs from the master
interface are reported as unsupported and all pmcs from the slave interface
return 0, which is probably not correct.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0470f/CJHICFBF.html

Would it be possible that someone from Samsung provide the right
interrupts values?
Many thanks.

Regards,
Willy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
  2019-04-19 17:53 ` Willy Wolff
@ 2019-04-19 21:18   ` Robin Murphy
  2019-04-20  6:50     ` Willy Wolff
  0 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2019-04-19 21:18 UTC (permalink / raw)
  To: Willy Wolff, Rob Herring, Mark Rutland, Kukjin Kim,
	Krzysztof Kozlowski, devicetree, linux-arm-kernel,
	linux-samsung-soc, Linux Kernel Mailing List

On 2019-04-19 6:53 pm, Willy Wolff wrote:
> Hi,
> 
> This patch can be dropped, as it needs more work.
> 
> In fact, the interrupts seems to be wrong. The interrupts suggested by
> Anand Moon gave the same following results.
> 
> export CCI_DEV=CCI_400
> export OMP_NUM_THREADS=2
> sudo --preserve-env ./perf stat -a \
>    -e armv7_cortex_a7/config=0x11,name=a7_cycles/ \
>    -e armv7_cortex_a15/config=0x11,name=a15_cycles/ \
>    -e armv7_cortex_a7/config=0x19,name=a7_bus/ \
>    -e armv7_cortex_a15/config=0x19,name=a15_bus/ \
>    -e ${CCI_DEV}/config=0xff,name=cci400_cycles/ \
>    -e ${CCI_DEV}/config=0x0,name=cci400_si_rrq_hs_any/ \
>    -e ${CCI_DEV}/config=0xc,name=cci400_si_wrq_hs_any/ \

 From the look of those configs, you'll be counting events on slave 
interface 0, which may not even have anything connected anyway. The CPU 
clusters on a CCI-400 will be on slave interfaces 3 and 4, so try 
something like '-e CCI_400/cci400_si_rrq_hs_any,source=4/'.

The interrupts only matter for counter overflow, so confirming those 
could be done by picking a sufficiently frequent event, counting for 
long enough to capture slightly more than 2^32 of those, then seeing 
whether the overflow accumulates correctly or the count appears to go 
backwards (and/or checking what fired in /proc/interrupts). I believe 
the cycle counter is also 32-bit on CCI, so that should be relatively 
easy; for the other counters beyond the first one it should be feasible 
to schedule additional dummy events before the event of interest in 
order to trick pmu_get_event_idx() into allocating the desired counter 
for it.

Robin.

>    taskset -c 0,7 /home/user/cg.x.A 1
> 
> [..]
> 
>   Performance counter stats for 'system wide':
> 
>       9,362,850,550      a7_cycles
>       1,682,125,760      a15_cycles
>          68,920,347      a7_bus
>          61,484,352      a15_bus
>       3,789,936,935      cci400_cycles
>                   0      cci400_si_rrq_hs_any
>                   0      cci400_si_wrq_hs_any
> 
>         9.541340558 seconds time elapsed
> 
> cg.x.A comes from NAS benchmark suite, compiled with fopenmp support, setup
> to run 2 threads and taskmapped to ran on both a7 and a15 clusters.
> a7_bus and a15_bus report main memory accesses.
> 
> Only cci400_cycles seems to be correct. However, all pmcs from the master
> interface are reported as unsupported and all pmcs from the slave interface
> return 0, which is probably not correct.
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0470f/CJHICFBF.html
> 
> Would it be possible that someone from Samsung provide the right
> interrupts values?
> Many thanks.
> 
> Regards,
> Willy
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs
  2019-04-19 21:18   ` Robin Murphy
@ 2019-04-20  6:50     ` Willy Wolff
  0 siblings, 0 replies; 8+ messages in thread
From: Willy Wolff @ 2019-04-20  6:50 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Rob Herring, Mark Rutland, Kukjin Kim, Krzysztof Kozlowski,
	devicetree, linux-arm-kernel, linux-samsung-soc,
	Linux Kernel Mailing List

Indeed, many thanks Robin.

Using this, values sound better.
export OMP_NUM_THREADS=2
sudo --preserve-env ./perf stat -a \
  -e armv7_cortex_a7/config=0x11,name=a7_cycles/ \
  -e armv7_cortex_a15/config=0x11,name=a15_cycles/ \
  -e armv7_cortex_a7/config=0x19,name=a7_bus/ \
  -e armv7_cortex_a15/config=0x19,name=a15_bus/ \
  -e CCI_400/config=0xff,name=cci400_cycles/ \
  -e CCI_400/config=0x0,source=4,name=cci400_si_rrq_hs_any_s4/ \
  -e CCI_400/config=0xc,source=4,name=cci400_si_wrq_hs_any_s4/ \
  -e CCI_400/config=0x0,source=3,name=cci400_si_rrq_hs_any_s3/ \
  -e CCI_400/config=0xc,source=3,name=cci400_si_wrq_hs_any_s3/ \
  taskset -c 0,4 /home/user/EnergyManager/temperature_bench_install/bin/cg.x.A 1

[..]
 Performance counter stats for 'system wide':

     6,201,513,834      a7_cycles
     2,781,009,706      a15_cycles
        64,200,721      a7_bus
        60,237,019      a15_bus
     1,158,303,323      cci400_cycles
        11,390,649      cci400_si_rrq_hs_any_s4
         1,253,383      cci400_si_wrq_hs_any_s4
        13,379,256      cci400_si_rrq_hs_any_s3
        13,200,717      cci400_si_wrq_hs_any_s3

       3.685087167 seconds time elapsed


Do you think that I should write a v2 with a better cover letter that shows
how to access this? By the way, I see that you contribute to that driver. I
haven't seen anything about this source=, do you think that it should be
documented somewhere?

Also, 0 and 4 on the interrupts represent GIC_SPI and IRQ_TYPE_LEVEL_HIGH,
I will do a v2 for this. Don't you think that the doc at
Documentation/devicetree/bindings/arm/cci.txt should use this too?

Willy

On Fri, Apr 19, 2019 at 10:18:02PM +0100, Robin Murphy wrote:
> On 2019-04-19 6:53 pm, Willy Wolff wrote:
> > Hi,
> > 
> > This patch can be dropped, as it needs more work.
> > 
> > In fact, the interrupts seems to be wrong. The interrupts suggested by
> > Anand Moon gave the same following results.
> > 
> > export CCI_DEV=CCI_400
> > export OMP_NUM_THREADS=2
> > sudo --preserve-env ./perf stat -a \
> >    -e armv7_cortex_a7/config=0x11,name=a7_cycles/ \
> >    -e armv7_cortex_a15/config=0x11,name=a15_cycles/ \
> >    -e armv7_cortex_a7/config=0x19,name=a7_bus/ \
> >    -e armv7_cortex_a15/config=0x19,name=a15_bus/ \
> >    -e ${CCI_DEV}/config=0xff,name=cci400_cycles/ \
> >    -e ${CCI_DEV}/config=0x0,name=cci400_si_rrq_hs_any/ \
> >    -e ${CCI_DEV}/config=0xc,name=cci400_si_wrq_hs_any/ \
> 
> From the look of those configs, you'll be counting events on slave interface
> 0, which may not even have anything connected anyway. The CPU clusters on a
> CCI-400 will be on slave interfaces 3 and 4, so try something like '-e
> CCI_400/cci400_si_rrq_hs_any,source=4/'.
> 
> The interrupts only matter for counter overflow, so confirming those could
> be done by picking a sufficiently frequent event, counting for long enough
> to capture slightly more than 2^32 of those, then seeing whether the
> overflow accumulates correctly or the count appears to go backwards (and/or
> checking what fired in /proc/interrupts). I believe the cycle counter is
> also 32-bit on CCI, so that should be relatively easy; for the other
> counters beyond the first one it should be feasible to schedule additional
> dummy events before the event of interest in order to trick
> pmu_get_event_idx() into allocating the desired counter for it.
> 
> Robin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-20  6:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-12 17:25 [PATCH] ARM: dts: exynos: add CCI-400 PMU nodes support to Exynos542x SoCs Willy Wolff
2019-04-15 12:24 ` Anand Moon
2019-04-16 10:19   ` Krzysztof Kozlowski
2019-04-17  4:26     ` Anand Moon
2019-04-17  7:44       ` Krzysztof Kozlowski
2019-04-19 17:53 ` Willy Wolff
2019-04-19 21:18   ` Robin Murphy
2019-04-20  6:50     ` Willy Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).