[PATCH 0/4] RK3588 and Rock 5B dts additions: thermal, OPP, rfkill and fan

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] RK3588 and Rock 5B dts additions: thermal, OPP, rfkill and fan
@ 2024-01-24 20:30 Alexey Charkov
  2024-01-24 20:30 ` [PATCH 1/4] arm64: dts: rockchip: add rfkill node for M.2 Key E WiFi on rock-5b Alexey Charkov
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-24 20:30 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
  Cc: Daniel Lezcano, Dragan Simic, devicetree, linux-arm-kernel,
	linux-rockchip, linux-kernel, Alexey Charkov

This is an assortment of device tree additions for RK3588(s) and their
enablement on Radxa Rock5B.

Thermal zone information and cooling maps is the follow-up to feedback
received on v2 patch version [1] - thanks a lot to Dragan, Heiko and
Daniel for review and comments. The patch adds passive cooling maps
for the three separate CPU clusters found in RK3588, as well as
critical trips for all TSADC channels. Enablement on Rock 5B was
split out to a separate patch, where I also add active cooling via
the PWM fan optionally available there. Other changes from v2:
 - Zero hysteresis for critical trips, per Daniel's guidance
 - Zero 'polling-delay' for zones having no active cooling maps,
   given that the driver supports interrupts for crossing trip
   temperature and thus doesn't need polling there, per Daniel's
   guidance
 - Zero 'polling-delay-passive' for zones having no passive cooling,
   per Daniel's guidance
 - Extra passive cooling alerts before the control temperature
   to enable power allocation governor's PID parameters
   initialization, per Daniel's guidance
 - Renamed the thermal zone tracking the middle of the chip
   to package_thermal, per Dragan's guidance

OPPs help actually scale CPU frequencies up and down for both cooling
and performance - tested on Rock 5B under varied loads.

RFKILL patch is the same one sent earlier just before the merge window
opened [2]. It didn't get any feedback back then, so I'm just
resending it here for ease of reference. Tested on Rock 5B with an
Intel AX210 card.

[1] https://lore.kernel.org/linux-rockchip/20240109192608.5981-1-alchark@gmail.com/
[2] https://lore.kernel.org/linux-rockchip/20240106202650.22310-1-alchark@gmail.com/

Signed-off-by: Alexey Charkov <alchark@gmail.com>
---
Alexey Charkov (4):
      arm64: dts: rockchip: add rfkill node for M.2 Key E WiFi on rock-5b
      arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
      arm64: dts: rockchip: enable temperature driven fan control on Rock 5B
      arm64: dts: rockchip: Add OPP data for CPU cores on RK3588

 arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts |  32 +-
 arch/arm64/boot/dts/rockchip/rk3588s.dtsi       | 374 ++++++++++++++++++++++++
 2 files changed, 405 insertions(+), 1 deletion(-)
---
base-commit: 615d300648869c774bd1fe54b4627bb0c20faed4
change-id: 20240124-rk-dts-additions-a6d7b52787b9

Best regards,
-- 
Alexey Charkov <alchark@gmail.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/4] arm64: dts: rockchip: add rfkill node for M.2 Key E WiFi on rock-5b
  2024-01-24 20:30 [PATCH 0/4] RK3588 and Rock 5B dts additions: thermal, OPP, rfkill and fan Alexey Charkov
@ 2024-01-24 20:30 ` Alexey Charkov
  2024-01-24 20:30 ` [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588 Alexey Charkov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-24 20:30 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
  Cc: Daniel Lezcano, Dragan Simic, devicetree, linux-arm-kernel,
	linux-rockchip, linux-kernel, Alexey Charkov

By default the GPIO pin that connects to the WiFi enable signal
inside the M.2 Key E slot is driven low, resulting in impossibility
to connect to any network. Add a DT node to expose it as an RFKILL
device, which lets the WiFi driver or userspace toggle it as
required.

Signed-off-by: Alexey Charkov <alchark@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
index a0e303c3a1dc..9b7bf6cec8bd 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
@@ -58,6 +58,13 @@ fan: pwm-fan {
 		#cooling-cells = <2>;
 	};
 
+	rfkill {
+		compatible = "rfkill-gpio";
+		label = "rfkill-pcie-wlan";
+		radio-type = "wlan";
+		shutdown-gpios = <&gpio4 RK_PA2 GPIO_ACTIVE_HIGH>;
+	};
+
 	vcc3v3_pcie2x1l0: vcc3v3-pcie2x1l0-regulator {
 		compatible = "regulator-fixed";
 		enable-active-high;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
  2024-01-24 20:30 [PATCH 0/4] RK3588 and Rock 5B dts additions: thermal, OPP, rfkill and fan Alexey Charkov
  2024-01-24 20:30 ` [PATCH 1/4] arm64: dts: rockchip: add rfkill node for M.2 Key E WiFi on rock-5b Alexey Charkov
@ 2024-01-24 20:30 ` Alexey Charkov
  2024-01-24 21:56   ` Daniel Lezcano
  2024-01-24 20:30 ` [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B Alexey Charkov
  2024-01-24 20:30 ` [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588 Alexey Charkov
  3 siblings, 1 reply; 34+ messages in thread
From: Alexey Charkov @ 2024-01-24 20:30 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
  Cc: Daniel Lezcano, Dragan Simic, devicetree, linux-arm-kernel,
	linux-rockchip, linux-kernel, Alexey Charkov

Include thermal zones information in device tree for rk3588 variants

Signed-off-by: Alexey Charkov <alchark@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 165 ++++++++++++++++++++++++++++++
 1 file changed, 165 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
index 36b1b7acfe6a..131b9eb21398 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
@@ -10,6 +10,7 @@
 #include <dt-bindings/reset/rockchip,rk3588-cru.h>
 #include <dt-bindings/phy/phy.h>
 #include <dt-bindings/ata/ahci.h>
+#include <dt-bindings/thermal/thermal.h>
 
 / {
 	compatible = "rockchip,rk3588";
@@ -2228,6 +2229,170 @@ tsadc: tsadc@fec00000 {
 		status = "disabled";
 	};
 
+	thermal_zones: thermal-zones {
+		/* sensor near the center of the whole chip */
+		package_thermal: package-thermal {
+			polling-delay-passive = <0>;
+			polling-delay = <0>;
+			thermal-sensors = <&tsadc 0>;
+
+			trips {
+				package_crit: package-crit {
+					temperature = <115000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+		};
+
+		/* sensor between A76 cores 0 and 1 */
+		bigcore0_thermal: bigcore0-thermal {
+			polling-delay-passive = <20>;
+			polling-delay = <0>;
+			thermal-sensors = <&tsadc 1>;
+
+			trips {
+				bigcore0_alert0: bigcore0-alert0 {
+					temperature = <75000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+				bigcore0_alert1: bigcore0-alert1 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+				bigcore0_crit: bigcore0-crit {
+					temperature = <115000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+			cooling-maps {
+				map0 {
+					trip = <&bigcore0_alert1>;
+					cooling-device =
+						<&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+						<&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					contribution = <1024>;
+				};
+			};
+		};
+
+		/* sensor between A76 cores 2 and 3 */
+		bigcore2_thermal: bigcore2-thermal {
+			polling-delay-passive = <20>;
+			polling-delay = <0>;
+			thermal-sensors = <&tsadc 2>;
+
+			trips {
+				bigcore2_alert0: bigcore2-alert0 {
+					temperature = <75000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+				bigcore2_alert1: bigcore2-alert1 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+				bigcore2_crit: bigcore2-crit {
+					temperature = <115000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+			cooling-maps {
+				map1 {
+					trip = <&bigcore2_alert1>;
+					cooling-device =
+						<&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+						<&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					contribution = <1024>;
+				};
+			};
+		};
+
+		/* sensor between the four A55 cores */
+		little_core_thermal: littlecore-thermal {
+			polling-delay-passive = <20>;
+			polling-delay = <0>;
+			thermal-sensors = <&tsadc 3>;
+
+			trips {
+				littlecore_alert0: littlecore-alert0 {
+					temperature = <75000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+				littlecore_alert1: littlecore-alert1 {
+					temperature = <85000>;
+					hysteresis = <2000>;
+					type = "passive";
+				};
+				littlecore_crit: littlecore-crit {
+					temperature = <115000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+			cooling-maps {
+				map2 {
+					trip = <&littlecore_alert1>;
+					cooling-device =
+						<&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+						<&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+						<&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+						<&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+					contribution = <1024>;
+				};
+			};
+		};
+
+		/* sensor near the PD_CENTER power domain */
+		center_thermal: center-thermal {
+			polling-delay-passive = <0>;
+			polling-delay = <0>;
+			thermal-sensors = <&tsadc 4>;
+
+			trips {
+				center_crit: center-crit {
+					temperature = <115000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+		};
+
+		gpu_thermal: gpu-thermal {
+			polling-delay-passive = <0>;
+			polling-delay = <0>;
+			thermal-sensors = <&tsadc 5>;
+
+			trips {
+				gpu_crit: gpu-crit {
+					temperature = <115000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+		};
+
+		npu_thermal: npu-thermal {
+			polling-delay-passive = <0>;
+			polling-delay = <0>;
+			thermal-sensors = <&tsadc 6>;
+
+			trips {
+				npu_crit: npu-crit {
+					temperature = <115000>;
+					hysteresis = <0>;
+					type = "critical";
+				};
+			};
+		};
+	};
+
 	saradc: adc@fec10000 {
 		compatible = "rockchip,rk3588-saradc";
 		reg = <0x0 0xfec10000 0x0 0x10000>;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
  2024-01-24 20:30 ` [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588 Alexey Charkov
@ 2024-01-24 21:56   ` Daniel Lezcano
  2024-01-25  8:26     ` Alexey Charkov
  2024-01-25 13:34     ` Diederik de Haas
  0 siblings, 2 replies; 34+ messages in thread
From: Daniel Lezcano @ 2024-01-24 21:56 UTC (permalink / raw)
  To: Alexey Charkov, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner
  Cc: Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip, linux-kernel

On 24/01/2024 21:30, Alexey Charkov wrote:
> Include thermal zones information in device tree for rk3588 variants

There is an energy model for the CPUs. But finding out the sustainable 
power may be a bit tricky. So I suggest to remove everything related to 
the power allocator in this change and propose a dedicated change with 
all the power configuration (which includes proper k_p* coefficients to 
be set from userspace to have a flat mitigation figure).

That implies removing the "contribution" properties in this description.

Some comments below but definitively this version is close to be ok.


> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> ---
>   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 165 ++++++++++++++++++++++++++++++
>   1 file changed, 165 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 36b1b7acfe6a..131b9eb21398 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -10,6 +10,7 @@
>   #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>   #include <dt-bindings/phy/phy.h>
>   #include <dt-bindings/ata/ahci.h>
> +#include <dt-bindings/thermal/thermal.h>
>   
>   / {
>   	compatible = "rockchip,rk3588";
> @@ -2228,6 +2229,170 @@ tsadc: tsadc@fec00000 {
>   		status = "disabled";
>   	};
>   
> +	thermal_zones: thermal-zones {
> +		/* sensor near the center of the whole chip */
> +		package_thermal: package-thermal {
> +			polling-delay-passive = <0>;
> +			polling-delay = <0>;
> +			thermal-sensors = <&tsadc 0>;
> +
> +			trips {
> +				package_crit: package-crit {
> +					temperature = <115000>;
> +					hysteresis = <0>;
> +					type = "critical";
> +				};
> +			};
> +		};
> +
> +		/* sensor between A76 cores 0 and 1 */
> +		bigcore0_thermal: bigcore0-thermal {
> +			polling-delay-passive = <20>;

20ms seems very short, is this value on purpose? Or just picked up 
arbitrarily?

If it is possible, perhaps you should profile the temperature of these 
thermal zones (CPUs ones). There is a tool in 
<linuxdir>/tools/thermal/thermometer to do that.

You can measure with 10ms sampling rate when running for instance 
dhrystone pinned on b0 and b1, then on b2 and b3. And finally on the 
small cluster.

But if you don't have spare time and 20 is ok for you. Then it is fine 
for me too.

Some nits below.

> +			polling-delay = <0>;
> +			thermal-sensors = <&tsadc 1>;
> +
> +			trips {
> +				bigcore0_alert0: bigcore0-alert0 {
> +					temperature = <75000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +				bigcore0_alert1: bigcore0-alert1 {
> +					temperature = <85000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +				bigcore0_crit: bigcore0-crit {
> +					temperature = <115000>;
> +					hysteresis = <0>;
> +					type = "critical";
> +				};
> +			};
> +			cooling-maps {
> +				map0 {
> +					trip = <&bigcore0_alert1>;
> +					cooling-device =
> +						<&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> +						<&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> +					contribution = <1024>;
> +				};
> +			};
> +		};
> +
> +		/* sensor between A76 cores 2 and 3 */
> +		bigcore2_thermal: bigcore2-thermal {
> +			polling-delay-passive = <20>;
> +			polling-delay = <0>;
> +			thermal-sensors = <&tsadc 2>;
> +
> +			trips {
> +				bigcore2_alert0: bigcore2-alert0 {
> +					temperature = <75000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +				bigcore2_alert1: bigcore2-alert1 {
> +					temperature = <85000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +				bigcore2_crit: bigcore2-crit {
> +					temperature = <115000>;
> +					hysteresis = <0>;
> +					type = "critical";
> +				};
> +			};
> +			cooling-maps {
> +				map1 {

s/map1/mpa0/

> +					trip = <&bigcore2_alert1>;
> +					cooling-device =
> +						<&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> +						<&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> +					contribution = <1024>;
> +				};
> +			};
> +		};
> +
> +		/* sensor between the four A55 cores */
> +		little_core_thermal: littlecore-thermal {
> +			polling-delay-passive = <20>;
> +			polling-delay = <0>;
> +			thermal-sensors = <&tsadc 3>;
> +
> +			trips {
> +				littlecore_alert0: littlecore-alert0 {
> +					temperature = <75000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +				littlecore_alert1: littlecore-alert1 {
> +					temperature = <85000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +				littlecore_crit: littlecore-crit {
> +					temperature = <115000>;
> +					hysteresis = <0>;
> +					type = "critical";
> +				};
> +			};
> +			cooling-maps {
> +				map2 {

s/map2/map0/

> +					trip = <&littlecore_alert1>;
> +					cooling-device =
> +						<&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> +						<&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> +						<&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> +						<&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> +					contribution = <1024>;
> +				};
> +			};
> +		};
> +
> +		/* sensor near the PD_CENTER power domain */
> +		center_thermal: center-thermal {
> +			polling-delay-passive = <0>;
> +			polling-delay = <0>;
> +			thermal-sensors = <&tsadc 4>;
> +
> +			trips {
> +				center_crit: center-crit {
> +					temperature = <115000>;
> +					hysteresis = <0>;
> +					type = "critical";
> +				};
> +			};
> +		};
> +
> +		gpu_thermal: gpu-thermal {
> +			polling-delay-passive = <0>;
> +			polling-delay = <0>;
> +			thermal-sensors = <&tsadc 5>;
> +
> +			trips {
> +				gpu_crit: gpu-crit {
> +					temperature = <115000>;
> +					hysteresis = <0>;
> +					type = "critical";
> +				};
> +			};
> +		};
> +
> +		npu_thermal: npu-thermal {
> +			polling-delay-passive = <0>;
> +			polling-delay = <0>;
> +			thermal-sensors = <&tsadc 6>;
> +
> +			trips {
> +				npu_crit: npu-crit {
> +					temperature = <115000>;
> +					hysteresis = <0>;
> +					type = "critical";
> +				};
> +			};
> +		};
> +	};
> +
>   	saradc: adc@fec10000 {
>   		compatible = "rockchip,rk3588-saradc";
>   		reg = <0x0 0xfec10000 0x0 0x10000>;
> 

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
  2024-01-24 21:56   ` Daniel Lezcano
@ 2024-01-25  8:26     ` Alexey Charkov
  2024-01-25 10:02       ` Daniel Lezcano
  2024-01-25 22:19       ` Dragan Simic
  2024-01-25 13:34     ` Diederik de Haas
  1 sibling, 2 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-25  8:26 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

On Thu, Jan 25, 2024 at 1:56 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 24/01/2024 21:30, Alexey Charkov wrote:
> > Include thermal zones information in device tree for rk3588 variants
>
> There is an energy model for the CPUs. But finding out the sustainable
> power may be a bit tricky. So I suggest to remove everything related to
> the power allocator in this change and propose a dedicated change with
> all the power configuration (which includes proper k_p* coefficients to
> be set from userspace to have a flat mitigation figure).
>
> That implies removing the "contribution" properties in this description.

Alright, I'll just drop those "contribution" properties, thanks!

> Some comments below but definitively this version is close to be ok.

Yay! :)

> > Signed-off-by: Alexey Charkov <alchark@gmail.com>
> > ---
> >   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 165 ++++++++++++++++++++++++++++++
> >   1 file changed, 165 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > index 36b1b7acfe6a..131b9eb21398 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > @@ -10,6 +10,7 @@
> >   #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> >   #include <dt-bindings/phy/phy.h>
> >   #include <dt-bindings/ata/ahci.h>
> > +#include <dt-bindings/thermal/thermal.h>
> >
> >   / {
> >       compatible = "rockchip,rk3588";
> > @@ -2228,6 +2229,170 @@ tsadc: tsadc@fec00000 {
> >               status = "disabled";
> >       };
> >
> > +     thermal_zones: thermal-zones {
> > +             /* sensor near the center of the whole chip */
> > +             package_thermal: package-thermal {
> > +                     polling-delay-passive = <0>;
> > +                     polling-delay = <0>;
> > +                     thermal-sensors = <&tsadc 0>;
> > +
> > +                     trips {
> > +                             package_crit: package-crit {
> > +                                     temperature = <115000>;
> > +                                     hysteresis = <0>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +             };
> > +
> > +             /* sensor between A76 cores 0 and 1 */
> > +             bigcore0_thermal: bigcore0-thermal {
> > +                     polling-delay-passive = <20>;
>
> 20ms seems very short, is this value on purpose? Or just picked up
> arbitrarily?

Frankly, I simply used the value that Radxa's downstream DTS sets for
my board. 100ms seem to work just as well.

> If it is possible, perhaps you should profile the temperature of these
> thermal zones (CPUs ones). There is a tool in
> <linuxdir>/tools/thermal/thermometer to do that.
>
> You can measure with 10ms sampling rate when running for instance
> dhrystone pinned on b0 and b1, then on b2 and b3. And finally on the
> small cluster.

It seems tricky to isolate the effects from just one of the CPU
clusters, as their individual thermal outputs are not that high.

For my testing I disabled the fan (but didn't remove the heatsink to
avoid wasting the thermal interface tape), and tried loading CPUs with
stress-ng. Here are the observations:
 - Little cores alone stressed with 4 threads pegged to them with
taskset never reach the throttling temperature (85C), and balance out
around 60C
 - Either of the big core clusters stressed individually with 2
threads pegged to them with taskset never reach the throttling
temperature either
 - Four big cores with 4 threads heat up very slowly (>30 minutes to
reach throttling temperature, I didn't have enough patience to let
them actually reach it - maybe they never do)
 - Eight cores with 8 threads heat up to the throttling temperature
within ~5 minutes (again, with the fan off), and then, as soon as just
one of the big core clusters gets throttled, the temperature of all
cores balances out just below the throttling threshold. In my
observation cores 6,7 go from 2.4GHz down to 1.8GHz while the rest
stay at their respective top performance states (2.4GHz for big cores
4,5 and 1.8GHz for little cores 0-3)

Adding to it the fact that the temperature measurement resolution is
not very granular (almost 1C) it's somewhat difficult to estimate how
fast throttling action on a single cluster really brings its
temperature within bounds, as they all affect each other at relevant
temperature-load combinations. Perhaps it means that too granular
polling doesn't add much value.

> But if you don't have spare time and 20 is ok for you. Then it is fine
> for me too.

I guess I'll go for 100 as other upstream Rockchip .dtsi's do, given
all of the above. Thanks for pointing this out!

> Some nits below.
>
> > +                     polling-delay = <0>;
> > +                     thermal-sensors = <&tsadc 1>;
> > +
> > +                     trips {
> > +                             bigcore0_alert0: bigcore0-alert0 {
> > +                                     temperature = <75000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             bigcore0_alert1: bigcore0-alert1 {
> > +                                     temperature = <85000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             bigcore0_crit: bigcore0-crit {
> > +                                     temperature = <115000>;
> > +                                     hysteresis = <0>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +                     cooling-maps {
> > +                             map0 {
> > +                                     trip = <&bigcore0_alert1>;
> > +                                     cooling-device =
> > +                                             <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > +                                             <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > +                                     contribution = <1024>;
> > +                             };
> > +                     };
> > +             };
> > +
> > +             /* sensor between A76 cores 2 and 3 */
> > +             bigcore2_thermal: bigcore2-thermal {
> > +                     polling-delay-passive = <20>;
> > +                     polling-delay = <0>;
> > +                     thermal-sensors = <&tsadc 2>;
> > +
> > +                     trips {
> > +                             bigcore2_alert0: bigcore2-alert0 {
> > +                                     temperature = <75000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             bigcore2_alert1: bigcore2-alert1 {
> > +                                     temperature = <85000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             bigcore2_crit: bigcore2-crit {
> > +                                     temperature = <115000>;
> > +                                     hysteresis = <0>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +                     cooling-maps {
> > +                             map1 {
>
> s/map1/mpa0/

Noted, thanks!

> > +                                     trip = <&bigcore2_alert1>;
> > +                                     cooling-device =
> > +                                             <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > +                                             <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > +                                     contribution = <1024>;
> > +                             };
> > +                     };
> > +             };
> > +
> > +             /* sensor between the four A55 cores */
> > +             little_core_thermal: littlecore-thermal {
> > +                     polling-delay-passive = <20>;
> > +                     polling-delay = <0>;
> > +                     thermal-sensors = <&tsadc 3>;
> > +
> > +                     trips {
> > +                             littlecore_alert0: littlecore-alert0 {
> > +                                     temperature = <75000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             littlecore_alert1: littlecore-alert1 {
> > +                                     temperature = <85000>;
> > +                                     hysteresis = <2000>;
> > +                                     type = "passive";
> > +                             };
> > +                             littlecore_crit: littlecore-crit {
> > +                                     temperature = <115000>;
> > +                                     hysteresis = <0>;
> > +                                     type = "critical";
> > +                             };
> > +                     };
> > +                     cooling-maps {
> > +                             map2 {
>
> s/map2/map0/

Noted, thanks!

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
  2024-01-25  8:26     ` Alexey Charkov
@ 2024-01-25 10:02       ` Daniel Lezcano
  2024-01-25 14:46         ` Alexey Charkov
  2024-01-25 22:19       ` Dragan Simic
  1 sibling, 1 reply; 34+ messages in thread
From: Daniel Lezcano @ 2024-01-25 10:02 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

On 25/01/2024 09:26, Alexey Charkov wrote:
> On Thu, Jan 25, 2024 at 1:56 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 24/01/2024 21:30, Alexey Charkov wrote:
>>> Include thermal zones information in device tree for rk3588 variants
>>
>> There is an energy model for the CPUs. But finding out the sustainable
>> power may be a bit tricky. So I suggest to remove everything related to
>> the power allocator in this change and propose a dedicated change with
>> all the power configuration (which includes proper k_p* coefficients to
>> be set from userspace to have a flat mitigation figure).
>>
>> That implies removing the "contribution" properties in this description.
> 
> Alright, I'll just drop those "contribution" properties, thanks!
> 
>> Some comments below but definitively this version is close to be ok.
> 
> Yay! :)
> 
>>> Signed-off-by: Alexey Charkov <alchark@gmail.com>
>>> ---
>>>    arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 165 ++++++++++++++++++++++++++++++
>>>    1 file changed, 165 insertions(+)
>>>
>>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>>> index 36b1b7acfe6a..131b9eb21398 100644
>>> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>>> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>>> @@ -10,6 +10,7 @@
>>>    #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>>>    #include <dt-bindings/phy/phy.h>
>>>    #include <dt-bindings/ata/ahci.h>
>>> +#include <dt-bindings/thermal/thermal.h>
>>>
>>>    / {
>>>        compatible = "rockchip,rk3588";
>>> @@ -2228,6 +2229,170 @@ tsadc: tsadc@fec00000 {
>>>                status = "disabled";
>>>        };
>>>
>>> +     thermal_zones: thermal-zones {
>>> +             /* sensor near the center of the whole chip */
>>> +             package_thermal: package-thermal {
>>> +                     polling-delay-passive = <0>;
>>> +                     polling-delay = <0>;
>>> +                     thermal-sensors = <&tsadc 0>;
>>> +
>>> +                     trips {
>>> +                             package_crit: package-crit {
>>> +                                     temperature = <115000>;
>>> +                                     hysteresis = <0>;
>>> +                                     type = "critical";
>>> +                             };
>>> +                     };
>>> +             };
>>> +
>>> +             /* sensor between A76 cores 0 and 1 */
>>> +             bigcore0_thermal: bigcore0-thermal {
>>> +                     polling-delay-passive = <20>;
>>
>> 20ms seems very short, is this value on purpose? Or just picked up
>> arbitrarily?
> 
> Frankly, I simply used the value that Radxa's downstream DTS sets for
> my board. 100ms seem to work just as well.
> 
>> If it is possible, perhaps you should profile the temperature of these
>> thermal zones (CPUs ones). There is a tool in
>> <linuxdir>/tools/thermal/thermometer to do that.
>>
>> You can measure with 10ms sampling rate when running for instance
>> dhrystone pinned on b0 and b1, then on b2 and b3. And finally on the
>> small cluster.
> 
> It seems tricky to isolate the effects from just one of the CPU
> clusters, as their individual thermal outputs are not that high.
> 
> For my testing I disabled the fan (but didn't remove the heatsink to
> avoid wasting the thermal interface tape),

It is ok but the system will have more heat capacity and it will be 
necessary to saturate it before running the tests. IOW warm up the 
system by running thermal stress tests several times.

> and tried loading CPUs with
> stress-ng. Here are the observations:

Usually I use drhystone to thermal stress the cores (e. one minute).

>   - Little cores alone stressed with 4 threads pegged to them with
> taskset never reach the throttling temperature (85C), and balance out
> around 60C
>   - Either of the big core clusters stressed individually with 2
> threads pegged to them with taskset never reach the throttling
> temperature either

Not sure what does stress-ng but you may want to test with dhrystone. 
I'm pretty sure it will make the expected thermal stress.

On the rk3388, the temperature on the big cores raises to the threshold 
in a few seconds (after a warmup).

>   - Four big cores with 4 threads heat up very slowly (>30 minutes to
> reach throttling temperature, I didn't have enough patience to let
> them actually reach it - maybe they never do)
>   - Eight cores with 8 threads heat up to the throttling temperature
> within ~5 minutes (again, with the fan off), and then, as soon as just
> one of the big core clusters gets throttled, the temperature of all
> cores balances out just below the throttling threshold. In my
> observation cores 6,7 go from 2.4GHz down to 1.8GHz while the rest
> stay at their respective top performance states (2.4GHz for big cores
> 4,5 and 1.8GHz for little cores 0-3)

Yeah, definitively it is probable stress-ng is not the right tool for 
that. I gave a try and it does not thermal stress my rk3388 board.

Just try "dhrystone -t 2 -l 10000" pinned on a big cluster


> Adding to it the fact that the temperature measurement resolution is
> not very granular (almost 1C) it's somewhat difficult to estimate how
> fast throttling action on a single cluster really brings its
> temperature within bounds, as they all affect each other at relevant
> temperature-load combinations. Perhaps it means that too granular
> polling doesn't add much value.
> 
>> But if you don't have spare time and 20 is ok for you. Then it is fine
>> for me too.
> 
> I guess I'll go for 100 as other upstream Rockchip .dtsi's do, given
> all of the above. Thanks for pointing this out!
> 
>> Some nits below.
>>
>>> +                     polling-delay = <0>;
>>> +                     thermal-sensors = <&tsadc 1>;
>>> +
>>> +                     trips {
>>> +                             bigcore0_alert0: bigcore0-alert0 {
>>> +                                     temperature = <75000>;
>>> +                                     hysteresis = <2000>;
>>> +                                     type = "passive";
>>> +                             };
>>> +                             bigcore0_alert1: bigcore0-alert1 {
>>> +                                     temperature = <85000>;
>>> +                                     hysteresis = <2000>;
>>> +                                     type = "passive";
>>> +                             };
>>> +                             bigcore0_crit: bigcore0-crit {
>>> +                                     temperature = <115000>;
>>> +                                     hysteresis = <0>;
>>> +                                     type = "critical";
>>> +                             };
>>> +                     };
>>> +                     cooling-maps {
>>> +                             map0 {
>>> +                                     trip = <&bigcore0_alert1>;
>>> +                                     cooling-device =
>>> +                                             <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>>> +                                             <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>>> +                                     contribution = <1024>;
>>> +                             };
>>> +                     };
>>> +             };
>>> +
>>> +             /* sensor between A76 cores 2 and 3 */
>>> +             bigcore2_thermal: bigcore2-thermal {
>>> +                     polling-delay-passive = <20>;
>>> +                     polling-delay = <0>;
>>> +                     thermal-sensors = <&tsadc 2>;
>>> +
>>> +                     trips {
>>> +                             bigcore2_alert0: bigcore2-alert0 {
>>> +                                     temperature = <75000>;
>>> +                                     hysteresis = <2000>;
>>> +                                     type = "passive";
>>> +                             };
>>> +                             bigcore2_alert1: bigcore2-alert1 {
>>> +                                     temperature = <85000>;
>>> +                                     hysteresis = <2000>;
>>> +                                     type = "passive";
>>> +                             };
>>> +                             bigcore2_crit: bigcore2-crit {
>>> +                                     temperature = <115000>;
>>> +                                     hysteresis = <0>;
>>> +                                     type = "critical";
>>> +                             };
>>> +                     };
>>> +                     cooling-maps {
>>> +                             map1 {
>>
>> s/map1/mpa0/
> 
> Noted, thanks!
> 
>>> +                                     trip = <&bigcore2_alert1>;
>>> +                                     cooling-device =
>>> +                                             <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>>> +                                             <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>>> +                                     contribution = <1024>;
>>> +                             };
>>> +                     };
>>> +             };
>>> +
>>> +             /* sensor between the four A55 cores */
>>> +             little_core_thermal: littlecore-thermal {
>>> +                     polling-delay-passive = <20>;
>>> +                     polling-delay = <0>;
>>> +                     thermal-sensors = <&tsadc 3>;
>>> +
>>> +                     trips {
>>> +                             littlecore_alert0: littlecore-alert0 {
>>> +                                     temperature = <75000>;
>>> +                                     hysteresis = <2000>;
>>> +                                     type = "passive";
>>> +                             };
>>> +                             littlecore_alert1: littlecore-alert1 {
>>> +                                     temperature = <85000>;
>>> +                                     hysteresis = <2000>;
>>> +                                     type = "passive";
>>> +                             };
>>> +                             littlecore_crit: littlecore-crit {
>>> +                                     temperature = <115000>;
>>> +                                     hysteresis = <0>;
>>> +                                     type = "critical";
>>> +                             };
>>> +                     };
>>> +                     cooling-maps {
>>> +                             map2 {
>>
>> s/map2/map0/
> 
> Noted, thanks!
> 
> Best regards,
> Alexey

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
  2024-01-25 10:02       ` Daniel Lezcano
@ 2024-01-25 14:46         ` Alexey Charkov
  0 siblings, 0 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-25 14:46 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

On Thu, Jan 25, 2024 at 2:02 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 25/01/2024 09:26, Alexey Charkov wrote:
> > On Thu, Jan 25, 2024 at 1:56 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 24/01/2024 21:30, Alexey Charkov wrote:
> >>> Include thermal zones information in device tree for rk3588 variants
> >>
> >> There is an energy model for the CPUs. But finding out the sustainable
> >> power may be a bit tricky. So I suggest to remove everything related to
> >> the power allocator in this change and propose a dedicated change with
> >> all the power configuration (which includes proper k_p* coefficients to
> >> be set from userspace to have a flat mitigation figure).
> >>
> >> That implies removing the "contribution" properties in this description.
> >
> > Alright, I'll just drop those "contribution" properties, thanks!
> >
> >> Some comments below but definitively this version is close to be ok.
> >
> > Yay! :)
> >
> >>> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> >>> ---
> >>>    arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 165 ++++++++++++++++++++++++++++++
> >>>    1 file changed, 165 insertions(+)
> >>>
> >>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >>> index 36b1b7acfe6a..131b9eb21398 100644
> >>> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >>> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >>> @@ -10,6 +10,7 @@
> >>>    #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> >>>    #include <dt-bindings/phy/phy.h>
> >>>    #include <dt-bindings/ata/ahci.h>
> >>> +#include <dt-bindings/thermal/thermal.h>
> >>>
> >>>    / {
> >>>        compatible = "rockchip,rk3588";
> >>> @@ -2228,6 +2229,170 @@ tsadc: tsadc@fec00000 {
> >>>                status = "disabled";
> >>>        };
> >>>
> >>> +     thermal_zones: thermal-zones {
> >>> +             /* sensor near the center of the whole chip */
> >>> +             package_thermal: package-thermal {
> >>> +                     polling-delay-passive = <0>;
> >>> +                     polling-delay = <0>;
> >>> +                     thermal-sensors = <&tsadc 0>;
> >>> +
> >>> +                     trips {
> >>> +                             package_crit: package-crit {
> >>> +                                     temperature = <115000>;
> >>> +                                     hysteresis = <0>;
> >>> +                                     type = "critical";
> >>> +                             };
> >>> +                     };
> >>> +             };
> >>> +
> >>> +             /* sensor between A76 cores 0 and 1 */
> >>> +             bigcore0_thermal: bigcore0-thermal {
> >>> +                     polling-delay-passive = <20>;
> >>
> >> 20ms seems very short, is this value on purpose? Or just picked up
> >> arbitrarily?
> >
> > Frankly, I simply used the value that Radxa's downstream DTS sets for
> > my board. 100ms seem to work just as well.
> >
> >> If it is possible, perhaps you should profile the temperature of these
> >> thermal zones (CPUs ones). There is a tool in
> >> <linuxdir>/tools/thermal/thermometer to do that.
> >>
> >> You can measure with 10ms sampling rate when running for instance
> >> dhrystone pinned on b0 and b1, then on b2 and b3. And finally on the
> >> small cluster.
> >
> > It seems tricky to isolate the effects from just one of the CPU
> > clusters, as their individual thermal outputs are not that high.
> >
> > For my testing I disabled the fan (but didn't remove the heatsink to
> > avoid wasting the thermal interface tape),
>
> It is ok but the system will have more heat capacity and it will be
> necessary to saturate it before running the tests. IOW warm up the
> system by running thermal stress tests several times.
>
> > and tried loading CPUs with
> > stress-ng. Here are the observations:
>
> Usually I use drhystone to thermal stress the cores (e. one minute).

Hmm, could you please point to the source package or repo to get the
version of dhrystone you use? I could only find the old shar [1] and a
Debian version with added Makefile [2], but neither seems to produce
multiple threads.

It doesn't seem to be packaged for either Gentoo or Fedora unfortunately.

Indeed, I'm getting higher thermal load (vs. stress-ng --cpu) even by
simply compiling kernel sources, although I'd expect it to wait for
memory and/or IO quite a lot.

[1] https://www.netlib.org/benchmark/dhry-c
[2] https://github.com/qris/dhrystone-deb

Thanks a lot,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
  2024-01-25  8:26     ` Alexey Charkov
  2024-01-25 10:02       ` Daniel Lezcano
@ 2024-01-25 22:19       ` Dragan Simic
  1 sibling, 0 replies; 34+ messages in thread
From: Dragan Simic @ 2024-01-25 22:19 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

On 2024-01-25 09:26, Alexey Charkov wrote:
> On Thu, Jan 25, 2024 at 1:56 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> 
>> On 24/01/2024 21:30, Alexey Charkov wrote:
>> > Include thermal zones information in device tree for rk3588 variants
>> 
>> There is an energy model for the CPUs. But finding out the sustainable
>> power may be a bit tricky. So I suggest to remove everything related 
>> to
>> the power allocator in this change and propose a dedicated change with
>> all the power configuration (which includes proper k_p* coefficients 
>> to
>> be set from userspace to have a flat mitigation figure).
>> 
>> That implies removing the "contribution" properties in this 
>> description.
> 
> Alright, I'll just drop those "contribution" properties, thanks!
> 
>> Some comments below but definitively this version is close to be ok.
> 
> Yay! :)
> 
>> > Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> > ---
>> >   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 165 ++++++++++++++++++++++++++++++
>> >   1 file changed, 165 insertions(+)
>> >
>> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > index 36b1b7acfe6a..131b9eb21398 100644
>> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > @@ -10,6 +10,7 @@
>> >   #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>> >   #include <dt-bindings/phy/phy.h>
>> >   #include <dt-bindings/ata/ahci.h>
>> > +#include <dt-bindings/thermal/thermal.h>
>> >
>> >   / {
>> >       compatible = "rockchip,rk3588";
>> > @@ -2228,6 +2229,170 @@ tsadc: tsadc@fec00000 {
>> >               status = "disabled";
>> >       };
>> >
>> > +     thermal_zones: thermal-zones {
>> > +             /* sensor near the center of the whole chip */
>> > +             package_thermal: package-thermal {
>> > +                     polling-delay-passive = <0>;
>> > +                     polling-delay = <0>;
>> > +                     thermal-sensors = <&tsadc 0>;
>> > +
>> > +                     trips {
>> > +                             package_crit: package-crit {
>> > +                                     temperature = <115000>;
>> > +                                     hysteresis = <0>;
>> > +                                     type = "critical";
>> > +                             };
>> > +                     };
>> > +             };
>> > +
>> > +             /* sensor between A76 cores 0 and 1 */
>> > +             bigcore0_thermal: bigcore0-thermal {
>> > +                     polling-delay-passive = <20>;
>> 
>> 20ms seems very short, is this value on purpose? Or just picked up
>> arbitrarily?
> 
> Frankly, I simply used the value that Radxa's downstream DTS sets for
> my board. 100ms seem to work just as well.

I think that 100 ms should be just fine (and should lower the resulting
CPU usage a bit), especially because the resolution of the TSADCs is
quite low at 1 oC, according to the RK3588 TRM.  If the resolution were
higher, sampling the temperatures more often might make more sense.

>> If it is possible, perhaps you should profile the temperature of these
>> thermal zones (CPUs ones). There is a tool in
>> <linuxdir>/tools/thermal/thermometer to do that.
>> 
>> You can measure with 10ms sampling rate when running for instance
>> dhrystone pinned on b0 and b1, then on b2 and b3. And finally on the
>> small cluster.
> 
> It seems tricky to isolate the effects from just one of the CPU
> clusters, as their individual thermal outputs are not that high.
> 
> For my testing I disabled the fan (but didn't remove the heatsink to
> avoid wasting the thermal interface tape), and tried loading CPUs with
> stress-ng. Here are the observations:
>  - Little cores alone stressed with 4 threads pegged to them with
> taskset never reach the throttling temperature (85C), and balance out
> around 60C
>  - Either of the big core clusters stressed individually with 2
> threads pegged to them with taskset never reach the throttling
> temperature either
>  - Four big cores with 4 threads heat up very slowly (>30 minutes to
> reach throttling temperature, I didn't have enough patience to let
> them actually reach it - maybe they never do)
>  - Eight cores with 8 threads heat up to the throttling temperature
> within ~5 minutes (again, with the fan off), and then, as soon as just
> one of the big core clusters gets throttled, the temperature of all
> cores balances out just below the throttling threshold. In my
> observation cores 6,7 go from 2.4GHz down to 1.8GHz while the rest
> stay at their respective top performance states (2.4GHz for big cores
> 4,5 and 1.8GHz for little cores 0-3)
> 
> Adding to it the fact that the temperature measurement resolution is
> not very granular (almost 1C) it's somewhat difficult to estimate how
> fast throttling action on a single cluster really brings its
> temperature within bounds, as they all affect each other at relevant
> temperature-load combinations. Perhaps it means that too granular
> polling doesn't add much value.
> 
>> But if you don't have spare time and 20 is ok for you. Then it is fine
>> for me too.
> 
> I guess I'll go for 100 as other upstream Rockchip .dtsi's do, given
> all of the above. Thanks for pointing this out!
> 
>> Some nits below.
>> 
>> > +                     polling-delay = <0>;
>> > +                     thermal-sensors = <&tsadc 1>;
>> > +
>> > +                     trips {
>> > +                             bigcore0_alert0: bigcore0-alert0 {
>> > +                                     temperature = <75000>;
>> > +                                     hysteresis = <2000>;
>> > +                                     type = "passive";
>> > +                             };
>> > +                             bigcore0_alert1: bigcore0-alert1 {
>> > +                                     temperature = <85000>;
>> > +                                     hysteresis = <2000>;
>> > +                                     type = "passive";
>> > +                             };
>> > +                             bigcore0_crit: bigcore0-crit {
>> > +                                     temperature = <115000>;
>> > +                                     hysteresis = <0>;
>> > +                                     type = "critical";
>> > +                             };
>> > +                     };
>> > +                     cooling-maps {
>> > +                             map0 {
>> > +                                     trip = <&bigcore0_alert1>;
>> > +                                     cooling-device =
>> > +                                             <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > +                                             <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > +                                     contribution = <1024>;
>> > +                             };
>> > +                     };
>> > +             };
>> > +
>> > +             /* sensor between A76 cores 2 and 3 */
>> > +             bigcore2_thermal: bigcore2-thermal {
>> > +                     polling-delay-passive = <20>;
>> > +                     polling-delay = <0>;
>> > +                     thermal-sensors = <&tsadc 2>;
>> > +
>> > +                     trips {
>> > +                             bigcore2_alert0: bigcore2-alert0 {
>> > +                                     temperature = <75000>;
>> > +                                     hysteresis = <2000>;
>> > +                                     type = "passive";
>> > +                             };
>> > +                             bigcore2_alert1: bigcore2-alert1 {
>> > +                                     temperature = <85000>;
>> > +                                     hysteresis = <2000>;
>> > +                                     type = "passive";
>> > +                             };
>> > +                             bigcore2_crit: bigcore2-crit {
>> > +                                     temperature = <115000>;
>> > +                                     hysteresis = <0>;
>> > +                                     type = "critical";
>> > +                             };
>> > +                     };
>> > +                     cooling-maps {
>> > +                             map1 {
>> 
>> s/map1/mpa0/
> 
> Noted, thanks!
> 
>> > +                                     trip = <&bigcore2_alert1>;
>> > +                                     cooling-device =
>> > +                                             <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > +                                             <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > +                                     contribution = <1024>;
>> > +                             };
>> > +                     };
>> > +             };
>> > +
>> > +             /* sensor between the four A55 cores */
>> > +             little_core_thermal: littlecore-thermal {
>> > +                     polling-delay-passive = <20>;
>> > +                     polling-delay = <0>;
>> > +                     thermal-sensors = <&tsadc 3>;
>> > +
>> > +                     trips {
>> > +                             littlecore_alert0: littlecore-alert0 {
>> > +                                     temperature = <75000>;
>> > +                                     hysteresis = <2000>;
>> > +                                     type = "passive";
>> > +                             };
>> > +                             littlecore_alert1: littlecore-alert1 {
>> > +                                     temperature = <85000>;
>> > +                                     hysteresis = <2000>;
>> > +                                     type = "passive";
>> > +                             };
>> > +                             littlecore_crit: littlecore-crit {
>> > +                                     temperature = <115000>;
>> > +                                     hysteresis = <0>;
>> > +                                     type = "critical";
>> > +                             };
>> > +                     };
>> > +                     cooling-maps {
>> > +                             map2 {
>> 
>> s/map2/map0/
> 
> Noted, thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588
  2024-01-24 21:56   ` Daniel Lezcano
  2024-01-25  8:26     ` Alexey Charkov
@ 2024-01-25 13:34     ` Diederik de Haas
  1 sibling, 0 replies; 34+ messages in thread
From: Diederik de Haas @ 2024-01-25 13:34 UTC (permalink / raw)
  To: Alexey Charkov, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, linux-rockchip
  Cc: Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Daniel Lezcano

[-- Attachment #1: Type: text/plain, Size: 237 bytes --]

On Wednesday, 24 January 2024 22:56:21 CET Daniel Lezcano wrote:
> > +                       cooling-maps {
> > +                               map1 {
> 
> s/map1/mpa0/

FTR: that should be ``s/map1/map0/``
(verified with Daniel Lezcano)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B
  2024-01-24 20:30 [PATCH 0/4] RK3588 and Rock 5B dts additions: thermal, OPP, rfkill and fan Alexey Charkov
  2024-01-24 20:30 ` [PATCH 1/4] arm64: dts: rockchip: add rfkill node for M.2 Key E WiFi on rock-5b Alexey Charkov
  2024-01-24 20:30 ` [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588 Alexey Charkov
@ 2024-01-24 20:30 ` Alexey Charkov
  2024-01-24 21:59   ` Daniel Lezcano
  2024-01-25 23:13   ` Dragan Simic
  2024-01-24 20:30 ` [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588 Alexey Charkov
  3 siblings, 2 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-24 20:30 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
  Cc: Daniel Lezcano, Dragan Simic, devicetree, linux-arm-kernel,
	linux-rockchip, linux-kernel, Alexey Charkov

This enables thermal monitoring on Radxa Rock 5B and links the PWM
fan as an active cooling device managed automatically by the thermal
subsystem, with a target SoC temperature of 55C

Signed-off-by: Alexey Charkov <alchark@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
index 9b7bf6cec8bd..c4c94e0b6163 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
@@ -52,7 +52,7 @@ led_rgb_b {
 
 	fan: pwm-fan {
 		compatible = "pwm-fan";
-		cooling-levels = <0 95 145 195 255>;
+		cooling-levels = <0 120 150 180 210 240 255>;
 		fan-supply = <&vcc5v0_sys>;
 		pwms = <&pwm1 0 50000 0>;
 		#cooling-cells = <2>;
@@ -180,6 +180,25 @@ &cpu_l3 {
 	cpu-supply = <&vdd_cpu_lit_s0>;
 };
 
+&package_thermal {
+	polling-delay = <1000>;
+
+	trips {
+		package_fan: package-fan {
+			temperature = <55000>;
+			hysteresis = <2000>;
+			type = "active";
+		};
+	};
+
+	cooling-maps {
+		map-fan {
+			trip = <&package_fan>;
+			cooling-device = <&fan THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+		};
+	};
+};
+
 &i2c0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&i2c0m2_xfer>;
@@ -738,6 +757,10 @@ regulator-state-mem {
 	};
 };
 
+&tsadc {
+	status = "okay";
+};
+
 &uart2 {
 	pinctrl-0 = <&uart2m0_xfer>;
 	status = "okay";

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B
  2024-01-24 20:30 ` [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B Alexey Charkov
@ 2024-01-24 21:59   ` Daniel Lezcano
  2024-01-25 23:13   ` Dragan Simic
  1 sibling, 0 replies; 34+ messages in thread
From: Daniel Lezcano @ 2024-01-24 21:59 UTC (permalink / raw)
  To: Alexey Charkov, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner
  Cc: Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip, linux-kernel

On 24/01/2024 21:30, Alexey Charkov wrote:
> This enables thermal monitoring on Radxa Rock 5B and links the PWM
> fan as an active cooling device managed automatically by the thermal
> subsystem, with a target SoC temperature of 55C
> 
> Signed-off-by: Alexey Charkov <alchark@gmail.com>

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> ---
>   arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 25 ++++++++++++++++++++++++-
>   1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> index 9b7bf6cec8bd..c4c94e0b6163 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> @@ -52,7 +52,7 @@ led_rgb_b {
>   
>   	fan: pwm-fan {
>   		compatible = "pwm-fan";
> -		cooling-levels = <0 95 145 195 255>;
> +		cooling-levels = <0 120 150 180 210 240 255>;
>   		fan-supply = <&vcc5v0_sys>;
>   		pwms = <&pwm1 0 50000 0>;
>   		#cooling-cells = <2>;
> @@ -180,6 +180,25 @@ &cpu_l3 {
>   	cpu-supply = <&vdd_cpu_lit_s0>;
>   };
>   
> +&package_thermal {
> +	polling-delay = <1000>;
> +
> +	trips {
> +		package_fan: package-fan {
> +			temperature = <55000>;
> +			hysteresis = <2000>;
> +			type = "active";
> +		};
> +	};
> +
> +	cooling-maps {
> +		map-fan {
> +			trip = <&package_fan>;
> +			cooling-device = <&fan THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> +		};
> +	};
> +};
> +
>   &i2c0 {
>   	pinctrl-names = "default";
>   	pinctrl-0 = <&i2c0m2_xfer>;
> @@ -738,6 +757,10 @@ regulator-state-mem {
>   	};
>   };
>   
> +&tsadc {
> +	status = "okay";
> +};
> +
>   &uart2 {
>   	pinctrl-0 = <&uart2m0_xfer>;
>   	status = "okay";
> 

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B
  2024-01-24 20:30 ` [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B Alexey Charkov
  2024-01-24 21:59   ` Daniel Lezcano
@ 2024-01-25 23:13   ` Dragan Simic
  2024-01-27 20:27     ` Dragan Simic
  1 sibling, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-25 23:13 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Daniel Lezcano, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

Hello Alexey,

On 2024-01-24 21:30, Alexey Charkov wrote:
> This enables thermal monitoring on Radxa Rock 5B and links the PWM
> fan as an active cooling device managed automatically by the thermal
> subsystem, with a target SoC temperature of 55C
> 
> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> ---
>  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 25 
> ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> index 9b7bf6cec8bd..c4c94e0b6163 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> @@ -52,7 +52,7 @@ led_rgb_b {
> 
>  	fan: pwm-fan {
>  		compatible = "pwm-fan";
> -		cooling-levels = <0 95 145 195 255>;
> +		cooling-levels = <0 120 150 180 210 240 255>;
>  		fan-supply = <&vcc5v0_sys>;
>  		pwms = <&pwm1 0 50000 0>;
>  		#cooling-cells = <2>;
> @@ -180,6 +180,25 @@ &cpu_l3 {
>  	cpu-supply = <&vdd_cpu_lit_s0>;
>  };
> 
> +&package_thermal {
> +	polling-delay = <1000>;
> +
> +	trips {
> +		package_fan: package-fan {
> +			temperature = <55000>;
> +			hysteresis = <2000>;
> +			type = "active";
> +		};
> +	};
> +
> +	cooling-maps {
> +		map-fan {
> +			trip = <&package_fan>;
> +			cooling-device = <&fan THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> +		};
> +	};
> +};

It should be better to have two new trips and two new cooling maps
defined, instead of having just one trip/map pair, like this:

&package_thermal {
	polling-delay = <1000>;

	trips {
		package_warm: package-warm {
			temperature = <55000>;
			hysteresis = <2000>;
			type = "active";
		};

		package_hot: package-hot {
			temperature = <65000>;
			hysteresis = <2000>;
			type = "active";
		};
	};

	cooling-maps {
		mapX {
			trip = <&package_warm>;
			cooling-device = <&fan THERMAL_NO_LIMIT 1>;
		};

		mapY {
			trip = <&package_hot>;
			cooling-device = <&fan 2 THERMAL_NO_LIMIT>;
		};
	};
};

The idea behind this approach is to keep the fan spinning at the lowest
available speed until the package temperature reaches the second trip's
temperature level, at which point the fan starts ramping up.  An 
approach
like this is already employed by the Pine64 RockPro64 SBC.

This way, we'll be doing our best to keep the fan noise down;  of 
course,
it will depend on the particular heatsink and fan combo how long the fan
can be kept at the lowest speed, but we should aim at supporting as many
different cooling setups as possible, and as well as possible, out of 
the
box and with no additional tweaking required.

Please notice "mapX" and "mapY" as the names of the additional cooling 
maps,
where X and Y are simply the next lowest available indices, which is 
pretty
much the usual way to name the additional cooling maps.

>  &i2c0 {
>  	pinctrl-names = "default";
>  	pinctrl-0 = <&i2c0m2_xfer>;
> @@ -738,6 +757,10 @@ regulator-state-mem {
>  	};
>  };
> 
> +&tsadc {
> +	status = "okay";
> +};
> +
>  &uart2 {
>  	pinctrl-0 = <&uart2m0_xfer>;
>  	status = "okay";

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B
  2024-01-25 23:13   ` Dragan Simic
@ 2024-01-27 20:27     ` Dragan Simic
  2024-01-28 20:08       ` Alexey Charkov
  0 siblings, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-27 20:27 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Daniel Lezcano, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

Hello Alexey,

On 2024-01-26 00:13, Dragan Simic wrote:
> On 2024-01-24 21:30, Alexey Charkov wrote:
>> This enables thermal monitoring on Radxa Rock 5B and links the PWM
>> fan as an active cooling device managed automatically by the thermal
>> subsystem, with a target SoC temperature of 55C
>> 
>> Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> ---
>>  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 25 
>> ++++++++++++++++++++++++-
>>  1 file changed, 24 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> index 9b7bf6cec8bd..c4c94e0b6163 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> @@ -52,7 +52,7 @@ led_rgb_b {
>> 
>>  	fan: pwm-fan {
>>  		compatible = "pwm-fan";
>> -		cooling-levels = <0 95 145 195 255>;
>> +		cooling-levels = <0 120 150 180 210 240 255>;
>>  		fan-supply = <&vcc5v0_sys>;
>>  		pwms = <&pwm1 0 50000 0>;
>>  		#cooling-cells = <2>;
>> @@ -180,6 +180,25 @@ &cpu_l3 {
>>  	cpu-supply = <&vdd_cpu_lit_s0>;
>>  };
>> 
>> +&package_thermal {
>> +	polling-delay = <1000>;
>> +
>> +	trips {
>> +		package_fan: package-fan {
>> +			temperature = <55000>;
>> +			hysteresis = <2000>;
>> +			type = "active";
>> +		};
>> +	};
>> +
>> +	cooling-maps {
>> +		map-fan {
>> +			trip = <&package_fan>;
>> +			cooling-device = <&fan THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> +		};
>> +	};
>> +};
> 
> It should be better to have two new trips and two new cooling maps
> defined, instead of having just one trip/map pair, like this:
> 
> &package_thermal {
> 	polling-delay = <1000>;
> 
> 	trips {
> 		package_warm: package-warm {
> 			temperature = <55000>;
> 			hysteresis = <2000>;
> 			type = "active";
> 		};
> 
> 		package_hot: package-hot {
> 			temperature = <65000>;
> 			hysteresis = <2000>;
> 			type = "active";
> 		};
> 	};
> 
> 	cooling-maps {
> 		mapX {
> 			trip = <&package_warm>;
> 			cooling-device = <&fan THERMAL_NO_LIMIT 1>;
> 		};
> 
> 		mapY {
> 			trip = <&package_hot>;
> 			cooling-device = <&fan 2 THERMAL_NO_LIMIT>;
> 		};
> 	};
> };
> 
> The idea behind this approach is to keep the fan spinning at the lowest
> available speed until the package temperature reaches the second trip's
> temperature level, at which point the fan starts ramping up.  An 
> approach
> like this is already employed by the Pine64 RockPro64 SBC.
> 
> This way, we'll be doing our best to keep the fan noise down;  of 
> course,
> it will depend on the particular heatsink and fan combo how long the 
> fan
> can be kept at the lowest speed, but we should aim at supporting as 
> many
> different cooling setups as possible, and as well as possible, out of 
> the
> box and with no additional tweaking required.
> 
> Please notice "mapX" and "mapY" as the names of the additional cooling 
> maps,
> where X and Y are simply the next lowest available indices, which is 
> pretty
> much the usual way to name the additional cooling maps.

Just checking, have you seen this?  Quite a few messages were exchanged
on the same day, so just wanted to make sure you didn't miss this one.

>>  &i2c0 {
>>  	pinctrl-names = "default";
>>  	pinctrl-0 = <&i2c0m2_xfer>;
>> @@ -738,6 +757,10 @@ regulator-state-mem {
>>  	};
>>  };
>> 
>> +&tsadc {
>> +	status = "okay";
>> +};
>> +
>>  &uart2 {
>>  	pinctrl-0 = <&uart2m0_xfer>;
>>  	status = "okay";

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B
  2024-01-27 20:27     ` Dragan Simic
@ 2024-01-28 20:08       ` Alexey Charkov
  2024-01-29  0:46         ` Dragan Simic
  0 siblings, 1 reply; 34+ messages in thread
From: Alexey Charkov @ 2024-01-28 20:08 UTC (permalink / raw)
  To: Dragan Simic
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Daniel Lezcano, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

On Sun, Jan 28, 2024 at 12:27 AM Dragan Simic <dsimic@manjaro.org> wrote:
>
> Hello Alexey,

Hello Dragan,

> On 2024-01-26 00:13, Dragan Simic wrote:
> > On 2024-01-24 21:30, Alexey Charkov wrote:
> >> This enables thermal monitoring on Radxa Rock 5B and links the PWM
> >> fan as an active cooling device managed automatically by the thermal
> >> subsystem, with a target SoC temperature of 55C
> >>
> >> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> >> ---
> >>  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 25
> >> ++++++++++++++++++++++++-
> >>  1 file changed, 24 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> >> b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> >> index 9b7bf6cec8bd..c4c94e0b6163 100644
> >> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> >> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> >> @@ -52,7 +52,7 @@ led_rgb_b {
> >>
> >>      fan: pwm-fan {
> >>              compatible = "pwm-fan";
> >> -            cooling-levels = <0 95 145 195 255>;
> >> +            cooling-levels = <0 120 150 180 210 240 255>;
> >>              fan-supply = <&vcc5v0_sys>;
> >>              pwms = <&pwm1 0 50000 0>;
> >>              #cooling-cells = <2>;
> >> @@ -180,6 +180,25 @@ &cpu_l3 {
> >>      cpu-supply = <&vdd_cpu_lit_s0>;
> >>  };
> >>
> >> +&package_thermal {
> >> +    polling-delay = <1000>;
> >> +
> >> +    trips {
> >> +            package_fan: package-fan {
> >> +                    temperature = <55000>;
> >> +                    hysteresis = <2000>;
> >> +                    type = "active";
> >> +            };
> >> +    };
> >> +
> >> +    cooling-maps {
> >> +            map-fan {
> >> +                    trip = <&package_fan>;
> >> +                    cooling-device = <&fan THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> >> +            };
> >> +    };
> >> +};
> >
> > It should be better to have two new trips and two new cooling maps
> > defined, instead of having just one trip/map pair, like this:
> >
> > &package_thermal {
> >       polling-delay = <1000>;
> >
> >       trips {
> >               package_warm: package-warm {
> >                       temperature = <55000>;
> >                       hysteresis = <2000>;
> >                       type = "active";
> >               };
> >
> >               package_hot: package-hot {
> >                       temperature = <65000>;
> >                       hysteresis = <2000>;
> >                       type = "active";
> >               };
> >       };
> >
> >       cooling-maps {
> >               mapX {
> >                       trip = <&package_warm>;
> >                       cooling-device = <&fan THERMAL_NO_LIMIT 1>;
> >               };
> >
> >               mapY {
> >                       trip = <&package_hot>;
> >                       cooling-device = <&fan 2 THERMAL_NO_LIMIT>;
> >               };
> >       };
> > };
> >
> > The idea behind this approach is to keep the fan spinning at the lowest
> > available speed until the package temperature reaches the second trip's
> > temperature level, at which point the fan starts ramping up.  An
> > approach
> > like this is already employed by the Pine64 RockPro64 SBC.
> >
> > This way, we'll be doing our best to keep the fan noise down;  of
> > course,
> > it will depend on the particular heatsink and fan combo how long the
> > fan
> > can be kept at the lowest speed, but we should aim at supporting as
> > many
> > different cooling setups as possible, and as well as possible, out of
> > the
> > box and with no additional tweaking required.
> >
> > Please notice "mapX" and "mapY" as the names of the additional cooling
> > maps,
> > where X and Y are simply the next lowest available indices, which is
> > pretty
> > much the usual way to name the additional cooling maps.
>
> Just checking, have you seen this?  Quite a few messages were exchanged
> on the same day, so just wanted to make sure you didn't miss this one.

Yes, thank you for pointing it out and following up.

I've been testing different setups to get my thoughts together on this
one. Long story short, your suggested setup indeed makes the system
quieter most of the time while still being safely far from hitting the
throttling threshold, though it appears that the main influence is
from the higher temperature value in the second trip (after which the
fan accelerates) rather than from the presence of the first trip and
the corresponding cooling map capped at the minimum-speed fan action.

In my observation, the system rarely crosses the 55C threshold under
partial load, and when the load is high (e.g. compiling stuff with 8
concurrent jobs) it takes ~2 seconds to go from below the first trip
point to above the second trip point, so the fan doesn't really get
the chance to stay at its leisurely first state.

So frankly I'm inclined to leave one trip point here, and simply
change its temperature threshold from 55C to 65C - just to keep it
simple.

What do you think?

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B
  2024-01-28 20:08       ` Alexey Charkov
@ 2024-01-29  0:46         ` Dragan Simic
  0 siblings, 0 replies; 34+ messages in thread
From: Dragan Simic @ 2024-01-29  0:46 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Daniel Lezcano, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel

Hello Alexey,

On 2024-01-28 21:08, Alexey Charkov wrote:
> On Sun, Jan 28, 2024 at 12:27 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> On 2024-01-26 00:13, Dragan Simic wrote:
>> > On 2024-01-24 21:30, Alexey Charkov wrote:
>> >> This enables thermal monitoring on Radxa Rock 5B and links the PWM
>> >> fan as an active cooling device managed automatically by the thermal
>> >> subsystem, with a target SoC temperature of 55C
>> >>
>> >> Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> >> ---
>> >>  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 25
>> >> ++++++++++++++++++++++++-
>> >>  1 file changed, 24 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> >> b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> >> index 9b7bf6cec8bd..c4c94e0b6163 100644
>> >> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> >> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
>> >> @@ -52,7 +52,7 @@ led_rgb_b {
>> >>
>> >>      fan: pwm-fan {
>> >>              compatible = "pwm-fan";
>> >> -            cooling-levels = <0 95 145 195 255>;
>> >> +            cooling-levels = <0 120 150 180 210 240 255>;
>> >>              fan-supply = <&vcc5v0_sys>;
>> >>              pwms = <&pwm1 0 50000 0>;
>> >>              #cooling-cells = <2>;
>> >> @@ -180,6 +180,25 @@ &cpu_l3 {
>> >>      cpu-supply = <&vdd_cpu_lit_s0>;
>> >>  };
>> >>
>> >> +&package_thermal {
>> >> +    polling-delay = <1000>;
>> >> +
>> >> +    trips {
>> >> +            package_fan: package-fan {
>> >> +                    temperature = <55000>;
>> >> +                    hysteresis = <2000>;
>> >> +                    type = "active";
>> >> +            };
>> >> +    };
>> >> +
>> >> +    cooling-maps {
>> >> +            map-fan {
>> >> +                    trip = <&package_fan>;
>> >> +                    cooling-device = <&fan THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> >> +            };
>> >> +    };
>> >> +};
>> >
>> > It should be better to have two new trips and two new cooling maps
>> > defined, instead of having just one trip/map pair, like this:
>> >
>> > &package_thermal {
>> >       polling-delay = <1000>;
>> >
>> >       trips {
>> >               package_warm: package-warm {
>> >                       temperature = <55000>;
>> >                       hysteresis = <2000>;
>> >                       type = "active";
>> >               };
>> >
>> >               package_hot: package-hot {
>> >                       temperature = <65000>;
>> >                       hysteresis = <2000>;
>> >                       type = "active";
>> >               };
>> >       };
>> >
>> >       cooling-maps {
>> >               mapX {
>> >                       trip = <&package_warm>;
>> >                       cooling-device = <&fan THERMAL_NO_LIMIT 1>;
>> >               };
>> >
>> >               mapY {
>> >                       trip = <&package_hot>;
>> >                       cooling-device = <&fan 2 THERMAL_NO_LIMIT>;
>> >               };
>> >       };
>> > };
>> >
>> > The idea behind this approach is to keep the fan spinning at the lowest
>> > available speed until the package temperature reaches the second trip's
>> > temperature level, at which point the fan starts ramping up.  An
>> > approach
>> > like this is already employed by the Pine64 RockPro64 SBC.
>> >
>> > This way, we'll be doing our best to keep the fan noise down;  of
>> > course, it will depend on the particular heatsink and fan combo how
>> > long the fan can be kept at the lowest speed, but we should aim at
>> > supporting as many different cooling setups as possible, and as
>> > well as possible, out of the box and with no additional tweaking
>> > required.
>> >
>> > Please notice "mapX" and "mapY" as the names of the additional
>> > cooling maps, where X and Y are simply the next lowest available
>> > indices, which is pretty much the usual way to name the additional
>> > cooling maps.
>> 
>> Just checking, have you seen this?  Quite a few messages were 
>> exchanged
>> on the same day, so just wanted to make sure you didn't miss this one.
> 
> Yes, thank you for pointing it out and following up.
> 
> I've been testing different setups to get my thoughts together on this
> one. Long story short, your suggested setup indeed makes the system
> quieter most of the time while still being safely far from hitting the
> throttling threshold, though it appears that the main influence is
> from the higher temperature value in the second trip (after which the
> fan accelerates) rather than from the presence of the first trip and
> the corresponding cooling map capped at the minimum-speed fan action.

Thank you for testing all this!

I see, but having a higher temperature defined in the second active
thermal trip is exactly the trick that should make cooling setups
  more quiet.  More precisely, the intention is to define a dual-trip
configuration that should make as many different active cooling setups
as quiet as possible, simply because some active cooling setups (and
some CPU loads) may result in crossing the second trip's temperature
less frequently than with the other setups.

> In my observation, the system rarely crosses the 55C threshold under
> partial load, and when the load is high (e.g. compiling stuff with 8
> concurrent jobs) it takes ~2 seconds to go from below the first trip
> point to above the second trip point, so the fan doesn't really get
> the chance to stay at its leisurely first state.
> 
> So frankly I'm inclined to leave one trip point here, and simply
> change its temperature threshold from 55C to 65C - just to keep it
> simple.
> 
> What do you think?

I'd much rather have two active thermal trips defined, simply because
a beefier heatsink, with much larger fin surface, may completely change
the behavior of the Rock 5B's active cooling.  Radxa already sells a
much larger heatsink, linked below, to which a fan can rather easily be
attached, or a fan can be used to provide some airflow inside the case
into which the board is mounted.

- 
https://shop.allnetchina.cn/products/rock5-b-passive-heat-sink?variant=39895250239590

In other words, having two active thermal trips at 55 oC and 65 oC does
not hurt the acoustics and the thermal performance of the active
cooling setup you're using, compared with having just one active trip
at 65 oC, while the dual-trip configuration can help with other active
cooling setups that are different from yours.  It's a win-win.

I hope you agree.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-24 20:30 [PATCH 0/4] RK3588 and Rock 5B dts additions: thermal, OPP, rfkill and fan Alexey Charkov
                   ` (2 preceding siblings ...)
  2024-01-24 20:30 ` [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B Alexey Charkov
@ 2024-01-24 20:30 ` Alexey Charkov
  2024-01-25  9:30   ` Daniel Lezcano
  3 siblings, 1 reply; 34+ messages in thread
From: Alexey Charkov @ 2024-01-24 20:30 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
  Cc: Daniel Lezcano, Dragan Simic, devicetree, linux-arm-kernel,
	linux-rockchip, linux-kernel, Alexey Charkov

By default the CPUs on RK3588 start up in a conservative performance
mode. Add frequency and voltage mappings to the device tree to enable
dynamic scaling via cpufreq

Signed-off-by: Alexey Charkov <alchark@gmail.com>
---
 arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209 ++++++++++++++++++++++++++++++
 1 file changed, 209 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
index 131b9eb21398..e605be531a0f 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
@@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
 			clocks = <&scmi_clk SCMI_CLK_CPUL>;
 			assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
 			assigned-clock-rates = <816000000>;
+			operating-points-v2 = <&cluster0_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <32768>;
 			i-cache-line-size = <64>;
@@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
 			enable-method = "psci";
 			capacity-dmips-mhz = <530>;
 			clocks = <&scmi_clk SCMI_CLK_CPUL>;
+			operating-points-v2 = <&cluster0_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <32768>;
 			i-cache-line-size = <64>;
@@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
 			enable-method = "psci";
 			capacity-dmips-mhz = <530>;
 			clocks = <&scmi_clk SCMI_CLK_CPUL>;
+			operating-points-v2 = <&cluster0_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <32768>;
 			i-cache-line-size = <64>;
@@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
 			enable-method = "psci";
 			capacity-dmips-mhz = <530>;
 			clocks = <&scmi_clk SCMI_CLK_CPUL>;
+			operating-points-v2 = <&cluster0_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <32768>;
 			i-cache-line-size = <64>;
@@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
 			clocks = <&scmi_clk SCMI_CLK_CPUB01>;
 			assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
 			assigned-clock-rates = <816000000>;
+			operating-points-v2 = <&cluster1_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <65536>;
 			i-cache-line-size = <64>;
@@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
 			enable-method = "psci";
 			capacity-dmips-mhz = <1024>;
 			clocks = <&scmi_clk SCMI_CLK_CPUB01>;
+			operating-points-v2 = <&cluster1_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <65536>;
 			i-cache-line-size = <64>;
@@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
 			clocks = <&scmi_clk SCMI_CLK_CPUB23>;
 			assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
 			assigned-clock-rates = <816000000>;
+			operating-points-v2 = <&cluster2_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <65536>;
 			i-cache-line-size = <64>;
@@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
 			enable-method = "psci";
 			capacity-dmips-mhz = <1024>;
 			clocks = <&scmi_clk SCMI_CLK_CPUB23>;
+			operating-points-v2 = <&cluster2_opp_table>;
 			cpu-idle-states = <&CPU_SLEEP>;
 			i-cache-size = <65536>;
 			i-cache-line-size = <64>;
@@ -348,6 +356,207 @@ l3_cache: l3-cache {
 		};
 	};
 
+	cluster0_opp_table: opp-table-cluster0 {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		opp-408000000 {
+			opp-hz = /bits/ 64 <408000000>;
+			opp-microvolt = <675000 675000 950000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-600000000 {
+			opp-hz = /bits/ 64 <600000000>;
+			opp-microvolt = <675000 675000 950000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-816000000 {
+			opp-hz = /bits/ 64 <816000000>;
+			opp-microvolt = <675000 675000 950000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1008000000 {
+			opp-hz = /bits/ 64 <1008000000>;
+			opp-microvolt = <675000 675000 950000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1200000000 {
+			opp-hz = /bits/ 64 <1200000000>;
+			opp-microvolt = <712500 712500 950000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1416000000 {
+			opp-hz = /bits/ 64 <1416000000>;
+			opp-microvolt = <762500 762500 950000>;
+			clock-latency-ns = <40000>;
+			opp-suspend;
+		};
+		opp-1608000000 {
+			opp-hz = /bits/ 64 <1608000000>;
+			opp-microvolt = <850000 850000 950000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1800000000 {
+			opp-hz = /bits/ 64 <1800000000>;
+			opp-microvolt = <950000 950000 950000>;
+			clock-latency-ns = <40000>;
+		};
+	};
+
+	cluster1_opp_table: opp-table-cluster1 {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		opp-408000000 {
+			opp-hz = /bits/ 64 <408000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+			opp-suspend;
+		};
+		opp-600000000 {
+			opp-hz = /bits/ 64 <600000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-816000000 {
+			opp-hz = /bits/ 64 <816000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1008000000 {
+			opp-hz = /bits/ 64 <1008000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1200000000 {
+			opp-hz = /bits/ 64 <1200000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1416000000 {
+			opp-hz = /bits/ 64 <1416000000>;
+			opp-microvolt = <725000 725000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1608000000 {
+			opp-hz = /bits/ 64 <1608000000>;
+			opp-microvolt = <762500 762500 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1800000000 {
+			opp-hz = /bits/ 64 <1800000000>;
+			opp-microvolt = <850000 850000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2016000000 {
+			opp-hz = /bits/ 64 <2016000000>;
+			opp-microvolt = <925000 925000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2208000000 {
+			opp-hz = /bits/ 64 <2208000000>;
+			opp-microvolt = <987500 987500 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2256000000 {
+			opp-hz = /bits/ 64 <2256000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2304000000 {
+			opp-hz = /bits/ 64 <2304000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2352000000 {
+			opp-hz = /bits/ 64 <2352000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2400000000 {
+			opp-hz = /bits/ 64 <2400000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+	};
+
+	cluster2_opp_table: opp-table-cluster2 {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		opp-408000000 {
+			opp-hz = /bits/ 64 <408000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+			opp-suspend;
+		};
+		opp-600000000 {
+			opp-hz = /bits/ 64 <600000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-816000000 {
+			opp-hz = /bits/ 64 <816000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1008000000 {
+			opp-hz = /bits/ 64 <1008000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1200000000 {
+			opp-hz = /bits/ 64 <1200000000>;
+			opp-microvolt = <675000 675000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1416000000 {
+			opp-hz = /bits/ 64 <1416000000>;
+			opp-microvolt = <725000 725000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1608000000 {
+			opp-hz = /bits/ 64 <1608000000>;
+			opp-microvolt = <762500 762500 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-1800000000 {
+			opp-hz = /bits/ 64 <1800000000>;
+			opp-microvolt = <850000 850000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2016000000 {
+			opp-hz = /bits/ 64 <2016000000>;
+			opp-microvolt = <925000 925000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2208000000 {
+			opp-hz = /bits/ 64 <2208000000>;
+			opp-microvolt = <987500 987500 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2256000000 {
+			opp-hz = /bits/ 64 <2256000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2304000000 {
+			opp-hz = /bits/ 64 <2304000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2352000000 {
+			opp-hz = /bits/ 64 <2352000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+		opp-2400000000 {
+			opp-hz = /bits/ 64 <2400000000>;
+			opp-microvolt = <1000000 1000000 1000000>;
+			clock-latency-ns = <40000>;
+		};
+	};
+
 	firmware {
 		optee: optee {
 			compatible = "linaro,optee-tz";

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-24 20:30 ` [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588 Alexey Charkov
@ 2024-01-25  9:30   ` Daniel Lezcano
  2024-01-25 10:17     ` Alexey Charkov
  2024-01-26  6:32     ` Dragan Simic
  0 siblings, 2 replies; 34+ messages in thread
From: Daniel Lezcano @ 2024-01-25  9:30 UTC (permalink / raw)
  To: Alexey Charkov, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner
  Cc: Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar


Hi Alexey,

Adding Viresh

On 24/01/2024 21:30, Alexey Charkov wrote:
> By default the CPUs on RK3588 start up in a conservative performance
> mode. Add frequency and voltage mappings to the device tree to enable
> dynamic scaling via cpufreq
> 
> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> ---
>   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209 ++++++++++++++++++++++++++++++
>   1 file changed, 209 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 131b9eb21398..e605be531a0f 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
>   			assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
>   			assigned-clock-rates = <816000000>;
> +			operating-points-v2 = <&cluster0_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <32768>;
>   			i-cache-line-size = <64>;
> @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
>   			enable-method = "psci";
>   			capacity-dmips-mhz = <530>;
>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
> +			operating-points-v2 = <&cluster0_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <32768>;
>   			i-cache-line-size = <64>;
> @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
>   			enable-method = "psci";
>   			capacity-dmips-mhz = <530>;
>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
> +			operating-points-v2 = <&cluster0_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <32768>;
>   			i-cache-line-size = <64>;
> @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
>   			enable-method = "psci";
>   			capacity-dmips-mhz = <530>;
>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
> +			operating-points-v2 = <&cluster0_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <32768>;
>   			i-cache-line-size = <64>;
> @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
>   			clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>   			assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>   			assigned-clock-rates = <816000000>;
> +			operating-points-v2 = <&cluster1_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <65536>;
>   			i-cache-line-size = <64>;
> @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
>   			enable-method = "psci";
>   			capacity-dmips-mhz = <1024>;
>   			clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> +			operating-points-v2 = <&cluster1_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <65536>;
>   			i-cache-line-size = <64>;
> @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
>   			clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>   			assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>   			assigned-clock-rates = <816000000>;
> +			operating-points-v2 = <&cluster2_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <65536>;
>   			i-cache-line-size = <64>;
> @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
>   			enable-method = "psci";
>   			capacity-dmips-mhz = <1024>;
>   			clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> +			operating-points-v2 = <&cluster2_opp_table>;
>   			cpu-idle-states = <&CPU_SLEEP>;
>   			i-cache-size = <65536>;
>   			i-cache-line-size = <64>;
> @@ -348,6 +356,207 @@ l3_cache: l3-cache {
>   		};
>   	};
>   
> +	cluster0_opp_table: opp-table-cluster0 {
> +		compatible = "operating-points-v2";
> +		opp-shared;
> +
> +		opp-408000000 {
> +			opp-hz = /bits/ 64 <408000000>;
> +			opp-microvolt = <675000 675000 950000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-600000000 {
> +			opp-hz = /bits/ 64 <600000000>;
> +			opp-microvolt = <675000 675000 950000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-816000000 {
> +			opp-hz = /bits/ 64 <816000000>;
> +			opp-microvolt = <675000 675000 950000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1008000000 {
> +			opp-hz = /bits/ 64 <1008000000>;
> +			opp-microvolt = <675000 675000 950000>;
> +			clock-latency-ns = <40000>;
> +		};

It is not useful to introduce OPP with the same voltage. There is no 
gain in terms of energy efficiency as the compute capacity is linearly 
tied with power consumption (P=CxFxV²) in this case.

For example, opp-408 consumes 2 bogoWatts and opp-816 consumes 4 
bogoWatts (because of the same voltage).

For a workload, opp-408 takes 10 sec and opp-816 takes 5 sec because it 
is twice faster.

The energy consumption is:

opp-408 = 10 x 2 = 20 BogoJoules
opp-816 = 5 x 4 = 20 BogoJoules


> +		opp-1200000000 {
> +			opp-hz = /bits/ 64 <1200000000>;
> +			opp-microvolt = <712500 712500 950000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1416000000 {
> +			opp-hz = /bits/ 64 <1416000000>;
> +			opp-microvolt = <762500 762500 950000>;
> +			clock-latency-ns = <40000>;
> +			opp-suspend;
> +		};
> +		opp-1608000000 {
> +			opp-hz = /bits/ 64 <1608000000>;
> +			opp-microvolt = <850000 850000 950000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1800000000 {
> +			opp-hz = /bits/ 64 <1800000000>;
> +			opp-microvolt = <950000 950000 950000>;
> +			clock-latency-ns = <40000>;
> +		};
> +	};
> +
> +	cluster1_opp_table: opp-table-cluster1 {
> +		compatible = "operating-points-v2";
> +		opp-shared;
> +
> +		opp-408000000 {
> +			opp-hz = /bits/ 64 <408000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +			opp-suspend;
> +		};
> +		opp-600000000 {
> +			opp-hz = /bits/ 64 <600000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-816000000 {
> +			opp-hz = /bits/ 64 <816000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1008000000 {
> +			opp-hz = /bits/ 64 <1008000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};

same comment

> +		opp-1200000000 {
> +			opp-hz = /bits/ 64 <1200000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1416000000 {
> +			opp-hz = /bits/ 64 <1416000000>;
> +			opp-microvolt = <725000 725000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1608000000 {
> +			opp-hz = /bits/ 64 <1608000000>;
> +			opp-microvolt = <762500 762500 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1800000000 {
> +			opp-hz = /bits/ 64 <1800000000>;
> +			opp-microvolt = <850000 850000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2016000000 {
> +			opp-hz = /bits/ 64 <2016000000>;
> +			opp-microvolt = <925000 925000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2208000000 {
> +			opp-hz = /bits/ 64 <2208000000>;
> +			opp-microvolt = <987500 987500 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2256000000 {
> +			opp-hz = /bits/ 64 <2256000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2304000000 {
> +			opp-hz = /bits/ 64 <2304000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2352000000 {
> +			opp-hz = /bits/ 64 <2352000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2400000000 {
> +			opp-hz = /bits/ 64 <2400000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};

Same comment

> +	};
> +
> +	cluster2_opp_table: opp-table-cluster2 {
> +		compatible = "operating-points-v2";
> +		opp-shared;
> +
> +		opp-408000000 {
> +			opp-hz = /bits/ 64 <408000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +			opp-suspend;
> +		};
> +		opp-600000000 {
> +			opp-hz = /bits/ 64 <600000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-816000000 {
> +			opp-hz = /bits/ 64 <816000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1008000000 {
> +			opp-hz = /bits/ 64 <1008000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1200000000 {
> +			opp-hz = /bits/ 64 <1200000000>;
> +			opp-microvolt = <675000 675000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1416000000 {
> +			opp-hz = /bits/ 64 <1416000000>;
> +			opp-microvolt = <725000 725000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1608000000 {
> +			opp-hz = /bits/ 64 <1608000000>;
> +			opp-microvolt = <762500 762500 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-1800000000 {
> +			opp-hz = /bits/ 64 <1800000000>;
> +			opp-microvolt = <850000 850000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2016000000 {
> +			opp-hz = /bits/ 64 <2016000000>;
> +			opp-microvolt = <925000 925000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2208000000 {
> +			opp-hz = /bits/ 64 <2208000000>;
> +			opp-microvolt = <987500 987500 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2256000000 {
> +			opp-hz = /bits/ 64 <2256000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2304000000 {
> +			opp-hz = /bits/ 64 <2304000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2352000000 {
> +			opp-hz = /bits/ 64 <2352000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};
> +		opp-2400000000 {
> +			opp-hz = /bits/ 64 <2400000000>;
> +			opp-microvolt = <1000000 1000000 1000000>;
> +			clock-latency-ns = <40000>;
> +		};

Same comment

> +	};
> +
>   	firmware {
>   		optee: optee {
>   			compatible = "linaro,optee-tz";
> 

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-25  9:30   ` Daniel Lezcano
@ 2024-01-25 10:17     ` Alexey Charkov
  2024-01-26  6:32     ` Dragan Simic
  1 sibling, 0 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-25 10:17 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Dragan Simic, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On Thu, Jan 25, 2024 at 1:30 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
>
> Hi Alexey,
>
> Adding Viresh
>
> On 24/01/2024 21:30, Alexey Charkov wrote:
> > By default the CPUs on RK3588 start up in a conservative performance
> > mode. Add frequency and voltage mappings to the device tree to enable
> > dynamic scaling via cpufreq
> >
> > Signed-off-by: Alexey Charkov <alchark@gmail.com>
> > ---
> >   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209 ++++++++++++++++++++++++++++++
> >   1 file changed, 209 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > index 131b9eb21398..e605be531a0f 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
> >                       clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >                       assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >                       assigned-clock-rates = <816000000>;
> > +                     operating-points-v2 = <&cluster0_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <32768>;
> >                       i-cache-line-size = <64>;
> > @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
> >                       enable-method = "psci";
> >                       capacity-dmips-mhz = <530>;
> >                       clocks = <&scmi_clk SCMI_CLK_CPUL>;
> > +                     operating-points-v2 = <&cluster0_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <32768>;
> >                       i-cache-line-size = <64>;
> > @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
> >                       enable-method = "psci";
> >                       capacity-dmips-mhz = <530>;
> >                       clocks = <&scmi_clk SCMI_CLK_CPUL>;
> > +                     operating-points-v2 = <&cluster0_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <32768>;
> >                       i-cache-line-size = <64>;
> > @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
> >                       enable-method = "psci";
> >                       capacity-dmips-mhz = <530>;
> >                       clocks = <&scmi_clk SCMI_CLK_CPUL>;
> > +                     operating-points-v2 = <&cluster0_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <32768>;
> >                       i-cache-line-size = <64>;
> > @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
> >                       clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >                       assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >                       assigned-clock-rates = <816000000>;
> > +                     operating-points-v2 = <&cluster1_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <65536>;
> >                       i-cache-line-size = <64>;
> > @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
> >                       enable-method = "psci";
> >                       capacity-dmips-mhz = <1024>;
> >                       clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> > +                     operating-points-v2 = <&cluster1_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <65536>;
> >                       i-cache-line-size = <64>;
> > @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
> >                       clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >                       assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >                       assigned-clock-rates = <816000000>;
> > +                     operating-points-v2 = <&cluster2_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <65536>;
> >                       i-cache-line-size = <64>;
> > @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
> >                       enable-method = "psci";
> >                       capacity-dmips-mhz = <1024>;
> >                       clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> > +                     operating-points-v2 = <&cluster2_opp_table>;
> >                       cpu-idle-states = <&CPU_SLEEP>;
> >                       i-cache-size = <65536>;
> >                       i-cache-line-size = <64>;
> > @@ -348,6 +356,207 @@ l3_cache: l3-cache {
> >               };
> >       };
> >
> > +     cluster0_opp_table: opp-table-cluster0 {
> > +             compatible = "operating-points-v2";
> > +             opp-shared;
> > +
> > +             opp-408000000 {
> > +                     opp-hz = /bits/ 64 <408000000>;
> > +                     opp-microvolt = <675000 675000 950000>;
> > +                     clock-latency-ns = <40000>;
> > +             };
> > +             opp-600000000 {
> > +                     opp-hz = /bits/ 64 <600000000>;
> > +                     opp-microvolt = <675000 675000 950000>;
> > +                     clock-latency-ns = <40000>;
> > +             };
> > +             opp-816000000 {
> > +                     opp-hz = /bits/ 64 <816000000>;
> > +                     opp-microvolt = <675000 675000 950000>;
> > +                     clock-latency-ns = <40000>;
> > +             };
> > +             opp-1008000000 {
> > +                     opp-hz = /bits/ 64 <1008000000>;
> > +                     opp-microvolt = <675000 675000 950000>;
> > +                     clock-latency-ns = <40000>;
> > +             };
>
> It is not useful to introduce OPP with the same voltage. There is no
> gain in terms of energy efficiency as the compute capacity is linearly
> tied with power consumption (P=CxFxV²) in this case.
>
> For example, opp-408 consumes 2 bogoWatts and opp-816 consumes 4
> bogoWatts (because of the same voltage).
>
> For a workload, opp-408 takes 10 sec and opp-816 takes 5 sec because it
> is twice faster.
>
> The energy consumption is:
>
> opp-408 = 10 x 2 = 20 BogoJoules
> opp-816 = 5 x 4 = 20 BogoJoules

I see, thank you. Will drop all "lower frequency - same voltage"
instances and resubmit in the next iteration.

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-25  9:30   ` Daniel Lezcano
  2024-01-25 10:17     ` Alexey Charkov
@ 2024-01-26  6:32     ` Dragan Simic
  2024-01-26  6:44       ` Alexey Charkov
  1 sibling, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-26  6:32 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Alexey Charkov, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

Hello Daniel,

On 2024-01-25 10:30, Daniel Lezcano wrote:
> On 24/01/2024 21:30, Alexey Charkov wrote:
>> By default the CPUs on RK3588 start up in a conservative performance
>> mode. Add frequency and voltage mappings to the device tree to enable
>> dynamic scaling via cpufreq
>> 
>> Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> ---
>>   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209 
>> ++++++++++++++++++++++++++++++
>>   1 file changed, 209 insertions(+)
>> 
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi 
>> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> index 131b9eb21398..e605be531a0f 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
>>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
>>   			assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
>>   			assigned-clock-rates = <816000000>;
>> +			operating-points-v2 = <&cluster0_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <32768>;
>>   			i-cache-line-size = <64>;
>> @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
>>   			enable-method = "psci";
>>   			capacity-dmips-mhz = <530>;
>>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> +			operating-points-v2 = <&cluster0_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <32768>;
>>   			i-cache-line-size = <64>;
>> @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
>>   			enable-method = "psci";
>>   			capacity-dmips-mhz = <530>;
>>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> +			operating-points-v2 = <&cluster0_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <32768>;
>>   			i-cache-line-size = <64>;
>> @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
>>   			enable-method = "psci";
>>   			capacity-dmips-mhz = <530>;
>>   			clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> +			operating-points-v2 = <&cluster0_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <32768>;
>>   			i-cache-line-size = <64>;
>> @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
>>   			clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>>   			assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>>   			assigned-clock-rates = <816000000>;
>> +			operating-points-v2 = <&cluster1_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <65536>;
>>   			i-cache-line-size = <64>;
>> @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
>>   			enable-method = "psci";
>>   			capacity-dmips-mhz = <1024>;
>>   			clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>> +			operating-points-v2 = <&cluster1_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <65536>;
>>   			i-cache-line-size = <64>;
>> @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
>>   			clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>>   			assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>>   			assigned-clock-rates = <816000000>;
>> +			operating-points-v2 = <&cluster2_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <65536>;
>>   			i-cache-line-size = <64>;
>> @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
>>   			enable-method = "psci";
>>   			capacity-dmips-mhz = <1024>;
>>   			clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>> +			operating-points-v2 = <&cluster2_opp_table>;
>>   			cpu-idle-states = <&CPU_SLEEP>;
>>   			i-cache-size = <65536>;
>>   			i-cache-line-size = <64>;
>> @@ -348,6 +356,207 @@ l3_cache: l3-cache {
>>   		};
>>   	};
>>   +	cluster0_opp_table: opp-table-cluster0 {
>> +		compatible = "operating-points-v2";
>> +		opp-shared;
>> +
>> +		opp-408000000 {
>> +			opp-hz = /bits/ 64 <408000000>;
>> +			opp-microvolt = <675000 675000 950000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-600000000 {
>> +			opp-hz = /bits/ 64 <600000000>;
>> +			opp-microvolt = <675000 675000 950000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-816000000 {
>> +			opp-hz = /bits/ 64 <816000000>;
>> +			opp-microvolt = <675000 675000 950000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1008000000 {
>> +			opp-hz = /bits/ 64 <1008000000>;
>> +			opp-microvolt = <675000 675000 950000>;
>> +			clock-latency-ns = <40000>;
>> +		};
> 
> It is not useful to introduce OPP with the same voltage. There is no
> gain in terms of energy efficiency as the compute capacity is linearly
> tied with power consumption (P=CxFxV²) in this case.
> 
> For example, opp-408 consumes 2 bogoWatts and opp-816 consumes 4
> bogoWatts (because of the same voltage).
> 
> For a workload, opp-408 takes 10 sec and opp-816 takes 5 sec because
> it is twice faster.
> 
> The energy consumption is:
> 
> opp-408 = 10 x 2 = 20 BogoJoules
> opp-816 = 5 x 4 = 20 BogoJoules

I'd respectfully disagree that including multiple OPPs with the same 
voltage
but different frequencies isn't useful.  Please allow me to explain.

See, the total amount of consumed energy is, in general, the same for 
such
OPPs and the same CPU task(s), if we ignore the static leakage current 
and
such stuff, which isn't important here.  Though, the emphasis here is on
"total", i.e. without taking into account the actual amount of time 
required
for the exemplified CPU task(s) to complete.  If the total amount of 
time
is quite short, we aren't going to heat up the package and the board 
enough
to hit the CPU thermal throttling;  this approach is also sometimes 
referred
to as "race to idle", which is actually quite effective for 
battery-powered
mobile devices that tend to load their CPU cores in bursts, while 
remaining
kind of inactive for the remaining time.

However, if the CPU task(s) last long enough to actually saturate the 
thermal
capacities of the package and the board or the device, we're getting 
into the
CPU throttling territory, in which running the CPU cores slower, but 
still as
fast as possible, may actually be beneficial for the overall CPU 
performance.
By running the CPU cores slower, we're lowering the power and 
"spreading" the
total energy consumption over time, i.e. we're making some time to allow 
the
generated heat to dissipate into the surroundings.  As we know, having 
more
energy consumed by the SoC means more heat generated by the SoC, but the
resulting temperature of the SoC depends on how fast the energy is 
consumed,
which equals to how fast the CPUs run;  of course, all that is valid 
under
the reasonable assumption that the entire cooling setup, including the 
board
surroundings, remains unchanged all the time.

Having all that in mind, having a few OPPs with the same voltage but 
different
frequencies can actually help us achieve better CPU performance.  That 
way,
throttling won't have to slow the CPUs more than it's actually needed to 
hit
and maintain the desired thermal trip temperatures.

>> +		opp-1200000000 {
>> +			opp-hz = /bits/ 64 <1200000000>;
>> +			opp-microvolt = <712500 712500 950000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1416000000 {
>> +			opp-hz = /bits/ 64 <1416000000>;
>> +			opp-microvolt = <762500 762500 950000>;
>> +			clock-latency-ns = <40000>;
>> +			opp-suspend;
>> +		};
>> +		opp-1608000000 {
>> +			opp-hz = /bits/ 64 <1608000000>;
>> +			opp-microvolt = <850000 850000 950000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1800000000 {
>> +			opp-hz = /bits/ 64 <1800000000>;
>> +			opp-microvolt = <950000 950000 950000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +	};
>> +
>> +	cluster1_opp_table: opp-table-cluster1 {
>> +		compatible = "operating-points-v2";
>> +		opp-shared;
>> +
>> +		opp-408000000 {
>> +			opp-hz = /bits/ 64 <408000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +			opp-suspend;
>> +		};
>> +		opp-600000000 {
>> +			opp-hz = /bits/ 64 <600000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-816000000 {
>> +			opp-hz = /bits/ 64 <816000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1008000000 {
>> +			opp-hz = /bits/ 64 <1008000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
> 
> same comment
> 
>> +		opp-1200000000 {
>> +			opp-hz = /bits/ 64 <1200000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1416000000 {
>> +			opp-hz = /bits/ 64 <1416000000>;
>> +			opp-microvolt = <725000 725000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1608000000 {
>> +			opp-hz = /bits/ 64 <1608000000>;
>> +			opp-microvolt = <762500 762500 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1800000000 {
>> +			opp-hz = /bits/ 64 <1800000000>;
>> +			opp-microvolt = <850000 850000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2016000000 {
>> +			opp-hz = /bits/ 64 <2016000000>;
>> +			opp-microvolt = <925000 925000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2208000000 {
>> +			opp-hz = /bits/ 64 <2208000000>;
>> +			opp-microvolt = <987500 987500 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2256000000 {
>> +			opp-hz = /bits/ 64 <2256000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2304000000 {
>> +			opp-hz = /bits/ 64 <2304000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2352000000 {
>> +			opp-hz = /bits/ 64 <2352000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2400000000 {
>> +			opp-hz = /bits/ 64 <2400000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
> 
> Same comment
> 
>> +	};
>> +
>> +	cluster2_opp_table: opp-table-cluster2 {
>> +		compatible = "operating-points-v2";
>> +		opp-shared;
>> +
>> +		opp-408000000 {
>> +			opp-hz = /bits/ 64 <408000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +			opp-suspend;
>> +		};
>> +		opp-600000000 {
>> +			opp-hz = /bits/ 64 <600000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-816000000 {
>> +			opp-hz = /bits/ 64 <816000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1008000000 {
>> +			opp-hz = /bits/ 64 <1008000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1200000000 {
>> +			opp-hz = /bits/ 64 <1200000000>;
>> +			opp-microvolt = <675000 675000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1416000000 {
>> +			opp-hz = /bits/ 64 <1416000000>;
>> +			opp-microvolt = <725000 725000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1608000000 {
>> +			opp-hz = /bits/ 64 <1608000000>;
>> +			opp-microvolt = <762500 762500 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-1800000000 {
>> +			opp-hz = /bits/ 64 <1800000000>;
>> +			opp-microvolt = <850000 850000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2016000000 {
>> +			opp-hz = /bits/ 64 <2016000000>;
>> +			opp-microvolt = <925000 925000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2208000000 {
>> +			opp-hz = /bits/ 64 <2208000000>;
>> +			opp-microvolt = <987500 987500 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2256000000 {
>> +			opp-hz = /bits/ 64 <2256000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2304000000 {
>> +			opp-hz = /bits/ 64 <2304000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2352000000 {
>> +			opp-hz = /bits/ 64 <2352000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
>> +		opp-2400000000 {
>> +			opp-hz = /bits/ 64 <2400000000>;
>> +			opp-microvolt = <1000000 1000000 1000000>;
>> +			clock-latency-ns = <40000>;
>> +		};
> 
> Same comment
> 
>> +	};
>> +
>>   	firmware {
>>   		optee: optee {
>>   			compatible = "linaro,optee-tz";
>> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26  6:32     ` Dragan Simic
@ 2024-01-26  6:44       ` Alexey Charkov
  2024-01-26  7:04         ` Dragan Simic
  0 siblings, 1 reply; 34+ messages in thread
From: Alexey Charkov @ 2024-01-26  6:44 UTC (permalink / raw)
  To: Dragan Simic
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org> wrote:
>
> Hello Daniel,
>
> On 2024-01-25 10:30, Daniel Lezcano wrote:
> > On 24/01/2024 21:30, Alexey Charkov wrote:
> >> By default the CPUs on RK3588 start up in a conservative performance
> >> mode. Add frequency and voltage mappings to the device tree to enable
> >> dynamic scaling via cpufreq
> >>
> >> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> >> ---
> >>   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209
> >> ++++++++++++++++++++++++++++++
> >>   1 file changed, 209 insertions(+)
> >>
> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> index 131b9eb21398..e605be531a0f 100644
> >> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >>                      assigned-clock-rates = <816000000>;
> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <32768>;
> >>                      i-cache-line-size = <64>;
> >> @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
> >>                      enable-method = "psci";
> >>                      capacity-dmips-mhz = <530>;
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <32768>;
> >>                      i-cache-line-size = <64>;
> >> @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
> >>                      enable-method = "psci";
> >>                      capacity-dmips-mhz = <530>;
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <32768>;
> >>                      i-cache-line-size = <64>;
> >> @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
> >>                      enable-method = "psci";
> >>                      capacity-dmips-mhz = <530>;
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <32768>;
> >>                      i-cache-line-size = <64>;
> >> @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >>                      assigned-clock-rates = <816000000>;
> >> +                    operating-points-v2 = <&cluster1_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <65536>;
> >>                      i-cache-line-size = <64>;
> >> @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
> >>                      enable-method = "psci";
> >>                      capacity-dmips-mhz = <1024>;
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >> +                    operating-points-v2 = <&cluster1_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <65536>;
> >>                      i-cache-line-size = <64>;
> >> @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >>                      assigned-clock-rates = <816000000>;
> >> +                    operating-points-v2 = <&cluster2_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <65536>;
> >>                      i-cache-line-size = <64>;
> >> @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
> >>                      enable-method = "psci";
> >>                      capacity-dmips-mhz = <1024>;
> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >> +                    operating-points-v2 = <&cluster2_opp_table>;
> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >>                      i-cache-size = <65536>;
> >>                      i-cache-line-size = <64>;
> >> @@ -348,6 +356,207 @@ l3_cache: l3-cache {
> >>              };
> >>      };
> >>   +  cluster0_opp_table: opp-table-cluster0 {
> >> +            compatible = "operating-points-v2";
> >> +            opp-shared;
> >> +
> >> +            opp-408000000 {
> >> +                    opp-hz = /bits/ 64 <408000000>;
> >> +                    opp-microvolt = <675000 675000 950000>;
> >> +                    clock-latency-ns = <40000>;
> >> +            };
> >> +            opp-600000000 {
> >> +                    opp-hz = /bits/ 64 <600000000>;
> >> +                    opp-microvolt = <675000 675000 950000>;
> >> +                    clock-latency-ns = <40000>;
> >> +            };
> >> +            opp-816000000 {
> >> +                    opp-hz = /bits/ 64 <816000000>;
> >> +                    opp-microvolt = <675000 675000 950000>;
> >> +                    clock-latency-ns = <40000>;
> >> +            };
> >> +            opp-1008000000 {
> >> +                    opp-hz = /bits/ 64 <1008000000>;
> >> +                    opp-microvolt = <675000 675000 950000>;
> >> +                    clock-latency-ns = <40000>;
> >> +            };
> >
> > It is not useful to introduce OPP with the same voltage. There is no
> > gain in terms of energy efficiency as the compute capacity is linearly
> > tied with power consumption (P=CxFxV²) in this case.
> >
> > For example, opp-408 consumes 2 bogoWatts and opp-816 consumes 4
> > bogoWatts (because of the same voltage).
> >
> > For a workload, opp-408 takes 10 sec and opp-816 takes 5 sec because
> > it is twice faster.
> >
> > The energy consumption is:
> >
> > opp-408 = 10 x 2 = 20 BogoJoules
> > opp-816 = 5 x 4 = 20 BogoJoules
>
> I'd respectfully disagree that including multiple OPPs with the same
> voltage
> but different frequencies isn't useful.  Please allow me to explain.
>
> See, the total amount of consumed energy is, in general, the same for
> such
> OPPs and the same CPU task(s), if we ignore the static leakage current
> and
> such stuff, which isn't important here.  Though, the emphasis here is on
> "total", i.e. without taking into account the actual amount of time
> required
> for the exemplified CPU task(s) to complete.  If the total amount of
> time
> is quite short, we aren't going to heat up the package and the board
> enough
> to hit the CPU thermal throttling;  this approach is also sometimes
> referred
> to as "race to idle", which is actually quite effective for
> battery-powered
> mobile devices that tend to load their CPU cores in bursts, while
> remaining
> kind of inactive for the remaining time.
>
> However, if the CPU task(s) last long enough to actually saturate the
> thermal
> capacities of the package and the board or the device, we're getting
> into the
> CPU throttling territory, in which running the CPU cores slower, but
> still as
> fast as possible, may actually be beneficial for the overall CPU
> performance.
> By running the CPU cores slower, we're lowering the power and
> "spreading" the
> total energy consumption over time, i.e. we're making some time to allow
> the
> generated heat to dissipate into the surroundings.  As we know, having
> more
> energy consumed by the SoC means more heat generated by the SoC, but the
> resulting temperature of the SoC depends on how fast the energy is
> consumed,
> which equals to how fast the CPUs run;  of course, all that is valid
> under
> the reasonable assumption that the entire cooling setup, including the
> board
> surroundings, remains unchanged all the time.

On the other hand, convective heat dissipation is approximately
proportional to the temperature differential, therefore heating up the
core to a higher temperature over a shorter period of time would let
it dissipate the same joule amount faster. Given that total joules
generated for a particular load are approximately the same for
different frequencies as long as voltage remains the same (as Daniel
pointed out), higher frequency seems to lead to better heat transfer
to the environment for the same load. And also the task completes
sooner, which is probably always good, ceteris paribus.

Not sure how that all changes when throttling enters the game though :)

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26  6:44       ` Alexey Charkov
@ 2024-01-26  7:04         ` Dragan Simic
  2024-01-26  7:30           ` Alexey Charkov
  0 siblings, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-26  7:04 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

Hello Alexey,

On 2024-01-26 07:44, Alexey Charkov wrote:
> On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> On 2024-01-25 10:30, Daniel Lezcano wrote:
>> > On 24/01/2024 21:30, Alexey Charkov wrote:
>> >> By default the CPUs on RK3588 start up in a conservative performance
>> >> mode. Add frequency and voltage mappings to the device tree to enable
>> >> dynamic scaling via cpufreq
>> >>
>> >> Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> >> ---
>> >>   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209
>> >> ++++++++++++++++++++++++++++++
>> >>   1 file changed, 209 insertions(+)
>> >>
>> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> index 131b9eb21398..e605be531a0f 100644
>> >> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >>                      assigned-clock-rates = <816000000>;
>> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <32768>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
>> >>                      enable-method = "psci";
>> >>                      capacity-dmips-mhz = <530>;
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <32768>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
>> >>                      enable-method = "psci";
>> >>                      capacity-dmips-mhz = <530>;
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <32768>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
>> >>                      enable-method = "psci";
>> >>                      capacity-dmips-mhz = <530>;
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <32768>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>> >>                      assigned-clock-rates = <816000000>;
>> >> +                    operating-points-v2 = <&cluster1_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <65536>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
>> >>                      enable-method = "psci";
>> >>                      capacity-dmips-mhz = <1024>;
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>> >> +                    operating-points-v2 = <&cluster1_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <65536>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>> >>                      assigned-clock-rates = <816000000>;
>> >> +                    operating-points-v2 = <&cluster2_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <65536>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
>> >>                      enable-method = "psci";
>> >>                      capacity-dmips-mhz = <1024>;
>> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>> >> +                    operating-points-v2 = <&cluster2_opp_table>;
>> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >>                      i-cache-size = <65536>;
>> >>                      i-cache-line-size = <64>;
>> >> @@ -348,6 +356,207 @@ l3_cache: l3-cache {
>> >>              };
>> >>      };
>> >>   +  cluster0_opp_table: opp-table-cluster0 {
>> >> +            compatible = "operating-points-v2";
>> >> +            opp-shared;
>> >> +
>> >> +            opp-408000000 {
>> >> +                    opp-hz = /bits/ 64 <408000000>;
>> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> +                    clock-latency-ns = <40000>;
>> >> +            };
>> >> +            opp-600000000 {
>> >> +                    opp-hz = /bits/ 64 <600000000>;
>> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> +                    clock-latency-ns = <40000>;
>> >> +            };
>> >> +            opp-816000000 {
>> >> +                    opp-hz = /bits/ 64 <816000000>;
>> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> +                    clock-latency-ns = <40000>;
>> >> +            };
>> >> +            opp-1008000000 {
>> >> +                    opp-hz = /bits/ 64 <1008000000>;
>> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> +                    clock-latency-ns = <40000>;
>> >> +            };
>> >
>> > It is not useful to introduce OPP with the same voltage. There is no
>> > gain in terms of energy efficiency as the compute capacity is linearly
>> > tied with power consumption (P=CxFxV²) in this case.
>> >
>> > For example, opp-408 consumes 2 bogoWatts and opp-816 consumes 4
>> > bogoWatts (because of the same voltage).
>> >
>> > For a workload, opp-408 takes 10 sec and opp-816 takes 5 sec because
>> > it is twice faster.
>> >
>> > The energy consumption is:
>> >
>> > opp-408 = 10 x 2 = 20 BogoJoules
>> > opp-816 = 5 x 4 = 20 BogoJoules
>> 
>> I'd respectfully disagree that including multiple OPPs with the same
>> voltage
>> but different frequencies isn't useful.  Please allow me to explain.
>> 
>> See, the total amount of consumed energy is, in general, the same for
>> such
>> OPPs and the same CPU task(s), if we ignore the static leakage current
>> and
>> such stuff, which isn't important here.  Though, the emphasis here is 
>> on
>> "total", i.e. without taking into account the actual amount of time
>> required
>> for the exemplified CPU task(s) to complete.  If the total amount of
>> time
>> is quite short, we aren't going to heat up the package and the board
>> enough
>> to hit the CPU thermal throttling;  this approach is also sometimes
>> referred
>> to as "race to idle", which is actually quite effective for
>> battery-powered
>> mobile devices that tend to load their CPU cores in bursts, while
>> remaining
>> kind of inactive for the remaining time.
>> 
>> However, if the CPU task(s) last long enough to actually saturate the
>> thermal
>> capacities of the package and the board or the device, we're getting
>> into the
>> CPU throttling territory, in which running the CPU cores slower, but
>> still as
>> fast as possible, may actually be beneficial for the overall CPU
>> performance.
>> By running the CPU cores slower, we're lowering the power and
>> "spreading" the
>> total energy consumption over time, i.e. we're making some time to 
>> allow
>> the
>> generated heat to dissipate into the surroundings.  As we know, having
>> more
>> energy consumed by the SoC means more heat generated by the SoC, but 
>> the
>> resulting temperature of the SoC depends on how fast the energy is
>> consumed,
>> which equals to how fast the CPUs run;  of course, all that is valid
>> under
>> the reasonable assumption that the entire cooling setup, including the
>> board
>> surroundings, remains unchanged all the time.
> 
> On the other hand, convective heat dissipation is approximately
> proportional to the temperature differential, therefore heating up the
> core to a higher temperature over a shorter period of time would let
> it dissipate the same joule amount faster. Given that total joules

Let me point out that the emphasis is again on "shorter period". :)
Yes, when the CPU load is bursty, having multiple same-voltage OPPs
almost surely won't help us at all, as I already noted.  However,
the things will surely change when the CPU cores are loaded for
longer amounts of time and, as a result, the defined thermal trips
are reached, because the cooling system gets saturated.

> generated for a particular load are approximately the same for
> different frequencies as long as voltage remains the same (as Daniel
> pointed out), higher frequency seems to lead to better heat transfer
> to the environment for the same load. And also the task completes
> sooner, which is probably always good, ceteris paribus.
> 
> Not sure how that all changes when throttling enters the game though :)

As I already noted above, the things are quite different when the CPU
load isn't bursty.  Once the cooling setup is saturated, the heat no
longer gets transferred effectively to the surroundings, while the CPU
cores keep producing the heat, which cannot continue indefinitely.  As
a result, the CPU cores need to run slower and "spread" the total amount
of joules over time, but they still should run as fast as possible.
Another option is to introduce active cooling, which also comes with
its own set of limits, but the initial assumption is that the cooling
setup remains unchanged.

In the end, if all that weren't the case, we wouldn't need CPU thermal
throttling at all, or not as much. :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26  7:04         ` Dragan Simic
@ 2024-01-26  7:30           ` Alexey Charkov
  2024-01-26  7:49             ` Dragan Simic
  0 siblings, 1 reply; 34+ messages in thread
From: Alexey Charkov @ 2024-01-26  7:30 UTC (permalink / raw)
  To: Dragan Simic
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
>
> Hello Alexey,
>
> On 2024-01-26 07:44, Alexey Charkov wrote:
> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
> > wrote:
> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
> >> > On 24/01/2024 21:30, Alexey Charkov wrote:
> >> >> By default the CPUs on RK3588 start up in a conservative performance
> >> >> mode. Add frequency and voltage mappings to the device tree to enable
> >> >> dynamic scaling via cpufreq
> >> >>
> >> >> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> >> >> ---
> >> >>   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209
> >> >> ++++++++++++++++++++++++++++++
> >> >>   1 file changed, 209 insertions(+)
> >> >>
> >> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> >> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> >> index 131b9eb21398..e605be531a0f 100644
> >> >> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> >> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> >> @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> >>                      assigned-clock-rates = <816000000>;
> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <32768>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
> >> >>                      enable-method = "psci";
> >> >>                      capacity-dmips-mhz = <530>;
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <32768>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
> >> >>                      enable-method = "psci";
> >> >>                      capacity-dmips-mhz = <530>;
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <32768>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
> >> >>                      enable-method = "psci";
> >> >>                      capacity-dmips-mhz = <530>;
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <32768>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >> >>                      assigned-clock-rates = <816000000>;
> >> >> +                    operating-points-v2 = <&cluster1_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <65536>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
> >> >>                      enable-method = "psci";
> >> >>                      capacity-dmips-mhz = <1024>;
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> >> >> +                    operating-points-v2 = <&cluster1_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <65536>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >> >>                      assigned-clock-rates = <816000000>;
> >> >> +                    operating-points-v2 = <&cluster2_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <65536>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
> >> >>                      enable-method = "psci";
> >> >>                      capacity-dmips-mhz = <1024>;
> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> >> >> +                    operating-points-v2 = <&cluster2_opp_table>;
> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
> >> >>                      i-cache-size = <65536>;
> >> >>                      i-cache-line-size = <64>;
> >> >> @@ -348,6 +356,207 @@ l3_cache: l3-cache {
> >> >>              };
> >> >>      };
> >> >>   +  cluster0_opp_table: opp-table-cluster0 {
> >> >> +            compatible = "operating-points-v2";
> >> >> +            opp-shared;
> >> >> +
> >> >> +            opp-408000000 {
> >> >> +                    opp-hz = /bits/ 64 <408000000>;
> >> >> +                    opp-microvolt = <675000 675000 950000>;
> >> >> +                    clock-latency-ns = <40000>;
> >> >> +            };
> >> >> +            opp-600000000 {
> >> >> +                    opp-hz = /bits/ 64 <600000000>;
> >> >> +                    opp-microvolt = <675000 675000 950000>;
> >> >> +                    clock-latency-ns = <40000>;
> >> >> +            };
> >> >> +            opp-816000000 {
> >> >> +                    opp-hz = /bits/ 64 <816000000>;
> >> >> +                    opp-microvolt = <675000 675000 950000>;
> >> >> +                    clock-latency-ns = <40000>;
> >> >> +            };
> >> >> +            opp-1008000000 {
> >> >> +                    opp-hz = /bits/ 64 <1008000000>;
> >> >> +                    opp-microvolt = <675000 675000 950000>;
> >> >> +                    clock-latency-ns = <40000>;
> >> >> +            };
> >> >
> >> > It is not useful to introduce OPP with the same voltage. There is no
> >> > gain in terms of energy efficiency as the compute capacity is linearly
> >> > tied with power consumption (P=CxFxV²) in this case.
> >> >
> >> > For example, opp-408 consumes 2 bogoWatts and opp-816 consumes 4
> >> > bogoWatts (because of the same voltage).
> >> >
> >> > For a workload, opp-408 takes 10 sec and opp-816 takes 5 sec because
> >> > it is twice faster.
> >> >
> >> > The energy consumption is:
> >> >
> >> > opp-408 = 10 x 2 = 20 BogoJoules
> >> > opp-816 = 5 x 4 = 20 BogoJoules
> >>
> >> I'd respectfully disagree that including multiple OPPs with the same
> >> voltage
> >> but different frequencies isn't useful.  Please allow me to explain.
> >>
> >> See, the total amount of consumed energy is, in general, the same for
> >> such
> >> OPPs and the same CPU task(s), if we ignore the static leakage current
> >> and
> >> such stuff, which isn't important here.  Though, the emphasis here is
> >> on
> >> "total", i.e. without taking into account the actual amount of time
> >> required
> >> for the exemplified CPU task(s) to complete.  If the total amount of
> >> time
> >> is quite short, we aren't going to heat up the package and the board
> >> enough
> >> to hit the CPU thermal throttling;  this approach is also sometimes
> >> referred
> >> to as "race to idle", which is actually quite effective for
> >> battery-powered
> >> mobile devices that tend to load their CPU cores in bursts, while
> >> remaining
> >> kind of inactive for the remaining time.
> >>
> >> However, if the CPU task(s) last long enough to actually saturate the
> >> thermal
> >> capacities of the package and the board or the device, we're getting
> >> into the
> >> CPU throttling territory, in which running the CPU cores slower, but
> >> still as
> >> fast as possible, may actually be beneficial for the overall CPU
> >> performance.
> >> By running the CPU cores slower, we're lowering the power and
> >> "spreading" the
> >> total energy consumption over time, i.e. we're making some time to
> >> allow
> >> the
> >> generated heat to dissipate into the surroundings.  As we know, having
> >> more
> >> energy consumed by the SoC means more heat generated by the SoC, but
> >> the
> >> resulting temperature of the SoC depends on how fast the energy is
> >> consumed,
> >> which equals to how fast the CPUs run;  of course, all that is valid
> >> under
> >> the reasonable assumption that the entire cooling setup, including the
> >> board
> >> surroundings, remains unchanged all the time.
> >
> > On the other hand, convective heat dissipation is approximately
> > proportional to the temperature differential, therefore heating up the
> > core to a higher temperature over a shorter period of time would let
> > it dissipate the same joule amount faster. Given that total joules
>
> Let me point out that the emphasis is again on "shorter period". :)
> Yes, when the CPU load is bursty, having multiple same-voltage OPPs
> almost surely won't help us at all, as I already noted.  However,
> the things will surely change when the CPU cores are loaded for
> longer amounts of time and, as a result, the defined thermal trips
> are reached, because the cooling system gets saturated.
>
> > generated for a particular load are approximately the same for
> > different frequencies as long as voltage remains the same (as Daniel
> > pointed out), higher frequency seems to lead to better heat transfer
> > to the environment for the same load. And also the task completes
> > sooner, which is probably always good, ceteris paribus.
> >
> > Not sure how that all changes when throttling enters the game though :)
>
> As I already noted above, the things are quite different when the CPU
> load isn't bursty.  Once the cooling setup is saturated, the heat no
> longer gets transferred effectively to the surroundings, while the CPU
> cores keep producing the heat, which cannot continue indefinitely.  As
> a result, the CPU cores need to run slower and "spread" the total amount
> of joules over time, but they still should run as fast as possible.

Wouldn't in this "non-bursty" case the total thermal production be
driven by how fast the system generates tasks, which is independent of
the thermal and frequency state? If joules per task are constant
(under steady state load), then it shouldn't matter whether they are
generated in short bursts or in a slower continuous flow - as long as
the time to dissipate the heat is longer than the time between tasks
at the high frequency state, in which case we are back to the "bursty"
scenario

> Another option is to introduce active cooling, which also comes with
> its own set of limits, but the initial assumption is that the cooling
> setup remains unchanged.
>
> In the end, if all that weren't the case, we wouldn't need CPU thermal
> throttling at all, or not as much. :)

Throttling would also lower the voltage at some point, which cools it
down much faster!

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26  7:30           ` Alexey Charkov
@ 2024-01-26  7:49             ` Dragan Simic
  2024-01-26 12:56               ` Daniel Lezcano
  0 siblings, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-26  7:49 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On 2024-01-26 08:30, Alexey Charkov wrote:
> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> On 2024-01-26 07:44, Alexey Charkov wrote:
>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
>> > wrote:
>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
>> >> > On 24/01/2024 21:30, Alexey Charkov wrote:
>> >> >> By default the CPUs on RK3588 start up in a conservative performance
>> >> >> mode. Add frequency and voltage mappings to the device tree to enable
>> >> >> dynamic scaling via cpufreq
>> >> >>
>> >> >> Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> >> >> ---
>> >> >>   arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 209
>> >> >> ++++++++++++++++++++++++++++++
>> >> >>   1 file changed, 209 insertions(+)
>> >> >>
>> >> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> >> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> >> index 131b9eb21398..e605be531a0f 100644
>> >> >> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> >> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> >> >> @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> >>                      assigned-clock-rates = <816000000>;
>> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <32768>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
>> >> >>                      enable-method = "psci";
>> >> >>                      capacity-dmips-mhz = <530>;
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <32768>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
>> >> >>                      enable-method = "psci";
>> >> >>                      capacity-dmips-mhz = <530>;
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <32768>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
>> >> >>                      enable-method = "psci";
>> >> >>                      capacity-dmips-mhz = <530>;
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUL>;
>> >> >> +                    operating-points-v2 = <&cluster0_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <32768>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>> >> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>> >> >>                      assigned-clock-rates = <816000000>;
>> >> >> +                    operating-points-v2 = <&cluster1_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <65536>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
>> >> >>                      enable-method = "psci";
>> >> >>                      capacity-dmips-mhz = <1024>;
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB01>;
>> >> >> +                    operating-points-v2 = <&cluster1_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <65536>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>> >> >>                      assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>> >> >>                      assigned-clock-rates = <816000000>;
>> >> >> +                    operating-points-v2 = <&cluster2_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <65536>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
>> >> >>                      enable-method = "psci";
>> >> >>                      capacity-dmips-mhz = <1024>;
>> >> >>                      clocks = <&scmi_clk SCMI_CLK_CPUB23>;
>> >> >> +                    operating-points-v2 = <&cluster2_opp_table>;
>> >> >>                      cpu-idle-states = <&CPU_SLEEP>;
>> >> >>                      i-cache-size = <65536>;
>> >> >>                      i-cache-line-size = <64>;
>> >> >> @@ -348,6 +356,207 @@ l3_cache: l3-cache {
>> >> >>              };
>> >> >>      };
>> >> >>   +  cluster0_opp_table: opp-table-cluster0 {
>> >> >> +            compatible = "operating-points-v2";
>> >> >> +            opp-shared;
>> >> >> +
>> >> >> +            opp-408000000 {
>> >> >> +                    opp-hz = /bits/ 64 <408000000>;
>> >> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> >> +                    clock-latency-ns = <40000>;
>> >> >> +            };
>> >> >> +            opp-600000000 {
>> >> >> +                    opp-hz = /bits/ 64 <600000000>;
>> >> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> >> +                    clock-latency-ns = <40000>;
>> >> >> +            };
>> >> >> +            opp-816000000 {
>> >> >> +                    opp-hz = /bits/ 64 <816000000>;
>> >> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> >> +                    clock-latency-ns = <40000>;
>> >> >> +            };
>> >> >> +            opp-1008000000 {
>> >> >> +                    opp-hz = /bits/ 64 <1008000000>;
>> >> >> +                    opp-microvolt = <675000 675000 950000>;
>> >> >> +                    clock-latency-ns = <40000>;
>> >> >> +            };
>> >> >
>> >> > It is not useful to introduce OPP with the same voltage. There is no
>> >> > gain in terms of energy efficiency as the compute capacity is linearly
>> >> > tied with power consumption (P=CxFxV²) in this case.
>> >> >
>> >> > For example, opp-408 consumes 2 bogoWatts and opp-816 consumes 4
>> >> > bogoWatts (because of the same voltage).
>> >> >
>> >> > For a workload, opp-408 takes 10 sec and opp-816 takes 5 sec because
>> >> > it is twice faster.
>> >> >
>> >> > The energy consumption is:
>> >> >
>> >> > opp-408 = 10 x 2 = 20 BogoJoules
>> >> > opp-816 = 5 x 4 = 20 BogoJoules
>> >>
>> >> I'd respectfully disagree that including multiple OPPs with the same
>> >> voltage
>> >> but different frequencies isn't useful.  Please allow me to explain.
>> >>
>> >> See, the total amount of consumed energy is, in general, the same for
>> >> such
>> >> OPPs and the same CPU task(s), if we ignore the static leakage current
>> >> and
>> >> such stuff, which isn't important here.  Though, the emphasis here is
>> >> on
>> >> "total", i.e. without taking into account the actual amount of time
>> >> required
>> >> for the exemplified CPU task(s) to complete.  If the total amount of
>> >> time
>> >> is quite short, we aren't going to heat up the package and the board
>> >> enough
>> >> to hit the CPU thermal throttling;  this approach is also sometimes
>> >> referred
>> >> to as "race to idle", which is actually quite effective for
>> >> battery-powered
>> >> mobile devices that tend to load their CPU cores in bursts, while
>> >> remaining
>> >> kind of inactive for the remaining time.
>> >>
>> >> However, if the CPU task(s) last long enough to actually saturate the
>> >> thermal
>> >> capacities of the package and the board or the device, we're getting
>> >> into the
>> >> CPU throttling territory, in which running the CPU cores slower, but
>> >> still as
>> >> fast as possible, may actually be beneficial for the overall CPU
>> >> performance.
>> >> By running the CPU cores slower, we're lowering the power and
>> >> "spreading" the
>> >> total energy consumption over time, i.e. we're making some time to
>> >> allow
>> >> the
>> >> generated heat to dissipate into the surroundings.  As we know, having
>> >> more
>> >> energy consumed by the SoC means more heat generated by the SoC, but
>> >> the
>> >> resulting temperature of the SoC depends on how fast the energy is
>> >> consumed,
>> >> which equals to how fast the CPUs run;  of course, all that is valid
>> >> under
>> >> the reasonable assumption that the entire cooling setup, including the
>> >> board
>> >> surroundings, remains unchanged all the time.
>> >
>> > On the other hand, convective heat dissipation is approximately
>> > proportional to the temperature differential, therefore heating up the
>> > core to a higher temperature over a shorter period of time would let
>> > it dissipate the same joule amount faster. Given that total joules
>> 
>> Let me point out that the emphasis is again on "shorter period". :)
>> Yes, when the CPU load is bursty, having multiple same-voltage OPPs
>> almost surely won't help us at all, as I already noted.  However,
>> the things will surely change when the CPU cores are loaded for
>> longer amounts of time and, as a result, the defined thermal trips
>> are reached, because the cooling system gets saturated.
>> 
>> > generated for a particular load are approximately the same for
>> > different frequencies as long as voltage remains the same (as Daniel
>> > pointed out), higher frequency seems to lead to better heat transfer
>> > to the environment for the same load. And also the task completes
>> > sooner, which is probably always good, ceteris paribus.
>> >
>> > Not sure how that all changes when throttling enters the game though :)
>> 
>> As I already noted above, the things are quite different when the CPU
>> load isn't bursty.  Once the cooling setup is saturated, the heat no
>> longer gets transferred effectively to the surroundings, while the CPU
>> cores keep producing the heat, which cannot continue indefinitely.  As
>> a result, the CPU cores need to run slower and "spread" the total 
>> amount
>> of joules over time, but they still should run as fast as possible.
> 
> Wouldn't in this "non-bursty" case the total thermal production be
> driven by how fast the system generates tasks, which is independent of
> the thermal and frequency state? If joules per task are constant
> (under steady state load), then it shouldn't matter whether they are
> generated in short bursts or in a slower continuous flow - as long as
> the time to dissipate the heat is longer than the time between tasks
> at the high frequency state, in which case we are back to the "bursty"
> scenario.

You're right, if there's enough time for the generated heat to 
dissipate,
between scheduled CPU tasks, there should be no significant OPP dips,
and how bursty (or not) the CPU load is shouldn't matter much.  
Actually,
if you agree, exactly this might be our definition of bursty CPU load.

The things change when there isn't enough time.

>> Another option is to introduce active cooling, which also comes with
>> its own set of limits, but the initial assumption is that the cooling
>> setup remains unchanged.
>> 
>> In the end, if all that weren't the case, we wouldn't need CPU thermal
>> throttling at all, or not as much. :)
> 
> Throttling would also lower the voltage at some point, which cools it
> down much faster!

Of course, but the key is not to cool (and slow down) the CPU cores too
much, but just enough to stay within the available thermal envelope,
which is where the same-voltage, lower-frequency OPPs should shine.

When the CPU load isn't bursty but steady and high, we don't race to
idle, but run a marathon instead, so to speak. :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26  7:49             ` Dragan Simic
@ 2024-01-26 12:56               ` Daniel Lezcano
  2024-01-26 13:44                 ` Alexey Charkov
  2024-01-26 20:04                 ` Dragan Simic
  0 siblings, 2 replies; 34+ messages in thread
From: Daniel Lezcano @ 2024-01-26 12:56 UTC (permalink / raw)
  To: Dragan Simic, Alexey Charkov
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
	Viresh Kumar

On 26/01/2024 08:49, Dragan Simic wrote:
> On 2024-01-26 08:30, Alexey Charkov wrote:
>> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
>>> On 2024-01-26 07:44, Alexey Charkov wrote:
>>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
>>> > wrote:
>>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
>>> >> > On 24/01/2024 21:30, Alexey Charkov wrote:
>>> >> >> By default the CPUs on RK3588 start up in a conservative 
>>> performance
>>> >> >> mode. Add frequency and voltage mappings to the device tree to 
>>> enable

[ ... ]

>> Throttling would also lower the voltage at some point, which cools it
>> down much faster!
> 
> Of course, but the key is not to cool (and slow down) the CPU cores too
> much, but just enough to stay within the available thermal envelope,
> which is where the same-voltage, lower-frequency OPPs should shine.

That implies the resulting power is sustainable which I doubt it is the 
case.

The voltage scaling makes the cooling effect efficient not the frequency.

For example:
	opp5 = opp(2GHz, 1V) => 2 BogoWatt
	opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
	opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
	[ other states but we focus on these 3 ]

opp5->opp4 => -5% compute capacity, -5% power, ratio=1
opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6

opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9

In burst operation (no thermal throttling), opp4 is pointless we agree 
on that.

IMO the following will happen: in burst operation with thermal 
throttling we hit the trip point and then the step wise governor reduces 
opp5 -> opp4. We have slight power reduction but the temperature does 
not decrease, so at the next iteration, it is throttle at opp3. And at 
the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.

It is probable we end up with an equivalent frequency average (or 
compute capacity avg).

opp4 <-> opp3 (longer duration in states, less transitions)
opp5 <-> opp3 (shorter duration in states, more transitions)

Some platforms had their higher OPPs with the same voltage and they 
failed to cool down the CPU in the long run.

Anyway, there is only one way to check it out :)

Alexey, is it possible to compare the compute duration for 'dhrystone' 
with these voltage OPP and without ? (with a period of cool down between 
the test in order to start at the same thermal condition) ?



> When the CPU load isn't bursty but steady and high, we don't race to
> idle, but run a marathon instead, so to speak. :)

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26 12:56               ` Daniel Lezcano
@ 2024-01-26 13:44                 ` Alexey Charkov
  2024-01-26 20:33                   ` Dragan Simic
  2024-01-26 20:04                 ` Dragan Simic
  1 sibling, 1 reply; 34+ messages in thread
From: Alexey Charkov @ 2024-01-26 13:44 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Dragan Simic, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 26/01/2024 08:49, Dragan Simic wrote:
> > On 2024-01-26 08:30, Alexey Charkov wrote:
> >> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
> >>> On 2024-01-26 07:44, Alexey Charkov wrote:
> >>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
> >>> > wrote:
> >>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
> >>> >> > On 24/01/2024 21:30, Alexey Charkov wrote:
> >>> >> >> By default the CPUs on RK3588 start up in a conservative
> >>> performance
> >>> >> >> mode. Add frequency and voltage mappings to the device tree to
> >>> enable
>
> [ ... ]
>
> >> Throttling would also lower the voltage at some point, which cools it
> >> down much faster!
> >
> > Of course, but the key is not to cool (and slow down) the CPU cores too
> > much, but just enough to stay within the available thermal envelope,
> > which is where the same-voltage, lower-frequency OPPs should shine.
>
> That implies the resulting power is sustainable which I doubt it is the
> case.
>
> The voltage scaling makes the cooling effect efficient not the frequency.
>
> For example:
>         opp5 = opp(2GHz, 1V) => 2 BogoWatt
>         opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
>         opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
>         [ other states but we focus on these 3 ]
>
> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
>
> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
>
> In burst operation (no thermal throttling), opp4 is pointless we agree
> on that.
>
> IMO the following will happen: in burst operation with thermal
> throttling we hit the trip point and then the step wise governor reduces
> opp5 -> opp4. We have slight power reduction but the temperature does
> not decrease, so at the next iteration, it is throttle at opp3. And at
> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
>
> It is probable we end up with an equivalent frequency average (or
> compute capacity avg).
>
> opp4 <-> opp3 (longer duration in states, less transitions)
> opp5 <-> opp3 (shorter duration in states, more transitions)
>
> Some platforms had their higher OPPs with the same voltage and they
> failed to cool down the CPU in the long run.
>
> Anyway, there is only one way to check it out :)
>
> Alexey, is it possible to compare the compute duration for 'dhrystone'
> with these voltage OPP and without ? (with a period of cool down between
> the test in order to start at the same thermal condition) ?

Sure, let me try that - would be interesting to see the results. In my
previous tinkering there were cases when the system stayed at 2.35GHz
for all big cores for non-trivial time (using the step-wise thermal
governor), and that's an example of "same voltage, lower frequency".
Other times though it throttled one cluster down to 1.8GHz and kept
the other at 2.4GHz, and was also stationary at those parameters for
extended time. This probably indicates that both of those states use
sustainable power in my cooling setup.

Note though that I still have that tiny heatsink installed (even
though I disable the fan during tests), and in this setup the
temperature drops from 85C to around 70C in a matter of seconds as
soon as the load stops. And if I enable the fan then it balances the
temperature at the control setpoint of 55C using less than full fan
speed with 8 threads of dhrystone running for extended time (and the
PWM value chosen by the step-wise governor stabilizes at 240 out of
255). Looks like my prior assessment that "the fan is not super mighty
vs. the total thermal output" was wrong after all, despite its modest
size :)

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26 13:44                 ` Alexey Charkov
@ 2024-01-26 20:33                   ` Dragan Simic
  2024-01-27 19:41                     ` Alexey Charkov
  0 siblings, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-26 20:33 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On 2024-01-26 14:44, Alexey Charkov wrote:
> On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> On 26/01/2024 08:49, Dragan Simic wrote:
>> > On 2024-01-26 08:30, Alexey Charkov wrote:
>> >> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >>> On 2024-01-26 07:44, Alexey Charkov wrote:
>> >>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
>> >>> > wrote:
>> >>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
>> >>> >> > On 24/01/2024 21:30, Alexey Charkov wrote:
>> >>> >> >> By default the CPUs on RK3588 start up in a conservative
>> >>> performance
>> >>> >> >> mode. Add frequency and voltage mappings to the device tree to
>> >>> enable
>> 
>> [ ... ]
>> 
>> >> Throttling would also lower the voltage at some point, which cools it
>> >> down much faster!
>> >
>> > Of course, but the key is not to cool (and slow down) the CPU cores too
>> > much, but just enough to stay within the available thermal envelope,
>> > which is where the same-voltage, lower-frequency OPPs should shine.
>> 
>> That implies the resulting power is sustainable which I doubt it is 
>> the
>> case.
>> 
>> The voltage scaling makes the cooling effect efficient not the 
>> frequency.
>> 
>> For example:
>>         opp5 = opp(2GHz, 1V) => 2 BogoWatt
>>         opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
>>         opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
>>         [ other states but we focus on these 3 ]
>> 
>> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
>> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
>> 
>> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
>> 
>> In burst operation (no thermal throttling), opp4 is pointless we agree
>> on that.
>> 
>> IMO the following will happen: in burst operation with thermal
>> throttling we hit the trip point and then the step wise governor 
>> reduces
>> opp5 -> opp4. We have slight power reduction but the temperature does
>> not decrease, so at the next iteration, it is throttle at opp3. And at
>> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
>> 
>> It is probable we end up with an equivalent frequency average (or
>> compute capacity avg).
>> 
>> opp4 <-> opp3 (longer duration in states, less transitions)
>> opp5 <-> opp3 (shorter duration in states, more transitions)
>> 
>> Some platforms had their higher OPPs with the same voltage and they
>> failed to cool down the CPU in the long run.
>> 
>> Anyway, there is only one way to check it out :)
>> 
>> Alexey, is it possible to compare the compute duration for 'dhrystone'
>> with these voltage OPP and without ? (with a period of cool down 
>> between
>> the test in order to start at the same thermal condition) ?
> 
> Sure, let me try that - would be interesting to see the results. In my
> previous tinkering there were cases when the system stayed at 2.35GHz
> for all big cores for non-trivial time (using the step-wise thermal
> governor), and that's an example of "same voltage, lower frequency".
> Other times though it throttled one cluster down to 1.8GHz and kept
> the other at 2.4GHz, and was also stationary at those parameters for
> extended time. This probably indicates that both of those states use
> sustainable power in my cooling setup.

IMHO, there are simply too many factors at play, including different
possible cooling setups, so providing additional CPU throttling
granularity can only be helpful.  Of course, testing and recording
data is the way to move forward, but I think we should use a few
different tests.

> Note though that I still have that tiny heatsink installed (even
> though I disable the fan during tests), and in this setup the
> temperature drops from 85C to around 70C in a matter of seconds as
> soon as the load stops. And if I enable the fan then it balances the
> temperature at the control setpoint of 55C using less than full fan
> speed with 8 threads of dhrystone running for extended time (and the
> PWM value chosen by the step-wise governor stabilizes at 240 out of
> 255). Looks like my prior assessment that "the fan is not super mighty
> vs. the total thermal output" was wrong after all, despite its modest
> size :)

Quite frankly, I wouldn't dare to run an RK3588 (or an RK3399) with
absolutely no heatsink attached, so I think your minimal cooling setup
(i.e. with the fan disconnected) is fine.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26 20:33                   ` Dragan Simic
@ 2024-01-27 19:41                     ` Alexey Charkov
  2024-01-28  3:35                       ` Dragan Simic
  2024-01-28 15:06                       ` Daniel Lezcano
  0 siblings, 2 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-27 19:41 UTC (permalink / raw)
  To: Dragan Simic
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On Sat, Jan 27, 2024 at 12:33 AM Dragan Simic <dsimic@manjaro.org> wrote:
>
> On 2024-01-26 14:44, Alexey Charkov wrote:
> > On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >> On 26/01/2024 08:49, Dragan Simic wrote:
> >> > On 2024-01-26 08:30, Alexey Charkov wrote:
> >> >> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
> >> >>> On 2024-01-26 07:44, Alexey Charkov wrote:
> >> >>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
> >> >>> > wrote:
> >> >>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
> >> >>> >> > On 24/01/2024 21:30, Alexey Charkov wrote:
> >> >>> >> >> By default the CPUs on RK3588 start up in a conservative
> >> >>> performance
> >> >>> >> >> mode. Add frequency and voltage mappings to the device tree to
> >> >>> enable
> >>
> >> [ ... ]
> >>
> >> >> Throttling would also lower the voltage at some point, which cools it
> >> >> down much faster!
> >> >
> >> > Of course, but the key is not to cool (and slow down) the CPU cores too
> >> > much, but just enough to stay within the available thermal envelope,
> >> > which is where the same-voltage, lower-frequency OPPs should shine.
> >>
> >> That implies the resulting power is sustainable which I doubt it is
> >> the
> >> case.
> >>
> >> The voltage scaling makes the cooling effect efficient not the
> >> frequency.
> >>
> >> For example:
> >>         opp5 = opp(2GHz, 1V) => 2 BogoWatt
> >>         opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
> >>         opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
> >>         [ other states but we focus on these 3 ]
> >>
> >> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
> >> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
> >>
> >> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
> >>
> >> In burst operation (no thermal throttling), opp4 is pointless we agree
> >> on that.
> >>
> >> IMO the following will happen: in burst operation with thermal
> >> throttling we hit the trip point and then the step wise governor
> >> reduces
> >> opp5 -> opp4. We have slight power reduction but the temperature does
> >> not decrease, so at the next iteration, it is throttle at opp3. And at
> >> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
> >>
> >> It is probable we end up with an equivalent frequency average (or
> >> compute capacity avg).
> >>
> >> opp4 <-> opp3 (longer duration in states, less transitions)
> >> opp5 <-> opp3 (shorter duration in states, more transitions)
> >>
> >> Some platforms had their higher OPPs with the same voltage and they
> >> failed to cool down the CPU in the long run.
> >>
> >> Anyway, there is only one way to check it out :)
> >>
> >> Alexey, is it possible to compare the compute duration for 'dhrystone'
> >> with these voltage OPP and without ? (with a period of cool down
> >> between
> >> the test in order to start at the same thermal condition) ?
> >
> > Sure, let me try that - would be interesting to see the results. In my
> > previous tinkering there were cases when the system stayed at 2.35GHz
> > for all big cores for non-trivial time (using the step-wise thermal
> > governor), and that's an example of "same voltage, lower frequency".
> > Other times though it throttled one cluster down to 1.8GHz and kept
> > the other at 2.4GHz, and was also stationary at those parameters for
> > extended time. This probably indicates that both of those states use
> > sustainable power in my cooling setup.
>
> IMHO, there are simply too many factors at play, including different
> possible cooling setups, so providing additional CPU throttling
> granularity can only be helpful.  Of course, testing and recording
> data is the way to move forward, but I think we should use a few
> different tests.

Soooo, benchmarking these turned out a bit trickier than I had hoped
for. Apparently, dhrystone uses an unsigned int rather than an
unsigned long for the loops count (or something of that sort), which
means that I can't get it to run enough loops to heat up my chip from
a stable idle state to the throttling state (due to counter
wraparound). So I ended up with a couple of crutches, namely:
 - run dhrystone continuously on 6 out of 8 cores to make the chip
warm enough (`taskset -c 0-5 ./dhrystone -t 6 -r 6000` - note that on
my machine cores 6-7 are usually the first ones to get throttled, due
to whatever thermal peculiarities)
 - wait for the temperature to stabilize (which happens at 79.5C)
 - then run timed dhrystone on the remaining 2 out of 6 cores (big
ones) to see how throttling with different OPP tables affects overall
performance.

In the end, here's what I got with the 'original' OPP table (including
"same voltage - different frequencies" states):
alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
duration: 0 seconds
number of threads: 2
number of loops: 4000000000000000
delay between starting threads: 0 seconds

Dhrystone(1.1) time for 1233977344 passes = 29.7
This machine benchmarks at 41481539 dhrystones/second
                           23609 DMIPS
Dhrystone(1.1) time for 1233977344 passes = 29.8
This machine benchmarks at 41476618 dhrystones/second
                           23606 DMIPS

Total dhrystone run time: 30.864492 seconds.

And here's what I got with the 'reduced' OPP table (keeping only the
highest frequency state for each voltage):
alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
duration: 0 seconds
number of threads: 2
number of loops: 4000000000000000
delay between starting threads: 0 seconds

Dhrystone(1.1) time for 1233977344 passes = 30.9
This machine benchmarks at 39968549 dhrystones/second
                          22748 DMIPS
Dhrystone(1.1) time for 1233977344 passes = 31.0
This machine benchmarks at 39817431 dhrystones/second
                          22662 DMIPS

Total dhrystone run time: 31.995136 seconds.

Bottomline: removing the lower-frequency OPPs led to a 3.8% drop in
performance in this setup. This is probably far from a reliable
estimate, but I guess it indeed indicates that having lower-frequency
states might be beneficial in some load scenarios.

Note though that several seconds after hitting the throttling
threshold cores 6-7 were oscillating between 1.608GHz and 1.8GHz in
both runs, which implies that the whole difference in performance was
due to different speed of initial throttling (i.e. it might be a
peculiarity of the step-wise thermal governor operation when it has to
go through more cooling states to reach the "steady-state" one). Given
that both 1.608GHz and 1.8GHz have no lower-frequency same-voltage
siblings in either of the OPP tables, it implies that under prolonged
constant load there should be no performance difference at all.

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-27 19:41                     ` Alexey Charkov
@ 2024-01-28  3:35                       ` Dragan Simic
  2024-01-28 19:14                         ` Alexey Charkov
  2024-01-28 15:06                       ` Daniel Lezcano
  1 sibling, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-28  3:35 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On 2024-01-27 20:41, Alexey Charkov wrote:
> On Sat, Jan 27, 2024 at 12:33 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> On 2024-01-26 14:44, Alexey Charkov wrote:
>> > On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>> >> On 26/01/2024 08:49, Dragan Simic wrote:
>> >> > On 2024-01-26 08:30, Alexey Charkov wrote:
>> >> >> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >> >>> On 2024-01-26 07:44, Alexey Charkov wrote:
>> >> >>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >> >>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
>> >> >> Throttling would also lower the voltage at some point, which cools it
>> >> >> down much faster!
>> >> >
>> >> > Of course, but the key is not to cool (and slow down) the CPU cores too
>> >> > much, but just enough to stay within the available thermal envelope,
>> >> > which is where the same-voltage, lower-frequency OPPs should shine.
>> >>
>> >> That implies the resulting power is sustainable which I doubt it is
>> >> the
>> >> case.
>> >>
>> >> The voltage scaling makes the cooling effect efficient not the
>> >> frequency.
>> >>
>> >> For example:
>> >>         opp5 = opp(2GHz, 1V) => 2 BogoWatt
>> >>         opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
>> >>         opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
>> >>         [ other states but we focus on these 3 ]
>> >>
>> >> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
>> >> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
>> >>
>> >> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
>> >>
>> >> In burst operation (no thermal throttling), opp4 is pointless we agree
>> >> on that.
>> >>
>> >> IMO the following will happen: in burst operation with thermal
>> >> throttling we hit the trip point and then the step wise governor
>> >> reduces
>> >> opp5 -> opp4. We have slight power reduction but the temperature does
>> >> not decrease, so at the next iteration, it is throttle at opp3. And at
>> >> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
>> >>
>> >> It is probable we end up with an equivalent frequency average (or
>> >> compute capacity avg).
>> >>
>> >> opp4 <-> opp3 (longer duration in states, less transitions)
>> >> opp5 <-> opp3 (shorter duration in states, more transitions)
>> >>
>> >> Some platforms had their higher OPPs with the same voltage and they
>> >> failed to cool down the CPU in the long run.
>> >>
>> >> Anyway, there is only one way to check it out :)
>> >>
>> >> Alexey, is it possible to compare the compute duration for 'dhrystone'
>> >> with these voltage OPP and without ? (with a period of cool down
>> >> between
>> >> the test in order to start at the same thermal condition) ?
>> >
>> > Sure, let me try that - would be interesting to see the results. In my
>> > previous tinkering there were cases when the system stayed at 2.35GHz
>> > for all big cores for non-trivial time (using the step-wise thermal
>> > governor), and that's an example of "same voltage, lower frequency".
>> > Other times though it throttled one cluster down to 1.8GHz and kept
>> > the other at 2.4GHz, and was also stationary at those parameters for
>> > extended time. This probably indicates that both of those states use
>> > sustainable power in my cooling setup.
>> 
>> IMHO, there are simply too many factors at play, including different
>> possible cooling setups, so providing additional CPU throttling
>> granularity can only be helpful.  Of course, testing and recording
>> data is the way to move forward, but I think we should use a few
>> different tests.
> 
> Soooo, benchmarking these turned out a bit trickier than I had hoped
> for. Apparently, dhrystone uses an unsigned int rather than an
> unsigned long for the loops count (or something of that sort), which
> means that I can't get it to run enough loops to heat up my chip from
> a stable idle state to the throttling state (due to counter
> wraparound). So I ended up with a couple of crutches, namely:

Huh, it seems that recent SBCs may have become a bit too fast for it,
which is great. :)  Thank you for the benchmarking.

>  - run dhrystone continuously on 6 out of 8 cores to make the chip
> warm enough (`taskset -c 0-5 ./dhrystone -t 6 -r 6000` - note that on
> my machine cores 6-7 are usually the first ones to get throttled, due
> to whatever thermal peculiarities)
>  - wait for the temperature to stabilize (which happens at 79.5C)
>  - then run timed dhrystone on the remaining 2 out of 6 cores (big
> ones) to see how throttling with different OPP tables affects overall
> performance.

Just checking, running the test on just two CPU cores was enough to
keep the package temperature at around 80 oC?

> In the end, here's what I got with the 'original' OPP table (including
> "same voltage - different frequencies" states):
> alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> duration: 0 seconds
> number of threads: 2
> number of loops: 4000000000000000
> delay between starting threads: 0 seconds
> 
> Dhrystone(1.1) time for 1233977344 passes = 29.7
> This machine benchmarks at 41481539 dhrystones/second
>                            23609 DMIPS
> Dhrystone(1.1) time for 1233977344 passes = 29.8
> This machine benchmarks at 41476618 dhrystones/second
>                            23606 DMIPS
> 
> Total dhrystone run time: 30.864492 seconds.
> 
> And here's what I got with the 'reduced' OPP table (keeping only the
> highest frequency state for each voltage):
> alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> duration: 0 seconds
> number of threads: 2
> number of loops: 4000000000000000
> delay between starting threads: 0 seconds
> 
> Dhrystone(1.1) time for 1233977344 passes = 30.9
> This machine benchmarks at 39968549 dhrystones/second
>                           22748 DMIPS
> Dhrystone(1.1) time for 1233977344 passes = 31.0
> This machine benchmarks at 39817431 dhrystones/second
>                           22662 DMIPS
> 
> Total dhrystone run time: 31.995136 seconds.
> 
> Bottomline: removing the lower-frequency OPPs led to a 3.8% drop in
> performance in this setup. This is probably far from a reliable
> estimate, but I guess it indeed indicates that having lower-frequency
> states might be beneficial in some load scenarios.

Measuring a difference of about 4% may be attributed to some unknown
inaccuracy or test deviation, but again, a performance improvement of
about 4% that comes free of charge is nothing to be sneezed at, IMHO.

> Note though that several seconds after hitting the throttling
> threshold cores 6-7 were oscillating between 1.608GHz and 1.8GHz in
> both runs, which implies that the whole difference in performance was
> due to different speed of initial throttling (i.e. it might be a
> peculiarity of the step-wise thermal governor operation when it has to
> go through more cooling states to reach the "steady-state" one). Given
> that both 1.608GHz and 1.8GHz have no lower-frequency same-voltage
> siblings in either of the OPP tables, it implies that under prolonged
> constant load there should be no performance difference at all.

... all that with one possible cooling setup, and with one synthetic
test.  We simply can't know in advance how would a different cooling
setup on the same or on a different board behave, if you agree.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-28  3:35                       ` Dragan Simic
@ 2024-01-28 19:14                         ` Alexey Charkov
  2024-01-29  0:09                           ` Dragan Simic
  0 siblings, 1 reply; 34+ messages in thread
From: Alexey Charkov @ 2024-01-28 19:14 UTC (permalink / raw)
  To: Dragan Simic
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On Sun, Jan 28, 2024 at 7:35 AM Dragan Simic <dsimic@manjaro.org> wrote:
>
> On 2024-01-27 20:41, Alexey Charkov wrote:
> > On Sat, Jan 27, 2024 at 12:33 AM Dragan Simic <dsimic@manjaro.org>
> > wrote:
> >> On 2024-01-26 14:44, Alexey Charkov wrote:
> >> > On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> >> >> On 26/01/2024 08:49, Dragan Simic wrote:
> >> >> > On 2024-01-26 08:30, Alexey Charkov wrote:
> >> >> >> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
> >> >> >>> On 2024-01-26 07:44, Alexey Charkov wrote:
> >> >> >>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org> wrote:
> >> >> >>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
> >> >> >> Throttling would also lower the voltage at some point, which cools it
> >> >> >> down much faster!
> >> >> >
> >> >> > Of course, but the key is not to cool (and slow down) the CPU cores too
> >> >> > much, but just enough to stay within the available thermal envelope,
> >> >> > which is where the same-voltage, lower-frequency OPPs should shine.
> >> >>
> >> >> That implies the resulting power is sustainable which I doubt it is
> >> >> the
> >> >> case.
> >> >>
> >> >> The voltage scaling makes the cooling effect efficient not the
> >> >> frequency.
> >> >>
> >> >> For example:
> >> >>         opp5 = opp(2GHz, 1V) => 2 BogoWatt
> >> >>         opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
> >> >>         opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
> >> >>         [ other states but we focus on these 3 ]
> >> >>
> >> >> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
> >> >> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
> >> >>
> >> >> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
> >> >>
> >> >> In burst operation (no thermal throttling), opp4 is pointless we agree
> >> >> on that.
> >> >>
> >> >> IMO the following will happen: in burst operation with thermal
> >> >> throttling we hit the trip point and then the step wise governor
> >> >> reduces
> >> >> opp5 -> opp4. We have slight power reduction but the temperature does
> >> >> not decrease, so at the next iteration, it is throttle at opp3. And at
> >> >> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
> >> >>
> >> >> It is probable we end up with an equivalent frequency average (or
> >> >> compute capacity avg).
> >> >>
> >> >> opp4 <-> opp3 (longer duration in states, less transitions)
> >> >> opp5 <-> opp3 (shorter duration in states, more transitions)
> >> >>
> >> >> Some platforms had their higher OPPs with the same voltage and they
> >> >> failed to cool down the CPU in the long run.
> >> >>
> >> >> Anyway, there is only one way to check it out :)
> >> >>
> >> >> Alexey, is it possible to compare the compute duration for 'dhrystone'
> >> >> with these voltage OPP and without ? (with a period of cool down
> >> >> between
> >> >> the test in order to start at the same thermal condition) ?
> >> >
> >> > Sure, let me try that - would be interesting to see the results. In my
> >> > previous tinkering there were cases when the system stayed at 2.35GHz
> >> > for all big cores for non-trivial time (using the step-wise thermal
> >> > governor), and that's an example of "same voltage, lower frequency".
> >> > Other times though it throttled one cluster down to 1.8GHz and kept
> >> > the other at 2.4GHz, and was also stationary at those parameters for
> >> > extended time. This probably indicates that both of those states use
> >> > sustainable power in my cooling setup.
> >>
> >> IMHO, there are simply too many factors at play, including different
> >> possible cooling setups, so providing additional CPU throttling
> >> granularity can only be helpful.  Of course, testing and recording
> >> data is the way to move forward, but I think we should use a few
> >> different tests.
> >
> > Soooo, benchmarking these turned out a bit trickier than I had hoped
> > for. Apparently, dhrystone uses an unsigned int rather than an
> > unsigned long for the loops count (or something of that sort), which
> > means that I can't get it to run enough loops to heat up my chip from
> > a stable idle state to the throttling state (due to counter
> > wraparound). So I ended up with a couple of crutches, namely:
>
> Huh, it seems that recent SBCs may have become a bit too fast for it,
> which is great. :)  Thank you for the benchmarking.
>
> >  - run dhrystone continuously on 6 out of 8 cores to make the chip
> > warm enough (`taskset -c 0-5 ./dhrystone -t 6 -r 6000` - note that on
> > my machine cores 6-7 are usually the first ones to get throttled, due
> > to whatever thermal peculiarities)
> >  - wait for the temperature to stabilize (which happens at 79.5C)
> >  - then run timed dhrystone on the remaining 2 out of 6 cores (big
> > ones) to see how throttling with different OPP tables affects overall
> > performance.
>
> Just checking, running the test on just two CPU cores was enough to
> keep the package temperature at around 80 oC?

No, not even remotely.

I kept the continuous 6 dhrystone threads running on all the other
cores (`taskset -c 0-5 ./dhrystone -t 6 -r 6000`) to let it reach the
throttling temperature. This adds further imprecision to the benchmark
of course, because the governor might choose to throttle some of the
cores that do not participate in the timed benchmarking run, and thus
lend some thermal headroom to the latter. That didn't seem to happen
from my naked-eye observation via `watch "cpupower -c 0,4,6
frequency-info | grep current"`, although I admit that I didn't record
more granular logs of frequency states, and some quick transitions to
lower frequencies could also have happened on the other cores. Don't
think it's a major influence though, because a quick transition back
and forth shouldn't have contributed much to the thermal output.

> > In the end, here's what I got with the 'original' OPP table (including
> > "same voltage - different frequencies" states):
> > alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> > duration: 0 seconds
> > number of threads: 2
> > number of loops: 4000000000000000
> > delay between starting threads: 0 seconds
> >
> > Dhrystone(1.1) time for 1233977344 passes = 29.7
> > This machine benchmarks at 41481539 dhrystones/second
> >                            23609 DMIPS
> > Dhrystone(1.1) time for 1233977344 passes = 29.8
> > This machine benchmarks at 41476618 dhrystones/second
> >                            23606 DMIPS
> >
> > Total dhrystone run time: 30.864492 seconds.
> >
> > And here's what I got with the 'reduced' OPP table (keeping only the
> > highest frequency state for each voltage):
> > alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> > duration: 0 seconds
> > number of threads: 2
> > number of loops: 4000000000000000
> > delay between starting threads: 0 seconds
> >
> > Dhrystone(1.1) time for 1233977344 passes = 30.9
> > This machine benchmarks at 39968549 dhrystones/second
> >                           22748 DMIPS
> > Dhrystone(1.1) time for 1233977344 passes = 31.0
> > This machine benchmarks at 39817431 dhrystones/second
> >                           22662 DMIPS
> >
> > Total dhrystone run time: 31.995136 seconds.
> >
> > Bottomline: removing the lower-frequency OPPs led to a 3.8% drop in
> > performance in this setup. This is probably far from a reliable
> > estimate, but I guess it indeed indicates that having lower-frequency
> > states might be beneficial in some load scenarios.
>
> Measuring a difference of about 4% may be attributed to some unknown
> inaccuracy or test deviation, but again, a performance improvement of
> about 4% that comes free of charge is nothing to be sneezed at, IMHO.

True :)

> > Note though that several seconds after hitting the throttling
> > threshold cores 6-7 were oscillating between 1.608GHz and 1.8GHz in
> > both runs, which implies that the whole difference in performance was
> > due to different speed of initial throttling (i.e. it might be a
> > peculiarity of the step-wise thermal governor operation when it has to
> > go through more cooling states to reach the "steady-state" one). Given
> > that both 1.608GHz and 1.8GHz have no lower-frequency same-voltage
> > siblings in either of the OPP tables, it implies that under prolonged
> > constant load there should be no performance difference at all.
>
> ... all that with one possible cooling setup, and with one synthetic
> test.  We simply can't know in advance how would a different cooling
> setup on the same or on a different board behave, if you agree.

Of course. My only concern is whether we might be somewhat deceiving
ourselves by that benchmarked performance boost: after all, it's also
entirely possible that by going through multiple intermediate
frequency states, the step-wise governor simply didn't cool the core
just enough over some fraction of the benchmarking run, which we would
have observed in a detailed temperature log as a higher peak
temperature and longer residence above the throttling threshold
temperature (and that would be the case if intermediate frequency
states were "unsustainable" as Daniel pointed out, which they probably
were given that the throttling didn't stop at any of them).

Attributing a performance increase in this case to a benefit from
additional intermediate OPPs is not fully fair, because then we can
also simply move the throttling threshold higher. And it would be
super tricky to separate the effects from greater system throughput at
intermediate frequency states vs. greater effective thermal budget we
allow the governor to use before it even considers throttling.

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-28 19:14                         ` Alexey Charkov
@ 2024-01-29  0:09                           ` Dragan Simic
  2024-01-29  7:39                             ` Dragan Simic
  0 siblings, 1 reply; 34+ messages in thread
From: Dragan Simic @ 2024-01-29  0:09 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On 2024-01-28 20:14, Alexey Charkov wrote:
> On Sun, Jan 28, 2024 at 7:35 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> On 2024-01-27 20:41, Alexey Charkov wrote:
>> > On Sat, Jan 27, 2024 at 12:33 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >> On 2024-01-26 14:44, Alexey Charkov wrote:
>> >> > On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>> >> >> On 26/01/2024 08:49, Dragan Simic wrote:
>> >> >> > On 2024-01-26 08:30, Alexey Charkov wrote:
>> >> >> >> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >> >> >>> On 2024-01-26 07:44, Alexey Charkov wrote:
>> >> >> >>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org> wrote:
>> >> >> >>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
>> >> >> >> Throttling would also lower the voltage at some point, which cools it
>> >> >> >> down much faster!
>> >> >> >
>> >> >> > Of course, but the key is not to cool (and slow down) the CPU cores too
>> >> >> > much, but just enough to stay within the available thermal envelope,
>> >> >> > which is where the same-voltage, lower-frequency OPPs should shine.
>> >> >>
>> >> >> That implies the resulting power is sustainable which I doubt it is
>> >> >> the
>> >> >> case.
>> >> >>
>> >> >> The voltage scaling makes the cooling effect efficient not the
>> >> >> frequency.
>> >> >>
>> >> >> For example:
>> >> >>         opp5 = opp(2GHz, 1V) => 2 BogoWatt
>> >> >>         opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
>> >> >>         opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
>> >> >>         [ other states but we focus on these 3 ]
>> >> >>
>> >> >> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
>> >> >> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
>> >> >>
>> >> >> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
>> >> >>
>> >> >> In burst operation (no thermal throttling), opp4 is pointless we agree
>> >> >> on that.
>> >> >>
>> >> >> IMO the following will happen: in burst operation with thermal
>> >> >> throttling we hit the trip point and then the step wise governor
>> >> >> reduces
>> >> >> opp5 -> opp4. We have slight power reduction but the temperature does
>> >> >> not decrease, so at the next iteration, it is throttle at opp3. And at
>> >> >> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
>> >> >>
>> >> >> It is probable we end up with an equivalent frequency average (or
>> >> >> compute capacity avg).
>> >> >>
>> >> >> opp4 <-> opp3 (longer duration in states, less transitions)
>> >> >> opp5 <-> opp3 (shorter duration in states, more transitions)
>> >> >>
>> >> >> Some platforms had their higher OPPs with the same voltage and they
>> >> >> failed to cool down the CPU in the long run.
>> >> >>
>> >> >> Anyway, there is only one way to check it out :)
>> >> >>
>> >> >> Alexey, is it possible to compare the compute duration for 'dhrystone'
>> >> >> with these voltage OPP and without ? (with a period of cool down
>> >> >> between
>> >> >> the test in order to start at the same thermal condition) ?
>> >> >
>> >> > Sure, let me try that - would be interesting to see the results. In my
>> >> > previous tinkering there were cases when the system stayed at 2.35GHz
>> >> > for all big cores for non-trivial time (using the step-wise thermal
>> >> > governor), and that's an example of "same voltage, lower frequency".
>> >> > Other times though it throttled one cluster down to 1.8GHz and kept
>> >> > the other at 2.4GHz, and was also stationary at those parameters for
>> >> > extended time. This probably indicates that both of those states use
>> >> > sustainable power in my cooling setup.
>> >>
>> >> IMHO, there are simply too many factors at play, including different
>> >> possible cooling setups, so providing additional CPU throttling
>> >> granularity can only be helpful.  Of course, testing and recording
>> >> data is the way to move forward, but I think we should use a few
>> >> different tests.
>> >
>> > Soooo, benchmarking these turned out a bit trickier than I had hoped
>> > for. Apparently, dhrystone uses an unsigned int rather than an
>> > unsigned long for the loops count (or something of that sort), which
>> > means that I can't get it to run enough loops to heat up my chip from
>> > a stable idle state to the throttling state (due to counter
>> > wraparound). So I ended up with a couple of crutches, namely:
>> 
>> Huh, it seems that recent SBCs may have become a bit too fast for it,
>> which is great. :)  Thank you for the benchmarking.
>> 
>> >  - run dhrystone continuously on 6 out of 8 cores to make the chip
>> > warm enough (`taskset -c 0-5 ./dhrystone -t 6 -r 6000` - note that on
>> > my machine cores 6-7 are usually the first ones to get throttled, due
>> > to whatever thermal peculiarities)
>> >  - wait for the temperature to stabilize (which happens at 79.5C)
>> >  - then run timed dhrystone on the remaining 2 out of 6 cores (big
>> > ones) to see how throttling with different OPP tables affects overall
>> > performance.
>> 
>> Just checking, running the test on just two CPU cores was enough to
>> keep the package temperature at around 80 oC?
> 
> No, not even remotely.
> 
> I kept the continuous 6 dhrystone threads running on all the other
> cores (`taskset -c 0-5 ./dhrystone -t 6 -r 6000`) to let it reach the
> throttling temperature. This adds further imprecision to the benchmark
> of course, because the governor might choose to throttle some of the
> cores that do not participate in the timed benchmarking run, and thus
> lend some thermal headroom to the latter. That didn't seem to happen
> from my naked-eye observation via `watch "cpupower -c 0,4,6
> frequency-info | grep current"`, although I admit that I didn't record
> more granular logs of frequency states, and some quick transitions to
> lower frequencies could also have happened on the other cores. Don't
> think it's a major influence though, because a quick transition back
> and forth shouldn't have contributed much to the thermal output.

Thank you for the clarification!

You're right, that might have introduced some inaccuracy into the test
results, and it also made the tests kind of hardly repeatable.  On the
other hand, that way the synthetic CPU test feels a bit more like some
real-world CPU load, in which multiple resource-hungry tasks usually
compete for the CPU cores and the thermal budget.

Though, as we know repeatability is the key for a scientific approach,
but it also usually contradicts with simulating real-world loads that
are of rather random nature.  Well, testing is hard. :)

I'll think a bit more about all this, and I'll come back with an update.
Maybe I'll also be able to join the testing.

>> > In the end, here's what I got with the 'original' OPP table (including
>> > "same voltage - different frequencies" states):
>> > alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
>> > duration: 0 seconds
>> > number of threads: 2
>> > number of loops: 4000000000000000
>> > delay between starting threads: 0 seconds
>> >
>> > Dhrystone(1.1) time for 1233977344 passes = 29.7
>> > This machine benchmarks at 41481539 dhrystones/second
>> >                            23609 DMIPS
>> > Dhrystone(1.1) time for 1233977344 passes = 29.8
>> > This machine benchmarks at 41476618 dhrystones/second
>> >                            23606 DMIPS
>> >
>> > Total dhrystone run time: 30.864492 seconds.
>> >
>> > And here's what I got with the 'reduced' OPP table (keeping only the
>> > highest frequency state for each voltage):
>> > alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
>> > duration: 0 seconds
>> > number of threads: 2
>> > number of loops: 4000000000000000
>> > delay between starting threads: 0 seconds
>> >
>> > Dhrystone(1.1) time for 1233977344 passes = 30.9
>> > This machine benchmarks at 39968549 dhrystones/second
>> >                           22748 DMIPS
>> > Dhrystone(1.1) time for 1233977344 passes = 31.0
>> > This machine benchmarks at 39817431 dhrystones/second
>> >                           22662 DMIPS
>> >
>> > Total dhrystone run time: 31.995136 seconds.
>> >
>> > Bottomline: removing the lower-frequency OPPs led to a 3.8% drop in
>> > performance in this setup. This is probably far from a reliable
>> > estimate, but I guess it indeed indicates that having lower-frequency
>> > states might be beneficial in some load scenarios.
>> 
>> Measuring a difference of about 4% may be attributed to some unknown
>> inaccuracy or test deviation, but again, a performance improvement of
>> about 4% that comes free of charge is nothing to be sneezed at, IMHO.
> 
> True :)
> 
>> > Note though that several seconds after hitting the throttling
>> > threshold cores 6-7 were oscillating between 1.608GHz and 1.8GHz in
>> > both runs, which implies that the whole difference in performance was
>> > due to different speed of initial throttling (i.e. it might be a
>> > peculiarity of the step-wise thermal governor operation when it has to
>> > go through more cooling states to reach the "steady-state" one). Given
>> > that both 1.608GHz and 1.8GHz have no lower-frequency same-voltage
>> > siblings in either of the OPP tables, it implies that under prolonged
>> > constant load there should be no performance difference at all.
>> 
>> ... all that with one possible cooling setup, and with one synthetic
>> test.  We simply can't know in advance how would a different cooling
>> setup on the same or on a different board behave, if you agree.
> 
> Of course. My only concern is whether we might be somewhat deceiving
> ourselves by that benchmarked performance boost: after all, it's also
> entirely possible that by going through multiple intermediate
> frequency states, the step-wise governor simply didn't cool the core
> just enough over some fraction of the benchmarking run, which we would
> have observed in a detailed temperature log as a higher peak
> temperature and longer residence above the throttling threshold
> temperature (and that would be the case if intermediate frequency
> states were "unsustainable" as Daniel pointed out, which they probably
> were given that the throttling didn't stop at any of them).

Well, the observed CPU frequency scaling didn't stop at any particular
OPP for an extended period of time, if I'm not mistaken?  That's even
more not to be expected under any kind of unpredictable real-world CPU
load, if you agree.

> Attributing a performance increase in this case to a benefit from
> additional intermediate OPPs is not fully fair, because then we can
> also simply move the throttling threshold higher. And it would be
> super tricky to separate the effects from greater system throughput at
> intermediate frequency states vs. greater effective thermal budget we
> allow the governor to use before it even considers throttling.

I'd agree that drawing such conclusions (or the conclusions from the
opposite end of the spectrum) wouldn't be exactly fair at this point.
Though, IMHO it's important that the test results observed so far
weren't worse with the additional same-voltage, lower-frequency OPPs.

In other words, all I ever said, basically, is that having more granular
OPPs should be helpful at best, and not harmful at worst.  How helpful
is very hard to predict, because it depends on all kinds of things.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-29  0:09                           ` Dragan Simic
@ 2024-01-29  7:39                             ` Dragan Simic
  0 siblings, 0 replies; 34+ messages in thread
From: Dragan Simic @ 2024-01-29  7:39 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: Daniel Lezcano, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

Hello Alexey,

On 2024-01-29 01:09, Dragan Simic wrote:
> On 2024-01-28 20:14, Alexey Charkov wrote:
>> On Sun, Jan 28, 2024 at 7:35 AM Dragan Simic <dsimic@manjaro.org> 
>> wrote:
>>> Just checking, running the test on just two CPU cores was enough to
>>> keep the package temperature at around 80 oC?
>> 
>> No, not even remotely.
>> 
>> I kept the continuous 6 dhrystone threads running on all the other
>> cores (`taskset -c 0-5 ./dhrystone -t 6 -r 6000`) to let it reach the
>> throttling temperature. This adds further imprecision to the benchmark
>> of course, because the governor might choose to throttle some of the
>> cores that do not participate in the timed benchmarking run, and thus
>> lend some thermal headroom to the latter. That didn't seem to happen
>> from my naked-eye observation via `watch "cpupower -c 0,4,6
>> frequency-info | grep current"`, although I admit that I didn't record
>> more granular logs of frequency states, and some quick transitions to
>> lower frequencies could also have happened on the other cores. Don't
>> think it's a major influence though, because a quick transition back
>> and forth shouldn't have contributed much to the thermal output.
> 
> Thank you for the clarification!
> 
> You're right, that might have introduced some inaccuracy into the test
> results, and it also made the tests kind of hardly repeatable.  On the
> other hand, that way the synthetic CPU test feels a bit more like some
> real-world CPU load, in which multiple resource-hungry tasks usually
> compete for the CPU cores and the thermal budget.
> 
> Though, as we know repeatability is the key for a scientific approach,
> but it also usually contradicts with simulating real-world loads that
> are of rather random nature.  Well, testing is hard. :)
> 
> I'll think a bit more about all this, and I'll come back with an 
> update.
> Maybe I'll also be able to join the testing.

Good news! :)  Thanks to Radxa, I'll be able to do the testing on my
side, perhaps in about a week or two.  If you agree, it might be the 
best
to wait for those test results as well;  of course, I'll keep thinking
about some kind of a test suite in the meantime, which we should be able
to use on other boards as well.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-27 19:41                     ` Alexey Charkov
  2024-01-28  3:35                       ` Dragan Simic
@ 2024-01-28 15:06                       ` Daniel Lezcano
  2024-01-28 19:32                         ` Alexey Charkov
  1 sibling, 1 reply; 34+ messages in thread
From: Daniel Lezcano @ 2024-01-28 15:06 UTC (permalink / raw)
  To: Alexey Charkov, Dragan Simic
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
	Viresh Kumar


Hi Alexey,

On 27/01/2024 20:41, Alexey Charkov wrote:
> On Sat, Jan 27, 2024 at 12:33 AM Dragan Simic <dsimic@manjaro.org> wrote:
>>
>> On 2024-01-26 14:44, Alexey Charkov wrote:
>>> On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>> On 26/01/2024 08:49, Dragan Simic wrote:
>>>>> On 2024-01-26 08:30, Alexey Charkov wrote:
>>>>>> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
>>>>>>> On 2024-01-26 07:44, Alexey Charkov wrote:
>>>>>>>> On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
>>>>>>>> wrote:
>>>>>>>>> On 2024-01-25 10:30, Daniel Lezcano wrote:
>>>>>>>>>> On 24/01/2024 21:30, Alexey Charkov wrote:
>>>>>>>>>>> By default the CPUs on RK3588 start up in a conservative
>>>>>>> performance
>>>>>>>>>>> mode. Add frequency and voltage mappings to the device tree to
>>>>>>> enable
>>>>
>>>> [ ... ]
>>>>
>>>>>> Throttling would also lower the voltage at some point, which cools it
>>>>>> down much faster!
>>>>>
>>>>> Of course, but the key is not to cool (and slow down) the CPU cores too
>>>>> much, but just enough to stay within the available thermal envelope,
>>>>> which is where the same-voltage, lower-frequency OPPs should shine.
>>>>
>>>> That implies the resulting power is sustainable which I doubt it is
>>>> the
>>>> case.
>>>>
>>>> The voltage scaling makes the cooling effect efficient not the
>>>> frequency.
>>>>
>>>> For example:
>>>>          opp5 = opp(2GHz, 1V) => 2 BogoWatt
>>>>          opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
>>>>          opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
>>>>          [ other states but we focus on these 3 ]
>>>>
>>>> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
>>>> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
>>>>
>>>> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
>>>>
>>>> In burst operation (no thermal throttling), opp4 is pointless we agree
>>>> on that.
>>>>
>>>> IMO the following will happen: in burst operation with thermal
>>>> throttling we hit the trip point and then the step wise governor
>>>> reduces
>>>> opp5 -> opp4. We have slight power reduction but the temperature does
>>>> not decrease, so at the next iteration, it is throttle at opp3. And at
>>>> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
>>>>
>>>> It is probable we end up with an equivalent frequency average (or
>>>> compute capacity avg).
>>>>
>>>> opp4 <-> opp3 (longer duration in states, less transitions)
>>>> opp5 <-> opp3 (shorter duration in states, more transitions)
>>>>
>>>> Some platforms had their higher OPPs with the same voltage and they
>>>> failed to cool down the CPU in the long run.
>>>>
>>>> Anyway, there is only one way to check it out :)
>>>>
>>>> Alexey, is it possible to compare the compute duration for 'dhrystone'
>>>> with these voltage OPP and without ? (with a period of cool down
>>>> between
>>>> the test in order to start at the same thermal condition) ?
>>>
>>> Sure, let me try that - would be interesting to see the results. In my
>>> previous tinkering there were cases when the system stayed at 2.35GHz
>>> for all big cores for non-trivial time (using the step-wise thermal
>>> governor), and that's an example of "same voltage, lower frequency".
>>> Other times though it throttled one cluster down to 1.8GHz and kept
>>> the other at 2.4GHz, and was also stationary at those parameters for
>>> extended time. This probably indicates that both of those states use
>>> sustainable power in my cooling setup.
>>
>> IMHO, there are simply too many factors at play, including different
>> possible cooling setups, so providing additional CPU throttling
>> granularity can only be helpful.  Of course, testing and recording
>> data is the way to move forward, but I think we should use a few
>> different tests.
> 
> Soooo, benchmarking these turned out a bit trickier than I had hoped
> for. Apparently, dhrystone uses an unsigned int rather than an
> unsigned long for the loops count (or something of that sort), which
> means that I can't get it to run enough loops to heat up my chip from
> a stable idle state to the throttling state (due to counter
> wraparound). So I ended up with a couple of crutches, namely:
>   - run dhrystone continuously on 6 out of 8 cores to make the chip
> warm enough (`taskset -c 0-5 ./dhrystone -t 6 -r 6000` - note that on
> my machine cores 6-7 are usually the first ones to get throttled, due
> to whatever thermal peculiarities)
>   - wait for the temperature to stabilize (which happens at 79.5C)
>   - then run timed dhrystone on the remaining 2 out of 6 cores (big
> ones) to see how throttling with different OPP tables affects overall
> performance.

Thanks for taking the time to test.

> In the end, here's what I got with the 'original' OPP table (including
> "same voltage - different frequencies" states):
> alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> duration: 0 seconds
> number of threads: 2
> number of loops: 4000000000000000
> delay between starting threads: 0 seconds
> 
> Dhrystone(1.1) time for 1233977344 passes = 29.7
> This machine benchmarks at 41481539 dhrystones/second
>                             23609 DMIPS
> Dhrystone(1.1) time for 1233977344 passes = 29.8
> This machine benchmarks at 41476618 dhrystones/second
>                             23606 DMIPS
> 
> Total dhrystone run time: 30.864492 seconds.
> 
> And here's what I got with the 'reduced' OPP table (keeping only the
> highest frequency state for each voltage):
> alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> duration: 0 seconds
> number of threads: 2
> number of loops: 4000000000000000
> delay between starting threads: 0 seconds
> 
> Dhrystone(1.1) time for 1233977344 passes = 30.9
> This machine benchmarks at 39968549 dhrystones/second
>                            22748 DMIPS
> Dhrystone(1.1) time for 1233977344 passes = 31.0
> This machine benchmarks at 39817431 dhrystones/second
>                            22662 DMIPS
> 
> Total dhrystone run time: 31.995136 seconds.
> 
> Bottomline: removing the lower-frequency OPPs led to a 3.8% drop in
> performance in this setup. This is probably far from a reliable
> estimate, but I guess it indeed indicates that having lower-frequency
> states might be beneficial in some load scenarios.

What is the duration between these two tests?

I would be curious if it is repeatable by inverting the setup (reduced 
OPP table and then original OPP table).

BTW: I used -l 10000 for a ~30 seconds workload more or less on the 
rk3399, may be -l 20000 will be ok for the rk3588.

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-28 15:06                       ` Daniel Lezcano
@ 2024-01-28 19:32                         ` Alexey Charkov
  0 siblings, 0 replies; 34+ messages in thread
From: Alexey Charkov @ 2024-01-28 19:32 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Dragan Simic, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On Sun, Jan 28, 2024 at 7:06 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
>
> Hi Alexey,

Hi Daniel,

> On 27/01/2024 20:41, Alexey Charkov wrote:
> > On Sat, Jan 27, 2024 at 12:33 AM Dragan Simic <dsimic@manjaro.org> wrote:
> >>
> >> On 2024-01-26 14:44, Alexey Charkov wrote:
> >>> On Fri, Jan 26, 2024 at 4:56 PM Daniel Lezcano
> >>> <daniel.lezcano@linaro.org> wrote:
> >>>> On 26/01/2024 08:49, Dragan Simic wrote:
> >>>>> On 2024-01-26 08:30, Alexey Charkov wrote:
> >>>>>> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> wrote:
> >>>>>>> On 2024-01-26 07:44, Alexey Charkov wrote:
> >>>>>>>> On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
> >>>>>>>> wrote:
> >>>>>>>>> On 2024-01-25 10:30, Daniel Lezcano wrote:
> >>>>>>>>>> On 24/01/2024 21:30, Alexey Charkov wrote:
> >>>>>>>>>>> By default the CPUs on RK3588 start up in a conservative
> >>>>>>> performance
> >>>>>>>>>>> mode. Add frequency and voltage mappings to the device tree to
> >>>>>>> enable
> >>>>
> >>>> [ ... ]
> >>>>
> >>>>>> Throttling would also lower the voltage at some point, which cools it
> >>>>>> down much faster!
> >>>>>
> >>>>> Of course, but the key is not to cool (and slow down) the CPU cores too
> >>>>> much, but just enough to stay within the available thermal envelope,
> >>>>> which is where the same-voltage, lower-frequency OPPs should shine.
> >>>>
> >>>> That implies the resulting power is sustainable which I doubt it is
> >>>> the
> >>>> case.
> >>>>
> >>>> The voltage scaling makes the cooling effect efficient not the
> >>>> frequency.
> >>>>
> >>>> For example:
> >>>>          opp5 = opp(2GHz, 1V) => 2 BogoWatt
> >>>>          opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
> >>>>          opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
> >>>>          [ other states but we focus on these 3 ]
> >>>>
> >>>> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
> >>>> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
> >>>>
> >>>> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
> >>>>
> >>>> In burst operation (no thermal throttling), opp4 is pointless we agree
> >>>> on that.
> >>>>
> >>>> IMO the following will happen: in burst operation with thermal
> >>>> throttling we hit the trip point and then the step wise governor
> >>>> reduces
> >>>> opp5 -> opp4. We have slight power reduction but the temperature does
> >>>> not decrease, so at the next iteration, it is throttle at opp3. And at
> >>>> the end we have opp4 <-> opp3 back and forth instead of opp5 <-> opp3.
> >>>>
> >>>> It is probable we end up with an equivalent frequency average (or
> >>>> compute capacity avg).
> >>>>
> >>>> opp4 <-> opp3 (longer duration in states, less transitions)
> >>>> opp5 <-> opp3 (shorter duration in states, more transitions)
> >>>>
> >>>> Some platforms had their higher OPPs with the same voltage and they
> >>>> failed to cool down the CPU in the long run.
> >>>>
> >>>> Anyway, there is only one way to check it out :)
> >>>>
> >>>> Alexey, is it possible to compare the compute duration for 'dhrystone'
> >>>> with these voltage OPP and without ? (with a period of cool down
> >>>> between
> >>>> the test in order to start at the same thermal condition) ?
> >>>
> >>> Sure, let me try that - would be interesting to see the results. In my
> >>> previous tinkering there were cases when the system stayed at 2.35GHz
> >>> for all big cores for non-trivial time (using the step-wise thermal
> >>> governor), and that's an example of "same voltage, lower frequency".
> >>> Other times though it throttled one cluster down to 1.8GHz and kept
> >>> the other at 2.4GHz, and was also stationary at those parameters for
> >>> extended time. This probably indicates that both of those states use
> >>> sustainable power in my cooling setup.
> >>
> >> IMHO, there are simply too many factors at play, including different
> >> possible cooling setups, so providing additional CPU throttling
> >> granularity can only be helpful.  Of course, testing and recording
> >> data is the way to move forward, but I think we should use a few
> >> different tests.
> >
> > Soooo, benchmarking these turned out a bit trickier than I had hoped
> > for. Apparently, dhrystone uses an unsigned int rather than an
> > unsigned long for the loops count (or something of that sort), which
> > means that I can't get it to run enough loops to heat up my chip from
> > a stable idle state to the throttling state (due to counter
> > wraparound). So I ended up with a couple of crutches, namely:
> >   - run dhrystone continuously on 6 out of 8 cores to make the chip
> > warm enough (`taskset -c 0-5 ./dhrystone -t 6 -r 6000` - note that on
> > my machine cores 6-7 are usually the first ones to get throttled, due
> > to whatever thermal peculiarities)
> >   - wait for the temperature to stabilize (which happens at 79.5C)
> >   - then run timed dhrystone on the remaining 2 out of 6 cores (big
> > ones) to see how throttling with different OPP tables affects overall
> > performance.
>
> Thanks for taking the time to test.
>
> > In the end, here's what I got with the 'original' OPP table (including
> > "same voltage - different frequencies" states):
> > alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> > duration: 0 seconds
> > number of threads: 2
> > number of loops: 4000000000000000
> > delay between starting threads: 0 seconds
> >
> > Dhrystone(1.1) time for 1233977344 passes = 29.7
> > This machine benchmarks at 41481539 dhrystones/second
> >                             23609 DMIPS
> > Dhrystone(1.1) time for 1233977344 passes = 29.8
> > This machine benchmarks at 41476618 dhrystones/second
> >                             23606 DMIPS
> >
> > Total dhrystone run time: 30.864492 seconds.
> >
> > And here's what I got with the 'reduced' OPP table (keeping only the
> > highest frequency state for each voltage):
> > alchark@rock-5b ~ $ taskset -c 6-7 ./dhrystone -t 2 -l 4000000000
> > duration: 0 seconds
> > number of threads: 2
> > number of loops: 4000000000000000
> > delay between starting threads: 0 seconds
> >
> > Dhrystone(1.1) time for 1233977344 passes = 30.9
> > This machine benchmarks at 39968549 dhrystones/second
> >                            22748 DMIPS
> > Dhrystone(1.1) time for 1233977344 passes = 31.0
> > This machine benchmarks at 39817431 dhrystones/second
> >                            22662 DMIPS
> >
> > Total dhrystone run time: 31.995136 seconds.
> >
> > Bottomline: removing the lower-frequency OPPs led to a 3.8% drop in
> > performance in this setup. This is probably far from a reliable
> > estimate, but I guess it indeed indicates that having lower-frequency
> > states might be beneficial in some load scenarios.
>
> What is the duration between these two tests?

Several hours and a couple of reboots. I did the first one, recorded
the results and the temperatures, then rebuilt the dtb the next day,
rebooted with it and did everything again with the other OPP table.

> I would be curious if it is repeatable by inverting the setup (reduced
> OPP table and then original OPP table).

Frankly, I can't see how ordering could have mattered, given that I
let the system cool down completely, and also rebooted it to use a
different dtb, so there shouldn't have been any caching effects. Maybe
there is some outside randomness in the results though - perhaps 5-10
repetitions in each case would have been more statistically
meaningful. But then again to make it statistically meaningful I'd
have to peg the other (non-benchmarked) cores to a static OPP to
ensure the thermal governor doesn't play with them when not asked to -
and it all starts to sound like a rabbit hole :)

> BTW: I used -l 10000 for a ~30 seconds workload more or less on the
> rk3399, may be -l 20000 will be ok for the rk3588.

-l 20000 with two threads also gives me about ~30 seconds runtime...
While -l 200000 completed in 25 seconds *facepalm*

Best regards,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
  2024-01-26 12:56               ` Daniel Lezcano
  2024-01-26 13:44                 ` Alexey Charkov
@ 2024-01-26 20:04                 ` Dragan Simic
  1 sibling, 0 replies; 34+ messages in thread
From: Dragan Simic @ 2024-01-26 20:04 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Alexey Charkov, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiko Stuebner, devicetree, linux-arm-kernel, linux-rockchip,
	linux-kernel, Viresh Kumar

On 2024-01-26 13:56, Daniel Lezcano wrote:
> On 26/01/2024 08:49, Dragan Simic wrote:
>> On 2024-01-26 08:30, Alexey Charkov wrote:
>>> On Fri, Jan 26, 2024 at 11:05 AM Dragan Simic <dsimic@manjaro.org> 
>>> wrote:
>>>> On 2024-01-26 07:44, Alexey Charkov wrote:
>>>> > On Fri, Jan 26, 2024 at 10:32 AM Dragan Simic <dsimic@manjaro.org>
>>>> > wrote:
>>>> >> On 2024-01-25 10:30, Daniel Lezcano wrote:
>>>> >> > On 24/01/2024 21:30, Alexey Charkov wrote:
>>>> >> >> By default the CPUs on RK3588 start up in a conservative performance
>>>> >> >> mode. Add frequency and voltage mappings to the device tree to enable
> 
> [ ... ]
> 
>>> Throttling would also lower the voltage at some point, which cools it
>>> down much faster!
>> 
>> Of course, but the key is not to cool (and slow down) the CPU cores 
>> too
>> much, but just enough to stay within the available thermal envelope,
>> which is where the same-voltage, lower-frequency OPPs should shine.
> 
> That implies the resulting power is sustainable which I doubt it is the 
> case.

Hmm, why wouldn't it be sustainable?  Would you elaborate a bit, please?
I mean, there are so many factors that can't be known for sure in 
advance,
so providing additional CPU throttling granularity can only be helpful.

> The voltage scaling makes the cooling effect efficient not the 
> frequency.
> 
> For example:
> 	opp5 = opp(2GHz, 1V) => 2 BogoWatt
> 	opp4 = opp(1.9GHz, 1V) => 1.9 BogoWatt
> 	opp3 = opp(1.8GHz, 0.9V) => 1.458 BogoWatt
> 	[ other states but we focus on these 3 ]
> 
> opp5->opp4 => -5% compute capacity, -5% power, ratio=1
> opp4->opp3 => -5% compute capacity, -23.1% power, ratio=21,6
> 
> opp5->opp3 => -10% compute capacity, -27.1% power, ratio=36.9
> 
> In burst operation (no thermal throttling), opp4 is pointless we agree 
> on that.

Well, if there's no thermal throtting at all, the opp3 is also not
needed.  In an unlikely scenario like that, the opp5 is all we need.

> IMO the following will happen: in burst operation with thermal
> throttling we hit the trip point and then the step wise governor
> reduces opp5 -> opp4. We have slight power reduction but the
> temperature does not decrease, so at the next iteration, it is
> throttle at opp3. And at the end we have opp4 <-> opp3 back and forth
> instead of opp5 <-> opp3.

Why should the temperature not decrease when switching from the opp5
to the opp4?  See, we can't assume or know in advance that reducing
the power consumption by 5% wouldn't do anything;  5% is actually
quite a lot.  If that would do absolutely nothing, then something
else would probably be wrong or not as expected.

Also, for some workloads it might be better to have rather frequent
transitions between the opp4 and the opp3, instead of staying at the
opp3 for longer priods of time.  Running 100 MHz faster can be quite
significant, especially on two CPU cores.

> It is probable we end up with an equivalent frequency average (or
> compute capacity avg).
> 
> opp4 <-> opp3 (longer duration in states, less transitions)
> opp5 <-> opp3 (shorter duration in states, more transitions)
> 
> Some platforms had their higher OPPs with the same voltage and they
> failed to cool down the CPU in the long run.
> 
> Anyway, there is only one way to check it out :)
> 
> Alexey, is it possible to compare the compute duration for 'dhrystone'
> with these voltage OPP and without ? (with a period of cool down
> between the test in order to start at the same thermal condition) ?

I agree that testing and recording as much data as possible is the best
approach.  However, quite frankly, we should run more different tests,
not only one synthetic test.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-01-29  7:39 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24 20:30 [PATCH 0/4] RK3588 and Rock 5B dts additions: thermal, OPP, rfkill and fan Alexey Charkov
2024-01-24 20:30 ` [PATCH 1/4] arm64: dts: rockchip: add rfkill node for M.2 Key E WiFi on rock-5b Alexey Charkov
2024-01-24 20:30 ` [PATCH 2/4] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588 Alexey Charkov
2024-01-24 21:56   ` Daniel Lezcano
2024-01-25  8:26     ` Alexey Charkov
2024-01-25 10:02       ` Daniel Lezcano
2024-01-25 14:46         ` Alexey Charkov
2024-01-25 22:19       ` Dragan Simic
2024-01-25 13:34     ` Diederik de Haas
2024-01-24 20:30 ` [PATCH 3/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B Alexey Charkov
2024-01-24 21:59   ` Daniel Lezcano
2024-01-25 23:13   ` Dragan Simic
2024-01-27 20:27     ` Dragan Simic
2024-01-28 20:08       ` Alexey Charkov
2024-01-29  0:46         ` Dragan Simic
2024-01-24 20:30 ` [PATCH 4/4] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588 Alexey Charkov
2024-01-25  9:30   ` Daniel Lezcano
2024-01-25 10:17     ` Alexey Charkov
2024-01-26  6:32     ` Dragan Simic
2024-01-26  6:44       ` Alexey Charkov
2024-01-26  7:04         ` Dragan Simic
2024-01-26  7:30           ` Alexey Charkov
2024-01-26  7:49             ` Dragan Simic
2024-01-26 12:56               ` Daniel Lezcano
2024-01-26 13:44                 ` Alexey Charkov
2024-01-26 20:33                   ` Dragan Simic
2024-01-27 19:41                     ` Alexey Charkov
2024-01-28  3:35                       ` Dragan Simic
2024-01-28 19:14                         ` Alexey Charkov
2024-01-29  0:09                           ` Dragan Simic
2024-01-29  7:39                             ` Dragan Simic
2024-01-28 15:06                       ` Daniel Lezcano
2024-01-28 19:32                         ` Alexey Charkov
2024-01-26 20:04                 ` Dragan Simic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).