All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-22 20:30 ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
in charge of deciding on state transition based on target state, state idle
time, and some other Tegra CPU core cluster states information.

Current PSCI specification don't have function defined for passing runtime
state idle time predicted by governor (based on next events and state target
residency) to ARM trusted firmware.

With the support of adding new PSCI function to allow passing runtime state
idle time from kernel to ARM trusted firmware, Tegra194 platforms can use
generic psci cpuidle driver rather than having Tegra specific cpuidle driver.

During Tegra specific cpuidle driver V1 review, Sudeep Holla from ARM also
suggested to use generic cpuidle driver by generalizing the need of runtime
state idle time.

So had internal discussion between ARM and NVIDIA on adding new PSCI function
to allow passing runtime state idle time from kernel to TF-A through PSCI and
once this implementation is accepted by upstream, ARM will look into further
to update PSCI specification for this new PSCI function.

So sending these patches as RFC as new PSCI function added in this series is
not part of PSCI specification and once this implementation is accepted by ARM
and upstream community, ARM can help to take this forward to add to PSCI
specification.

To keep the backward compatibility we can't update CPU_SUSPEND function to pass
state idle time argument. So added seperate function for passing state idle time
and serializing this with cpu suspend state enter.

Once this approach is agreed, we can either use this way of separate PSCI
function for passing state idle time or with PSCI specification update we can
use same CPU_SUSPEND function with extra argument for state idle time which can
be decided later for final patches based on discussion with ARM.


Sowjanya Komatineni (4):
  firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME
  cpuidle: menu: add idle_time to cpuidle_state
  cpuidle: psci: pass state idle time before state enter callback
  arm64: dts: tegra194: Add CPU idle states

 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 19 +++++++++++++++++++
 drivers/cpuidle/cpuidle-psci.c           | 11 +++++++++++
 drivers/cpuidle/governors/menu.c         |  7 ++++++-
 drivers/firmware/psci/psci.c             |  9 +++++++++
 include/linux/cpuidle.h                  |  1 +
 include/linux/psci.h                     |  1 +
 include/uapi/linux/psci.h                |  2 ++
 7 files changed, 49 insertions(+), 1 deletion(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-22 20:30 ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
in charge of deciding on state transition based on target state, state idle
time, and some other Tegra CPU core cluster states information.

Current PSCI specification don't have function defined for passing runtime
state idle time predicted by governor (based on next events and state target
residency) to ARM trusted firmware.

With the support of adding new PSCI function to allow passing runtime state
idle time from kernel to ARM trusted firmware, Tegra194 platforms can use
generic psci cpuidle driver rather than having Tegra specific cpuidle driver.

During Tegra specific cpuidle driver V1 review, Sudeep Holla from ARM also
suggested to use generic cpuidle driver by generalizing the need of runtime
state idle time.

So had internal discussion between ARM and NVIDIA on adding new PSCI function
to allow passing runtime state idle time from kernel to TF-A through PSCI and
once this implementation is accepted by upstream, ARM will look into further
to update PSCI specification for this new PSCI function.

So sending these patches as RFC as new PSCI function added in this series is
not part of PSCI specification and once this implementation is accepted by ARM
and upstream community, ARM can help to take this forward to add to PSCI
specification.

To keep the backward compatibility we can't update CPU_SUSPEND function to pass
state idle time argument. So added seperate function for passing state idle time
and serializing this with cpu suspend state enter.

Once this approach is agreed, we can either use this way of separate PSCI
function for passing state idle time or with PSCI specification update we can
use same CPU_SUSPEND function with extra argument for state idle time which can
be decided later for final patches based on discussion with ARM.


Sowjanya Komatineni (4):
  firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME
  cpuidle: menu: add idle_time to cpuidle_state
  cpuidle: psci: pass state idle time before state enter callback
  arm64: dts: tegra194: Add CPU idle states

 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 19 +++++++++++++++++++
 drivers/cpuidle/cpuidle-psci.c           | 11 +++++++++++
 drivers/cpuidle/governors/menu.c         |  7 ++++++-
 drivers/firmware/psci/psci.c             |  9 +++++++++
 include/linux/cpuidle.h                  |  1 +
 include/linux/psci.h                     |  1 +
 include/uapi/linux/psci.h                |  2 ++
 7 files changed, 49 insertions(+), 1 deletion(-)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/4] firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME
  2021-04-22 20:30 ` Sowjanya Komatineni
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

This patch adds support for new PSCI function ID SET_STATE_IDLE_TIME.

Some platforms use separate CPU firmware to handle all state transition
based on state enter request from kernel and may need runtime state
idle time of the corresponding state from kernel to pass to TF-A through
PSCI.

Current PSCI specification has no way of passing runtime state information
from kernel to TF-A.

So, this patch adds implementation for new PSCI function ID for this
purpose.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 drivers/firmware/psci/psci.c | 9 +++++++++
 include/linux/psci.h         | 1 +
 include/uapi/linux/psci.h    | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index f5bd0dc..3bd63d7 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -180,6 +180,14 @@ static int psci_0_1_cpu_suspend(u32 state, unsigned long entry_point)
 				  state, entry_point);
 }
 
+static int psci_set_state_idle_time(u32 idle_time)
+{
+	int err;
+
+	err = invoke_psci_fn(PSCI_1_1_FN_SET_STATE_IDLE_TIME, idle_time, 0, 0);
+	return psci_to_linux_errno(err);
+}
+
 static int psci_0_2_cpu_suspend(u32 state, unsigned long entry_point)
 {
 	return __psci_cpu_suspend(PSCI_FN_NATIVE(0_2, CPU_SUSPEND),
@@ -470,6 +478,7 @@ static void __init psci_0_2_set_functions(void)
 		.migrate = psci_0_2_migrate,
 		.affinity_info = psci_affinity_info,
 		.migrate_info_type = psci_migrate_info_type,
+		.set_state_idle_time = psci_set_state_idle_time,
 	};
 
 	arm_pm_restart = psci_sys_reset;
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 4ca0060..6643bfd 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -30,6 +30,7 @@ struct psci_operations {
 	int (*affinity_info)(unsigned long target_affinity,
 			unsigned long lowest_affinity_level);
 	int (*migrate_info_type)(void);
+	int (*set_state_idle_time)(u32 idle_time);
 };
 
 extern struct psci_operations psci_ops;
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 2fcad1d..0013db74 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -55,6 +55,8 @@
 #define PSCI_1_0_FN64_SYSTEM_SUSPEND		PSCI_0_2_FN64(14)
 #define PSCI_1_1_FN64_SYSTEM_RESET2		PSCI_0_2_FN64(18)
 
+#define PSCI_1_1_FN_SET_STATE_IDLE_TIME		PSCI_0_2_FN(25)
+
 /* PSCI v0.2 power state encoding for CPU_SUSPEND function */
 #define PSCI_0_2_POWER_STATE_ID_MASK		0xffff
 #define PSCI_0_2_POWER_STATE_ID_SHIFT		0
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/4] firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

This patch adds support for new PSCI function ID SET_STATE_IDLE_TIME.

Some platforms use separate CPU firmware to handle all state transition
based on state enter request from kernel and may need runtime state
idle time of the corresponding state from kernel to pass to TF-A through
PSCI.

Current PSCI specification has no way of passing runtime state information
from kernel to TF-A.

So, this patch adds implementation for new PSCI function ID for this
purpose.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 drivers/firmware/psci/psci.c | 9 +++++++++
 include/linux/psci.h         | 1 +
 include/uapi/linux/psci.h    | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index f5bd0dc..3bd63d7 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -180,6 +180,14 @@ static int psci_0_1_cpu_suspend(u32 state, unsigned long entry_point)
 				  state, entry_point);
 }
 
+static int psci_set_state_idle_time(u32 idle_time)
+{
+	int err;
+
+	err = invoke_psci_fn(PSCI_1_1_FN_SET_STATE_IDLE_TIME, idle_time, 0, 0);
+	return psci_to_linux_errno(err);
+}
+
 static int psci_0_2_cpu_suspend(u32 state, unsigned long entry_point)
 {
 	return __psci_cpu_suspend(PSCI_FN_NATIVE(0_2, CPU_SUSPEND),
@@ -470,6 +478,7 @@ static void __init psci_0_2_set_functions(void)
 		.migrate = psci_0_2_migrate,
 		.affinity_info = psci_affinity_info,
 		.migrate_info_type = psci_migrate_info_type,
+		.set_state_idle_time = psci_set_state_idle_time,
 	};
 
 	arm_pm_restart = psci_sys_reset;
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 4ca0060..6643bfd 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -30,6 +30,7 @@ struct psci_operations {
 	int (*affinity_info)(unsigned long target_affinity,
 			unsigned long lowest_affinity_level);
 	int (*migrate_info_type)(void);
+	int (*set_state_idle_time)(u32 idle_time);
 };
 
 extern struct psci_operations psci_ops;
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 2fcad1d..0013db74 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -55,6 +55,8 @@
 #define PSCI_1_0_FN64_SYSTEM_SUSPEND		PSCI_0_2_FN64(14)
 #define PSCI_1_1_FN64_SYSTEM_RESET2		PSCI_0_2_FN64(18)
 
+#define PSCI_1_1_FN_SET_STATE_IDLE_TIME		PSCI_0_2_FN(25)
+
 /* PSCI v0.2 power state encoding for CPU_SUSPEND function */
 #define PSCI_0_2_POWER_STATE_ID_MASK		0xffff
 #define PSCI_0_2_POWER_STATE_ID_SHIFT		0
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
  2021-04-22 20:30 ` Sowjanya Komatineni
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Some platforms use separate CPU firmware running in background to
handle state transitions which may need runtime idle time of the
corresponding target state from the kernel.

This patch adds idle_time to cpuidle state to expose to cpuidle driver the
idle time that the governor menu predicts based on next events and states
target residency for selecting proper idle state.

CPU idle driver passes this runtime state idle time to TF-A.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 drivers/cpuidle/governors/menu.c | 7 ++++++-
 include/linux/cpuidle.h          | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index c3aa8d6..0da5bc5 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -382,8 +382,10 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			 * stuck in the shallow one for too long.
 			 */
 			if (drv->states[idx].target_residency_ns < TICK_NSEC &&
-			    s->target_residency_ns <= delta_tick)
+			    s->target_residency_ns <= delta_tick) {
+				drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
 				idx = i;
+			}
 
 			return idx;
 		}
@@ -393,6 +395,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		idx = i;
 	}
 
+	drv->states[idx].idle_time = predicted_ns / NSEC_PER_USEC;
 	if (idx == -1)
 		idx = 0; /* No states enabled. Must use 0. */
 
@@ -419,6 +422,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 				if (drv->states[i].target_residency_ns <= delta_tick)
 					break;
 			}
+
+			drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
 		}
 	}
 
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index fce4762..12db2e9 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -55,6 +55,7 @@ struct cpuidle_state {
 	unsigned int	exit_latency; /* in US */
 	int		power_usage; /* in mW */
 	unsigned int	target_residency; /* in US */
+	unsigned int	idle_time; /* in US */
 
 	int (*enter)	(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Some platforms use separate CPU firmware running in background to
handle state transitions which may need runtime idle time of the
corresponding target state from the kernel.

This patch adds idle_time to cpuidle state to expose to cpuidle driver the
idle time that the governor menu predicts based on next events and states
target residency for selecting proper idle state.

CPU idle driver passes this runtime state idle time to TF-A.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 drivers/cpuidle/governors/menu.c | 7 ++++++-
 include/linux/cpuidle.h          | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index c3aa8d6..0da5bc5 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -382,8 +382,10 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			 * stuck in the shallow one for too long.
 			 */
 			if (drv->states[idx].target_residency_ns < TICK_NSEC &&
-			    s->target_residency_ns <= delta_tick)
+			    s->target_residency_ns <= delta_tick) {
+				drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
 				idx = i;
+			}
 
 			return idx;
 		}
@@ -393,6 +395,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		idx = i;
 	}
 
+	drv->states[idx].idle_time = predicted_ns / NSEC_PER_USEC;
 	if (idx == -1)
 		idx = 0; /* No states enabled. Must use 0. */
 
@@ -419,6 +422,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 				if (drv->states[i].target_residency_ns <= delta_tick)
 					break;
 			}
+
+			drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
 		}
 	}
 
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index fce4762..12db2e9 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -55,6 +55,7 @@ struct cpuidle_state {
 	unsigned int	exit_latency; /* in US */
 	int		power_usage; /* in mW */
 	unsigned int	target_residency; /* in US */
+	unsigned int	idle_time; /* in US */
 
 	int (*enter)	(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/4] cpuidle: psci: pass state idle time before state enter callback
  2021-04-22 20:30 ` Sowjanya Komatineni
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Some platforms use separate CPU firmware to handle all state transition
decisions and may need runtime state idle time of the target state it is
going to enter.

This patch calls set_state_idle_time psci operation callback to pass the
state idle time to TF-A through PSCI before calling psci_enter_state.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 drivers/cpuidle/cpuidle-psci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index b51b5df..22a5003 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -151,6 +151,17 @@ static int psci_enter_idle_state(struct cpuidle_device *dev,
 {
 	u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
 
+	/*
+	 * Some platforms use separate CPU firmware running in background to
+	 * handle state transition which may need runtime idle time of the
+	 * corresponding state from kernel.
+	 * So, pass the state idle time from kernel to TF-A firmware through
+	 * PSCI. Platform specific TF-A firmware may update this to CPU
+	 * firmware.
+	 */
+	if (idx)
+		psci_ops.set_state_idle_time(drv->states[idx].idle_time);
+
 	return psci_enter_state(idx, state[idx]);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/4] cpuidle: psci: pass state idle time before state enter callback
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Some platforms use separate CPU firmware to handle all state transition
decisions and may need runtime state idle time of the target state it is
going to enter.

This patch calls set_state_idle_time psci operation callback to pass the
state idle time to TF-A through PSCI before calling psci_enter_state.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 drivers/cpuidle/cpuidle-psci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index b51b5df..22a5003 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -151,6 +151,17 @@ static int psci_enter_idle_state(struct cpuidle_device *dev,
 {
 	u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
 
+	/*
+	 * Some platforms use separate CPU firmware running in background to
+	 * handle state transition which may need runtime idle time of the
+	 * corresponding state from kernel.
+	 * So, pass the state idle time from kernel to TF-A firmware through
+	 * PSCI. Platform specific TF-A firmware may update this to CPU
+	 * firmware.
+	 */
+	if (idx)
+		psci_ops.set_state_idle_time(drv->states[idx].idle_time);
+
 	return psci_enter_state(idx, state[idx]);
 }
 
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/4] arm64: dts: tegra194: Add CPU idle states
  2021-04-22 20:30 ` Sowjanya Komatineni
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

This patch adds CPU core and cluster idle states to Tegra194
device tree

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 9449156..c3b478e 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -2161,6 +2161,7 @@
 			device_type = "cpu";
 			reg = <0x000>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2175,6 +2176,7 @@
 			device_type = "cpu";
 			reg = <0x001>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2189,6 +2191,7 @@
 			device_type = "cpu";
 			reg = <0x100>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2203,6 +2206,7 @@
 			device_type = "cpu";
 			reg = <0x101>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2217,6 +2221,7 @@
 			device_type = "cpu";
 			reg = <0x200>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2231,6 +2236,7 @@
 			device_type = "cpu";
 			reg = <0x201>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2245,6 +2251,7 @@
 			device_type = "cpu";
 			reg = <0x300>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2259,6 +2266,7 @@
 			device_type = "cpu";
 			reg = <0x301>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2343,6 +2351,17 @@
 			cache-line-size = <64>;
 			cache-sets = <4096>;
 		};
+
+		idle-states {
+			entry-method = "arm,psci";
+			C6: c6 {
+				compatible = "arm,idle-state";
+				wakeup-latency-us = <2000>;
+				min-residency-us = <30000>;
+				arm,psci-suspend-param = <0x6>;
+				status = "okay";
+			};
+		};
 	};
 
 	psci {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/4] arm64: dts: tegra194: Add CPU idle states
@ 2021-04-22 20:30   ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-22 20:30 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, skomatineni,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

This patch adds CPU core and cluster idle states to Tegra194
device tree

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 9449156..c3b478e 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -2161,6 +2161,7 @@
 			device_type = "cpu";
 			reg = <0x000>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2175,6 +2176,7 @@
 			device_type = "cpu";
 			reg = <0x001>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2189,6 +2191,7 @@
 			device_type = "cpu";
 			reg = <0x100>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2203,6 +2206,7 @@
 			device_type = "cpu";
 			reg = <0x101>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2217,6 +2221,7 @@
 			device_type = "cpu";
 			reg = <0x200>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2231,6 +2236,7 @@
 			device_type = "cpu";
 			reg = <0x201>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2245,6 +2251,7 @@
 			device_type = "cpu";
 			reg = <0x300>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2259,6 +2266,7 @@
 			device_type = "cpu";
 			reg = <0x301>;
 			enable-method = "psci";
+			cpu-idle-states = <&C6>;
 			i-cache-size = <131072>;
 			i-cache-line-size = <64>;
 			i-cache-sets = <512>;
@@ -2343,6 +2351,17 @@
 			cache-line-size = <64>;
 			cache-sets = <4096>;
 		};
+
+		idle-states {
+			entry-method = "arm,psci";
+			C6: c6 {
+				compatible = "arm,idle-state";
+				wakeup-latency-us = <2000>;
+				min-residency-us = <30000>;
+				arm,psci-suspend-param = <0x6>;
+				status = "okay";
+			};
+		};
 	};
 
 	psci {
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
  2021-04-22 20:30   ` Sowjanya Komatineni
  (?)
@ 2021-04-23  0:46   ` kernel test robot
  -1 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-04-23  0:46 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8588 bytes --]

Hi Sowjanya,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on pm/linux-next]
[also build test ERROR on next-20210422]
[cannot apply to robh/for-next linus/master daniel.lezcano/clockevents/next v5.12-rc8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sowjanya-Komatineni/Support-for-passing-runtime-state-idle-time-to-TF-A/20210423-043308
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-s001-20210421 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/072299f9d8ef2e6ed90ddd7ecb17598ff1618b0d
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sowjanya-Komatineni/Support-for-passing-runtime-state-idle-time-to-TF-A/20210423-043308
        git checkout 072299f9d8ef2e6ed90ddd7ecb17598ff1618b0d
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/cpuidle/governors/menu.o: in function `menu_select':
>> drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
>> ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
>> ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
>> ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'


vim +398 drivers/cpuidle/governors/menu.c

   258	
   259	/**
   260	 * menu_select - selects the next idle state to enter
   261	 * @drv: cpuidle driver containing state data
   262	 * @dev: the CPU
   263	 * @stop_tick: indication on whether or not to stop the tick
   264	 */
   265	static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
   266			       bool *stop_tick)
   267	{
   268		struct menu_device *data = this_cpu_ptr(&menu_devices);
   269		s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
   270		unsigned int predicted_us;
   271		u64 predicted_ns;
   272		u64 interactivity_req;
   273		unsigned long nr_iowaiters;
   274		ktime_t delta, delta_tick;
   275		int i, idx;
   276	
   277		if (data->needs_update) {
   278			menu_update(drv, dev);
   279			data->needs_update = 0;
   280		}
   281	
   282		/* determine the expected residency time, round up */
   283		delta = tick_nohz_get_sleep_length(&delta_tick);
   284		if (unlikely(delta < 0)) {
   285			delta = 0;
   286			delta_tick = 0;
   287		}
   288		data->next_timer_ns = delta;
   289	
   290		nr_iowaiters = nr_iowait_cpu(dev->cpu);
   291		data->bucket = which_bucket(data->next_timer_ns, nr_iowaiters);
   292	
   293		if (unlikely(drv->state_count <= 1 || latency_req == 0) ||
   294		    ((data->next_timer_ns < drv->states[1].target_residency_ns ||
   295		      latency_req < drv->states[1].exit_latency_ns) &&
   296		     !dev->states_usage[0].disable)) {
   297			/*
   298			 * In this case state[0] will be used no matter what, so return
   299			 * it right away and keep the tick running if state[0] is a
   300			 * polling one.
   301			 */
   302			*stop_tick = !(drv->states[0].flags & CPUIDLE_FLAG_POLLING);
   303			return 0;
   304		}
   305	
   306		/* Round up the result for half microseconds. */
   307		predicted_us = div_u64(data->next_timer_ns *
   308				       data->correction_factor[data->bucket] +
   309				       (RESOLUTION * DECAY * NSEC_PER_USEC) / 2,
   310				       RESOLUTION * DECAY * NSEC_PER_USEC);
   311		/* Use the lowest expected idle interval to pick the idle state. */
   312		predicted_ns = (u64)min(predicted_us,
   313					get_typical_interval(data, predicted_us)) *
   314					NSEC_PER_USEC;
   315	
   316		if (tick_nohz_tick_stopped()) {
   317			/*
   318			 * If the tick is already stopped, the cost of possible short
   319			 * idle duration misprediction is much higher, because the CPU
   320			 * may be stuck in a shallow idle state for a long time as a
   321			 * result of it.  In that case say we might mispredict and use
   322			 * the known time till the closest timer event for the idle
   323			 * state selection.
   324			 */
   325			if (predicted_ns < TICK_NSEC)
   326				predicted_ns = data->next_timer_ns;
   327		} else {
   328			/*
   329			 * Use the performance multiplier and the user-configurable
   330			 * latency_req to determine the maximum exit latency.
   331			 */
   332			interactivity_req = div64_u64(predicted_ns,
   333						      performance_multiplier(nr_iowaiters));
   334			if (latency_req > interactivity_req)
   335				latency_req = interactivity_req;
   336		}
   337	
   338		/*
   339		 * Find the idle state with the lowest power while satisfying
   340		 * our constraints.
   341		 */
   342		idx = -1;
   343		for (i = 0; i < drv->state_count; i++) {
   344			struct cpuidle_state *s = &drv->states[i];
   345	
   346			if (dev->states_usage[i].disable)
   347				continue;
   348	
   349			if (idx == -1)
   350				idx = i; /* first enabled state */
   351	
   352			if (s->target_residency_ns > predicted_ns) {
   353				/*
   354				 * Use a physical idle state, not busy polling, unless
   355				 * a timer is going to trigger soon enough.
   356				 */
   357				if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
   358				    s->exit_latency_ns <= latency_req &&
   359				    s->target_residency_ns <= data->next_timer_ns) {
   360					predicted_ns = s->target_residency_ns;
   361					idx = i;
   362					break;
   363				}
   364				if (predicted_ns < TICK_NSEC)
   365					break;
   366	
   367				if (!tick_nohz_tick_stopped()) {
   368					/*
   369					 * If the state selected so far is shallow,
   370					 * waking up early won't hurt, so retain the
   371					 * tick in that case and let the governor run
   372					 * again in the next iteration of the loop.
   373					 */
   374					predicted_ns = drv->states[idx].target_residency_ns;
   375					break;
   376				}
   377	
   378				/*
   379				 * If the state selected so far is shallow and this
   380				 * state's target residency matches the time till the
   381				 * closest timer event, select this one to avoid getting
   382				 * stuck in the shallow one for too long.
   383				 */
   384				if (drv->states[idx].target_residency_ns < TICK_NSEC &&
   385				    s->target_residency_ns <= delta_tick) {
   386					drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
   387					idx = i;
   388				}
   389	
   390				return idx;
   391			}
   392			if (s->exit_latency_ns > latency_req)
   393				break;
   394	
   395			idx = i;
   396		}
   397	
 > 398		drv->states[idx].idle_time = predicted_ns / NSEC_PER_USEC;
   399		if (idx == -1)
   400			idx = 0; /* No states enabled. Must use 0. */
   401	
   402		/*
   403		 * Don't stop the tick if the selected state is a polling one or if the
   404		 * expected idle duration is shorter than the tick period length.
   405		 */
   406		if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
   407		     predicted_ns < TICK_NSEC) && !tick_nohz_tick_stopped()) {
   408			*stop_tick = false;
   409	
   410			if (idx > 0 && drv->states[idx].target_residency_ns > delta_tick) {
   411				/*
   412				 * The tick is not going to be stopped and the target
   413				 * residency of the state to be returned is not within
   414				 * the time until the next timer event including the
   415				 * tick, so try to correct that.
   416				 */
   417				for (i = idx - 1; i >= 0; i--) {
   418					if (dev->states_usage[i].disable)
   419						continue;
   420	
   421					idx = i;
   422					if (drv->states[i].target_residency_ns <= delta_tick)
   423						break;
   424				}
   425	
   426				drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
   427			}
   428		}
   429	
   430		return idx;
   431	}
   432	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 31502 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
  2021-04-22 20:30 ` Sowjanya Komatineni
@ 2021-04-23  1:03   ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23  1:03 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, mark.rutland,
	lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

kernel test robot reported below errors for patch-2 of this series. Will 
fix in next version to use div_u64() for nsec to usec conversion along 
with the other feedback I may be receiving.

drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'

Thanks

Sowjanya

On 4/22/21 1:30 PM, Sowjanya Komatineni wrote:
> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
> in charge of deciding on state transition based on target state, state idle
> time, and some other Tegra CPU core cluster states information.
>
> Current PSCI specification don't have function defined for passing runtime
> state idle time predicted by governor (based on next events and state target
> residency) to ARM trusted firmware.
>
> With the support of adding new PSCI function to allow passing runtime state
> idle time from kernel to ARM trusted firmware, Tegra194 platforms can use
> generic psci cpuidle driver rather than having Tegra specific cpuidle driver.
>
> During Tegra specific cpuidle driver V1 review, Sudeep Holla from ARM also
> suggested to use generic cpuidle driver by generalizing the need of runtime
> state idle time.
>
> So had internal discussion between ARM and NVIDIA on adding new PSCI function
> to allow passing runtime state idle time from kernel to TF-A through PSCI and
> once this implementation is accepted by upstream, ARM will look into further
> to update PSCI specification for this new PSCI function.
>
> So sending these patches as RFC as new PSCI function added in this series is
> not part of PSCI specification and once this implementation is accepted by ARM
> and upstream community, ARM can help to take this forward to add to PSCI
> specification.
>
> To keep the backward compatibility we can't update CPU_SUSPEND function to pass
> state idle time argument. So added seperate function for passing state idle time
> and serializing this with cpu suspend state enter.
>
> Once this approach is agreed, we can either use this way of separate PSCI
> function for passing state idle time or with PSCI specification update we can
> use same CPU_SUSPEND function with extra argument for state idle time which can
> be decided later for final patches based on discussion with ARM.
>
>
> Sowjanya Komatineni (4):
>    firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME
>    cpuidle: menu: add idle_time to cpuidle_state
>    cpuidle: psci: pass state idle time before state enter callback
>    arm64: dts: tegra194: Add CPU idle states
>
>   arch/arm64/boot/dts/nvidia/tegra194.dtsi | 19 +++++++++++++++++++
>   drivers/cpuidle/cpuidle-psci.c           | 11 +++++++++++
>   drivers/cpuidle/governors/menu.c         |  7 ++++++-
>   drivers/firmware/psci/psci.c             |  9 +++++++++
>   include/linux/cpuidle.h                  |  1 +
>   include/linux/psci.h                     |  1 +
>   include/uapi/linux/psci.h                |  2 ++
>   7 files changed, 49 insertions(+), 1 deletion(-)
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-23  1:03   ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23  1:03 UTC (permalink / raw)
  To: sudeep.holla, souvik.chakravarty, thierry.reding, mark.rutland,
	lorenzo.pieralisi, daniel.lezcano, robh+dt
  Cc: jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

kernel test robot reported below errors for patch-2 of this series. Will 
fix in next version to use div_u64() for nsec to usec conversion along 
with the other feedback I may be receiving.

drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'
ld: drivers/cpuidle/governors/menu.c:398: undefined reference to `__udivdi3'

Thanks

Sowjanya

On 4/22/21 1:30 PM, Sowjanya Komatineni wrote:
> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
> in charge of deciding on state transition based on target state, state idle
> time, and some other Tegra CPU core cluster states information.
>
> Current PSCI specification don't have function defined for passing runtime
> state idle time predicted by governor (based on next events and state target
> residency) to ARM trusted firmware.
>
> With the support of adding new PSCI function to allow passing runtime state
> idle time from kernel to ARM trusted firmware, Tegra194 platforms can use
> generic psci cpuidle driver rather than having Tegra specific cpuidle driver.
>
> During Tegra specific cpuidle driver V1 review, Sudeep Holla from ARM also
> suggested to use generic cpuidle driver by generalizing the need of runtime
> state idle time.
>
> So had internal discussion between ARM and NVIDIA on adding new PSCI function
> to allow passing runtime state idle time from kernel to TF-A through PSCI and
> once this implementation is accepted by upstream, ARM will look into further
> to update PSCI specification for this new PSCI function.
>
> So sending these patches as RFC as new PSCI function added in this series is
> not part of PSCI specification and once this implementation is accepted by ARM
> and upstream community, ARM can help to take this forward to add to PSCI
> specification.
>
> To keep the backward compatibility we can't update CPU_SUSPEND function to pass
> state idle time argument. So added seperate function for passing state idle time
> and serializing this with cpu suspend state enter.
>
> Once this approach is agreed, we can either use this way of separate PSCI
> function for passing state idle time or with PSCI specification update we can
> use same CPU_SUSPEND function with extra argument for state idle time which can
> be decided later for final patches based on discussion with ARM.
>
>
> Sowjanya Komatineni (4):
>    firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME
>    cpuidle: menu: add idle_time to cpuidle_state
>    cpuidle: psci: pass state idle time before state enter callback
>    arm64: dts: tegra194: Add CPU idle states
>
>   arch/arm64/boot/dts/nvidia/tegra194.dtsi | 19 +++++++++++++++++++
>   drivers/cpuidle/cpuidle-psci.c           | 11 +++++++++++
>   drivers/cpuidle/governors/menu.c         |  7 ++++++-
>   drivers/firmware/psci/psci.c             |  9 +++++++++
>   include/linux/cpuidle.h                  |  1 +
>   include/linux/psci.h                     |  1 +
>   include/uapi/linux/psci.h                |  2 ++
>   7 files changed, 49 insertions(+), 1 deletion(-)
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
  2021-04-22 20:30   ` Sowjanya Komatineni
  (?)
  (?)
@ 2021-04-23  1:28   ` kernel test robot
  -1 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-04-23  1:28 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]

Hi Sowjanya,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on pm/linux-next]
[also build test ERROR on next-20210422]
[cannot apply to robh/for-next linus/master daniel.lezcano/clockevents/next v5.12-rc8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sowjanya-Komatineni/Support-for-passing-runtime-state-idle-time-to-TF-A/20210423-043308
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm-omap2plus_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/072299f9d8ef2e6ed90ddd7ecb17598ff1618b0d
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sowjanya-Komatineni/Support-for-passing-runtime-state-idle-time-to-TF-A/20210423-043308
        git checkout 072299f9d8ef2e6ed90ddd7ecb17598ff1618b0d
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arm-linux-gnueabi-ld: drivers/cpuidle/governors/menu.o: in function `menu_select':
>> menu.c:(.text+0x53c): undefined reference to `__aeabi_uldivmod'
>> arm-linux-gnueabi-ld: menu.c:(.text+0x618): undefined reference to `__aeabi_ldivmod'
>> arm-linux-gnueabi-ld: menu.c:(.text+0x944): undefined reference to `__aeabi_uldivmod'
   arm-linux-gnueabi-ld: menu.c:(.text+0x9b8): undefined reference to `__aeabi_uldivmod'
   arm-linux-gnueabi-ld: menu.c:(.text+0xa24): undefined reference to `__aeabi_uldivmod'
   arm-linux-gnueabi-ld: menu.c:(.text+0xaf8): undefined reference to `__aeabi_ldivmod'
   arm-linux-gnueabi-ld: menu.c:(.text+0xb24): undefined reference to `__aeabi_uldivmod'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 42365 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
  2021-04-22 20:30   ` Sowjanya Komatineni
@ 2021-04-23 12:22     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-04-23 12:22 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS

On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
<skomatineni@nvidia.com> wrote:
>
> Some platforms use separate CPU firmware running in background to
> handle state transitions which may need runtime idle time of the
> corresponding target state from the kernel.

How exactly does this work?

> This patch adds idle_time to cpuidle state to expose to cpuidle driver the
> idle time that the governor menu predicts based on next events and states
> target residency for selecting proper idle state.
>
> CPU idle driver passes this runtime state idle time to TF-A.
>
> Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
> ---
>  drivers/cpuidle/governors/menu.c | 7 ++++++-
>  include/linux/cpuidle.h          | 1 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index c3aa8d6..0da5bc5 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -382,8 +382,10 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                          * stuck in the shallow one for too long.
>                          */
>                         if (drv->states[idx].target_residency_ns < TICK_NSEC &&
> -                           s->target_residency_ns <= delta_tick)
> +                           s->target_residency_ns <= delta_tick) {
> +                               drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
>                                 idx = i;
> +                       }
>
>                         return idx;
>                 }
> @@ -393,6 +395,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                 idx = i;
>         }
>
> +       drv->states[idx].idle_time = predicted_ns / NSEC_PER_USEC;
>         if (idx == -1)
>                 idx = 0; /* No states enabled. Must use 0. */
>
> @@ -419,6 +422,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                                 if (drv->states[i].target_residency_ns <= delta_tick)
>                                         break;
>                         }
> +
> +                       drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
>                 }
>         }
>
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index fce4762..12db2e9 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -55,6 +55,7 @@ struct cpuidle_state {
>         unsigned int    exit_latency; /* in US */
>         int             power_usage; /* in mW */
>         unsigned int    target_residency; /* in US */
> +       unsigned int    idle_time; /* in US */

No way.

This structure holds idle state properties of and not some random data
passed between cpuidle components.  And it is not per-CPU, while the
governors work on the per-CPU basis.

state_usage might be more suitable, but that only if I'm convinced
that this is really necessary.

>
>         int (*enter)    (struct cpuidle_device *dev,
>                         struct cpuidle_driver *drv,
> --

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
@ 2021-04-23 12:22     ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-04-23 12:22 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS

On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
<skomatineni@nvidia.com> wrote:
>
> Some platforms use separate CPU firmware running in background to
> handle state transitions which may need runtime idle time of the
> corresponding target state from the kernel.

How exactly does this work?

> This patch adds idle_time to cpuidle state to expose to cpuidle driver the
> idle time that the governor menu predicts based on next events and states
> target residency for selecting proper idle state.
>
> CPU idle driver passes this runtime state idle time to TF-A.
>
> Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
> ---
>  drivers/cpuidle/governors/menu.c | 7 ++++++-
>  include/linux/cpuidle.h          | 1 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index c3aa8d6..0da5bc5 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -382,8 +382,10 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                          * stuck in the shallow one for too long.
>                          */
>                         if (drv->states[idx].target_residency_ns < TICK_NSEC &&
> -                           s->target_residency_ns <= delta_tick)
> +                           s->target_residency_ns <= delta_tick) {
> +                               drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
>                                 idx = i;
> +                       }
>
>                         return idx;
>                 }
> @@ -393,6 +395,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                 idx = i;
>         }
>
> +       drv->states[idx].idle_time = predicted_ns / NSEC_PER_USEC;
>         if (idx == -1)
>                 idx = 0; /* No states enabled. Must use 0. */
>
> @@ -419,6 +422,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                                 if (drv->states[i].target_residency_ns <= delta_tick)
>                                         break;
>                         }
> +
> +                       drv->states[idx].idle_time = delta_tick / NSEC_PER_USEC;
>                 }
>         }
>
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index fce4762..12db2e9 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -55,6 +55,7 @@ struct cpuidle_state {
>         unsigned int    exit_latency; /* in US */
>         int             power_usage; /* in mW */
>         unsigned int    target_residency; /* in US */
> +       unsigned int    idle_time; /* in US */

No way.

This structure holds idle state properties of and not some random data
passed between cpuidle components.  And it is not per-CPU, while the
governors work on the per-CPU basis.

state_usage might be more suitable, but that only if I'm convinced
that this is really necessary.

>
>         int (*enter)    (struct cpuidle_device *dev,
>                         struct cpuidle_driver *drv,
> --

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
  2021-04-22 20:30 ` Sowjanya Komatineni
@ 2021-04-23 12:27   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-04-23 12:27 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS

On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
<skomatineni@nvidia.com> wrote:
>
> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
> in charge of deciding on state transition based on target state, state idle
> time, and some other Tegra CPU core cluster states information.
>
> Current PSCI specification don't have function defined for passing runtime
> state idle time predicted by governor (based on next events and state target
> residency) to ARM trusted firmware.

Presumably that's because this is not a good idea.

A basic design principle of cpuidle is that it should be possible to
use every governor with every driver and the changes in this series
make the platforms in question only work with menu AFAICS.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-23 12:27   ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-04-23 12:27 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS

On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
<skomatineni@nvidia.com> wrote:
>
> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
> in charge of deciding on state transition based on target state, state idle
> time, and some other Tegra CPU core cluster states information.
>
> Current PSCI specification don't have function defined for passing runtime
> state idle time predicted by governor (based on next events and state target
> residency) to ARM trusted firmware.

Presumably that's because this is not a good idea.

A basic design principle of cpuidle is that it should be possible to
use every governor with every driver and the changes in this series
make the platforms in question only work with menu AFAICS.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
  2021-04-23 12:27   ` Rafael J. Wysocki
@ 2021-04-23 18:32     ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23 18:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS


On 4/23/21 5:27 AM, Rafael J. Wysocki wrote:
> On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
> <skomatineni@nvidia.com> wrote:
>> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
>> in charge of deciding on state transition based on target state, state idle
>> time, and some other Tegra CPU core cluster states information.
>>
>> Current PSCI specification don't have function defined for passing runtime
>> state idle time predicted by governor (based on next events and state target
>> residency) to ARM trusted firmware.
> Presumably that's because this is not a good idea.
Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs.

MCE firmware handles CPU complex power management states entry/exit 
based on its background tasks and uses expected wake time of the core to 
decide on state transition.

Expected wake time is based on next event and allowed  min residency of 
the state which governor predicts in kernel which MCE is not aware of.

So need a way to pass this info from kernel through PSCI to TF-A and 
TF-A will update this to MCE along with requested state of entry.

For example, When C6 core idle state is requested, MCE notes the time at 
which core is likely to wake up. There could be background tasks running 
on core which kernel is not aware of.

When those tasks are completed it will check the remaining time against 
states crossover thresholds (programmed by ARM trusted FW) to determine 
if its still have enough time to enter into C6 state.

While a core is power gates, it could be woken up for background tasks 
and put back to power gated state again by MCE based on expected wake 
time of the corresponding core.


So, Tegra194/Tegra186 CPU idle support, we need this runtime state 
expected wake time predicted by governor to be passed from kernel to TF-A.

Thanks

Sowjanya

>
> A basic design principle of cpuidle is that it should be possible to
> use every governor with every driver and the changes in this series
> make the platforms in question only work with menu AFAICS.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-23 18:32     ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23 18:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS


On 4/23/21 5:27 AM, Rafael J. Wysocki wrote:
> On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
> <skomatineni@nvidia.com> wrote:
>> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
>> in charge of deciding on state transition based on target state, state idle
>> time, and some other Tegra CPU core cluster states information.
>>
>> Current PSCI specification don't have function defined for passing runtime
>> state idle time predicted by governor (based on next events and state target
>> residency) to ARM trusted firmware.
> Presumably that's because this is not a good idea.
Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs.

MCE firmware handles CPU complex power management states entry/exit 
based on its background tasks and uses expected wake time of the core to 
decide on state transition.

Expected wake time is based on next event and allowed  min residency of 
the state which governor predicts in kernel which MCE is not aware of.

So need a way to pass this info from kernel through PSCI to TF-A and 
TF-A will update this to MCE along with requested state of entry.

For example, When C6 core idle state is requested, MCE notes the time at 
which core is likely to wake up. There could be background tasks running 
on core which kernel is not aware of.

When those tasks are completed it will check the remaining time against 
states crossover thresholds (programmed by ARM trusted FW) to determine 
if its still have enough time to enter into C6 state.

While a core is power gates, it could be woken up for background tasks 
and put back to power gated state again by MCE based on expected wake 
time of the corresponding core.


So, Tegra194/Tegra186 CPU idle support, we need this runtime state 
expected wake time predicted by governor to be passed from kernel to TF-A.

Thanks

Sowjanya

>
> A basic design principle of cpuidle is that it should be possible to
> use every governor with every driver and the changes in this series
> make the platforms in question only work with menu AFAICS.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
  2021-04-23 12:22     ` Rafael J. Wysocki
@ 2021-04-23 18:33       ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23 18:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS


On 4/23/21 5:22 AM, Rafael J. Wysocki wrote:
> On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
> <skomatineni@nvidia.com>  wrote:
>> Some platforms use separate CPU firmware running in background to
>> handle state transitions which may need runtime idle time of the
>> corresponding target state from the kernel.
> How exactly does this work?
>
Explained this as part of other feedback in Patch-0

Thanks

Sowjanya


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state
@ 2021-04-23 18:33       ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23 18:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sudeep Holla, Souvik Chakravarty, Thierry Reding, Mark Rutland,
	Lorenzo Pieralisi, Daniel Lezcano, Rob Herring, Jon Hunter,
	ksitaraman, sanjayc, Linux ARM, linux-tegra,
	Linux Kernel Mailing List, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS


On 4/23/21 5:22 AM, Rafael J. Wysocki wrote:
> On Thu, Apr 22, 2021 at 10:31 PM Sowjanya Komatineni
> <skomatineni@nvidia.com>  wrote:
>> Some platforms use separate CPU firmware running in background to
>> handle state transitions which may need runtime idle time of the
>> corresponding target state from the kernel.
> How exactly does this work?
>
Explained this as part of other feedback in Patch-0

Thanks

Sowjanya


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
  2021-04-22 20:30 ` Sowjanya Komatineni
@ 2021-04-23 20:16   ` Lukasz Luba
  -1 siblings, 0 replies; 30+ messages in thread
From: Lukasz Luba @ 2021-04-23 20:16 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: sudeep.holla, souvik.chakravarty, thierry.reding, mark.rutland,
	lorenzo.pieralisi, daniel.lezcano, robh+dt, jonathanh,
	ksitaraman, sanjayc, linux-arm-kernel, linux-tegra, linux-kernel,
	linux-pm, devicetree

Hi Sowjanya,

On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
> in charge of deciding on state transition based on target state, state idle
> time, and some other Tegra CPU core cluster states information.
> 
> Current PSCI specification don't have function defined for passing runtime
> state idle time predicted by governor (based on next events and state target
> residency) to ARM trusted firmware.

Do you have some numbers from experiments showing that these idle
governor prediction values, which are passed from kernel to MCE
firmware, are making a good 'guess'?
How much precision (1us? 1ms?) in the values do you need there?

IIRC (probably Rafael's presentations) predicting in the kernel
something like CPU idle time residency is not a trivial thing.

Another idea (depending on DT structure and PSCI bits):
Could this be solved differently, but just having a knowledge that if
the governor requested some C-state, this means governor 'predicted'
an idle residency to be greater that min_residency attached to this
C-state?
Then, when that request shows up in your FW, you know that it must be at
least min_residency because of this C-state id.
It would depend on number of available states, max_residency, scale
that you would choose while assigning values from [0, max_residency]
to each state.
IIRC there can be many state IDs for idle, so it would depend on
number of bits encoding this state, and your needs. Example of
linear scale:
4-bits encoding idle state and max predicted residency 10msec,
that means 10000us / 16 states = 625us/state.
The max_residency might be split differently, using different than
linear function, to have some rage more precised.

Open question is if these idle states must be all represented
in DT, or there is a way of describing a 'set of idle states'
automatically.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-23 20:16   ` Lukasz Luba
  0 siblings, 0 replies; 30+ messages in thread
From: Lukasz Luba @ 2021-04-23 20:16 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: sudeep.holla, souvik.chakravarty, thierry.reding, mark.rutland,
	lorenzo.pieralisi, daniel.lezcano, robh+dt, jonathanh,
	ksitaraman, sanjayc, linux-arm-kernel, linux-tegra, linux-kernel,
	linux-pm, devicetree

Hi Sowjanya,

On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
> in charge of deciding on state transition based on target state, state idle
> time, and some other Tegra CPU core cluster states information.
> 
> Current PSCI specification don't have function defined for passing runtime
> state idle time predicted by governor (based on next events and state target
> residency) to ARM trusted firmware.

Do you have some numbers from experiments showing that these idle
governor prediction values, which are passed from kernel to MCE
firmware, are making a good 'guess'?
How much precision (1us? 1ms?) in the values do you need there?

IIRC (probably Rafael's presentations) predicting in the kernel
something like CPU idle time residency is not a trivial thing.

Another idea (depending on DT structure and PSCI bits):
Could this be solved differently, but just having a knowledge that if
the governor requested some C-state, this means governor 'predicted'
an idle residency to be greater that min_residency attached to this
C-state?
Then, when that request shows up in your FW, you know that it must be at
least min_residency because of this C-state id.
It would depend on number of available states, max_residency, scale
that you would choose while assigning values from [0, max_residency]
to each state.
IIRC there can be many state IDs for idle, so it would depend on
number of bits encoding this state, and your needs. Example of
linear scale:
4-bits encoding idle state and max predicted residency 10msec,
that means 10000us / 16 states = 625us/state.
The max_residency might be split differently, using different than
linear function, to have some rage more precised.

Open question is if these idle states must be all represented
in DT, or there is a way of describing a 'set of idle states'
automatically.

Regards,
Lukasz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
  2021-04-23 20:16   ` Lukasz Luba
@ 2021-04-23 22:24     ` Sowjanya Komatineni
  -1 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23 22:24 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: sudeep.holla, souvik.chakravarty, thierry.reding, mark.rutland,
	lorenzo.pieralisi, daniel.lezcano, robh+dt, jonathanh,
	ksitaraman, sanjayc, linux-arm-kernel, linux-tegra, linux-kernel,
	linux-pm, devicetree


On 4/23/21 1:16 PM, Lukasz Luba wrote:
> Hi Sowjanya,
>
> On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
>> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs 
>> which is
>> in charge of deciding on state transition based on target state, 
>> state idle
>> time, and some other Tegra CPU core cluster states information.
>>
>> Current PSCI specification don't have function defined for passing 
>> runtime
>> state idle time predicted by governor (based on next events and state 
>> target
>> residency) to ARM trusted firmware.
>
> Do you have some numbers from experiments showing that these idle
> governor prediction values, which are passed from kernel to MCE
> firmware, are making a good 'guess'?
> How much precision (1us? 1ms?) in the values do you need there?

it could also be in few ms depending on when next cpu event/activity 
might happen which is not transparent to MCE firmware.

>
> IIRC (probably Rafael's presentations) predicting in the kernel
> something like CPU idle time residency is not a trivial thing.
>
> Another idea (depending on DT structure and PSCI bits):
> Could this be solved differently, but just having a knowledge that if
> the governor requested some C-state, this means governor 'predicted'
> an idle residency to be greater that min_residency attached to this
> C-state?
> Then, when that request shows up in your FW, you know that it must be at
> least min_residency because of this C-state id.
C6 is the only deepest state for Tegra194 Carmel CPU that we support in 
addition to C1 (WFI) idle state.

MCE firmware gets state crossover thresholds for C1 to C6 transition 
from TF-A and uses it along with state idle time to decide on C6 state 
entry based on its background work.

Assuming for now if we use min_residency as state idle time which is 
static value from DT, then it enters into deepest state C6 always as we 
use min_residency value we use is always higher than state crossover 
threshold.

But MCE firmware is not aware of when next cpu event can happen to 
predict if next event can take longer than state min_residency time.

Using min residency in such case is very conservative where MCE firmware 
exits C6 state early where we may not have better power saving.

But with MCE firmware being aware of when next event can happen it can 
use that to stay in C6 state without early exit for better power savings.

> It would depend on number of available states, max_residency, scale
> that you would choose while assigning values from [0, max_residency]
> to each state.
> IIRC there can be many state IDs for idle, so it would depend on
> number of bits encoding this state, and your needs. Example of
> linear scale:
> 4-bits encoding idle state and max predicted residency 10msec,
> that means 10000us / 16 states = 625us/state.
> The max_residency might be split differently, using different than
> linear function, to have some rage more precised.
>
> Open question is if these idle states must be all represented
> in DT, or there is a way of describing a 'set of idle states'
> automatically.
We only support C6 state through DT as C6 is the only deepest state for 
Tegra194 carmel CPU. WFI idle state is completely handled by kernel and 
does not require MCE sequences for entry/exit.
>
> Regards,
> Lukasz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-23 22:24     ` Sowjanya Komatineni
  0 siblings, 0 replies; 30+ messages in thread
From: Sowjanya Komatineni @ 2021-04-23 22:24 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: sudeep.holla, souvik.chakravarty, thierry.reding, mark.rutland,
	lorenzo.pieralisi, daniel.lezcano, robh+dt, jonathanh,
	ksitaraman, sanjayc, linux-arm-kernel, linux-tegra, linux-kernel,
	linux-pm, devicetree


On 4/23/21 1:16 PM, Lukasz Luba wrote:
> Hi Sowjanya,
>
> On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
>> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs 
>> which is
>> in charge of deciding on state transition based on target state, 
>> state idle
>> time, and some other Tegra CPU core cluster states information.
>>
>> Current PSCI specification don't have function defined for passing 
>> runtime
>> state idle time predicted by governor (based on next events and state 
>> target
>> residency) to ARM trusted firmware.
>
> Do you have some numbers from experiments showing that these idle
> governor prediction values, which are passed from kernel to MCE
> firmware, are making a good 'guess'?
> How much precision (1us? 1ms?) in the values do you need there?

it could also be in few ms depending on when next cpu event/activity 
might happen which is not transparent to MCE firmware.

>
> IIRC (probably Rafael's presentations) predicting in the kernel
> something like CPU idle time residency is not a trivial thing.
>
> Another idea (depending on DT structure and PSCI bits):
> Could this be solved differently, but just having a knowledge that if
> the governor requested some C-state, this means governor 'predicted'
> an idle residency to be greater that min_residency attached to this
> C-state?
> Then, when that request shows up in your FW, you know that it must be at
> least min_residency because of this C-state id.
C6 is the only deepest state for Tegra194 Carmel CPU that we support in 
addition to C1 (WFI) idle state.

MCE firmware gets state crossover thresholds for C1 to C6 transition 
from TF-A and uses it along with state idle time to decide on C6 state 
entry based on its background work.

Assuming for now if we use min_residency as state idle time which is 
static value from DT, then it enters into deepest state C6 always as we 
use min_residency value we use is always higher than state crossover 
threshold.

But MCE firmware is not aware of when next cpu event can happen to 
predict if next event can take longer than state min_residency time.

Using min residency in such case is very conservative where MCE firmware 
exits C6 state early where we may not have better power saving.

But with MCE firmware being aware of when next event can happen it can 
use that to stay in C6 state without early exit for better power savings.

> It would depend on number of available states, max_residency, scale
> that you would choose while assigning values from [0, max_residency]
> to each state.
> IIRC there can be many state IDs for idle, so it would depend on
> number of bits encoding this state, and your needs. Example of
> linear scale:
> 4-bits encoding idle state and max predicted residency 10msec,
> that means 10000us / 16 states = 625us/state.
> The max_residency might be split differently, using different than
> linear function, to have some rage more precised.
>
> Open question is if these idle states must be all represented
> in DT, or there is a way of describing a 'set of idle states'
> automatically.
We only support C6 state through DT as C6 is the only deepest state for 
Tegra194 carmel CPU. WFI idle state is completely handled by kernel and 
does not require MCE sequences for entry/exit.
>
> Regards,
> Lukasz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
  2021-04-23 22:24     ` Sowjanya Komatineni
@ 2021-04-26 10:10       ` Souvik Chakravarty
  -1 siblings, 0 replies; 30+ messages in thread
From: Souvik Chakravarty @ 2021-04-26 10:10 UTC (permalink / raw)
  To: Sowjanya Komatineni, Lukasz Luba
  Cc: Sudeep Holla, thierry.reding, Mark Rutland, Lorenzo Pieralisi,
	daniel.lezcano, robh+dt, jonathanh, ksitaraman, sanjayc,
	linux-arm-kernel, linux-tegra, linux-kernel, linux-pm,
	devicetree

Hi Sowjanya,

> From: Sowjanya Komatineni
> Sent: Friday, April 23, 2021 11:25 PM
> 
> On 4/23/21 1:16 PM, Lukasz Luba wrote:
> > Hi Sowjanya,
> >
> > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> >> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs
> >> which is in charge of deciding on state transition based on target
> >> state, state idle time, and some other Tegra CPU core cluster states
> >> information.
> >>
> >> Current PSCI specification don't have function defined for passing
> >> runtime state idle time predicted by governor (based on next events
> >> and state target
> >> residency) to ARM trusted firmware.
> >
> > Do you have some numbers from experiments showing that these idle
> > governor prediction values, which are passed from kernel to MCE
> > firmware, are making a good 'guess'?
> > How much precision (1us? 1ms?) in the values do you need there?
> 
> it could also be in few ms depending on when next cpu event/activity might
> happen which is not transparent to MCE firmware.
> 
> >
> > IIRC (probably Rafael's presentations) predicting in the kernel
> > something like CPU idle time residency is not a trivial thing.
> >
> > Another idea (depending on DT structure and PSCI bits):
> > Could this be solved differently, but just having a knowledge that if
> > the governor requested some C-state, this means governor 'predicted'
> > an idle residency to be greater that min_residency attached to this
> > C-state?
> > Then, when that request shows up in your FW, you know that it must be
> > at least min_residency because of this C-state id.
> C6 is the only deepest state for Tegra194 Carmel CPU that we support in
> addition to C1 (WFI) idle state.
> 
> MCE firmware gets state crossover thresholds for C1 to C6 transition from TF-
> A and uses it along with state idle time to decide on C6 state entry based on
> its background work.
> 
> Assuming for now if we use min_residency as state idle time which is static
> value from DT, then it enters into deepest state C6 always as we use
> min_residency value we use is always higher than state crossover threshold.
> 
> But MCE firmware is not aware of when next cpu event can happen to
> predict if next event can take longer than state min_residency time.
> 
> Using min residency in such case is very conservative where MCE firmware
> exits C6 state early where we may not have better power saving.
> 
> But with MCE firmware being aware of when next event can happen it can
> use that to stay in C6 state without early exit for better power savings.

This part confuses me. Are you saying that the firmware will forcefully wake up
the core, even if no wakeups are pending, when min residency for C6 expires?

Regards,
Souvik

> 
> > It would depend on number of available states, max_residency, scale
> > that you would choose while assigning values from [0, max_residency]
> > to each state.
> > IIRC there can be many state IDs for idle, so it would depend on
> > number of bits encoding this state, and your needs. Example of linear
> > scale:
> > 4-bits encoding idle state and max predicted residency 10msec, that
> > means 10000us / 16 states = 625us/state.
> > The max_residency might be split differently, using different than
> > linear function, to have some rage more precised.
> >
> > Open question is if these idle states must be all represented in DT,
> > or there is a way of describing a 'set of idle states'
> > automatically.
> We only support C6 state through DT as C6 is the only deepest state for
> Tegra194 carmel CPU. WFI idle state is completely handled by kernel and
> does not require MCE sequences for entry/exit.
> >
> > Regards,
> > Lukasz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-26 10:10       ` Souvik Chakravarty
  0 siblings, 0 replies; 30+ messages in thread
From: Souvik Chakravarty @ 2021-04-26 10:10 UTC (permalink / raw)
  To: Sowjanya Komatineni, Lukasz Luba
  Cc: Sudeep Holla, thierry.reding, Mark Rutland, Lorenzo Pieralisi,
	daniel.lezcano, robh+dt, jonathanh, ksitaraman, sanjayc,
	linux-arm-kernel, linux-tegra, linux-kernel, linux-pm,
	devicetree

Hi Sowjanya,

> From: Sowjanya Komatineni
> Sent: Friday, April 23, 2021 11:25 PM
> 
> On 4/23/21 1:16 PM, Lukasz Luba wrote:
> > Hi Sowjanya,
> >
> > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> >> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs
> >> which is in charge of deciding on state transition based on target
> >> state, state idle time, and some other Tegra CPU core cluster states
> >> information.
> >>
> >> Current PSCI specification don't have function defined for passing
> >> runtime state idle time predicted by governor (based on next events
> >> and state target
> >> residency) to ARM trusted firmware.
> >
> > Do you have some numbers from experiments showing that these idle
> > governor prediction values, which are passed from kernel to MCE
> > firmware, are making a good 'guess'?
> > How much precision (1us? 1ms?) in the values do you need there?
> 
> it could also be in few ms depending on when next cpu event/activity might
> happen which is not transparent to MCE firmware.
> 
> >
> > IIRC (probably Rafael's presentations) predicting in the kernel
> > something like CPU idle time residency is not a trivial thing.
> >
> > Another idea (depending on DT structure and PSCI bits):
> > Could this be solved differently, but just having a knowledge that if
> > the governor requested some C-state, this means governor 'predicted'
> > an idle residency to be greater that min_residency attached to this
> > C-state?
> > Then, when that request shows up in your FW, you know that it must be
> > at least min_residency because of this C-state id.
> C6 is the only deepest state for Tegra194 Carmel CPU that we support in
> addition to C1 (WFI) idle state.
> 
> MCE firmware gets state crossover thresholds for C1 to C6 transition from TF-
> A and uses it along with state idle time to decide on C6 state entry based on
> its background work.
> 
> Assuming for now if we use min_residency as state idle time which is static
> value from DT, then it enters into deepest state C6 always as we use
> min_residency value we use is always higher than state crossover threshold.
> 
> But MCE firmware is not aware of when next cpu event can happen to
> predict if next event can take longer than state min_residency time.
> 
> Using min residency in such case is very conservative where MCE firmware
> exits C6 state early where we may not have better power saving.
> 
> But with MCE firmware being aware of when next event can happen it can
> use that to stay in C6 state without early exit for better power savings.

This part confuses me. Are you saying that the firmware will forcefully wake up
the core, even if no wakeups are pending, when min residency for C6 expires?

Regards,
Souvik

> 
> > It would depend on number of available states, max_residency, scale
> > that you would choose while assigning values from [0, max_residency]
> > to each state.
> > IIRC there can be many state IDs for idle, so it would depend on
> > number of bits encoding this state, and your needs. Example of linear
> > scale:
> > 4-bits encoding idle state and max predicted residency 10msec, that
> > means 10000us / 16 states = 625us/state.
> > The max_residency might be split differently, using different than
> > linear function, to have some rage more precised.
> >
> > Open question is if these idle states must be all represented in DT,
> > or there is a way of describing a 'set of idle states'
> > automatically.
> We only support C6 state through DT as C6 is the only deepest state for
> Tegra194 carmel CPU. WFI idle state is completely handled by kernel and
> does not require MCE sequences for entry/exit.
> >
> > Regards,
> > Lukasz
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
  2021-04-23 22:24     ` Sowjanya Komatineni
@ 2021-04-26 13:11       ` Morten Rasmussen
  -1 siblings, 0 replies; 30+ messages in thread
From: Morten Rasmussen @ 2021-04-26 13:11 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: Lukasz Luba, sudeep.holla, souvik.chakravarty, thierry.reding,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt,
	jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Hi,

On Fri, Apr 23, 2021 at 03:24:51PM -0700, Sowjanya Komatineni wrote:
> On 4/23/21 1:16 PM, Lukasz Luba wrote:
> > Hi Sowjanya,
> > 
> > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> > > Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs
> > > which is
> > > in charge of deciding on state transition based on target state,
> > > state idle
> > > time, and some other Tegra CPU core cluster states information.
> > > 
> > > Current PSCI specification don't have function defined for passing
> > > runtime
> > > state idle time predicted by governor (based on next events and
> > > state target
> > > residency) to ARM trusted firmware.
> > 
> > Do you have some numbers from experiments showing that these idle
> > governor prediction values, which are passed from kernel to MCE
> > firmware, are making a good 'guess'?
> > How much precision (1us? 1ms?) in the values do you need there?
> 
> it could also be in few ms depending on when next cpu event/activity might
> happen which is not transparent to MCE firmware.
> 
> > 
> > IIRC (probably Rafael's presentations) predicting in the kernel
> > something like CPU idle time residency is not a trivial thing.
> > 
> > Another idea (depending on DT structure and PSCI bits):
> > Could this be solved differently, but just having a knowledge that if
> > the governor requested some C-state, this means governor 'predicted'
> > an idle residency to be greater that min_residency attached to this
> > C-state?
> > Then, when that request shows up in your FW, you know that it must be at
> > least min_residency because of this C-state id.
> C6 is the only deepest state for Tegra194 Carmel CPU that we support in
> addition to C1 (WFI) idle state.
> 
> MCE firmware gets state crossover thresholds for C1 to C6 transition from
> TF-A and uses it along with state idle time to decide on C6 state entry
> based on its background work.
> 
> Assuming for now if we use min_residency as state idle time which is static
> value from DT, then it enters into deepest state C6 always as we use
> min_residency value we use is always higher than state crossover threshold.
> 
> But MCE firmware is not aware of when next cpu event can happen to predict
> if next event can take longer than state min_residency time.
> 
> Using min residency in such case is very conservative where MCE firmware
> exits C6 state early where we may not have better power saving.
> 
> But with MCE firmware being aware of when next event can happen it can use
> that to stay in C6 state without early exit for better power savings.
> 
> > It would depend on number of available states, max_residency, scale
> > that you would choose while assigning values from [0, max_residency]
> > to each state.
> > IIRC there can be many state IDs for idle, so it would depend on
> > number of bits encoding this state, and your needs. Example of
> > linear scale:
> > 4-bits encoding idle state and max predicted residency 10msec,
> > that means 10000us / 16 states = 625us/state.
> > The max_residency might be split differently, using different than
> > linear function, to have some rage more precised.
> > 
> > Open question is if these idle states must be all represented
> > in DT, or there is a way of describing a 'set of idle states'
> > automatically.
> We only support C6 state through DT as C6 is the only deepest state for
> Tegra194 carmel CPU. WFI idle state is completely handled by kernel and does
> not require MCE sequences for entry/exit.

I think Lukasz's point is that you can encode the predicted idle time by
having multiple idle_state entries with different min_residency mapping
to the same actual idle-state. So you would several variants of C6 with
different min_residencies and if the OS picks one with longer
min_residency firmware would have a better estimate of the predicted
idle residency.

I'm not convinced it is the right way to work around passing this
information on to firmware. I would rather see an example of how well
this works (best with numbers) and have a proper solution.

Morten

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A
@ 2021-04-26 13:11       ` Morten Rasmussen
  0 siblings, 0 replies; 30+ messages in thread
From: Morten Rasmussen @ 2021-04-26 13:11 UTC (permalink / raw)
  To: Sowjanya Komatineni
  Cc: Lukasz Luba, sudeep.holla, souvik.chakravarty, thierry.reding,
	mark.rutland, lorenzo.pieralisi, daniel.lezcano, robh+dt,
	jonathanh, ksitaraman, sanjayc, linux-arm-kernel, linux-tegra,
	linux-kernel, linux-pm, devicetree

Hi,

On Fri, Apr 23, 2021 at 03:24:51PM -0700, Sowjanya Komatineni wrote:
> On 4/23/21 1:16 PM, Lukasz Luba wrote:
> > Hi Sowjanya,
> > 
> > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> > > Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs
> > > which is
> > > in charge of deciding on state transition based on target state,
> > > state idle
> > > time, and some other Tegra CPU core cluster states information.
> > > 
> > > Current PSCI specification don't have function defined for passing
> > > runtime
> > > state idle time predicted by governor (based on next events and
> > > state target
> > > residency) to ARM trusted firmware.
> > 
> > Do you have some numbers from experiments showing that these idle
> > governor prediction values, which are passed from kernel to MCE
> > firmware, are making a good 'guess'?
> > How much precision (1us? 1ms?) in the values do you need there?
> 
> it could also be in few ms depending on when next cpu event/activity might
> happen which is not transparent to MCE firmware.
> 
> > 
> > IIRC (probably Rafael's presentations) predicting in the kernel
> > something like CPU idle time residency is not a trivial thing.
> > 
> > Another idea (depending on DT structure and PSCI bits):
> > Could this be solved differently, but just having a knowledge that if
> > the governor requested some C-state, this means governor 'predicted'
> > an idle residency to be greater that min_residency attached to this
> > C-state?
> > Then, when that request shows up in your FW, you know that it must be at
> > least min_residency because of this C-state id.
> C6 is the only deepest state for Tegra194 Carmel CPU that we support in
> addition to C1 (WFI) idle state.
> 
> MCE firmware gets state crossover thresholds for C1 to C6 transition from
> TF-A and uses it along with state idle time to decide on C6 state entry
> based on its background work.
> 
> Assuming for now if we use min_residency as state idle time which is static
> value from DT, then it enters into deepest state C6 always as we use
> min_residency value we use is always higher than state crossover threshold.
> 
> But MCE firmware is not aware of when next cpu event can happen to predict
> if next event can take longer than state min_residency time.
> 
> Using min residency in such case is very conservative where MCE firmware
> exits C6 state early where we may not have better power saving.
> 
> But with MCE firmware being aware of when next event can happen it can use
> that to stay in C6 state without early exit for better power savings.
> 
> > It would depend on number of available states, max_residency, scale
> > that you would choose while assigning values from [0, max_residency]
> > to each state.
> > IIRC there can be many state IDs for idle, so it would depend on
> > number of bits encoding this state, and your needs. Example of
> > linear scale:
> > 4-bits encoding idle state and max predicted residency 10msec,
> > that means 10000us / 16 states = 625us/state.
> > The max_residency might be split differently, using different than
> > linear function, to have some rage more precised.
> > 
> > Open question is if these idle states must be all represented
> > in DT, or there is a way of describing a 'set of idle states'
> > automatically.
> We only support C6 state through DT as C6 is the only deepest state for
> Tegra194 carmel CPU. WFI idle state is completely handled by kernel and does
> not require MCE sequences for entry/exit.

I think Lukasz's point is that you can encode the predicted idle time by
having multiple idle_state entries with different min_residency mapping
to the same actual idle-state. So you would several variants of C6 with
different min_residencies and if the OS picks one with longer
min_residency firmware would have a better estimate of the predicted
idle residency.

I'm not convinced it is the right way to work around passing this
information on to firmware. I would rather see an example of how well
this works (best with numbers) and have a proper solution.

Morten

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-04-26 13:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22 20:30 [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A Sowjanya Komatineni
2021-04-22 20:30 ` Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 1/4] firmware/psci: add support for PSCI function SET_STATE_IDLE_TIME Sowjanya Komatineni
2021-04-22 20:30   ` Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 2/4] cpuidle: menu: add idle_time to cpuidle_state Sowjanya Komatineni
2021-04-22 20:30   ` Sowjanya Komatineni
2021-04-23  0:46   ` kernel test robot
2021-04-23  1:28   ` kernel test robot
2021-04-23 12:22   ` Rafael J. Wysocki
2021-04-23 12:22     ` Rafael J. Wysocki
2021-04-23 18:33     ` Sowjanya Komatineni
2021-04-23 18:33       ` Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 3/4] cpuidle: psci: pass state idle time before state enter callback Sowjanya Komatineni
2021-04-22 20:30   ` Sowjanya Komatineni
2021-04-22 20:30 ` [RFC PATCH 4/4] arm64: dts: tegra194: Add CPU idle states Sowjanya Komatineni
2021-04-22 20:30   ` Sowjanya Komatineni
2021-04-23  1:03 ` [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A Sowjanya Komatineni
2021-04-23  1:03   ` Sowjanya Komatineni
2021-04-23 12:27 ` Rafael J. Wysocki
2021-04-23 12:27   ` Rafael J. Wysocki
2021-04-23 18:32   ` Sowjanya Komatineni
2021-04-23 18:32     ` Sowjanya Komatineni
2021-04-23 20:16 ` Lukasz Luba
2021-04-23 20:16   ` Lukasz Luba
2021-04-23 22:24   ` Sowjanya Komatineni
2021-04-23 22:24     ` Sowjanya Komatineni
2021-04-26 10:10     ` Souvik Chakravarty
2021-04-26 10:10       ` Souvik Chakravarty
2021-04-26 13:11     ` Morten Rasmussen
2021-04-26 13:11       ` Morten Rasmussen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.