linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
@ 2019-02-26 14:54 Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 1/8] PM / Domains: Add a generic data pointer to the genpd_power_state struct Ulf Hansson
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

Changes in v11:
 - This version contains only the infrastructure changes that is needed for
deployment. The PSCI/ARM changes have also been updated and tested, but I will
post them separately. Still, to provide completeness, I have published a branch
containing everything to a git tree [1], feel free to have a look and test.
 - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",
which has been replaced by a couple of new patches, whom reworks the existing
tick_nohz_get_sleep_length() function, to provide the next timer expiration
instead of the duration.
 - More changelogs are available per patch.

Changes in v10:
 - Quite significant changes have been to the PSCI driver deployment. According
   to an agreement with Lorenzo, the hierarchical CPU layout for PSCI should be
   orthogonal to whether the PSCI FW supports OSI or not. This has been taken
   care of in this version.
 - Drop the generic attach/detach helpers of CPUs to genpd, instead make that
   related code internal to PSCI, for now.
 - Fix "BUG: sleeping for invalid context" for hotplug, as reported by Raju.
 - Addressed various comments from version 8 and 9.
 - Clarified changelogs and re-wrote the cover-letter to better explain the
   motivations behind these changes.

Background:

For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
From a CPU idle state perspective, this means some states may be shared among a
group of CPUs (aka CPU cluster).

To deal with idle management of a group of CPUs, sometimes the kernel needs to
be involved to manage the last-man standing algorithm, simply because it can't
rely solely on power management FWs to deal with this. Depending on the
platform, of course.

There are a couple of typical scenarios for when the kernel needs to be in
control, dealing with synchronization of when the last CPU in a cluster is about
to enter a deep idle state.

1)
The kernel needs to carry out so called last-man activities before the
CPU cluster can enter a deep idle state. This may for example involve to
configure external logics for wakeups, as the GIC may no longer be functional
once a deep cluster idle state have been entered. Likewise, these operations
may need to be restored, when the first CPU wakes up.

2)
Other more generic I/O devices, such as an MMC controller for example, may be a
part of the same power domain as the CPU cluster, due to a shared power-rail.
For these scenarios, when the MMC controller is in use dealing with an MMC
request, a deeper idle state of the CPU cluster may needs to be temporarily
disabled. This is needed to retain the MMC controller in a functional state,
else it may loose its register-context in the middle of serving a request.

In this series, we are extending the generic PM domain (aka genpd) to be used
for also CPU devices. Hence the goal is to re-use much of its current code to
help us manage the last-man standing synchronization. Moreover, as we already
use genpd to model power domains for generic I/O devices, both 1) and 2) can be
address with its help.

Moreover, to address these problems for ARM64 DT based platforms, we are
deploying support for genpd and runtime PM to the PSCI FW driver - and finally
we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
layout.

The series has been tested on a Qcom 410c dragonboard and on a Hisilicon Hikey
board. The first one uses PSCI OS-initiated mode, while the second uses the PSCI
Platform-Coordinated mode.

Kind regards
Ulf Hansson

[1]
git.linaro.org/people/ulf.hansson/linux-pm.git next_v11

Daniel Lezcano (4):
  time: tick-sched: Provide helpers to get the next timer expiration
  cpuidle: menu: Convert to tick_nohz_get_next_timer|hrtimer()
  cpuidle: teo: Convert to tick_nohz_get_next_timer()
  time: tick-sched: Remove tick_nohz_get_sleep_length()

Ulf Hansson (4):
  PM / Domains: Add a generic data pointer to the genpd_power_state
    struct
  PM / Domains: Add support for CPU devices to genpd
  cpuidle: Pre-store next timer/tick before selecting an idle state
  PM / Domains: Add genpd governor for CPUs

 drivers/base/power/domain.c          | 78 ++++++++++++++++++++++++++--
 drivers/base/power/domain_governor.c | 62 +++++++++++++++++++++-
 drivers/cpuidle/cpuidle.c            |  2 +
 drivers/cpuidle/governors/menu.c     | 14 ++++-
 drivers/cpuidle/governors/teo.c      |  4 +-
 include/linux/cpuidle.h              |  2 +
 include/linux/pm_domain.h            | 20 ++++++-
 include/linux/tick.h                 | 11 ++--
 kernel/time/tick-sched.c             | 71 +++++++++++++++++--------
 9 files changed, 229 insertions(+), 35 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v11 1/8] PM / Domains: Add a generic data pointer to the genpd_power_state struct
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 2/8] PM / Domains: Add support for CPU devices to genpd Ulf Hansson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

Let's add a data pointer to the genpd_power_state struct, to allow a genpd
backend driver to store per state specific data. To introduce the pointer,
we need to change the way genpd deals with freeing of the corresponding
allocated data.

More precisely, let's clarify the responsibility of whom that shall free
the data, by adding a ->free_states() callback to the struct
generic_pm_domain. The one allocating the data shall assign the callback,
to allow genpd to invoke it from genpd_remove().

Cc: Lina Iyer <ilina@codeaurora.org>
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- Add a callback for freeing allocated power states.

---
 drivers/base/power/domain.c | 12 ++++++++++--
 include/linux/pm_domain.h   |  4 +++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 2c334c01fc43..03885c003c6a 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1685,6 +1685,12 @@ int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd,
 }
 EXPORT_SYMBOL_GPL(pm_genpd_remove_subdomain);
 
+static void genpd_free_default_power_state(struct genpd_power_state *states,
+					   unsigned int state_count)
+{
+	kfree(states);
+}
+
 static int genpd_set_default_power_state(struct generic_pm_domain *genpd)
 {
 	struct genpd_power_state *state;
@@ -1695,7 +1701,7 @@ static int genpd_set_default_power_state(struct generic_pm_domain *genpd)
 
 	genpd->states = state;
 	genpd->state_count = 1;
-	genpd->free = state;
+	genpd->free_states = genpd_free_default_power_state;
 
 	return 0;
 }
@@ -1811,7 +1817,9 @@ static int genpd_remove(struct generic_pm_domain *genpd)
 	list_del(&genpd->gpd_list_node);
 	genpd_unlock(genpd);
 	cancel_work_sync(&genpd->power_off_work);
-	kfree(genpd->free);
+	if (genpd->free_states)
+		genpd->free_states(genpd->states, genpd->state_count);
+
 	pr_debug("%s: removed %s\n", __func__, genpd->name);
 
 	return 0;
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 1ed5874bcee0..8e1399231753 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -69,6 +69,7 @@ struct genpd_power_state {
 	s64 residency_ns;
 	struct fwnode_handle *fwnode;
 	ktime_t idle_time;
+	void *data;
 };
 
 struct genpd_lock_ops;
@@ -110,9 +111,10 @@ struct generic_pm_domain {
 			   struct device *dev);
 	unsigned int flags;		/* Bit field of configs for genpd */
 	struct genpd_power_state *states;
+	void (*free_states)(struct genpd_power_state *states,
+			    unsigned int state_count);
 	unsigned int state_count; /* number of states */
 	unsigned int state_idx; /* state that genpd will go to when off */
-	void *free; /* Free the state that was allocated for default */
 	ktime_t on_time;
 	ktime_t accounting_time;
 	const struct genpd_lock_ops *lock_ops;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v11 2/8] PM / Domains: Add support for CPU devices to genpd
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 1/8] PM / Domains: Add a generic data pointer to the genpd_power_state struct Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 3/8] time: tick-sched: Provide helpers to get the next timer expiration Ulf Hansson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

To enable a device belonging to a CPU to be attached to a PM domain managed
by genpd, let's do a few changes to it, as to make it convenient to manage
the specifics around CPUs.

To be able to quickly find out what CPUs that are attached to a genpd,
which typically becomes useful from a genpd governor as following changes
is about to show, let's add a cpumask to the struct generic_pm_domain. At
the point when a CPU device gets attached to a genpd, let's update the
genpd's cpumask. Moreover, let's also propagate changes to the cpumask
upwards in the topology to the master PM domains. In this way, the cpumask
for a genpd hierarchically reflects all CPUs attached to the topology below
it.

Finally, let's make this an opt-in feature, to avoid having to manage CPUs
and the cpumask for a genpd that doesn't need it. For that reason, let's
add a new genpd configuration bit, GENPD_FLAG_CPU_DOMAIN.

Cc: Lina Iyer <ilina@codeaurora.org>
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- None.

---
 drivers/base/power/domain.c | 66 ++++++++++++++++++++++++++++++++++++-
 include/linux/pm_domain.h   | 13 ++++++++
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 03885c003c6a..da10052e6427 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -20,6 +20,7 @@
 #include <linux/sched.h>
 #include <linux/suspend.h>
 #include <linux/export.h>
+#include <linux/cpu.h>
 
 #include "power.h"
 
@@ -126,6 +127,7 @@ static const struct genpd_lock_ops genpd_spin_ops = {
 #define genpd_is_irq_safe(genpd)	(genpd->flags & GENPD_FLAG_IRQ_SAFE)
 #define genpd_is_always_on(genpd)	(genpd->flags & GENPD_FLAG_ALWAYS_ON)
 #define genpd_is_active_wakeup(genpd)	(genpd->flags & GENPD_FLAG_ACTIVE_WAKEUP)
+#define genpd_is_cpu_domain(genpd)	(genpd->flags & GENPD_FLAG_CPU_DOMAIN)
 
 static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev,
 		const struct generic_pm_domain *genpd)
@@ -1452,6 +1454,56 @@ static void genpd_free_dev_data(struct device *dev,
 	dev_pm_put_subsys_data(dev);
 }
 
+static void __genpd_update_cpumask(struct generic_pm_domain *genpd,
+				   int cpu, bool set, unsigned int depth)
+{
+	struct gpd_link *link;
+
+	if (!genpd_is_cpu_domain(genpd))
+		return;
+
+	list_for_each_entry(link, &genpd->slave_links, slave_node) {
+		struct generic_pm_domain *master = link->master;
+
+		genpd_lock_nested(master, depth + 1);
+		__genpd_update_cpumask(master, cpu, set, depth + 1);
+		genpd_unlock(master);
+	}
+
+	if (set)
+		cpumask_set_cpu(cpu, genpd->cpus);
+	else
+		cpumask_clear_cpu(cpu, genpd->cpus);
+}
+
+static void genpd_update_cpumask(struct generic_pm_domain *genpd,
+				 struct device *dev, bool set)
+{
+	int cpu;
+
+	if (!genpd_is_cpu_domain(genpd))
+		return;
+
+	for_each_possible_cpu(cpu) {
+		if (get_cpu_device(cpu) == dev) {
+			__genpd_update_cpumask(genpd, cpu, set, 0);
+			return;
+		}
+	}
+}
+
+static void genpd_set_cpumask(struct generic_pm_domain *genpd,
+			      struct device *dev)
+{
+	genpd_update_cpumask(genpd, dev, true);
+}
+
+static void genpd_clear_cpumask(struct generic_pm_domain *genpd,
+				struct device *dev)
+{
+	genpd_update_cpumask(genpd, dev, false);
+}
+
 static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
 			    struct gpd_timing_data *td)
 {
@@ -1473,6 +1525,8 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
 	if (ret)
 		goto out;
 
+	genpd_set_cpumask(genpd, dev);
+
 	dev_pm_domain_set(dev, &genpd->domain);
 
 	genpd->device_count++;
@@ -1534,6 +1588,7 @@ static int genpd_remove_device(struct generic_pm_domain *genpd,
 	if (genpd->detach_dev)
 		genpd->detach_dev(genpd, dev);
 
+	genpd_clear_cpumask(genpd, dev);
 	dev_pm_domain_set(dev, NULL);
 
 	list_del_init(&pdd->list_node);
@@ -1767,11 +1822,18 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
 	if (genpd_is_always_on(genpd) && !genpd_status_on(genpd))
 		return -EINVAL;
 
+	if (genpd_is_cpu_domain(genpd) &&
+	    !zalloc_cpumask_var(&genpd->cpus, GFP_KERNEL))
+		return -ENOMEM;
+
 	/* Use only one "off" state if there were no states declared */
 	if (genpd->state_count == 0) {
 		ret = genpd_set_default_power_state(genpd);
-		if (ret)
+		if (ret) {
+			if (genpd_is_cpu_domain(genpd))
+				free_cpumask_var(genpd->cpus);
 			return ret;
+		}
 	} else if (!gov) {
 		pr_warn("%s : no governor for states\n", genpd->name);
 	}
@@ -1817,6 +1879,8 @@ static int genpd_remove(struct generic_pm_domain *genpd)
 	list_del(&genpd->gpd_list_node);
 	genpd_unlock(genpd);
 	cancel_work_sync(&genpd->power_off_work);
+	if (genpd_is_cpu_domain(genpd))
+		free_cpumask_var(genpd->cpus);
 	if (genpd->free_states)
 		genpd->free_states(genpd->states, genpd->state_count);
 
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 8e1399231753..a6e251fe9deb 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -16,6 +16,7 @@
 #include <linux/of.h>
 #include <linux/notifier.h>
 #include <linux/spinlock.h>
+#include <linux/cpumask.h>
 
 /*
  * Flags to control the behaviour of a genpd.
@@ -42,11 +43,22 @@
  * GENPD_FLAG_ACTIVE_WAKEUP:	Instructs genpd to keep the PM domain powered
  *				on, in case any of its attached devices is used
  *				in the wakeup path to serve system wakeups.
+ *
+ * GENPD_FLAG_CPU_DOMAIN:	Instructs genpd that it should expect to get
+ *				devices attached, which may belong to CPUs or
+ *				possibly have subdomains with CPUs attached.
+ *				This flag enables the genpd backend driver to
+ *				deploy idle power management support for CPUs
+ *				and groups of CPUs. Note that, the backend
+ *				driver must then comply with the so called,
+ *				last-man-standing algorithm, for the CPUs in the
+ *				PM domain.
  */
 #define GENPD_FLAG_PM_CLK	 (1U << 0)
 #define GENPD_FLAG_IRQ_SAFE	 (1U << 1)
 #define GENPD_FLAG_ALWAYS_ON	 (1U << 2)
 #define GENPD_FLAG_ACTIVE_WAKEUP (1U << 3)
+#define GENPD_FLAG_CPU_DOMAIN	 (1U << 4)
 
 enum gpd_status {
 	GPD_STATE_ACTIVE = 0,	/* PM domain is active */
@@ -94,6 +106,7 @@ struct generic_pm_domain {
 	unsigned int suspended_count;	/* System suspend device counter */
 	unsigned int prepared_count;	/* Suspend counter of prepared devices */
 	unsigned int performance_state;	/* Aggregated max performance state */
+	cpumask_var_t cpus;		/* A cpumask of the attached CPUs */
 	int (*power_off)(struct generic_pm_domain *domain);
 	int (*power_on)(struct generic_pm_domain *domain);
 	struct opp_table *opp_table;	/* OPP table of the genpd */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v11 3/8] time: tick-sched: Provide helpers to get the next timer expiration
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 1/8] PM / Domains: Add a generic data pointer to the genpd_power_state struct Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 2/8] PM / Domains: Add support for CPU devices to genpd Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 4/8] cpuidle: menu: Convert to tick_nohz_get_next_timer|hrtimer() Ulf Hansson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

From: Daniel Lezcano <daniel.lezcano@linaro.org>

In the power control path with the interrupts disabled, cpuidle governors
calls tick_nohz_get_sleep_length() to get the duration until the next timer
expires.

This is not always convenient when going forward, as the duration is a
relative time to "now". To make it more flexible, let's instead provide two
new functions tick_nohz_get_next_timer|hrtimer(), which provides the actual
time for when the next timer is about to expire. The computation of the
duration can then be done by the caller.

For now, leave tick_nohz_get_sleep_length() as is. When users of it has
converted to the new APIs, then we can remove it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[Ulf: Clarified information in changelog]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- New patch, replace "timer: Export next wakeup time of a CPU" from v10.

---
 include/linux/tick.h     |  7 +++++
 kernel/time/tick-sched.c | 68 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 55388ab45fd4..5b10a0e4acbb 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -111,6 +111,8 @@ enum tick_dep_bits {
 #define TICK_DEP_MASK_SCHED		(1 << TICK_DEP_BIT_SCHED)
 #define TICK_DEP_MASK_CLOCK_UNSTABLE	(1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
 
+extern ktime_t tick_nohz_get_next_hrtimer(void);
+
 #ifdef CONFIG_NO_HZ_COMMON
 extern bool tick_nohz_enabled;
 extern bool tick_nohz_tick_stopped(void);
@@ -123,6 +125,7 @@ extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
 extern bool tick_nohz_idle_got_tick(void);
 extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
+extern ktime_t tick_nohz_get_next_timer(void);
 extern unsigned long tick_nohz_get_idle_calls(void);
 extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
@@ -145,6 +148,10 @@ static inline void tick_nohz_idle_restart_tick(void) { }
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
 static inline bool tick_nohz_idle_got_tick(void) { return false; }
+static inline ktime_t tick_nohz_get_next_timer(void)
+{
+	return tick_nohz_get_next_hrtimer();
+}
 
 static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
 {
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6fa52cd6df0b..9966be665074 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -443,6 +443,20 @@ void __init tick_nohz_init(void)
 }
 #endif
 
+/**
+ * tick_nohz_get_next_hrtimer - return the expected deadline for the
+ * next hrtimer
+ *
+ * Called from power state control code with interrupts disabled
+ *
+ * Returns a ktime_t based value giving the next deadline for the tick
+ * or an earlier hrtimer
+ */
+ktime_t tick_nohz_get_next_hrtimer(void)
+{
+	return __this_cpu_read(tick_cpu_device.evtdev)->next_event;
+}
+
 /*
  * NOHZ - aka dynamic tick functionality
  */
@@ -918,7 +932,7 @@ static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
 	int cpu = smp_processor_id();
 
 	/*
-	 * If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the
+	 * If tick_nohz_get_next_timer() ran tick_nohz_next_event(), the
 	 * tick timer expiration time is known already.
 	 */
 	if (ts->timer_expires_base)
@@ -1022,6 +1036,58 @@ bool tick_nohz_idle_got_tick(void)
 	return false;
 }
 
+/**
+ * tick_nohz_get_next_timer - return the earliest timer deadline
+ *
+ * Called from power state control code with interrupts disabled
+ *
+ * Returns a ktime_t based value giving the earliest timer
+ */
+ktime_t tick_nohz_get_next_timer(void)
+{
+
+	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+	ktime_t next_tick, next_timer, next_hrtimer;
+	int cpu = smp_processor_id();
+
+	/*
+	 * Actually the function returns the next hrtimer which can
+	 * be the next tick or another hrtimer expiring before the tick.
+	 */
+	next_tick = tick_nohz_get_next_hrtimer();
+
+	/*
+	 * If we can not stop the tick, then in any case the next
+	 * timer is the earliest one expiring with the hrtimer.
+	 */
+	if (!can_stop_idle_tick(cpu, ts))
+		return next_tick;
+
+	/*
+	 * Get the next low resolution timer led by the hrtimer. If
+	 * there is no timer in this case, then we can rely on the
+	 * hrtimer next event only coming from the call above.
+	 */
+	next_timer = tick_nohz_next_event(ts, cpu);
+	if (!next_timer)
+		return next_tick;
+
+	/*
+	 * This function will return the next timer after the
+	 * sched_timer aka the tick. We know there is another hrtimer
+	 * because the tick_nohz_next_event() returned a non zero
+	 * deadline and we need a tick to make the lowres to expire.
+	 */
+	next_hrtimer = hrtimer_next_event_without(&ts->sched_timer);
+
+	/*
+	 * At this point, we have the next lowres timer and the next
+	 * hrtimer which is not the tick. We need to figure out which
+	 * one expires first.
+	 */
+	return min_t(u64, next_hrtimer, next_timer);
+}
+
 /**
  * tick_nohz_get_sleep_length - return the expected length of the current sleep
  * @delta_next: duration until the next event if the tick cannot be stopped
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v11 4/8] cpuidle: menu: Convert to tick_nohz_get_next_timer|hrtimer()
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
                   ` (2 preceding siblings ...)
  2019-02-26 14:54 ` [PATCH v11 3/8] time: tick-sched: Provide helpers to get the next timer expiration Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 5/8] cpuidle: teo: Convert to tick_nohz_get_next_timer() Ulf Hansson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

From: Daniel Lezcano <daniel.lezcano@linaro.org>

Rather than using tick_nohz_get_sleep_length(), let's convert to use the
more flexible tick_nohz_get_next_timer|hrtimer() APIs. This should have no
functional change, but allows following changes to later drop the
tick_nohz_get_sleep_length() API.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[Ulf: Clarified information in changelog]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- New patch.

---
 drivers/cpuidle/governors/menu.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 61316fc51548..95e9122d6047 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -286,14 +286,26 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	unsigned int predicted_us;
 	unsigned long nr_iowaiters;
 	ktime_t delta_next;
+	ktime_t now = ktime_get();
+	ktime_t next_hrtimer = tick_nohz_get_next_hrtimer();
+	ktime_t next_timer = tick_nohz_get_next_timer();
 
 	if (data->needs_update) {
 		menu_update(drv, dev);
 		data->needs_update = 0;
 	}
 
-	/* determine the expected residency time, round up */
-	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
+	/*
+	 * Compute the duration before the next timer, whatever the origin
+	 */
+	delta_next = ktime_sub(next_timer, now);
+	data->next_timer_us = ktime_to_us(delta_next);
+
+	/*
+	 * Compute the duration before next hrtimer which is the tick
+	 * or an earliest hrtimer
+	 */
+	delta_next = ktime_sub(next_hrtimer, now);
 
 	nr_iowaiters = nr_iowait_cpu(dev->cpu);
 	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v11 5/8] cpuidle: teo: Convert to tick_nohz_get_next_timer()
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
                   ` (3 preceding siblings ...)
  2019-02-26 14:54 ` [PATCH v11 4/8] cpuidle: menu: Convert to tick_nohz_get_next_timer|hrtimer() Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 6/8] time: tick-sched: Remove tick_nohz_get_sleep_length() Ulf Hansson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

From: Daniel Lezcano <daniel.lezcano@linaro.org>

Rather than using tick_nohz_get_sleep_length(), let's convert to use the
more flexible tick_nohz_get_next_timer|hrtimer() APIs. This should have no
functional change, but allows following changes to later drop the
tick_nohz_get_sleep_length() API.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[Ulf: Clarified information in changelog]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- New patch.

---
 drivers/cpuidle/governors/teo.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 7d05efdbd3c6..bef1e95c597e 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -244,6 +244,9 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	unsigned int duration_us, count;
 	int max_early_idx, idx, i;
 	ktime_t delta_tick;
+	ktime_t now = ktime_get();
+	ktime_t next_hrtimer = tick_nohz_get_next_hrtimer();
+	ktime_t next_timer = tick_nohz_get_next_timer();
 
 	if (cpu_data->last_state >= 0) {
 		teo_update(drv, dev);
@@ -252,7 +255,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 
 	cpu_data->time_span_ns = local_clock();
 
-	cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
+	cpu_data->sleep_length_ns = ktime_sub(next_timer, now);
+	delta_tick = ktime_sub(next_hrtimer, now);
 	duration_us = ktime_to_us(cpu_data->sleep_length_ns);
 
 	count = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v11 6/8] time: tick-sched: Remove tick_nohz_get_sleep_length()
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
                   ` (4 preceding siblings ...)
  2019-02-26 14:54 ` [PATCH v11 5/8] cpuidle: teo: Convert to tick_nohz_get_next_timer() Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 14:54 ` [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state Ulf Hansson
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

From: Daniel Lezcano <daniel.lezcano@linaro.org>

There are no longer any users of tick_nohz_get_sleep_length(), so let's
drop it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[Ulf: Clarified information in changelog]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- New patch.

---
 include/linux/tick.h     |  6 ------
 kernel/time/tick-sched.c | 39 ---------------------------------------
 2 files changed, 45 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 5b10a0e4acbb..1a0158736072 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -124,7 +124,6 @@ extern void tick_nohz_idle_enter(void);
 extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
 extern bool tick_nohz_idle_got_tick(void);
-extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
 extern ktime_t tick_nohz_get_next_timer(void);
 extern unsigned long tick_nohz_get_idle_calls(void);
 extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
@@ -153,11 +152,6 @@ static inline ktime_t tick_nohz_get_next_timer(void)
 	return tick_nohz_get_next_hrtimer();
 }
 
-static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
-{
-	*delta_next = TICK_NSEC;
-	return *delta_next;
-}
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
 static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9966be665074..a376c7db26ce 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1088,45 +1088,6 @@ ktime_t tick_nohz_get_next_timer(void)
 	return min_t(u64, next_hrtimer, next_timer);
 }
 
-/**
- * tick_nohz_get_sleep_length - return the expected length of the current sleep
- * @delta_next: duration until the next event if the tick cannot be stopped
- *
- * Called from power state control code with interrupts disabled
- */
-ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
-{
-	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
-	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
-	int cpu = smp_processor_id();
-	/*
-	 * The idle entry time is expected to be a sufficient approximation of
-	 * the current time at this point.
-	 */
-	ktime_t now = ts->idle_entrytime;
-	ktime_t next_event;
-
-	WARN_ON_ONCE(!ts->inidle);
-
-	*delta_next = ktime_sub(dev->next_event, now);
-
-	if (!can_stop_idle_tick(cpu, ts))
-		return *delta_next;
-
-	next_event = tick_nohz_next_event(ts, cpu);
-	if (!next_event)
-		return *delta_next;
-
-	/*
-	 * If the next highres timer to expire is earlier than next_event, the
-	 * idle governor needs to know that.
-	 */
-	next_event = min_t(u64, next_event,
-			   hrtimer_next_event_without(&ts->sched_timer));
-
-	return ktime_sub(next_event, now);
-}
-
 /**
  * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
  * for a particular CPU.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
                   ` (5 preceding siblings ...)
  2019-02-26 14:54 ` [PATCH v11 6/8] time: tick-sched: Remove tick_nohz_get_sleep_length() Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 22:08   ` Rafael J. Wysocki
  2019-02-26 14:54 ` [PATCH v11 8/8] PM / Domains: Add genpd governor for CPUs Ulf Hansson
  2019-02-26 17:50 ` [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Rafael J. Wysocki
  8 siblings, 1 reply; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

A common piece of data used by cpuidle governors, is the information about
when the next timer/tick is going to fire. Rather than having each governor
calling tick_nohz_get_next_timer|hrtimer() separately, let's consolidate
the code by calling these functions before invoking the ->select() callback
of the governor - and store the output data in the struct cpuidle_device.

Besides the consolidation benefit, the purpose of this change is also to
make the information about the next timer/tick, available outside the
cpuidle framework. Following changes that implements a new genpd governor,
makes use of this.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- New patch.

---
 drivers/cpuidle/cpuidle.c        | 2 ++
 drivers/cpuidle/governors/menu.c | 6 ++----
 drivers/cpuidle/governors/teo.c  | 6 ++----
 include/linux/cpuidle.h          | 2 ++
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 7f108309e871..3b148253036b 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -312,6 +312,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		   bool *stop_tick)
 {
+	dev->next_timer = tick_nohz_get_next_timer();
+	dev->next_hrtimer = tick_nohz_get_next_hrtimer();
 	return cpuidle_curr_governor->select(drv, dev, stop_tick);
 }
 
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 95e9122d6047..cdbe434e783d 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -287,8 +287,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	unsigned long nr_iowaiters;
 	ktime_t delta_next;
 	ktime_t now = ktime_get();
-	ktime_t next_hrtimer = tick_nohz_get_next_hrtimer();
-	ktime_t next_timer = tick_nohz_get_next_timer();
 
 	if (data->needs_update) {
 		menu_update(drv, dev);
@@ -298,14 +296,14 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	/*
 	 * Compute the duration before the next timer, whatever the origin
 	 */
-	delta_next = ktime_sub(next_timer, now);
+	delta_next = ktime_sub(dev->next_timer, now);
 	data->next_timer_us = ktime_to_us(delta_next);
 
 	/*
 	 * Compute the duration before next hrtimer which is the tick
 	 * or an earliest hrtimer
 	 */
-	delta_next = ktime_sub(next_hrtimer, now);
+	delta_next = ktime_sub(dev->next_hrtimer, now);
 
 	nr_iowaiters = nr_iowait_cpu(dev->cpu);
 	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index bef1e95c597e..7af9851d9d40 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -245,8 +245,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	int max_early_idx, idx, i;
 	ktime_t delta_tick;
 	ktime_t now = ktime_get();
-	ktime_t next_hrtimer = tick_nohz_get_next_hrtimer();
-	ktime_t next_timer = tick_nohz_get_next_timer();
 
 	if (cpu_data->last_state >= 0) {
 		teo_update(drv, dev);
@@ -255,8 +253,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 
 	cpu_data->time_span_ns = local_clock();
 
-	cpu_data->sleep_length_ns = ktime_sub(next_timer, now);
-	delta_tick = ktime_sub(next_hrtimer, now);
+	cpu_data->sleep_length_ns = ktime_sub(dev->next_timer, now);
+	delta_tick = ktime_sub(dev->next_hrtimer, now);
 	duration_us = ktime_to_us(cpu_data->sleep_length_ns);
 
 	count = 0;
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3b39472324a3..81ec924206f0 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -83,6 +83,8 @@ struct cpuidle_device {
 	unsigned int		use_deepest_state:1;
 	unsigned int		poll_time_limit:1;
 	unsigned int		cpu;
+	ktime_t			next_timer;
+	ktime_t			next_hrtimer;
 
 	int			last_residency;
 	struct cpuidle_state_usage	states_usage[CPUIDLE_STATE_MAX];
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v11 8/8] PM / Domains: Add genpd governor for CPUs
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
                   ` (6 preceding siblings ...)
  2019-02-26 14:54 ` [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state Ulf Hansson
@ 2019-02-26 14:54 ` Ulf Hansson
  2019-02-26 17:50 ` [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Rafael J. Wysocki
  8 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 14:54 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Ulf Hansson, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, linux-arm-kernel,
	linux-arm-msm, linux-kernel

As it's now perfectly possible that a PM domain managed by genpd contains
devices belonging to CPUs, we should start to take into account the
residency values for the idle states during the state selection process.
The residency value specifies the minimum duration of time, the CPU or a
group of CPUs, needs to spend in an idle state to not waste energy entering
it.

To deal with this, let's add a new genpd governor, pm_domain_cpu_gov, that
may be used for a PM domain that have CPU devices attached or if the CPUs
are attached through subdomains.

The new governor computes the minimum expected idle duration time for the
online CPUs being attached to the PM domain and its subdomains. Then in the
state selection process, trying the deepest state first, it verifies that
the idle duration time satisfies the state's residency value.

It should be noted that, when computing the minimum expected idle duration
time, we use the information about the next timer/tick that is stored in
the per CPU variable, cpuidle_devices, for the related CPUs. Future wise,
this deserves to be improved, as there are obviously more reasons to why a
CPU may be woken up from idle.

Cc: Lina Iyer <ilina@codeaurora.org>
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v11:
	- Updated to use the data about the next timer/tick, pre-stored in the
	per CPU variable, cpuidle_devices.

---
 drivers/base/power/domain_governor.c | 62 +++++++++++++++++++++++++++-
 include/linux/pm_domain.h            |  3 ++
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
index 99896fbf18e4..d950ec24ed0e 100644
--- a/drivers/base/power/domain_governor.c
+++ b/drivers/base/power/domain_governor.c
@@ -10,6 +10,9 @@
 #include <linux/pm_domain.h>
 #include <linux/pm_qos.h>
 #include <linux/hrtimer.h>
+#include <linux/cpuidle.h>
+#include <linux/cpumask.h>
+#include <linux/ktime.h>
 
 static int dev_update_qos_constraint(struct device *dev, void *data)
 {
@@ -211,8 +214,10 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
 	struct generic_pm_domain *genpd = pd_to_genpd(pd);
 	struct gpd_link *link;
 
-	if (!genpd->max_off_time_changed)
+	if (!genpd->max_off_time_changed) {
+		genpd->state_idx = genpd->cached_power_down_state_idx;
 		return genpd->cached_power_down_ok;
+	}
 
 	/*
 	 * We have to invalidate the cached results for the masters, so
@@ -237,6 +242,7 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
 		genpd->state_idx--;
 	}
 
+	genpd->cached_power_down_state_idx = genpd->state_idx;
 	return genpd->cached_power_down_ok;
 }
 
@@ -245,6 +251,55 @@ static bool always_on_power_down_ok(struct dev_pm_domain *domain)
 	return false;
 }
 
+static bool cpu_power_down_ok(struct dev_pm_domain *pd)
+{
+	struct generic_pm_domain *genpd = pd_to_genpd(pd);
+	struct cpuidle_device *dev;
+	ktime_t domain_wakeup;
+	s64 idle_duration_ns;
+	int cpu, i;
+
+	/* Validate dev PM QoS constraints. */
+	if (!default_power_down_ok(pd))
+		return false;
+
+	if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN))
+		return true;
+
+	/*
+	 * Find the next wakeup for any of the online CPUs within the PM domain
+	 * and its subdomains. Note, we only need the genpd->cpus, as it already
+	 * contains a mask of all CPUs from subdomains.
+	 */
+	domain_wakeup = ktime_set(KTIME_SEC_MAX, 0);
+	for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) {
+		dev = per_cpu(cpuidle_devices, cpu);
+		if (dev && ktime_before(dev->next_timer, domain_wakeup))
+			domain_wakeup = dev->next_timer;
+	}
+
+	/* The minimum idle duration is from now - until the next wakeup. */
+	idle_duration_ns = ktime_to_ns(ktime_sub(domain_wakeup, ktime_get()));
+	if (idle_duration_ns <= 0)
+		return false;
+
+	/*
+	 * Find the deepest idle state that has its residency value satisfied
+	 * and by also taking into account the power off latency for the state.
+	 * Start at the state picked by the dev PM QoS constraint validation.
+	 */
+	i = genpd->state_idx;
+	do {
+		if (idle_duration_ns >= (genpd->states[i].residency_ns +
+		    genpd->states[i].power_off_latency_ns)) {
+			genpd->state_idx = i;
+			return true;
+		}
+	} while (--i >= 0);
+
+	return false;
+}
+
 struct dev_power_governor simple_qos_governor = {
 	.suspend_ok = default_suspend_ok,
 	.power_down_ok = default_power_down_ok,
@@ -257,3 +312,8 @@ struct dev_power_governor pm_domain_always_on_gov = {
 	.power_down_ok = always_on_power_down_ok,
 	.suspend_ok = default_suspend_ok,
 };
+
+struct dev_power_governor pm_domain_cpu_gov = {
+	.suspend_ok = default_suspend_ok,
+	.power_down_ok = cpu_power_down_ok,
+};
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index a6e251fe9deb..ae7061556a26 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -118,6 +118,7 @@ struct generic_pm_domain {
 	s64 max_off_time_ns;	/* Maximum allowed "suspended" time. */
 	bool max_off_time_changed;
 	bool cached_power_down_ok;
+	bool cached_power_down_state_idx;
 	int (*attach_dev)(struct generic_pm_domain *domain,
 			  struct device *dev);
 	void (*detach_dev)(struct generic_pm_domain *domain,
@@ -202,6 +203,7 @@ int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state);
 
 extern struct dev_power_governor simple_qos_governor;
 extern struct dev_power_governor pm_domain_always_on_gov;
+extern struct dev_power_governor pm_domain_cpu_gov;
 #else
 
 static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev)
@@ -245,6 +247,7 @@ static inline int dev_pm_genpd_set_performance_state(struct device *dev,
 
 #define simple_qos_governor		(*(struct dev_power_governor *)(NULL))
 #define pm_domain_always_on_gov		(*(struct dev_power_governor *)(NULL))
+#define pm_domain_cpu_gov		(*(struct dev_power_governor *)(NULL))
 #endif
 
 #ifdef CONFIG_PM_GENERIC_DOMAINS_SLEEP
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
  2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
                   ` (7 preceding siblings ...)
  2019-02-26 14:54 ` [PATCH v11 8/8] PM / Domains: Add genpd governor for CPUs Ulf Hansson
@ 2019-02-26 17:50 ` Rafael J. Wysocki
  2019-02-26 21:31   ` Ulf Hansson
  8 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2019-02-26 17:50 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Sudeep Holla, Lorenzo Pieralisi, Mark Rutland,
	Daniel Lezcano, Raju P . L . S . S . S . N, Stephen Boyd,
	Tony Lindgren, Kevin Hilman, Lina Iyer, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> Changes in v11:
>  - This version contains only the infrastructure changes that is needed for
> deployment. The PSCI/ARM changes have also been updated and tested, but I will
> post them separately. Still, to provide completeness, I have published a branch
> containing everything to a git tree [1], feel free to have a look and test.
>  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",
> which has been replaced by a couple of new patches, whom reworks the existing
> tick_nohz_get_sleep_length() function, to provide the next timer expiration
> instead of the duration.
>  - More changelogs are available per patch.

NAK for patches [4-6/8].

The code as is specifically avoids calling ktime_get() from the
governors as that can be quite expensive, so these patches potentially
make things worse.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
  2019-02-26 17:50 ` [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Rafael J. Wysocki
@ 2019-02-26 21:31   ` Ulf Hansson
  2019-02-26 21:52     ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 21:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J . Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Sudeep Holla, Lorenzo Pieralisi, Mark Rutland,
	Daniel Lezcano, Raju P . L . S . S . S . N, Stephen Boyd,
	Tony Lindgren, Kevin Hilman, Lina Iyer, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > Changes in v11:
> >  - This version contains only the infrastructure changes that is needed for
> > deployment. The PSCI/ARM changes have also been updated and tested, but I will
> > post them separately. Still, to provide completeness, I have published a branch
> > containing everything to a git tree [1], feel free to have a look and test.
> >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",
> > which has been replaced by a couple of new patches, whom reworks the existing
> > tick_nohz_get_sleep_length() function, to provide the next timer expiration
> > instead of the duration.
> >  - More changelogs are available per patch.
>
> NAK for patches [4-6/8].
>
> The code as is specifically avoids calling ktime_get() from the
> governors as that can be quite expensive, so these patches potentially
> make things worse.

Yeah, good point! What do you think about folding in a patch into the
series, like below, and then let the cpuidle governors use it?

One questions about when CONFIG_NO_HZ_COMMON is unset for the below
suggested code, does it make sense to "return -1" for that case, or
should I return ktime_get()? Does it matter?

Thanks for reviewing!

Kind regards
Uffe

From: Ulf Hansson <ulf.hansson@linaro.org>
Date: Tue, 26 Feb 2019 21:43:46 +0100
Subject: [PATCH] time: tick-sched: Add a helper function returning the idle
 entry time

To avoid calling ktime_get() unnecessary times during the idle path, let's
export the timestamp we stored in the per CPU variable,
tick_cpu_sched.idle_entrytime, at the point when we entered idle.

Following changes to the cpuidle governors makes use of this.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 include/linux/tick.h     |  2 ++
 kernel/time/tick-sched.c | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 5b10a0e4acbb..b641f6e4a50f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -126,6 +126,7 @@ extern void tick_nohz_irq_exit(void);
 extern bool tick_nohz_idle_got_tick(void);
 extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
 extern ktime_t tick_nohz_get_next_timer(void);
+extern ktime_t tick_nohz_get_idle_entrytime(void);
 extern unsigned long tick_nohz_get_idle_calls(void);
 extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
@@ -158,6 +159,7 @@ static inline ktime_t
tick_nohz_get_sleep_length(ktime_t *delta_next)
        *delta_next = TICK_NSEC;
        return *delta_next;
 }
+static inline ktime_t tick_nohz_get_idle_entrytime(void) { return -1; }
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
 static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9966be665074..e5d66b618bfa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1127,6 +1127,16 @@ ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
        return ktime_sub(next_event, now);
 }

+/**
+ * tick_nohz_get_idle_entrytime - return the time when we entered idle
+ *
+ * Called from power state control code with interrupts disabled
+ */
+ktime_t tick_nohz_get_idle_entrytime(void)
+{
+       return __this_cpu_read(tick_cpu_sched.idle_entrytime);
+}
+
 /**
  * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
  * for a particular CPU.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
  2019-02-26 21:31   ` Ulf Hansson
@ 2019-02-26 21:52     ` Rafael J. Wysocki
  2019-02-26 22:06       ` Ulf Hansson
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2019-02-26 21:52 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Rafael J . Wysocki, Linux PM,
	Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Viresh Kumar, Vincent Guittot,
	Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, Feb 26, 2019 at 10:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > Changes in v11:
> > >  - This version contains only the infrastructure changes that is needed for
> > > deployment. The PSCI/ARM changes have also been updated and tested, but I will
> > > post them separately. Still, to provide completeness, I have published a branch
> > > containing everything to a git tree [1], feel free to have a look and test.
> > >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",
> > > which has been replaced by a couple of new patches, whom reworks the existing
> > > tick_nohz_get_sleep_length() function, to provide the next timer expiration
> > > instead of the duration.
> > >  - More changelogs are available per patch.
> >
> > NAK for patches [4-6/8].
> >
> > The code as is specifically avoids calling ktime_get() from the
> > governors as that can be quite expensive, so these patches potentially
> > make things worse.
>
> Yeah, good point! What do you think about folding in a patch into the
> series, like below, and then let the cpuidle governors use it?

It is not objectionable as it stands, but that also depends on what
the new function is used for.

In particular, I don't really think that the menu and teo governors
need to call it at all.

> One questions about when CONFIG_NO_HZ_COMMON is unset for the below
> suggested code, does it make sense to "return -1" for that case, or
> should I return ktime_get()? Does it matter?

Again, it may or may not matter depending on what the caller is going
to do with the value returned by it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
  2019-02-26 21:52     ` Rafael J. Wysocki
@ 2019-02-26 22:06       ` Ulf Hansson
  2019-02-26 22:11         ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 22:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J . Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Sudeep Holla, Lorenzo Pieralisi, Mark Rutland,
	Daniel Lezcano, Raju P . L . S . S . S . N, Stephen Boyd,
	Tony Lindgren, Kevin Hilman, Lina Iyer, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, 26 Feb 2019 at 22:52, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Feb 26, 2019 at 10:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > Changes in v11:
> > > >  - This version contains only the infrastructure changes that is needed for
> > > > deployment. The PSCI/ARM changes have also been updated and tested, but I will
> > > > post them separately. Still, to provide completeness, I have published a branch
> > > > containing everything to a git tree [1], feel free to have a look and test.
> > > >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",
> > > > which has been replaced by a couple of new patches, whom reworks the existing
> > > > tick_nohz_get_sleep_length() function, to provide the next timer expiration
> > > > instead of the duration.
> > > >  - More changelogs are available per patch.
> > >
> > > NAK for patches [4-6/8].
> > >
> > > The code as is specifically avoids calling ktime_get() from the
> > > governors as that can be quite expensive, so these patches potentially
> > > make things worse.
> >
> > Yeah, good point! What do you think about folding in a patch into the
> > series, like below, and then let the cpuidle governors use it?
>
> It is not objectionable as it stands, but that also depends on what
> the new function is used for.
>
> In particular, I don't really think that the menu and teo governors
> need to call it at all.

Well, if we are going to re-work the code as suggested in the series,
then how would you suggest to get rid of the calls to ktime_get() that
is introduced in patch 4 and patch5?

BTW, at a closer look I am even tempted to squash patch3 to patch6
(including the part I attached earlier) as this part of the series is
really a re-work of tick_nohz_get_sleep_length() and its users.

>
> > One questions about when CONFIG_NO_HZ_COMMON is unset for the below
> > suggested code, does it make sense to "return -1" for that case, or
> > should I return ktime_get()? Does it matter?
>
> Again, it may or may not matter depending on what the caller is going
> to do with the value returned by it.

Well, so far this is only for teo/menu - instead of calling ktime_get().

Any other suggestions of how to move forward here is welcome, of course!

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state
  2019-02-26 14:54 ` [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state Ulf Hansson
@ 2019-02-26 22:08   ` Rafael J. Wysocki
  2019-02-26 22:17     ` Rafael J. Wysocki
  2019-02-26 23:15     ` Ulf Hansson
  0 siblings, 2 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2019-02-26 22:08 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Sudeep Holla, Lorenzo Pieralisi, Mark Rutland,
	Daniel Lezcano, Raju P . L . S . S . S . N, Stephen Boyd,
	Tony Lindgren, Kevin Hilman, Lina Iyer, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, Feb 26, 2019 at 3:54 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> A common piece of data used by cpuidle governors, is the information about
> when the next timer/tick is going to fire. Rather than having each governor
> calling tick_nohz_get_next_timer|hrtimer() separately, let's consolidate
> the code by calling these functions before invoking the ->select() callback
> of the governor - and store the output data in the struct cpuidle_device.

That misses the point IMO.

You don't need to store two values in struct cpuidle_device, but just
one, and not before running ->select(), but before invoking the
driver's ->enter() callback.

At that point, the decision on whether or not to stop the scheduler
tick has been made already and it should be sufficient to store the
return value of tick_nohz_get_next_hrtimer() introduced by patch
[3/8], because that value represents the next timer regardless of what
has been decided with respect to the tick.

And you won't need the tick_nohz_get_next_timer() any more then.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
  2019-02-26 22:06       ` Ulf Hansson
@ 2019-02-26 22:11         ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2019-02-26 22:11 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Rafael J . Wysocki, Linux PM,
	Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Viresh Kumar, Vincent Guittot,
	Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, Feb 26, 2019 at 11:06 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 26 Feb 2019 at 22:52, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Feb 26, 2019 at 10:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > Changes in v11:
> > > > >  - This version contains only the infrastructure changes that is needed for
> > > > > deployment. The PSCI/ARM changes have also been updated and tested, but I will
> > > > > post them separately. Still, to provide completeness, I have published a branch
> > > > > containing everything to a git tree [1], feel free to have a look and test.
> > > > >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",
> > > > > which has been replaced by a couple of new patches, whom reworks the existing
> > > > > tick_nohz_get_sleep_length() function, to provide the next timer expiration
> > > > > instead of the duration.
> > > > >  - More changelogs are available per patch.
> > > >
> > > > NAK for patches [4-6/8].
> > > >
> > > > The code as is specifically avoids calling ktime_get() from the
> > > > governors as that can be quite expensive, so these patches potentially
> > > > make things worse.
> > >
> > > Yeah, good point! What do you think about folding in a patch into the
> > > series, like below, and then let the cpuidle governors use it?
> >
> > It is not objectionable as it stands, but that also depends on what
> > the new function is used for.
> >
> > In particular, I don't really think that the menu and teo governors
> > need to call it at all.
>
> Well, if we are going to re-work the code as suggested in the series,
> then how would you suggest to get rid of the calls to ktime_get() that
> is introduced in patch 4 and patch5?
>
> BTW, at a closer look I am even tempted to squash patch3 to patch6
> (including the part I attached earlier) as this part of the series is
> really a re-work of tick_nohz_get_sleep_length() and its users.

Which is unnecessary IMO - see my reply to patch [7/8].

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state
  2019-02-26 22:08   ` Rafael J. Wysocki
@ 2019-02-26 22:17     ` Rafael J. Wysocki
  2019-02-26 23:15     ` Ulf Hansson
  1 sibling, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2019-02-26 22:17 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Sudeep Holla, Lorenzo Pieralisi, Mark Rutland,
	Daniel Lezcano, Raju P . L . S . S . S . N, Stephen Boyd,
	Tony Lindgren, Kevin Hilman, Lina Iyer, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, Feb 26, 2019 at 11:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Feb 26, 2019 at 3:54 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > A common piece of data used by cpuidle governors, is the information about
> > when the next timer/tick is going to fire. Rather than having each governor
> > calling tick_nohz_get_next_timer|hrtimer() separately, let's consolidate
> > the code by calling these functions before invoking the ->select() callback
> > of the governor - and store the output data in the struct cpuidle_device.
>
> That misses the point IMO.
>
> You don't need to store two values in struct cpuidle_device, but just
> one, and not before running ->select(), but before invoking the
> driver's ->enter() callback.
>
> At that point, the decision on whether or not to stop the scheduler
> tick has been made already and it should be sufficient to store the
> return value of tick_nohz_get_next_hrtimer() introduced by patch
> [3/8],

Or the difference between in and ts->idle_entrytime for that matter,
whichever is more useful.

> because that value represents the next timer regardless of what
> has been decided with respect to the tick.
>
> And you won't need the tick_nohz_get_next_timer() any more then.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state
  2019-02-26 22:08   ` Rafael J. Wysocki
  2019-02-26 22:17     ` Rafael J. Wysocki
@ 2019-02-26 23:15     ` Ulf Hansson
  2019-02-26 23:40       ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Ulf Hansson @ 2019-02-26 23:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J . Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Sudeep Holla, Lorenzo Pieralisi, Mark Rutland,
	Daniel Lezcano, Raju P . L . S . S . S . N, Stephen Boyd,
	Tony Lindgren, Kevin Hilman, Lina Iyer, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Tue, 26 Feb 2019 at 23:08, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Feb 26, 2019 at 3:54 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > A common piece of data used by cpuidle governors, is the information about
> > when the next timer/tick is going to fire. Rather than having each governor
> > calling tick_nohz_get_next_timer|hrtimer() separately, let's consolidate
> > the code by calling these functions before invoking the ->select() callback
> > of the governor - and store the output data in the struct cpuidle_device.
>
> That misses the point IMO.
>
> You don't need to store two values in struct cpuidle_device, but just
> one, and not before running ->select(), but before invoking the
> driver's ->enter() callback.

Okay! Thanks for letting me know!

>
> At that point, the decision on whether or not to stop the scheduler
> tick has been made already and it should be sufficient to store the
> return value of tick_nohz_get_next_hrtimer() introduced by patch
> [3/8], because that value represents the next timer regardless of what
> has been decided with respect to the tick.

Just to make sure I get this correctly, because it seems like I have
missed a few points here....

If we decided to keep the tick running, then
tick_nohz_get_next_hrtimer() gives the next tick or the next hrtimer,
whatever that comes first. There are no other timer that can expire
earlier than this, right!?

If we decided to stop the tick, then tick_nohz_get_next_hrtimer() will
give us the next hrtimer. Again, then there are no other timer that
can't expire earlier than this, right!?

>
> And you won't need the tick_nohz_get_next_timer() any more then.

Alright, this kind of brings this hole thing back closer to v10 - and
then we should stick to use tick_nohz_get_sleep_length() as is for the
cpuidle governors. That is what you are saying?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state
  2019-02-26 23:15     ` Ulf Hansson
@ 2019-02-26 23:40       ` Rafael J. Wysocki
  2019-02-27  0:07         ` Ulf Hansson
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2019-02-26 23:40 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Rafael J . Wysocki, Linux PM,
	Frederic Weisbecker, Thomas Gleixner, Sudeep Holla,
	Lorenzo Pieralisi, Mark Rutland, Daniel Lezcano,
	Raju P . L . S . S . S . N, Stephen Boyd, Tony Lindgren,
	Kevin Hilman, Lina Iyer, Viresh Kumar, Vincent Guittot,
	Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Wed, Feb 27, 2019 at 12:16 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 26 Feb 2019 at 23:08, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Feb 26, 2019 at 3:54 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > A common piece of data used by cpuidle governors, is the information about
> > > when the next timer/tick is going to fire. Rather than having each governor
> > > calling tick_nohz_get_next_timer|hrtimer() separately, let's consolidate
> > > the code by calling these functions before invoking the ->select() callback
> > > of the governor - and store the output data in the struct cpuidle_device.
> >
> > That misses the point IMO.
> >
> > You don't need to store two values in struct cpuidle_device, but just
> > one, and not before running ->select(), but before invoking the
> > driver's ->enter() callback.
>
> Okay! Thanks for letting me know!
>
> >
> > At that point, the decision on whether or not to stop the scheduler
> > tick has been made already and it should be sufficient to store the
> > return value of tick_nohz_get_next_hrtimer() introduced by patch
> > [3/8], because that value represents the next timer regardless of what
> > has been decided with respect to the tick.
>
> Just to make sure I get this correctly, because it seems like I have
> missed a few points here....
>
> If we decided to keep the tick running, then
> tick_nohz_get_next_hrtimer() gives the next tick or the next hrtimer,
> whatever that comes first. There are no other timer that can expire
> earlier than this, right!?
>
> If we decided to stop the tick, then tick_nohz_get_next_hrtimer() will
> give us the next hrtimer. Again, then there are no other timer that
> can't expire earlier than this, right!?

Right in both cases.

IOW, that is the event that will wake up the CPU unless any other
(non-timer) interrupts (or equivalent events) occur in the meantime.

> >
> > And you won't need the tick_nohz_get_next_timer() any more then.
>
> Alright, this kind of brings this hole thing back closer to v10 - and
> then we should stick to use tick_nohz_get_sleep_length() as is for the
> cpuidle governors. That is what you are saying?

For the menu and teo governors - yes.  IMO
tick_nohz_get_sleep_length() is as good as it gets in there.

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state
  2019-02-26 23:40       ` Rafael J. Wysocki
@ 2019-02-27  0:07         ` Ulf Hansson
  0 siblings, 0 replies; 19+ messages in thread
From: Ulf Hansson @ 2019-02-27  0:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J . Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Sudeep Holla, Lorenzo Pieralisi, Mark Rutland,
	Daniel Lezcano, Raju P . L . S . S . S . N, Stephen Boyd,
	Tony Lindgren, Kevin Hilman, Lina Iyer, Viresh Kumar,
	Vincent Guittot, Geert Uytterhoeven, Linux ARM, linux-arm-msm,
	Linux Kernel Mailing List

On Wed, 27 Feb 2019 at 00:41, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Feb 27, 2019 at 12:16 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Tue, 26 Feb 2019 at 23:08, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Tue, Feb 26, 2019 at 3:54 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > A common piece of data used by cpuidle governors, is the information about
> > > > when the next timer/tick is going to fire. Rather than having each governor
> > > > calling tick_nohz_get_next_timer|hrtimer() separately, let's consolidate
> > > > the code by calling these functions before invoking the ->select() callback
> > > > of the governor - and store the output data in the struct cpuidle_device.
> > >
> > > That misses the point IMO.
> > >
> > > You don't need to store two values in struct cpuidle_device, but just
> > > one, and not before running ->select(), but before invoking the
> > > driver's ->enter() callback.
> >
> > Okay! Thanks for letting me know!
> >
> > >
> > > At that point, the decision on whether or not to stop the scheduler
> > > tick has been made already and it should be sufficient to store the
> > > return value of tick_nohz_get_next_hrtimer() introduced by patch
> > > [3/8], because that value represents the next timer regardless of what
> > > has been decided with respect to the tick.
> >
> > Just to make sure I get this correctly, because it seems like I have
> > missed a few points here....
> >
> > If we decided to keep the tick running, then
> > tick_nohz_get_next_hrtimer() gives the next tick or the next hrtimer,
> > whatever that comes first. There are no other timer that can expire
> > earlier than this, right!?
> >
> > If we decided to stop the tick, then tick_nohz_get_next_hrtimer() will
> > give us the next hrtimer. Again, then there are no other timer that
> > can't expire earlier than this, right!?
>
> Right in both cases.
>
> IOW, that is the event that will wake up the CPU unless any other
> (non-timer) interrupts (or equivalent events) occur in the meantime.
>
> > >
> > > And you won't need the tick_nohz_get_next_timer() any more then.
> >
> > Alright, this kind of brings this hole thing back closer to v10 - and
> > then we should stick to use tick_nohz_get_sleep_length() as is for the
> > cpuidle governors. That is what you are saying?
>
> For the menu and teo governors - yes.  IMO
> tick_nohz_get_sleep_length() is as good as it gets in there.

Okay, got it! Thanks for your helpful answers!

I will post a v12 as soon as I can, so we can look into the other
parts of the series.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-02-27  0:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 14:54 [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 1/8] PM / Domains: Add a generic data pointer to the genpd_power_state struct Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 2/8] PM / Domains: Add support for CPU devices to genpd Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 3/8] time: tick-sched: Provide helpers to get the next timer expiration Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 4/8] cpuidle: menu: Convert to tick_nohz_get_next_timer|hrtimer() Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 5/8] cpuidle: teo: Convert to tick_nohz_get_next_timer() Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 6/8] time: tick-sched: Remove tick_nohz_get_sleep_length() Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 7/8] cpuidle: Pre-store next timer/tick before selecting an idle state Ulf Hansson
2019-02-26 22:08   ` Rafael J. Wysocki
2019-02-26 22:17     ` Rafael J. Wysocki
2019-02-26 23:15     ` Ulf Hansson
2019-02-26 23:40       ` Rafael J. Wysocki
2019-02-27  0:07         ` Ulf Hansson
2019-02-26 14:54 ` [PATCH v11 8/8] PM / Domains: Add genpd governor for CPUs Ulf Hansson
2019-02-26 17:50 ` [PATCH v11 0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) Rafael J. Wysocki
2019-02-26 21:31   ` Ulf Hansson
2019-02-26 21:52     ` Rafael J. Wysocki
2019-02-26 22:06       ` Ulf Hansson
2019-02-26 22:11         ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).