[PATCH V3 0/2] cpufreq: cppc: Add support for frequency invariance

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 0/2] cpufreq: cppc: Add support for frequency invariance
@ 2021-01-28 10:48 Viresh Kumar
  2021-01-28 10:48 ` [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback Viresh Kumar
  2021-01-28 10:48 ` [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
  0 siblings, 2 replies; 19+ messages in thread
From: Viresh Kumar @ 2021-01-28 10:48 UTC (permalink / raw)
  To: Ionela Voinescu, Rafael Wysocki, Catalin Marinas, Will Deacon
  Cc: Viresh Kumar, Vincent Guittot, linux-kernel, linux-arm-kernel,
	linux-pm, Sudeep Holla, Greg Kroah-Hartman

Hello,

CPPC cpufreq driver is used for ARM servers and this patch series tries
to provide counter-based frequency invariance support for them in the
absence for architecture specific counters (like AMUs).

This is tested with some hacks, as I didn't have access to the right
hardware, on the ARM64 hikey board to check the overall functionality
and that works fine.

Vincent/Ionela, it would be nice if you guys can give this a shot on
some real hardware where counters work.

This is based of pm/linux-next (need some stuff from the arm64 tree).
These patches should get merged via the arm64 tree only for the same
reason.

Changes since V2:
- Not sending as an RFC anymore.
- Several renames, reordering of code in 1/2 based on Ionela's comments.
- Several rebase changes for 2/2.
- The freq_scale calculations are optimized a bit.
- Better overall commenting and commit logs.

Changes since V1:
- The interface for setting the callbacks is improved, so different
  parts looking to provide their callbacks don't need to think about
  each other.

- Moved to per-cpu storage for storing the callback related data, AMU
  counters have higher priority with this.

--
viresh

Viresh Kumar (2):
  topology: Allow multiple entities to provide sched_freq_tick()
    callback
  cpufreq: cppc: Add support for frequency invariance

 arch/arm64/include/asm/topology.h |  10 +-
 arch/arm64/kernel/topology.c      |  89 +++++++--------
 drivers/base/arch_topology.c      |  56 +++++++++-
 drivers/cpufreq/cppc_cpufreq.c    | 179 ++++++++++++++++++++++++++++--
 include/linux/arch_topology.h     |  15 ++-
 kernel/sched/core.c               |   1 +
 6 files changed, 274 insertions(+), 76 deletions(-)

-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-01-28 10:48 [PATCH V3 0/2] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
@ 2021-01-28 10:48 ` Viresh Kumar
  2021-02-03 11:45   ` Ionela Voinescu
  2021-01-28 10:48 ` [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
  1 sibling, 1 reply; 19+ messages in thread
From: Viresh Kumar @ 2021-01-28 10:48 UTC (permalink / raw)
  To: Ionela Voinescu, Rafael Wysocki, Catalin Marinas, Will Deacon
  Cc: Viresh Kumar, Vincent Guittot, linux-kernel, linux-arm-kernel,
	linux-pm, Sudeep Holla, Greg Kroah-Hartman

This patch attempts to make it generic enough so other parts of the
kernel can also provide their own implementation of scale_freq_tick()
callback, which is called by the scheduler periodically to update the
per-cpu freq_scale variable.

The implementations now need to provide struct scale_freq_data for the
CPUs for which they have hardware counters available, and a callback
gets registered for each possible CPU in a per-cpu variable.

The arch specific (or ARM AMU) counters are updated to adapt to this and
they take the highest priority if they are available, i.e. they will be
used instead of CPPC based counters for example.

Note that this also defines SCALE_FREQ_SOURCE_CPUFREQ but doesn't use it
and it is added to show that cpufreq is also acts as source of
information for FIE and will be used by default if no other counters are
supported for a platform.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 arch/arm64/include/asm/topology.h | 10 +---
 arch/arm64/kernel/topology.c      | 89 ++++++++++++++-----------------
 drivers/base/arch_topology.c      | 56 +++++++++++++++++--
 include/linux/arch_topology.h     | 14 ++++-
 4 files changed, 105 insertions(+), 64 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 3b8dca4eb08d..ec2db3419c41 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -17,17 +17,9 @@ int pcibus_to_node(struct pci_bus *bus);
 #include <linux/arch_topology.h>
 
 void update_freq_counters_refs(void);
-void topology_scale_freq_tick(void);
-
-#ifdef CONFIG_ARM64_AMU_EXTN
-/*
- * Replace task scheduler's default counter-based
- * frequency-invariance scale factor setting.
- */
-#define arch_scale_freq_tick topology_scale_freq_tick
-#endif /* CONFIG_ARM64_AMU_EXTN */
 
 /* Replace task scheduler's default frequency-invariant accounting */
+#define arch_scale_freq_tick topology_scale_freq_tick
 #define arch_set_freq_scale topology_set_freq_scale
 #define arch_scale_freq_capacity topology_get_freq_scale
 #define arch_scale_freq_invariant topology_scale_freq_invariant
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index e08a4126453a..1e47dfd465f8 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -199,8 +199,44 @@ static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
 	return 0;
 }
 
-static DEFINE_STATIC_KEY_FALSE(amu_fie_key);
-#define amu_freq_invariant() static_branch_unlikely(&amu_fie_key)
+static void amu_scale_freq_tick(void)
+{
+	u64 prev_core_cnt, prev_const_cnt;
+	u64 core_cnt, const_cnt, scale;
+
+	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
+	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+
+	update_freq_counters_refs();
+
+	const_cnt = this_cpu_read(arch_const_cycles_prev);
+	core_cnt = this_cpu_read(arch_core_cycles_prev);
+
+	if (unlikely(core_cnt <= prev_core_cnt ||
+		     const_cnt <= prev_const_cnt))
+		return;
+
+	/*
+	 *	    /\core    arch_max_freq_scale
+	 * scale =  ------- * --------------------
+	 *	    /\const   SCHED_CAPACITY_SCALE
+	 *
+	 * See validate_cpu_freq_invariance_counters() for details on
+	 * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
+	 */
+	scale = core_cnt - prev_core_cnt;
+	scale *= this_cpu_read(arch_max_freq_scale);
+	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
+			  const_cnt - prev_const_cnt);
+
+	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
+	this_cpu_write(freq_scale, (unsigned long)scale);
+}
+
+static struct scale_freq_data amu_sfd = {
+	.source = SCALE_FREQ_SOURCE_ARCH,
+	.set_freq_scale = amu_scale_freq_tick,
+};
 
 static void amu_fie_setup(const struct cpumask *cpus)
 {
@@ -227,7 +263,7 @@ static void amu_fie_setup(const struct cpumask *cpus)
 	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
 		return;
 
-	static_branch_enable(&amu_fie_key);
+	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);
 
 	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",
 		 cpumask_pr_args(cpus));
@@ -283,53 +319,6 @@ static int __init init_amu_fie(void)
 }
 core_initcall(init_amu_fie);
 
-bool arch_freq_counters_available(const struct cpumask *cpus)
-{
-	return amu_freq_invariant() &&
-	       cpumask_subset(cpus, amu_fie_cpus);
-}
-
-void topology_scale_freq_tick(void)
-{
-	u64 prev_core_cnt, prev_const_cnt;
-	u64 core_cnt, const_cnt, scale;
-	int cpu = smp_processor_id();
-
-	if (!amu_freq_invariant())
-		return;
-
-	if (!cpumask_test_cpu(cpu, amu_fie_cpus))
-		return;
-
-	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
-	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
-
-	update_freq_counters_refs();
-
-	const_cnt = this_cpu_read(arch_const_cycles_prev);
-	core_cnt = this_cpu_read(arch_core_cycles_prev);
-
-	if (unlikely(core_cnt <= prev_core_cnt ||
-		     const_cnt <= prev_const_cnt))
-		return;
-
-	/*
-	 *	    /\core    arch_max_freq_scale
-	 * scale =  ------- * --------------------
-	 *	    /\const   SCHED_CAPACITY_SCALE
-	 *
-	 * See validate_cpu_freq_invariance_counters() for details on
-	 * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
-	 */
-	scale = core_cnt - prev_core_cnt;
-	scale *= this_cpu_read(arch_max_freq_scale);
-	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
-			  const_cnt - prev_const_cnt);
-
-	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
-	this_cpu_write(freq_scale, (unsigned long)scale);
-}
-
 #ifdef CONFIG_ACPI_CPPC_LIB
 #include <acpi/cppc_acpi.h>
 
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index de8587cc119e..20b511949cd8 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -21,17 +21,65 @@
 #include <linux/sched.h>
 #include <linux/smp.h>
 
+static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);
+static struct cpumask scale_freq_counters_mask;
+
+static bool supports_scale_freq_counters(const struct cpumask *cpus)
+{
+	return cpumask_subset(cpus, &scale_freq_counters_mask);
+}
+
 bool topology_scale_freq_invariant(void)
 {
 	return cpufreq_supports_freq_invariance() ||
-	       arch_freq_counters_available(cpu_online_mask);
+	       supports_scale_freq_counters(cpu_online_mask);
+}
+
+void topology_set_scale_freq_source(struct scale_freq_data *data,
+				    const struct cpumask *cpus)
+{
+	struct scale_freq_data *sfd;
+	int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		sfd = per_cpu(sft_data, cpu);
+
+		/* Use ARCH provided counters whenever possible */
+		if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) {
+			per_cpu(sft_data, cpu) = data;
+			cpumask_set_cpu(cpu, &scale_freq_counters_mask);
+		}
+	}
 }
+EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
 
-__weak bool arch_freq_counters_available(const struct cpumask *cpus)
+void topology_clear_scale_freq_source(enum scale_freq_source source,
+				      const struct cpumask *cpus)
 {
-	return false;
+	struct scale_freq_data *sfd;
+	int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		sfd = per_cpu(sft_data, cpu);
+
+		if (sfd && sfd->source == source) {
+			per_cpu(sft_data, cpu) = NULL;
+			cpumask_clear_cpu(cpu, &scale_freq_counters_mask);
+		}
+	}
 }
+EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);
+
+void topology_scale_freq_tick(void)
+{
+	struct scale_freq_data *sfd = *this_cpu_ptr(&sft_data);
+
+	if (sfd)
+		sfd->set_freq_scale();
+}
+
 DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
+EXPORT_SYMBOL_GPL(freq_scale);
 
 void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
 			     unsigned long max_freq)
@@ -47,7 +95,7 @@ void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
 	 * want to update the scale factor with information from CPUFREQ.
 	 * Instead the scale factor will be updated from arch_scale_freq_tick.
 	 */
-	if (arch_freq_counters_available(cpus))
+	if (supports_scale_freq_counters(cpus))
 		return;
 
 	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 0f6cd6b73a61..3bcfba5c21a7 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -34,7 +34,19 @@ void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
 			     unsigned long max_freq);
 bool topology_scale_freq_invariant(void);
 
-bool arch_freq_counters_available(const struct cpumask *cpus);
+enum scale_freq_source {
+	SCALE_FREQ_SOURCE_CPUFREQ = 0,
+	SCALE_FREQ_SOURCE_ARCH,
+};
+
+struct scale_freq_data {
+	enum scale_freq_source source;
+	void (*set_freq_scale)(void);
+};
+
+void topology_scale_freq_tick(void);
+void topology_set_scale_freq_source(struct scale_freq_data *data, const struct cpumask *cpus);
+void topology_clear_scale_freq_source(enum scale_freq_source source, const struct cpumask *cpus);
 
 DECLARE_PER_CPU(unsigned long, thermal_pressure);
 
-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance
  2021-01-28 10:48 [PATCH V3 0/2] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
  2021-01-28 10:48 ` [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback Viresh Kumar
@ 2021-01-28 10:48 ` Viresh Kumar
  2021-02-18 16:35   ` Ionela Voinescu
  1 sibling, 1 reply; 19+ messages in thread
From: Viresh Kumar @ 2021-01-28 10:48 UTC (permalink / raw)
  To: Ionela Voinescu, Rafael Wysocki, Catalin Marinas, Will Deacon
  Cc: Viresh Kumar, Vincent Guittot, linux-kernel, linux-arm-kernel,
	linux-pm, Sudeep Holla, Greg Kroah-Hartman

The Frequency Invariance Engine (FIE) is providing a frequency scaling
correction factor that helps achieve more accurate load-tracking.

Normally, this scaling factor can be obtained directly with the help of
the cpufreq drivers as they know the exact frequency the hardware is
running at. But that isn't the case for CPPC cpufreq driver.

Another way of obtaining that is using the arch specific counter
support, which is already present in kernel, but that hardware is
optional for platforms.

This patch thus obtains this scaling factor using the existing logic
present in the cppc driver. Note that the arch specific counters have
higher priority than CPPC counters if available, though the CPPC driver
doesn't need to have any special handling for that.

This also exports sched_setattr_nocheck() as the CPPC driver can be
built as a module.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cppc_cpufreq.c | 179 ++++++++++++++++++++++++++++++---
 include/linux/arch_topology.h  |   1 +
 kernel/sched/core.c            |   1 +
 3 files changed, 169 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 8a482c434ea6..53815f6d2797 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -10,14 +10,18 @@
 
 #define pr_fmt(fmt)	"CPPC Cpufreq:"	fmt
 
+#include <linux/arch_topology.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/dmi.h>
+#include <linux/irq_work.h>
+#include <linux/kthread.h>
 #include <linux/time.h>
 #include <linux/vmalloc.h>
+#include <uapi/linux/sched/types.h>
 
 #include <asm/unaligned.h>
 
@@ -39,6 +43,17 @@ static LIST_HEAD(cpu_data_list);
 
 static bool boost_supported;
 
+struct cppc_freq_invariance {
+	struct kthread_worker *worker;
+	struct irq_work irq_work;
+	struct kthread_work work;
+	struct cppc_perf_fb_ctrs prev_perf_fb_ctrs;
+	struct cppc_cpudata *cpu_data;
+	unsigned int max_freq;
+};
+
+static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_f_i);
+
 struct cppc_workaround_oem_info {
 	char oem_id[ACPI_OEM_ID_SIZE + 1];
 	char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1];
@@ -292,7 +307,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	unsigned int cpu = policy->cpu;
 	struct cppc_cpudata *cpu_data;
 	struct cppc_perf_caps *caps;
-	int ret;
+	int ret, i;
 
 	cpu_data = cppc_cpufreq_get_cpu_data(cpu);
 	if (!cpu_data) {
@@ -343,6 +358,11 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 		return -EFAULT;
 	}
 
+	for_each_cpu(i, policy->cpus) {
+		per_cpu(cppc_f_i, i).max_freq = policy->cpuinfo.max_freq;
+		per_cpu(cppc_f_i, i).cpu_data = cpu_data;
+	}
+
 	/*
 	 * If 'highest_perf' is greater than 'nominal_perf', we assume CPU Boost
 	 * is supported.
@@ -370,12 +390,12 @@ static inline u64 get_delta(u64 t1, u64 t0)
 	return (u32)t1 - (u32)t0;
 }
 
-static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
-				     struct cppc_perf_fb_ctrs fb_ctrs_t0,
-				     struct cppc_perf_fb_ctrs fb_ctrs_t1)
+static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
+				 struct cppc_perf_fb_ctrs fb_ctrs_t0,
+				 struct cppc_perf_fb_ctrs fb_ctrs_t1)
 {
 	u64 delta_reference, delta_delivered;
-	u64 reference_perf, delivered_perf;
+	u64 reference_perf;
 
 	reference_perf = fb_ctrs_t0.reference_perf;
 
@@ -385,11 +405,20 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
 				    fb_ctrs_t0.delivered);
 
 	/* Check to avoid divide-by zero */
-	if (delta_reference || delta_delivered)
-		delivered_perf = (reference_perf * delta_delivered) /
-					delta_reference;
-	else
-		delivered_perf = cpu_data->perf_ctrls.desired_perf;
+	if (!delta_reference && !delta_delivered)
+		return cpu_data->perf_ctrls.desired_perf;
+
+	return (reference_perf * delta_delivered) / delta_reference;
+}
+
+static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
+				     struct cppc_perf_fb_ctrs fb_ctrs_t0,
+				     struct cppc_perf_fb_ctrs fb_ctrs_t1)
+{
+	u64 delivered_perf;
+
+	delivered_perf = cppc_perf_from_fbctrs(cpu_data, fb_ctrs_t0,
+					       fb_ctrs_t1);
 
 	return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
 }
@@ -420,7 +449,7 @@ static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
 {
 	struct cppc_cpudata *cpu_data = policy->driver_data;
 	struct cppc_perf_caps *caps = &cpu_data->perf_caps;
-	int ret;
+	int ret, i;
 
 	if (!boost_supported) {
 		pr_err("BOOST not supported by CPU or firmware\n");
@@ -435,6 +464,9 @@ static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
 						       caps->nominal_perf);
 	policy->cpuinfo.max_freq = policy->max;
 
+	for_each_cpu(i, policy->related_cpus)
+		per_cpu(cppc_f_i, i).max_freq = policy->cpuinfo.max_freq;
+
 	ret = freq_qos_update_request(policy->max_freq_req, policy->max);
 	if (ret < 0)
 		return ret;
@@ -512,8 +544,126 @@ static void cppc_check_hisi_workaround(void)
 	acpi_put_table(tbl);
 }
 
+static void cppc_scale_freq_workfn(struct kthread_work *work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+	struct cppc_perf_fb_ctrs fb_ctrs = {0};
+	int cpu = raw_smp_processor_id();
+	struct cppc_cpudata *cpu_data;
+	u64 perf;
+
+	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
+	cpu_data = cppc_fi->cpu_data;
+
+	if (cppc_get_perf_ctrs(cpu, &fb_ctrs)) {
+		pr_info("%s: cppc_get_perf_ctrs() failed\n", __func__);
+		return;
+	}
+
+	cppc_fi->prev_perf_fb_ctrs = fb_ctrs;
+	perf = cppc_perf_from_fbctrs(cpu_data, cppc_fi->prev_perf_fb_ctrs,
+				     fb_ctrs);
+
+	perf <<= SCHED_CAPACITY_SHIFT;
+	per_cpu(freq_scale, cpu) = div64_u64(perf, cpu_data->perf_caps.highest_perf);
+}
+
+static void cppc_irq_work(struct irq_work *irq_work)
+{
+	struct cppc_freq_invariance *cppc_fi;
+
+	cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work);
+	kthread_queue_work(cppc_fi->worker, &cppc_fi->work);
+}
+
+static void cppc_scale_freq_tick(void)
+{
+	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_f_i, raw_smp_processor_id());
+
+	/*
+	 * cppc_get_perf_ctrs() can potentially sleep, call that from the right
+	 * context.
+	 */
+	irq_work_queue(&cppc_fi->irq_work);
+}
+
+static struct scale_freq_data cppc_sftd = {
+	.source = SCALE_FREQ_SOURCE_CPPC,
+	.set_freq_scale = cppc_scale_freq_tick,
+};
+
+static void cppc_freq_invariance_exit(void)
+{
+	struct cppc_freq_invariance *cppc_fi;
+	int i;
+
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return;
+
+	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, cpu_present_mask);
+
+	for_each_possible_cpu(i) {
+		cppc_fi = &per_cpu(cppc_f_i, i);
+		if (cppc_fi->worker) {
+			irq_work_sync(&cppc_fi->irq_work);
+			kthread_destroy_worker(cppc_fi->worker);
+			cppc_fi->worker = NULL;
+		}
+	}
+}
+
+static void __init cppc_freq_invariance_init(void)
+{
+	struct cppc_perf_fb_ctrs fb_ctrs = {0};
+	struct cppc_freq_invariance *cppc_fi;
+	struct sched_attr attr = {
+		.size		= sizeof(struct sched_attr),
+		.sched_policy	= SCHED_DEADLINE,
+		.sched_nice	= 0,
+		.sched_priority	= 0,
+		/*
+		 * Fake (unused) bandwidth; workaround to "fix"
+		 * priority inheritance.
+		 */
+		.sched_runtime	= 1000000,
+		.sched_deadline = 10000000,
+		.sched_period	= 10000000,
+	};
+	struct kthread_worker *worker;
+	int i, ret;
+
+	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+		return;
+
+	for_each_possible_cpu(i) {
+		cppc_fi = &per_cpu(cppc_f_i, i);
+
+		kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
+		init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+		worker = kthread_create_worker_on_cpu(i, 0, "cppc:%d", i);
+		if (IS_ERR(worker))
+			return cppc_freq_invariance_exit();
+
+		cppc_fi->worker = worker;
+		ret = sched_setattr_nocheck(worker->task, &attr);
+		if (ret) {
+			pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
+			return cppc_freq_invariance_exit();
+		}
+
+		ret = cppc_get_perf_ctrs(i, &fb_ctrs);
+		if (!ret)
+			per_cpu(cppc_fi->prev_perf_fb_ctrs, i) = fb_ctrs;
+	}
+
+	/* Register for freq-invariance */
+	topology_set_scale_freq_source(&cppc_sftd, cpu_present_mask);
+}
+
 static int __init cppc_cpufreq_init(void)
 {
+	int ret;
+
 	if ((acpi_disabled) || !acpi_cpc_valid())
 		return -ENODEV;
 
@@ -521,7 +671,11 @@ static int __init cppc_cpufreq_init(void)
 
 	cppc_check_hisi_workaround();
 
-	return cpufreq_register_driver(&cppc_cpufreq_driver);
+	ret = cpufreq_register_driver(&cppc_cpufreq_driver);
+	if (!ret)
+		cppc_freq_invariance_init();
+
+	return ret;
 }
 
 static inline void free_cpu_data(void)
@@ -538,6 +692,7 @@ static inline void free_cpu_data(void)
 
 static void __exit cppc_cpufreq_exit(void)
 {
+	cppc_freq_invariance_exit();
 	cpufreq_unregister_driver(&cppc_cpufreq_driver);
 
 	free_cpu_data();
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 3bcfba5c21a7..47ac4b41c28d 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -37,6 +37,7 @@ bool topology_scale_freq_invariant(void);
 enum scale_freq_source {
 	SCALE_FREQ_SOURCE_CPUFREQ = 0,
 	SCALE_FREQ_SOURCE_ARCH,
+	SCALE_FREQ_SOURCE_CPPC,
 };
 
 struct scale_freq_data {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8c5481077c9c..85d1d23951ba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6190,6 +6190,7 @@ int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
 {
 	return __sched_setscheduler(p, attr, false, true);
 }
+EXPORT_SYMBOL_GPL(sched_setattr_nocheck);
 
 /**
  * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-01-28 10:48 ` [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback Viresh Kumar
@ 2021-02-03 11:45   ` Ionela Voinescu
  2021-02-05  9:14     ` Viresh Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-03 11:45 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

Hi Viresh,

Many thanks for the renaming of functions/variables/enums.

I've cropped all the code that looks good to me, and I kept some
portions of interest.

On Thursday 28 Jan 2021 at 16:18:55 (+0530), Viresh Kumar wrote:
> This patch attempts to make it generic enough so other parts of the
> kernel can also provide their own implementation of scale_freq_tick()
> callback, which is called by the scheduler periodically to update the
> per-cpu freq_scale variable.
[..]
>  static void amu_fie_setup(const struct cpumask *cpus)
>  {
> @@ -227,7 +263,7 @@ static void amu_fie_setup(const struct cpumask *cpus)
>  	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
>  		return;
>  
> -	static_branch_enable(&amu_fie_key);
> +	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);
>  
>  	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",
>  		 cpumask_pr_args(cpus));
> @@ -283,53 +319,6 @@ static int __init init_amu_fie(void)
>  }
>  core_initcall(init_amu_fie);
[..]
> +void topology_set_scale_freq_source(struct scale_freq_data *data,
> +				    const struct cpumask *cpus)
> +{
> +	struct scale_freq_data *sfd;
> +	int cpu;
> +
> +	for_each_cpu(cpu, cpus) {
> +		sfd = per_cpu(sft_data, cpu);
> +
> +		/* Use ARCH provided counters whenever possible */
> +		if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) {
> +			per_cpu(sft_data, cpu) = data;
> +			cpumask_set_cpu(cpu, &scale_freq_counters_mask);
> +		}
> +	}
>  }
> +EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
[..]

I have one single comment left for this patch, and apologies in advance
for the long story.

In general, for frequency invariance, we're interested in whether the full
system is frequency invariant as well, for two reasons:
 - We want to be able to either set a scale factor for all CPUs or none
   at all.
 - If at some point during the boot process the system invariance status
   changes, we want to be able to inform dependents: schedutil and EAS.

This management is currently done on amu_fie_setup(), because before
these patches we only had two sources for frequency invariance: cpufreq
and AMU counters. But that's not enough after these two patches, from
both functional and code design point of view.

I have to mention that your code will work as it is for now, but only
because CPPC is the new other source of counters, and for CPPC we also
have cpufreq invariance available. But this only hides what can become a
problem later: if in the future we won't have cpufreq invariance for
CPPC or if another provider of counters is added and used in a system
without cpufreq invariance, the late initialization of these counters
will either break schedutil/scheduler if not all CPUs support those
counters, or keep EAS disabled, even if all CPUs support these counters,
and frequency invariance is later enabled.

Therefore, I think system level invariance management (checks and
call to rebuild_sched_domains_energy()) also needs to move from arm64
code to arch_topology code.

Thanks,
Ionela.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-03 11:45   ` Ionela Voinescu
@ 2021-02-05  9:14     ` Viresh Kumar
  2021-02-17  0:24       ` Ionela Voinescu
  0 siblings, 1 reply; 19+ messages in thread
From: Viresh Kumar @ 2021-02-05  9:14 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 03-02-21, 11:45, Ionela Voinescu wrote:
> Therefore, I think system level invariance management (checks and
> call to rebuild_sched_domains_energy()) also needs to move from arm64
> code to arch_topology code.

Here is the 3rd patch of this series then :)

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Fri, 5 Feb 2021 13:31:53 +0530
Subject: [PATCH] drivers: arch_topology: rebuild sched domains on invariance
 change

We already do this for the arm64, move it to arch_topology.c as we
manage all sched_freq_tick sources here now.

Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 arch/arm64/kernel/topology.c | 16 ----------------
 drivers/base/arch_topology.c | 22 ++++++++++++++++++++++
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1e47dfd465f8..47fca7376c93 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {
 
 static void amu_fie_setup(const struct cpumask *cpus)
 {
-	bool invariant;
 	int cpu;
 
 	/* We are already set since the last insmod of cpufreq driver */
@@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)
 
 	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
 
-	invariant = topology_scale_freq_invariant();
-
-	/* We aren't fully invariant yet */
-	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
-		return;
-
 	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);
 
 	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",
 		 cpumask_pr_args(cpus));
-
-	/*
-	 * Task scheduler behavior depends on frequency invariance support,
-	 * either cpufreq or counter driven. If the support status changes as
-	 * a result of counter initialisation and use, retrigger the build of
-	 * scheduling domains to ensure the information is propagated properly.
-	 */
-	if (!invariant)
-		rebuild_sched_domains_energy();
 }
 
 static int init_amu_fie_callback(struct notifier_block *nb, unsigned long val,
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 20b511949cd8..3631877f4440 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -23,6 +23,7 @@
 
 static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);
 static struct cpumask scale_freq_counters_mask;
+static bool scale_freq_invariant;
 
 static bool supports_scale_freq_counters(const struct cpumask *cpus)
 {
@@ -35,6 +36,23 @@ bool topology_scale_freq_invariant(void)
 	       supports_scale_freq_counters(cpu_online_mask);
 }
 
+static void update_scale_freq_invariant(bool status)
+{
+	if (scale_freq_invariant == status)
+		return;
+
+	/*
+	 * Task scheduler behavior depends on frequency invariance support,
+	 * either cpufreq or counter driven. If the support status changes as
+	 * a result of counter initialisation and use, retrigger the build of
+	 * scheduling domains to ensure the information is propagated properly.
+	 */
+	if (topology_scale_freq_invariant() == status) {
+		scale_freq_invariant = status;
+		rebuild_sched_domains_energy();
+	}
+}
+
 void topology_set_scale_freq_source(struct scale_freq_data *data,
 				    const struct cpumask *cpus)
 {
@@ -50,6 +68,8 @@ void topology_set_scale_freq_source(struct scale_freq_data *data,
 			cpumask_set_cpu(cpu, &scale_freq_counters_mask);
 		}
 	}
+
+	update_scale_freq_invariant(true);
 }
 EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
 
@@ -67,6 +87,8 @@ void topology_clear_scale_freq_source(enum scale_freq_source source,
 			cpumask_clear_cpu(cpu, &scale_freq_counters_mask);
 		}
 	}
+
+	update_scale_freq_invariant(false);
 }
 EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);
 
-- 
2.25.0.rc1.19.g042ed3e048af

-- 
viresh

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-05  9:14     ` Viresh Kumar
@ 2021-02-17  0:24       ` Ionela Voinescu
  2021-02-17  4:25         ` Viresh Kumar
  2021-02-18  9:33         ` Viresh Kumar
  0 siblings, 2 replies; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-17  0:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

Hey,

On Friday 05 Feb 2021 at 14:44:24 (+0530), Viresh Kumar wrote:
> On 03-02-21, 11:45, Ionela Voinescu wrote:
> > Therefore, I think system level invariance management (checks and
> > call to rebuild_sched_domains_energy()) also needs to move from arm64
> > code to arch_topology code.
> 
> Here is the 3rd patch of this series then :)
> 

I think it could be merged in patch 1/2 as it's part of enabling the use
of multiple sources of information for FIE. Up to you!

> From: Viresh Kumar <viresh.kumar@linaro.org>
> Date: Fri, 5 Feb 2021 13:31:53 +0530
> Subject: [PATCH] drivers: arch_topology: rebuild sched domains on invariance
>  change
> 
> We already do this for the arm64, move it to arch_topology.c as we
> manage all sched_freq_tick sources here now.
> 
> Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  arch/arm64/kernel/topology.c | 16 ----------------
>  drivers/base/arch_topology.c | 22 ++++++++++++++++++++++
>  2 files changed, 22 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1e47dfd465f8..47fca7376c93 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {
>  
>  static void amu_fie_setup(const struct cpumask *cpus)
>  {
> -	bool invariant;
>  	int cpu;
>  
>  	/* We are already set since the last insmod of cpufreq driver */
> @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)
>  
>  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
>  
> -	invariant = topology_scale_freq_invariant();
> -
> -	/* We aren't fully invariant yet */
> -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
> -		return;
> -

You still need these checks, otherwise you could end up with only part
of the CPUs setting a scale factor, when only part of the CPUs support
AMUs and there is no cpufreq support for FIE.

>  	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);
>  
>  	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",
>  		 cpumask_pr_args(cpus));
> -
> -	/*
> -	 * Task scheduler behavior depends on frequency invariance support,
> -	 * either cpufreq or counter driven. If the support status changes as
> -	 * a result of counter initialisation and use, retrigger the build of
> -	 * scheduling domains to ensure the information is propagated properly.
> -	 */
> -	if (!invariant)
> -		rebuild_sched_domains_energy();
>  }
>  
>  static int init_amu_fie_callback(struct notifier_block *nb, unsigned long val,
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 20b511949cd8..3631877f4440 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -23,6 +23,7 @@
>  
>  static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);
>  static struct cpumask scale_freq_counters_mask;
> +static bool scale_freq_invariant;
>  
>  static bool supports_scale_freq_counters(const struct cpumask *cpus)
>  {
> @@ -35,6 +36,23 @@ bool topology_scale_freq_invariant(void)
>  	       supports_scale_freq_counters(cpu_online_mask);
>  }
>  
> +static void update_scale_freq_invariant(bool status)
> +{
> +	if (scale_freq_invariant == status)
> +		return;
> +
> +	/*
> +	 * Task scheduler behavior depends on frequency invariance support,
> +	 * either cpufreq or counter driven. If the support status changes as
> +	 * a result of counter initialisation and use, retrigger the build of
> +	 * scheduling domains to ensure the information is propagated properly.
> +	 */
> +	if (topology_scale_freq_invariant() == status) {
> +		scale_freq_invariant = status;
> +		rebuild_sched_domains_energy();
> +	}
> +}
> +
>  void topology_set_scale_freq_source(struct scale_freq_data *data,
>  				    const struct cpumask *cpus)
>  {
> @@ -50,6 +68,8 @@ void topology_set_scale_freq_source(struct scale_freq_data *data,
>  			cpumask_set_cpu(cpu, &scale_freq_counters_mask);
>  		}
>  	}

Small(ish) optimisation at the beginning of this function:

    if (cpumask_empty(&scale_freq_counters_mask))
        scale_freq_invariant = topology_scale_freq_invariant();

This will save you a call to rebuild_sched_domains_energy(), which is
quite expensive, when cpufreq supports FIE and we also have counters.


After comments addressed,

Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>

..for 1/2 and this addition.


Thanks,
Ionela.

> +
> +	update_scale_freq_invariant(true);
>  }
>  EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
>  
> @@ -67,6 +87,8 @@ void topology_clear_scale_freq_source(enum scale_freq_source source,
>  			cpumask_clear_cpu(cpu, &scale_freq_counters_mask);
>  		}
>  	}
> +
> +	update_scale_freq_invariant(false);
>  }
>  EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);
>  
> -- 
> 2.25.0.rc1.19.g042ed3e048af
> 
> -- 
> viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-17  0:24       ` Ionela Voinescu
@ 2021-02-17  4:25         ` Viresh Kumar
  2021-02-17 11:30           ` Ionela Voinescu
  2021-02-18  9:33         ` Viresh Kumar
  1 sibling, 1 reply; 19+ messages in thread
From: Viresh Kumar @ 2021-02-17  4:25 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 17-02-21, 00:24, Ionela Voinescu wrote:
> I think it could be merged in patch 1/2 as it's part of enabling the use
> of multiple sources of information for FIE. Up to you!

Sure.

> >  static void amu_fie_setup(const struct cpumask *cpus)
> >  {
> > -	bool invariant;
> >  	int cpu;
> >  
> >  	/* We are already set since the last insmod of cpufreq driver */
> > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)
> >  
> >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
> >  
> > -	invariant = topology_scale_freq_invariant();
> > -
> > -	/* We aren't fully invariant yet */
> > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
> > -		return;
> > -
> 
> You still need these checks, otherwise you could end up with only part
> of the CPUs setting a scale factor, when only part of the CPUs support
> AMUs and there is no cpufreq support for FIE.

Both supports_scale_freq_counters() and topology_scale_freq_invariant() take
care of this now and they will keep reporting the system as invariant until the
time all the CPUs have counters (in absence of cpufreq).

The topology_set_scale_freq_source() API is supposed to be called multiple
times, probably once for each policy and so I don't see a need of these checks
anymore.

> Small(ish) optimisation at the beginning of this function:
> 
>     if (cpumask_empty(&scale_freq_counters_mask))
>         scale_freq_invariant = topology_scale_freq_invariant();
> 
> This will save you a call to rebuild_sched_domains_energy(), which is
> quite expensive, when cpufreq supports FIE and we also have counters.

Good Point.
 
> After comments addressed,
> 
> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>

Thanks.

> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>

Just out of curiosity, what exactly did you test and what was the setup ? :)

-- 
viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-17  4:25         ` Viresh Kumar
@ 2021-02-17 11:30           ` Ionela Voinescu
  2021-02-17 11:40             ` Viresh Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-17 11:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

Hi,

Replying this first as it's going to be relevant below:

> Just out of curiosity, what exactly did you test and what was the setup ? :)

I tested it on:

 - Juno R0 (CPUs [0, 3-5] are littles, CPUs [1-2] are bigs)
   + PMUs faking AMUs
   + userspace/schedutil +
   + cpufreq-FIE/!cpufreq-FIE
   + DT

This testing did not yet cover patch 2/2.

My checklist shows:
 - system invariance status correct - passed
 - scale factor correct (userspace cpufreq governor) - passed
 - arch_set_freq_scale bypassed - passed
 - partial "AMUs" support - failed (see below)
 - EAS enabling - passed

I don't have an automated process for this as many test cases involve
kernel source changes. In time I will automate all of these and
possibly cover all scenarios with FVP (fast models) testing, but for
now human error is possible :).

On Wednesday 17 Feb 2021 at 09:55:58 (+0530), Viresh Kumar wrote:
> On 17-02-21, 00:24, Ionela Voinescu wrote:
> > I think it could be merged in patch 1/2 as it's part of enabling the use
> > of multiple sources of information for FIE. Up to you!
> 
> Sure.
> 
> > >  static void amu_fie_setup(const struct cpumask *cpus)
> > >  {
> > > -	bool invariant;
> > >  	int cpu;
> > >  
> > >  	/* We are already set since the last insmod of cpufreq driver */
> > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)
> > >  
> > >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
> > >  
> > > -	invariant = topology_scale_freq_invariant();
> > > -
> > > -	/* We aren't fully invariant yet */
> > > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
> > > -		return;
> > > -
> > 
> > You still need these checks, otherwise you could end up with only part
> > of the CPUs setting a scale factor, when only part of the CPUs support
> > AMUs and there is no cpufreq support for FIE.
> 
> Both supports_scale_freq_counters() and topology_scale_freq_invariant() take
> care of this now and they will keep reporting the system as invariant until the
> time all the CPUs have counters (in absence of cpufreq).
> 

Correct!

> The topology_set_scale_freq_source() API is supposed to be called multiple
> times, probably once for each policy and so I don't see a need of these checks
> anymore.
> 

The problem is not topology_scale_freq_invariant() but whether a scale
factor is set for some CPUs.

Scenario (test system above):
 - "AMUs" are only supported for [1-2],
 - cpufreq_supports_freq_invariance() -> false

What should happen:
 - topology_scale_freq_invariant() -> false (passed)
 - all CPUs should have their freq_scale unmodified (1024) - (failed)
   because only 2 out of 6 CPUs have a method of setting a scale factor

What does happen:
 - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale
   factor for [1-2] based on AMUs. This should not happen. We will end
   up with invariant signals for bigs and signals that are not freq
   invariant for littles.

Ionela.

> > Small(ish) optimisation at the beginning of this function:
> > 
> >     if (cpumask_empty(&scale_freq_counters_mask))
> >         scale_freq_invariant = topology_scale_freq_invariant();
> > 
> > This will save you a call to rebuild_sched_domains_energy(), which is
> > quite expensive, when cpufreq supports FIE and we also have counters.
> 
> Good Point.
>  
> > After comments addressed,
> > 
> > Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
> 
> Thanks.
> 
> > Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> 
> 

> -- 
> viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-17 11:30           ` Ionela Voinescu
@ 2021-02-17 11:40             ` Viresh Kumar
  2021-02-17 11:57               ` Ionela Voinescu
  0 siblings, 1 reply; 19+ messages in thread
From: Viresh Kumar @ 2021-02-17 11:40 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 17-02-21, 11:30, Ionela Voinescu wrote:
> The problem is not topology_scale_freq_invariant() but whether a scale
> factor is set for some CPUs.
> 
> Scenario (test system above):
>  - "AMUs" are only supported for [1-2],
>  - cpufreq_supports_freq_invariance() -> false
> 
> What should happen:
>  - topology_scale_freq_invariant() -> false (passed)
>  - all CPUs should have their freq_scale unmodified (1024) - (failed)
>    because only 2 out of 6 CPUs have a method of setting a scale factor
> 
> What does happen:
>  - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale
>    factor for [1-2] based on AMUs. This should not happen. We will end
>    up with invariant signals for bigs and signals that are not freq
>    invariant for littles.

Another case. cpufreq is included as a module and AMU is implemented
partially.

- first time cpufreq driver is inserted, we set up everything and
  freq_scale gets updated on ticks.

- remove cpufreq driver, we are back in same situation.

We can't control it that way.. Or we add another call layer in middle
before the tick-handler gets called for AMU, which will check if we
are fully invariant or not ?

-- 
viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-17 11:40             ` Viresh Kumar
@ 2021-02-17 11:57               ` Ionela Voinescu
  2021-02-18  7:23                 ` Viresh Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-17 11:57 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On Wednesday 17 Feb 2021 at 17:10:27 (+0530), Viresh Kumar wrote:
> On 17-02-21, 11:30, Ionela Voinescu wrote:
> > The problem is not topology_scale_freq_invariant() but whether a scale
> > factor is set for some CPUs.
> > 
> > Scenario (test system above):
> >  - "AMUs" are only supported for [1-2],
> >  - cpufreq_supports_freq_invariance() -> false
> > 
> > What should happen:
> >  - topology_scale_freq_invariant() -> false (passed)
> >  - all CPUs should have their freq_scale unmodified (1024) - (failed)
> >    because only 2 out of 6 CPUs have a method of setting a scale factor
> > 
> > What does happen:
> >  - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale
> >    factor for [1-2] based on AMUs. This should not happen. We will end
> >    up with invariant signals for bigs and signals that are not freq
> >    invariant for littles.
> 
> Another case. cpufreq is included as a module and AMU is implemented
> partially.
> 
> - first time cpufreq driver is inserted, we set up everything and
>   freq_scale gets updated on ticks.
> 
> - remove cpufreq driver, we are back in same situation.
> 

Yes, but the littles (lacking AMUs) would have had a scale factor set
through arch_set_freq_scale() which will correspond to the last
frequency change through the cpufreq driver. When removing the driver,
it's unlikely that the frequency of littles will change (no driver).
 - topology_scale_freq_invariant() will still return true.
 - littles would still  have a scale factor set which is likely accurate
 - bigs will continue updating the scale factor through AMUs.

See a very useful comment someone added recently :) :

"""
+	/*
+	 * We don't need to handle CPUFREQ_REMOVE_POLICY event as the AMU
+	 * counters don't have any dependency on cpufreq driver once we have
+	 * initialized AMU support and enabled invariance. The AMU counters will
+	 * keep on working just fine in the absence of the cpufreq driver, and
+	 * for the CPUs for which there are no counters available, the last set
+	 * value of freq_scale will remain valid as that is the frequency those
+	 * CPUs are running at.
+	 */
"""

> We can't control it that way.. Or we add another call layer in middle
> before the tick-handler gets called for AMU, which will check if we
> are fully invariant or not ?
> 

I would avoid additional work done on the tick, especially for a scenario which
is unlikely. If you think this case is worth supporting, it might be best to do
it at CPUFREQ_REMOVE_POLICY event.

Thanks,
Ionela.

> -- 
> viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-17 11:57               ` Ionela Voinescu
@ 2021-02-18  7:23                 ` Viresh Kumar
  0 siblings, 0 replies; 19+ messages in thread
From: Viresh Kumar @ 2021-02-18  7:23 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 17-02-21, 11:57, Ionela Voinescu wrote:
> See a very useful comment someone added recently :) :
> 
> """
> +	/*
> +	 * We don't need to handle CPUFREQ_REMOVE_POLICY event as the AMU
> +	 * counters don't have any dependency on cpufreq driver once we have
> +	 * initialized AMU support and enabled invariance. The AMU counters will
> +	 * keep on working just fine in the absence of the cpufreq driver, and
> +	 * for the CPUs for which there are no counters available, the last set
> +	 * value of freq_scale will remain valid as that is the frequency those
> +	 * CPUs are running at.
> +	 */
> """

Lol... 

-- 
viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-17  0:24       ` Ionela Voinescu
  2021-02-17  4:25         ` Viresh Kumar
@ 2021-02-18  9:33         ` Viresh Kumar
  2021-02-18 16:36           ` Ionela Voinescu
  1 sibling, 1 reply; 19+ messages in thread
From: Viresh Kumar @ 2021-02-18  9:33 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 17-02-21, 00:24, Ionela Voinescu wrote:
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 1e47dfd465f8..47fca7376c93 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {
> >  
> >  static void amu_fie_setup(const struct cpumask *cpus)
> >  {
> > -	bool invariant;
> >  	int cpu;
> >  
> >  	/* We are already set since the last insmod of cpufreq driver */
> > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)
> >  
> >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
> >  
> > -	invariant = topology_scale_freq_invariant();
> > -
> > -	/* We aren't fully invariant yet */
> > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
> > -		return;
> > -
> 
> You still need these checks, otherwise you could end up with only part
> of the CPUs setting a scale factor, when only part of the CPUs support
> AMUs and there is no cpufreq support for FIE.

Another look at it and here goes another reason (hope I don't have
another in-code comment somewhere else to kill this one) :)

We don't need to care for the reason you gave (which is a valid reason
otherwise), as we are talking specifically about amu_fie_setup() here
and it gets called from cpufreq policy-notifier. i.e. we won't support
AMUs without cpufreq being there in the first place and the same goes
for cppc-driver.

Does that sound reasonable ?

-- 
viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance
  2021-01-28 10:48 ` [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
@ 2021-02-18 16:35   ` Ionela Voinescu
  2021-02-22 11:00     ` Ionela Voinescu
  0 siblings, 1 reply; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-18 16:35 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

Hi,

On Thursday 28 Jan 2021 at 16:18:56 (+0530), Viresh Kumar wrote:
> The Frequency Invariance Engine (FIE) is providing a frequency scaling
> correction factor that helps achieve more accurate load-tracking.
> 
> Normally, this scaling factor can be obtained directly with the help of
> the cpufreq drivers as they know the exact frequency the hardware is
> running at. But that isn't the case for CPPC cpufreq driver.
> 
> Another way of obtaining that is using the arch specific counter
> support, which is already present in kernel, but that hardware is
> optional for platforms.
> 
> This patch thus obtains this scaling factor using the existing logic
> present in the cppc driver. Note that the arch specific counters have
> higher priority than CPPC counters if available, though the CPPC driver
> doesn't need to have any special handling for that.
> 
> This also exports sched_setattr_nocheck() as the CPPC driver can be
> built as a module.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cppc_cpufreq.c | 179 ++++++++++++++++++++++++++++++---
>  include/linux/arch_topology.h  |   1 +
>  kernel/sched/core.c            |   1 +
>  3 files changed, 169 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 8a482c434ea6..53815f6d2797 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -10,14 +10,18 @@
>  
>  #define pr_fmt(fmt)	"CPPC Cpufreq:"	fmt
>  
> +#include <linux/arch_topology.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/delay.h>
>  #include <linux/cpu.h>
>  #include <linux/cpufreq.h>
>  #include <linux/dmi.h>
> +#include <linux/irq_work.h>
> +#include <linux/kthread.h>
>  #include <linux/time.h>
>  #include <linux/vmalloc.h>
> +#include <uapi/linux/sched/types.h>
>  
>  #include <asm/unaligned.h>
>  
> @@ -39,6 +43,17 @@ static LIST_HEAD(cpu_data_list);
>  
>  static bool boost_supported;
>  
> +struct cppc_freq_invariance {
> +	struct kthread_worker *worker;
> +	struct irq_work irq_work;
> +	struct kthread_work work;
> +	struct cppc_perf_fb_ctrs prev_perf_fb_ctrs;
> +	struct cppc_cpudata *cpu_data;
> +	unsigned int max_freq;

You don't really need to store max_freq, actually, as you can (and you
do) use highest_perf directly in the cppc_scale_freq_workfn().

I think in an initial comment I mentioned you should store either
highest_perf or nominal_perf, depending on whether boost is
supported/enabled, but using highest_perf in all cases might be best.

The only difference is when boost is supported but not enabled, which
means the OS's requests will be capped to nominal_perf, but FW/HW
could still give as much as highest_perf. Even if FW/HW never does
exceed nominal_perf, there are mechanisms in the scheduler (after
Vincent's PELT rewrite) that ensure utilization can still reach 1024.

So if you always use highest_perf in cppc_scale_freq_workfn(), you
should not need to store max_freq.

> +};
> +
> +static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_f_i);
> +

nit: I would like s/cppc_f_i/cppc_freq_inv better

>  struct cppc_workaround_oem_info {
>  	char oem_id[ACPI_OEM_ID_SIZE + 1];
>  	char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1];
> @@ -292,7 +307,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  	unsigned int cpu = policy->cpu;
>  	struct cppc_cpudata *cpu_data;
>  	struct cppc_perf_caps *caps;
> -	int ret;
> +	int ret, i;
>  
>  	cpu_data = cppc_cpufreq_get_cpu_data(cpu);
>  	if (!cpu_data) {
> @@ -343,6 +358,11 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  		return -EFAULT;
>  	}
>  
> +	for_each_cpu(i, policy->cpus) {
> +		per_cpu(cppc_f_i, i).max_freq = policy->cpuinfo.max_freq;
> +		per_cpu(cppc_f_i, i).cpu_data = cpu_data;
> +	}
> +
>  	/*
>  	 * If 'highest_perf' is greater than 'nominal_perf', we assume CPU Boost
>  	 * is supported.
> @@ -370,12 +390,12 @@ static inline u64 get_delta(u64 t1, u64 t0)
>  	return (u32)t1 - (u32)t0;
>  }
>  
> -static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
> -				     struct cppc_perf_fb_ctrs fb_ctrs_t0,
> -				     struct cppc_perf_fb_ctrs fb_ctrs_t1)
> +static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
> +				 struct cppc_perf_fb_ctrs fb_ctrs_t0,
> +				 struct cppc_perf_fb_ctrs fb_ctrs_t1)
>  {
>  	u64 delta_reference, delta_delivered;
> -	u64 reference_perf, delivered_perf;
> +	u64 reference_perf;
>  
>  	reference_perf = fb_ctrs_t0.reference_perf;
>  
> @@ -385,11 +405,20 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
>  				    fb_ctrs_t0.delivered);
>  
>  	/* Check to avoid divide-by zero */
> -	if (delta_reference || delta_delivered)

This is broken actually, it should have been:

if (delta_reference && delta_delivered)

.. both of them need to be !0 to avoid divide-by 0 and for the
delivered_perf to make sens (!0).

> -		delivered_perf = (reference_perf * delta_delivered) /
> -					delta_reference;
> -	else
> -		delivered_perf = cpu_data->perf_ctrls.desired_perf;
> +	if (!delta_reference && !delta_delivered)
> +		return cpu_data->perf_ctrls.desired_perf;
> +

.. which here turns into:

if (!delta_reference || !delta_delivered)

.. if any is 0, we return desired_perf.

> +	return (reference_perf * delta_delivered) / delta_reference;
> +}
> +
> +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
> +				     struct cppc_perf_fb_ctrs fb_ctrs_t0,
> +				     struct cppc_perf_fb_ctrs fb_ctrs_t1)
> +{
> +	u64 delivered_perf;
> +
> +	delivered_perf = cppc_perf_from_fbctrs(cpu_data, fb_ctrs_t0,
> +					       fb_ctrs_t1);
>  
>  	return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
>  }
> @@ -420,7 +449,7 @@ static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
>  {
>  	struct cppc_cpudata *cpu_data = policy->driver_data;
>  	struct cppc_perf_caps *caps = &cpu_data->perf_caps;
> -	int ret;
> +	int ret, i;
>  
>  	if (!boost_supported) {
>  		pr_err("BOOST not supported by CPU or firmware\n");
> @@ -435,6 +464,9 @@ static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
>  						       caps->nominal_perf);
>  	policy->cpuinfo.max_freq = policy->max;
>  
> +	for_each_cpu(i, policy->related_cpus)
> +		per_cpu(cppc_f_i, i).max_freq = policy->cpuinfo.max_freq;
> +
>  	ret = freq_qos_update_request(policy->max_freq_req, policy->max);
>  	if (ret < 0)
>  		return ret;
> @@ -512,8 +544,126 @@ static void cppc_check_hisi_workaround(void)
>  	acpi_put_table(tbl);
>  }
>  
> +static void cppc_scale_freq_workfn(struct kthread_work *work)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +	struct cppc_perf_fb_ctrs fb_ctrs = {0};
> +	int cpu = raw_smp_processor_id();
> +	struct cppc_cpudata *cpu_data;
> +	u64 perf;
> +
> +	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
> +	cpu_data = cppc_fi->cpu_data;
> +
> +	if (cppc_get_perf_ctrs(cpu, &fb_ctrs)) {
> +		pr_info("%s: cppc_get_perf_ctrs() failed\n", __func__);
> +		return;
> +	}
> +
> +	cppc_fi->prev_perf_fb_ctrs = fb_ctrs;
> +	perf = cppc_perf_from_fbctrs(cpu_data, cppc_fi->prev_perf_fb_ctrs,
> +				     fb_ctrs);
> +
> +	perf <<= SCHED_CAPACITY_SHIFT;
> +	per_cpu(freq_scale, cpu) = div64_u64(perf, cpu_data->perf_caps.highest_perf);

s/per_cpu(freq_scale, cpu)=/this_cpu_write

Let's also cap this to 1024, to be on the safe side.

> +}
> +
> +static void cppc_irq_work(struct irq_work *irq_work)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +
> +	cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work);
> +	kthread_queue_work(cppc_fi->worker, &cppc_fi->work);
> +}
> +
> +static void cppc_scale_freq_tick(void)
> +{
> +	struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_f_i, raw_smp_processor_id());
> +

You can use smp_processor_id() here as this is called from interrupt
context.

> +	/*
> +	 * cppc_get_perf_ctrs() can potentially sleep, call that from the right
> +	 * context.
> +	 */
> +	irq_work_queue(&cppc_fi->irq_work);
> +}
> +
> +static struct scale_freq_data cppc_sftd = {
> +	.source = SCALE_FREQ_SOURCE_CPPC,
> +	.set_freq_scale = cppc_scale_freq_tick,
> +};
> +
> +static void cppc_freq_invariance_exit(void)
> +{
> +	struct cppc_freq_invariance *cppc_fi;
> +	int i;
> +
> +	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> +		return;
> +
> +	topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, cpu_present_mask);
> +
> +	for_each_possible_cpu(i) {
> +		cppc_fi = &per_cpu(cppc_f_i, i);
> +		if (cppc_fi->worker) {
> +			irq_work_sync(&cppc_fi->irq_work);
> +			kthread_destroy_worker(cppc_fi->worker);
> +			cppc_fi->worker = NULL;
> +		}
> +	}
> +}
> +
> +static void __init cppc_freq_invariance_init(void)
> +{
> +	struct cppc_perf_fb_ctrs fb_ctrs = {0};
> +	struct cppc_freq_invariance *cppc_fi;
> +	struct sched_attr attr = {
> +		.size		= sizeof(struct sched_attr),
> +		.sched_policy	= SCHED_DEADLINE,
> +		.sched_nice	= 0,
> +		.sched_priority	= 0,
> +		/*
> +		 * Fake (unused) bandwidth; workaround to "fix"
> +		 * priority inheritance.
> +		 */
> +		.sched_runtime	= 1000000,
> +		.sched_deadline = 10000000,
> +		.sched_period	= 10000000,
> +	};
> +	struct kthread_worker *worker;
> +	int i, ret;
> +
> +	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> +		return;
> +
> +	for_each_possible_cpu(i) {
> +		cppc_fi = &per_cpu(cppc_f_i, i);
> +
> +		kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> +		init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> +		worker = kthread_create_worker_on_cpu(i, 0, "cppc:%d", i);

I keep going back on forth in my mind on whether we need to create the
worker to run the work on each CPU. Because reading CPPC counters should
be possible from any CPU, for any CPU. So as long as we take the CPU ID
on the tick and pass it to the worker, that work does not have to
specifically run on that CPU. Eliminating this restriction could speed up
things.

What do you think?

> +		if (IS_ERR(worker))
> +			return cppc_freq_invariance_exit();
> +
> +		cppc_fi->worker = worker;
> +		ret = sched_setattr_nocheck(worker->task, &attr);
> +		if (ret) {
> +			pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
> +			return cppc_freq_invariance_exit();
> +		}
> +
> +		ret = cppc_get_perf_ctrs(i, &fb_ctrs);
> +		if (!ret)
> +			per_cpu(cppc_fi->prev_perf_fb_ctrs, i) = fb_ctrs;

Why do you ignore the error here? If counters are not supported (reading
the registers returns 0, for example), I think it might be best to
cppc_freq_invariance_exit(). If they read 0 and we ignore this, we might
end up doing all that work on the tick without any benefit, as we'd
always use desired_perf instead of a current performance value obtained
from counters.

> +	}
> +
> +	/* Register for freq-invariance */
> +	topology_set_scale_freq_source(&cppc_sftd, cpu_present_mask);
> +}
> +
>  static int __init cppc_cpufreq_init(void)
>  {
> +	int ret;
> +
>  	if ((acpi_disabled) || !acpi_cpc_valid())
>  		return -ENODEV;
>  
> @@ -521,7 +671,11 @@ static int __init cppc_cpufreq_init(void)
>  
>  	cppc_check_hisi_workaround();
>  
> -	return cpufreq_register_driver(&cppc_cpufreq_driver);
> +	ret = cpufreq_register_driver(&cppc_cpufreq_driver);
> +	if (!ret)
> +		cppc_freq_invariance_init();

Should there be a way to disable this? I'm only thinking about the fact
that reading some of these counters could be very expensive, and doing
it for a large number of CPUs, every few ms on the tick, might be a cost
that some platforms might now want to pay for the benefit of having
somewhat more accurate FIE support.

Thanks,
Ionela.

> +
> +	return ret;
>  }
>  
>  static inline void free_cpu_data(void)
> @@ -538,6 +692,7 @@ static inline void free_cpu_data(void)
>  
>  static void __exit cppc_cpufreq_exit(void)
>  {
> +	cppc_freq_invariance_exit();
>  	cpufreq_unregister_driver(&cppc_cpufreq_driver);
>  
>  	free_cpu_data();
> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
> index 3bcfba5c21a7..47ac4b41c28d 100644
> --- a/include/linux/arch_topology.h
> +++ b/include/linux/arch_topology.h
> @@ -37,6 +37,7 @@ bool topology_scale_freq_invariant(void);
>  enum scale_freq_source {
>  	SCALE_FREQ_SOURCE_CPUFREQ = 0,
>  	SCALE_FREQ_SOURCE_ARCH,
> +	SCALE_FREQ_SOURCE_CPPC,
>  };
>  
>  struct scale_freq_data {
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8c5481077c9c..85d1d23951ba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6190,6 +6190,7 @@ int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
>  {
>  	return __sched_setscheduler(p, attr, false, true);
>  }
> +EXPORT_SYMBOL_GPL(sched_setattr_nocheck);
>  
>  /**
>   * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
> -- 
> 2.25.0.rc1.19.g042ed3e048af
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-18  9:33         ` Viresh Kumar
@ 2021-02-18 16:36           ` Ionela Voinescu
  2021-02-19  4:58             ` Viresh Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-18 16:36 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

Hey,

On Thursday 18 Feb 2021 at 15:03:04 (+0530), Viresh Kumar wrote:
> On 17-02-21, 00:24, Ionela Voinescu wrote:
> > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > > index 1e47dfd465f8..47fca7376c93 100644
> > > --- a/arch/arm64/kernel/topology.c
> > > +++ b/arch/arm64/kernel/topology.c
> > > @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {
> > >  
> > >  static void amu_fie_setup(const struct cpumask *cpus)
> > >  {
> > > -	bool invariant;
> > >  	int cpu;
> > >  
> > >  	/* We are already set since the last insmod of cpufreq driver */
> > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)
> > >  
> > >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
> > >  
> > > -	invariant = topology_scale_freq_invariant();
> > > -
> > > -	/* We aren't fully invariant yet */
> > > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
> > > -		return;
> > > -
> > 
> > You still need these checks, otherwise you could end up with only part
> > of the CPUs setting a scale factor, when only part of the CPUs support
> > AMUs and there is no cpufreq support for FIE.
> 
> Another look at it and here goes another reason (hope I don't have
> another in-code comment somewhere else to kill this one) :)
> 
> We don't need to care for the reason you gave (which is a valid reason
> otherwise), as we are talking specifically about amu_fie_setup() here
> and it gets called from cpufreq policy-notifier. i.e. we won't support
> AMUs without cpufreq being there in the first place and the same goes
> for cppc-driver.
> 
> Does that sound reasonable ?
> 

Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't
get initialised either. But we do care if there is a cpufreq driver that
does not support frequency invariance, which is the example above.

The intention with the patches that made cpufreq based invariance generic
a while back was for it to be present, seamlessly, for as many drivers as
possible, as a less than accurate invariance default method is still
better than nothing. So only a few drivers today don't support cpufreq
based FI, but it's not a guarantee that it will stay this way.

Hope it makes sense,
Ionela.

> -- 
> viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-18 16:36           ` Ionela Voinescu
@ 2021-02-19  4:58             ` Viresh Kumar
  2021-02-19  9:44               ` Ionela Voinescu
  0 siblings, 1 reply; 19+ messages in thread
From: Viresh Kumar @ 2021-02-19  4:58 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 18-02-21, 16:36, Ionela Voinescu wrote:
> Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't
> get initialised either. But we do care if there is a cpufreq driver that
> does not support frequency invariance, which is the example above.
> 
> The intention with the patches that made cpufreq based invariance generic
> a while back was for it to be present, seamlessly, for as many drivers as
> possible, as a less than accurate invariance default method is still
> better than nothing.

Right.

> So only a few drivers today don't support cpufreq based FI

Only two AFAICT, both x86, and the AMU stuff doesn't conflict with
them.

drivers/cpufreq/intel_pstate.c
drivers/cpufreq/longrun.c

> but it's not a guarantee that it will stay this way.

What do you mean by "no guarantee" here ?

The very core routines (cpufreq_freq_transition_end() and
cpufreq_driver_fast_switch()) of the cpufreq core call
arch_set_freq_scale() today and this isn't going to change anytime
soon. If something gets changed there someone will need to see other
parts of the kernel which may get broken with that.

I don't see any need of complicating other parts of the kernel like,
amu or cppc code for that. They should be kept simple and they should
assume cpufreq invariance will be supported as it is today.

-- 
viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-19  4:58             ` Viresh Kumar
@ 2021-02-19  9:44               ` Ionela Voinescu
  2021-02-19  9:48                 ` Viresh Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-19  9:44 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On Friday 19 Feb 2021 at 10:28:23 (+0530), Viresh Kumar wrote:
> On 18-02-21, 16:36, Ionela Voinescu wrote:
> > Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't
> > get initialised either. But we do care if there is a cpufreq driver that
> > does not support frequency invariance, which is the example above.
> > 
> > The intention with the patches that made cpufreq based invariance generic
> > a while back was for it to be present, seamlessly, for as many drivers as
> > possible, as a less than accurate invariance default method is still
> > better than nothing.
> 
> Right.
> 
> > So only a few drivers today don't support cpufreq based FI
> 
> Only two AFAICT, both x86, and the AMU stuff doesn't conflict with
> them.
> 
> drivers/cpufreq/intel_pstate.c
> drivers/cpufreq/longrun.c
> 
> > but it's not a guarantee that it will stay this way.
> 
> What do you mean by "no guarantee" here ?
> 
> The very core routines (cpufreq_freq_transition_end() and
> cpufreq_driver_fast_switch()) of the cpufreq core call
> arch_set_freq_scale() today and this isn't going to change anytime
> soon. If something gets changed there someone will need to see other
> parts of the kernel which may get broken with that.
> 

Yes, but it won't really be straightforward to notice this breakage if
that happens, so in my opinion it was worth to keep that condition.

> I don't see any need of complicating other parts of the kernel like,
> amu or cppc code for that. They should be kept simple and they should
> assume cpufreq invariance will be supported as it is today.
> 

Fair enough! It is a corner case after all.

Thanks,
Ionela.

> -- 
> viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback
  2021-02-19  9:44               ` Ionela Voinescu
@ 2021-02-19  9:48                 ` Viresh Kumar
  0 siblings, 0 replies; 19+ messages in thread
From: Viresh Kumar @ 2021-02-19  9:48 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 19-02-21, 09:44, Ionela Voinescu wrote:
> On Friday 19 Feb 2021 at 10:28:23 (+0530), Viresh Kumar wrote:
> > The very core routines (cpufreq_freq_transition_end() and
> > cpufreq_driver_fast_switch()) of the cpufreq core call
> > arch_set_freq_scale() today and this isn't going to change anytime
> > soon. If something gets changed there someone will need to see other
> > parts of the kernel which may get broken with that.
> > 
> 
> Yes, but it won't really be straightforward to notice this breakage if
> that happens, so in my opinion it was worth to keep that condition.

Right, but chances of that happening are close to zero right now. I
don't see any changes being made there in near future and so as we
agreed, lets leave it as is.

Btw, thanks for your feedback, it was indeed very valuable.

-- 
viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance
  2021-02-18 16:35   ` Ionela Voinescu
@ 2021-02-22 11:00     ` Ionela Voinescu
  2021-02-22 11:04       ` Viresh Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Ionela Voinescu @ 2021-02-22 11:00 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

Hey,

Some test results:

On Thursday 18 Feb 2021 at 16:35:38 (+0000), Ionela Voinescu wrote:
[..]
> > +static void __init cppc_freq_invariance_init(void)
> > +{
[..]
> > +
> > +		ret = cppc_get_perf_ctrs(i, &fb_ctrs);
> > +		if (!ret)
> > +			per_cpu(cppc_fi->prev_perf_fb_ctrs, i) = fb_ctrs;
> 

After fixing this one:
			cppc_fi->prev_perf_fb_ctrs = fb_ctrs;

I got the following:

Platform:

 - Juno R2 (CPUs [0-3] are littles, CPUs [4-5] are bigs)
    + PMU counters, used by CPPC through FFH
    + userspace/schedutil


  - Verifying that with userspace governor we see a correct change in
    scale factor:

	root@buildroot:~# dmesg | grep FIE
	[    6.436770] AMU: CPUs[0-3]: AMU counters WON'T be used for FIE.
	[    6.436962] AMU: CPUs[4-5]: AMU counters WON'T be used for FIE.
	[    6.451510] CPPC:CPUs[0-5]: CPPC counters will be used for FIE.

	root@buildroot:~# echo 600000 > policy4/scaling_setspeed
	[  353.939495] CPU4: Invariance(cppc) scale: 512.
	[  353.939497] CPU5: Invariance(cppc) scale: 512.

	root@buildroot:~# echo 1200000 > policy4/scaling_setspeed
	[  372.683511] CPU5: Invariance(cppc) scale: 1024.
	[  372.683518] CPU4: Invariance(cppc) scale: 1024.

	root@buildroot:~# echo 450000 > policy0/scaling_setspeed
	[  641.495513] CPU2: Invariance(cppc) scale: 485.
	[  641.495514] CPU1: Invariance(cppc) scale: 485.
	[  641.495517] CPU0: Invariance(cppc) scale: 485.
	[  641.495542] CPU3: Invariance(cppc) scale: 485.

	root@buildroot:~# echo 950000 > policy0/scaling_setspeed
	[  852.015514] CPU2: Invariance(cppc) scale: 1024.
	[  852.015514] CPU1: Invariance(cppc) scale: 1024.
	[  852.015517] CPU0: Invariance(cppc) scale: 1024.
	[  852.015541] CPU3: Invariance(cppc) scale: 1024.

 - I ran some benchmarks as well (perf, hackbench, dhrystone) on the same
   platform, using the userspace governor at fixed frequency, to evaluate
   the impact of the work we do or don't do on the tick.

   ./perf bench sched pipe
   (10 iterations, higher is better, ops/s, comparisons with
   cpufreq-based FIE)

   cpufreq-based FIE    AMU-based FIE    CPPC-based FIE
   ----------------------------------------------------
   39498.8		40984.7		 38893.4
   std: 3.766%		std: 4.461%	 std: 0.575%
   			diff: 3.625%	 diff: -1.556%

   ./hackbench -l 1000
   (10 iterations, lower is better, seconds, comparison with
   cpufreq-based FIE)

   cpufreq-based FIE    AMU-based FIE    CPPC-based FIE
   ----------------------------------------------------
   6.4207		6.3386		 6.7841
   std: 7.298%		std: 2.252%	 std: 2.460%
   			diff: -1.295%	 diff: 5.356%

   This shows a small regression for the CPPC-based FIE, but within the
   standard deviation.

   I ran some dhrystone benchmarks (./dhrystone -t 2/34/5/6/ -l 5000) as
   well with schedutil governor to understand if an increase in accuracy
   with the AMU/CPPC counters makes a difference. Given the
   characteristics of the platform it's no surprise that the results
   were very similar between the three cases, so I won't bore you with
   the numbers.

Hope it helps,
Ionela.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance
  2021-02-22 11:00     ` Ionela Voinescu
@ 2021-02-22 11:04       ` Viresh Kumar
  0 siblings, 0 replies; 19+ messages in thread
From: Viresh Kumar @ 2021-02-22 11:04 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Rafael Wysocki, Catalin Marinas, Will Deacon, Vincent Guittot,
	linux-kernel, linux-arm-kernel, linux-pm, Sudeep Holla,
	Greg Kroah-Hartman

On 22-02-21, 11:00, Ionela Voinescu wrote:
> Hey,
> 
> Some test results:

Nice, I haven't responded earlier as Vincent was also testing the
stuff out later last week and was planning to do it more this week.

> On Thursday 18 Feb 2021 at 16:35:38 (+0000), Ionela Voinescu wrote:
> [..]
> > > +static void __init cppc_freq_invariance_init(void)
> > > +{
> [..]
> > > +
> > > +		ret = cppc_get_perf_ctrs(i, &fb_ctrs);
> > > +		if (!ret)
> > > +			per_cpu(cppc_fi->prev_perf_fb_ctrs, i) = fb_ctrs;
> > 
> 
> After fixing this one:
> 			cppc_fi->prev_perf_fb_ctrs = fb_ctrs;

Yeah, I already fixed it and made several changes based on your
feedback.

> I got the following:
> 
> Platform:
> 
>  - Juno R2 (CPUs [0-3] are littles, CPUs [4-5] are bigs)
>     + PMU counters, used by CPPC through FFH
>     + userspace/schedutil
> 
> 
>   - Verifying that with userspace governor we see a correct change in
>     scale factor:
> 
> 	root@buildroot:~# dmesg | grep FIE
> 	[    6.436770] AMU: CPUs[0-3]: AMU counters WON'T be used for FIE.
> 	[    6.436962] AMU: CPUs[4-5]: AMU counters WON'T be used for FIE.
> 	[    6.451510] CPPC:CPUs[0-5]: CPPC counters will be used for FIE.
> 
> 	root@buildroot:~# echo 600000 > policy4/scaling_setspeed
> 	[  353.939495] CPU4: Invariance(cppc) scale: 512.
> 	[  353.939497] CPU5: Invariance(cppc) scale: 512.
> 
> 	root@buildroot:~# echo 1200000 > policy4/scaling_setspeed
> 	[  372.683511] CPU5: Invariance(cppc) scale: 1024.
> 	[  372.683518] CPU4: Invariance(cppc) scale: 1024.
> 
> 	root@buildroot:~# echo 450000 > policy0/scaling_setspeed
> 	[  641.495513] CPU2: Invariance(cppc) scale: 485.
> 	[  641.495514] CPU1: Invariance(cppc) scale: 485.
> 	[  641.495517] CPU0: Invariance(cppc) scale: 485.
> 	[  641.495542] CPU3: Invariance(cppc) scale: 485.
> 
> 	root@buildroot:~# echo 950000 > policy0/scaling_setspeed
> 	[  852.015514] CPU2: Invariance(cppc) scale: 1024.
> 	[  852.015514] CPU1: Invariance(cppc) scale: 1024.
> 	[  852.015517] CPU0: Invariance(cppc) scale: 1024.
> 	[  852.015541] CPU3: Invariance(cppc) scale: 1024.

Great.

>  - I ran some benchmarks as well (perf, hackbench, dhrystone) on the same
>    platform, using the userspace governor at fixed frequency, to evaluate
>    the impact of the work we do or don't do on the tick.
> 
>    ./perf bench sched pipe
>    (10 iterations, higher is better, ops/s, comparisons with
>    cpufreq-based FIE)
> 
>    cpufreq-based FIE    AMU-based FIE    CPPC-based FIE
>    ----------------------------------------------------
>    39498.8		40984.7		 38893.4
>    std: 3.766%		std: 4.461%	 std: 0.575%
>    			diff: 3.625%	 diff: -1.556%
> 
>    ./hackbench -l 1000
>    (10 iterations, lower is better, seconds, comparison with
>    cpufreq-based FIE)
> 
>    cpufreq-based FIE    AMU-based FIE    CPPC-based FIE
>    ----------------------------------------------------
>    6.4207		6.3386		 6.7841
>    std: 7.298%		std: 2.252%	 std: 2.460%
>    			diff: -1.295%	 diff: 5.356%
> 
>    This shows a small regression for the CPPC-based FIE, but within the
>    standard deviation.
> 
>    I ran some dhrystone benchmarks (./dhrystone -t 2/34/5/6/ -l 5000) as
>    well with schedutil governor to understand if an increase in accuracy
>    with the AMU/CPPC counters makes a difference. Given the
>    characteristics of the platform it's no surprise that the results
>    were very similar between the three cases, so I won't bore you with
>    the numbers.

Nice, I have much more confidence on this stuff now :)

Thanks a lot Ionela, I will resend the series again today then.

-- 
viresh

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-02-22 11:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28 10:48 [PATCH V3 0/2] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
2021-01-28 10:48 ` [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback Viresh Kumar
2021-02-03 11:45   ` Ionela Voinescu
2021-02-05  9:14     ` Viresh Kumar
2021-02-17  0:24       ` Ionela Voinescu
2021-02-17  4:25         ` Viresh Kumar
2021-02-17 11:30           ` Ionela Voinescu
2021-02-17 11:40             ` Viresh Kumar
2021-02-17 11:57               ` Ionela Voinescu
2021-02-18  7:23                 ` Viresh Kumar
2021-02-18  9:33         ` Viresh Kumar
2021-02-18 16:36           ` Ionela Voinescu
2021-02-19  4:58             ` Viresh Kumar
2021-02-19  9:44               ` Ionela Voinescu
2021-02-19  9:48                 ` Viresh Kumar
2021-01-28 10:48 ` [PATCH V3 2/2] cpufreq: cppc: Add support for frequency invariance Viresh Kumar
2021-02-18 16:35   ` Ionela Voinescu
2021-02-22 11:00     ` Ionela Voinescu
2021-02-22 11:04       ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).