linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] arm64: Use AMU counters for measuring CPU frequency
@ 2024-02-29 16:25 Vanshidhar Konda
  2024-02-29 16:25 ` [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support Vanshidhar Konda
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Vanshidhar Konda @ 2024-02-29 16:25 UTC (permalink / raw)
  To: Huisong Li, Beata Michalska
  Cc: Vanshidhar Konda, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

AMU extension was added to Armv8.4 as an optional extension. The
extension provides architectural counters that can be used to measure
CPU frequency - CPU_CYCLES and CNT_CYCLES.

In the kernel FIE uses these counters to compute frequency scale on
every tick. The counters are also be used in the CPPC driver if the
firmware publishes support for registered & delivered registers using
ACPI FFH.

In the current implementation using these counters in the CPPC driver
results in inaccurate measurement in some cases. This has been discussed
in [1] and [2].

In the current implementation, CPPC delivered register and reference
register are read in two different cpc_read calls(). There could be
significant latency between the CPU reading these two registers due to
the core being interrupted - leading to an inaccurate result. Also, when
these registers are in FFH region, reading each register using cpc_read
will result in 2 IPI interrpts to the core whose registers are being read.
It will also wake up any core in idle to read the AMU counters.

In this patch series, there are two changes:
- Implement arch_freq_get_on_cpu() for arm64 to record AMU counters on
  every clock tick
- CPPC driver reads delivered and reference registers in a single IPI
  while avoiding a wake up on idle core to read AMU counters; also
    allows measuring CPU frequency of isolated CPUs

Results on an AmpereOne system with 128 cores after the patch:

When system is idle:
core   scaling_cur_freq  cpuinfo_cur_freq
[0]:   3068518           3000000
[1]:   1030869           1000000
[2]:   1031296           1000000
[3]:   1032224           1000000
[4]:   1032469           1000000
[5]:   1030987           1000000
...
...
isolcpus = 122-127
[122]: 1030724           1000000
[123]: 1030667           1000000
[124]: 1031888           1000000
[125]: 1031047           1000000
[126]: 1031683           1000000
[127]: 1030794           1000000

With stress applied to core 122-126:
core   scaling_cur_freq  cpuinfo_cur_freq
[0]:   3050000           3000000
[1]:   1031068           1000000
[2]:   1030699           1000000
[3]:   1031818           1000000
[4]:   1032251           1000000
[5]:   1031282           1000000
...
...
isolcpus = 122-127
[122]: 3000061           3012000
[123]: 3000041           3008000
[124]: 3000038           2998000
[125]: 3000062           2995000
[126]: 3000035           3004000
[127]: 1031440           1000000

[1]: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/
[2]: https://lore.kernel.org/linux-arm-kernel/20231212072617.14756-1-lihuisong@huawei.com/

Vanshidhar Konda (3):
  arm64: topology: Add arch_freq_get_on_cpu() support
  arm64: idle: Cache AMU counters before entering idle
  ACPI: CPPC: Read CPC FFH counters in a single IPI

 arch/arm64/kernel/idle.c     |  10 +++
 arch/arm64/kernel/topology.c | 153 ++++++++++++++++++++++++++++++-----
 drivers/acpi/cppc_acpi.c     |  32 +++++++-
 include/acpi/cppc_acpi.h     |  13 +++
 4 files changed, 186 insertions(+), 22 deletions(-)

-- 
2.43.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support
  2024-02-29 16:25 [PATCH v1 0/3] arm64: Use AMU counters for measuring CPU frequency Vanshidhar Konda
@ 2024-02-29 16:25 ` Vanshidhar Konda
  2024-03-06  8:24   ` Beata Michalska
  2024-03-07  3:02   ` lihuisong (C)
  2024-02-29 16:25 ` [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle Vanshidhar Konda
  2024-02-29 16:25 ` [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI Vanshidhar Konda
  2 siblings, 2 replies; 15+ messages in thread
From: Vanshidhar Konda @ 2024-02-29 16:25 UTC (permalink / raw)
  To: Huisong Li, Beata Michalska
  Cc: Vanshidhar Konda, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

AMU counters are used by the Frequency Invariance Engine (FIE) to
estimate the CPU utilization during each tick. The delta of the AMU
counters between two ticks can also be used to estimate the average CPU
frequency of each core over the tick duration. Measure the AMU counters
during tick, compute the delta and store it. When the frequency of the
core is queried, use the stored delta to determine the frequency.

arch_freq_get_on_cpu() is used on x86 systems to estimate the frequency
of each CPU. It can be wired up on arm64 for the same functionality.

Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
---
 arch/arm64/kernel/topology.c | 114 +++++++++++++++++++++++++++++------
 1 file changed, 96 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..db8d14525cf4 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -17,6 +17,8 @@
 #include <linux/cpufreq.h>
 #include <linux/init.h>
 #include <linux/percpu.h>
+#include <linux/sched/isolation.h>
+#include <linux/seqlock_types.h>
 
 #include <asm/cpu.h>
 #include <asm/cputype.h>
@@ -82,20 +84,54 @@ int __init parse_acpi_topology(void)
 #undef pr_fmt
 #define pr_fmt(fmt) "AMU: " fmt
 
+struct amu_counters {
+	seqcount_t	seq;
+	unsigned long	last_update;
+	u64		core_cnt;
+	u64		const_cnt;
+	u64		delta_core_cnt;
+	u64		delta_const_cnt;
+};
+
 /*
  * Ensure that amu_scale_freq_tick() will return SCHED_CAPACITY_SCALE until
  * the CPU capacity and its associated frequency have been correctly
  * initialized.
  */
-static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =  1UL << (2 * SCHED_CAPACITY_SHIFT);
-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =
+	1UL << (2 * SCHED_CAPACITY_SHIFT);
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_counters, cpu_samples) = {
+	.seq = SEQCNT_ZERO(cpu_samples.seq)
+};
 static cpumask_var_t amu_fie_cpus;
 
 void update_freq_counters_refs(void)
 {
-	this_cpu_write(arch_core_cycles_prev, read_corecnt());
-	this_cpu_write(arch_const_cycles_prev, read_constcnt());
+	struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
+	u64 core_cnt, const_cnt, delta_core_cnt, delta_const_cnt;
+
+	const_cnt = read_constcnt();
+	core_cnt = read_corecnt();
+
+	if (unlikely(core_cnt < cpu_sample->core_cnt) ||
+	    unlikely(const_cnt < cpu_sample->const_cnt)) {
+		WARN(1, "AMU counter values should be monotonic.\n");
+		cpu_sample->delta_const_cnt = 0;
+		cpu_sample->delta_core_cnt = 0;
+		return;
+	}
+
+	delta_core_cnt = core_cnt - cpu_sample->core_cnt;
+	delta_const_cnt = const_cnt - cpu_sample->const_cnt;
+
+	cpu_sample->core_cnt = core_cnt;
+	cpu_sample->const_cnt = const_cnt;
+
+	raw_write_seqcount_begin(&cpu_sample->seq);
+	cpu_sample->last_update = jiffies;
+	cpu_sample->delta_const_cnt = delta_const_cnt;
+	cpu_sample->delta_core_cnt = delta_core_cnt;
+	raw_write_seqcount_end(&cpu_sample->seq);
 }
 
 static inline bool freq_counters_valid(int cpu)
@@ -108,8 +144,7 @@ static inline bool freq_counters_valid(int cpu)
 		return false;
 	}
 
-	if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
-		     !per_cpu(arch_core_cycles_prev, cpu))) {
+	if (unlikely(per_cpu_ptr(&cpu_samples, cpu) == NULL)) {
 		pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
 		return false;
 	}
@@ -152,19 +187,15 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
 
 static void amu_scale_freq_tick(void)
 {
-	u64 prev_core_cnt, prev_const_cnt;
-	u64 core_cnt, const_cnt, scale;
-
-	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
-	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+	struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
+	u64 delta_core_cnt, delta_const_cnt, scale;
 
 	update_freq_counters_refs();
 
-	const_cnt = this_cpu_read(arch_const_cycles_prev);
-	core_cnt = this_cpu_read(arch_core_cycles_prev);
+	delta_const_cnt = cpu_sample->delta_const_cnt;
+	delta_core_cnt = cpu_sample->delta_core_cnt;
 
-	if (unlikely(core_cnt <= prev_core_cnt ||
-		     const_cnt <= prev_const_cnt))
+	if ((delta_const_cnt == 0) || (delta_core_cnt == 0))
 		return;
 
 	/*
@@ -175,15 +206,62 @@ static void amu_scale_freq_tick(void)
 	 * See validate_cpu_freq_invariance_counters() for details on
 	 * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
 	 */
-	scale = core_cnt - prev_core_cnt;
+	scale = delta_core_cnt;
 	scale *= this_cpu_read(arch_max_freq_scale);
 	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
-			  const_cnt - prev_const_cnt);
+			  delta_const_cnt);
 
 	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
 	this_cpu_write(arch_freq_scale, (unsigned long)scale);
 }
 
+/*
+ * Discard samples older than the define maximum sample age of 20ms. There
+ * is no point in sending IPIs in such a case. If the scheduler tick was
+ * not running then the CPU is either idle or isolated.
+ */
+#define MAX_SAMPLE_AGE	((unsigned long)HZ / 50)
+
+unsigned int arch_freq_get_on_cpu(int cpu)
+{
+	struct amu_counters *cpu_sample = per_cpu_ptr(&cpu_samples, cpu);
+	u64 delta_const_cnt, delta_core_cnt;
+	unsigned int seq, freq;
+	unsigned long last;
+
+	if (!freq_counters_valid(cpu))
+		goto fallback;
+
+	do {
+		seq = raw_read_seqcount_begin(&cpu_sample->seq);
+		last = cpu_sample->last_update;
+		delta_core_cnt = cpu_sample->delta_core_cnt;
+		delta_const_cnt = cpu_sample->delta_const_cnt;
+	} while (read_seqcount_retry(&cpu_sample->seq, seq));
+
+	/*
+	 * Bail on invalid count and when the last update was too long ago,
+	 * which covers idle and NOHZ full CPUs.
+	 */
+	if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
+		if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
+			goto fallback;
+	}
+
+	/*
+	 * CPU frequency = reference perf (in Hz) * (/\ delivered) / (/\ reference)
+	 * AMU reference performance counter increment rate is equal to the rate
+	 * of increment of the System counter, CNTPCT_EL0 and can be used to
+	 * compute the CPU frequency.
+	 */
+	return div64_u64((delta_core_cnt * (arch_timer_get_rate() / HZ)),
+			 delta_const_cnt);
+
+fallback:
+	freq = cpufreq_quick_get(cpu);
+	return freq ? freq : cpufreq_get_hw_max_freq(cpu);
+}
+
 static struct scale_freq_data amu_sfd = {
 	.source = SCALE_FREQ_SOURCE_ARCH,
 	.set_freq_scale = amu_scale_freq_tick,
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle
  2024-02-29 16:25 [PATCH v1 0/3] arm64: Use AMU counters for measuring CPU frequency Vanshidhar Konda
  2024-02-29 16:25 ` [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support Vanshidhar Konda
@ 2024-02-29 16:25 ` Vanshidhar Konda
  2024-03-07  3:17   ` lihuisong (C)
  2024-02-29 16:25 ` [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI Vanshidhar Konda
  2 siblings, 1 reply; 15+ messages in thread
From: Vanshidhar Konda @ 2024-02-29 16:25 UTC (permalink / raw)
  To: Huisong Li, Beata Michalska
  Cc: Vanshidhar Konda, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

AMU counters do not increment while a CPU is in idle. Saving the value
of the core and constant counters prior to invoking WFI allows FIE to
compute the frequency of a CPU that is idle.

Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
---
 arch/arm64/kernel/idle.c     | 10 ++++++++++
 arch/arm64/kernel/topology.c | 14 ++++++++------
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
index 05cfb347ec26..5ed2e57188a8 100644
--- a/arch/arm64/kernel/idle.c
+++ b/arch/arm64/kernel/idle.c
@@ -26,6 +26,16 @@ void __cpuidle cpu_do_idle(void)
 
 	arm_cpuidle_save_irq_context(&context);
 
+#ifdef CONFIG_ARM64_AMU_EXTN
+	/* Update the AMU counters before entering WFI. The cached AMU counter
+	 * value is used to determine CPU frequency while the CPU is idle
+	 * without needing to wake up the CPU.
+	 */
+
+	if (cpu_has_amu_feat(smp_processor_id()))
+		update_freq_counters_refs();
+#endif
+
 	dsb(sy);
 	wfi();
 
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index db8d14525cf4..8905eb0c681f 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -240,13 +240,15 @@ unsigned int arch_freq_get_on_cpu(int cpu)
 	} while (read_seqcount_retry(&cpu_sample->seq, seq));
 
 	/*
-	 * Bail on invalid count and when the last update was too long ago,
-	 * which covers idle and NOHZ full CPUs.
+	 * Bail on invalid count and when the last update was too long ago.
+	 * This covers idle, NOHZ full and isolated CPUs.
+	 *
+	 * Idle CPUs don't need to be measured because AMU counters stop
+	 * incrementing during WFI/WFE.
 	 */
-	if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
-		if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
-			goto fallback;
-	}
+	if (!delta_const_cnt ||
+	    ((jiffies - last) > MAX_SAMPLE_AGE && !idle_cpu(cpu)))
+		goto fallback;
 
 	/*
 	 * CPU frequency = reference perf (in Hz) * (/\ delivered) / (/\ reference)
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI
  2024-02-29 16:25 [PATCH v1 0/3] arm64: Use AMU counters for measuring CPU frequency Vanshidhar Konda
  2024-02-29 16:25 ` [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support Vanshidhar Konda
  2024-02-29 16:25 ` [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle Vanshidhar Konda
@ 2024-02-29 16:25 ` Vanshidhar Konda
  2024-02-29 17:32   ` Rafael J. Wysocki
  2 siblings, 1 reply; 15+ messages in thread
From: Vanshidhar Konda @ 2024-02-29 16:25 UTC (permalink / raw)
  To: Huisong Li, Beata Michalska
  Cc: Vanshidhar Konda, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

The CPPC driver reads delivered and reference counters using cpc_read
one at a time. This leads to inaccurate measurement of CPU frequency
discussed in [1]. If the firmware indicates that both the registers are
in the FFH interface the kernel can read the registers together in a
single IPI. This has two benefits:
1. Reduces the number of IPIs needed to read the two registers
2. The two registers will be read in close proximity resulting in more
   accurate CPU frequency measurement

[1]: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/

Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
---
 arch/arm64/kernel/topology.c | 37 ++++++++++++++++++++++++++++++++++++
 drivers/acpi/cppc_acpi.c     | 32 +++++++++++++++++++++++++++----
 include/acpi/cppc_acpi.h     | 13 +++++++++++++
 3 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8905eb0c681f..8207565f43ee 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -421,6 +421,43 @@ int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
 	return ret;
 }
 
+static void cpc_update_freq_counters(void *info)
+{
+	update_freq_counters_refs();
+}
+
+int cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *ffh_regs)
+{
+	struct amu_counters *ctrs = per_cpu_ptr(&cpu_samples, cpu);
+	int idx;
+
+	if (!cpc_ffh_supported() || !freq_counters_valid(cpu))
+		return -EOPNOTSUPP;
+
+	if (WARN_ON_ONCE(irqs_disabled()))
+		return -EPERM;
+
+	if (!idle_cpu(cpu))
+		smp_call_function_single(cpu, cpc_update_freq_counters, NULL, 1);
+
+	for (idx = 0; idx < MAX_NUM_CPC_REGS_FFH; idx++) {
+
+		if (!ffh_regs->regs[idx].reg)
+			continue;
+
+		switch ((u64)(ffh_regs->regs[idx].reg->address)) {
+		case 0x0:
+			ffh_regs->regs[idx].value = ctrs->core_cnt;
+			break;
+		case 0x1:
+			ffh_regs->regs[idx].value = ctrs->const_cnt;
+			break;
+		}
+	}
+
+	return 0;
+}
+
 int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
 {
 	return -EOPNOTSUPP;
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index d155a86a8614..55ffb1915e4f 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -113,6 +113,10 @@ static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr);
 				(cpc)->cpc_entry.reg.space_id ==	\
 				ACPI_ADR_SPACE_SYSTEM_IO)
 
+#define CPC_IN_FFH(cpc) ((cpc)->type == ACPI_TYPE_BUFFER &&		\
+				(cpc)->cpc_entry.reg.space_id ==	\
+				ACPI_ADR_SPACE_FIXED_HARDWARE)
+
 /* Evaluates to True if reg is a NULL register descriptor */
 #define IS_NULL_REG(reg) ((reg)->space_id ==  ACPI_ADR_SPACE_SYSTEM_MEMORY && \
 				(reg)->address == 0 &&			\
@@ -974,6 +978,11 @@ int __weak cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
 	return -ENOTSUPP;
 }
 
+int __weak cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *regs)
+{
+	return -ENOTSUPP;
+}
+
 /*
  * Since cpc_read and cpc_write are called while holding pcc_lock, it should be
  * as fast as possible. We have already mapped the PCC subspace during init, so
@@ -1317,7 +1326,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
 	int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
 	struct cppc_pcc_data *pcc_ss_data = NULL;
 	u64 delivered, reference, ref_perf, ctr_wrap_time;
-	int ret = 0, regs_in_pcc = 0;
+	int ret = 0, regs_in_pcc = 0, regs_read_in_ffh = 0;
 
 	if (!cpc_desc) {
 		pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
@@ -1353,8 +1362,23 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
 		}
 	}
 
-	cpc_read(cpunum, delivered_reg, &delivered);
-	cpc_read(cpunum, reference_reg, &reference);
+	if (CPC_IN_FFH(delivered_reg) && CPC_IN_FFH(reference_reg)) {
+		struct ffh_cpc_reg_values ffh_regs;
+
+		ffh_regs.regs[0].reg = &(delivered_reg->cpc_entry.reg);
+		ffh_regs.regs[1].reg = &(reference_reg->cpc_entry.reg);
+		ret = cpc_read_regs_ffh(cpunum, &ffh_regs);
+		if (!ret) {
+			delivered = ffh_regs.regs[0].value;
+			reference = ffh_regs.regs[1].value;
+			regs_read_in_ffh = 1;
+		}
+	}
+
+	if (!regs_read_in_ffh) {
+		cpc_read(cpunum, delivered_reg, &delivered);
+		cpc_read(cpunum, reference_reg, &reference);
+	}
 	cpc_read(cpunum, ref_perf_reg, &ref_perf);
 
 	/*
@@ -1366,7 +1390,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
 	if (CPC_SUPPORTED(ctr_wrap_reg))
 		cpc_read(cpunum, ctr_wrap_reg, &ctr_wrap_time);
 
-	if (!delivered || !reference ||	!ref_perf) {
+	if (!delivered || !reference || !ref_perf) {
 		ret = -EFAULT;
 		goto out_err;
 	}
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 3a0995f8bce8..0da614a50edd 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -137,6 +137,18 @@ struct cppc_cpudata {
 };
 
 #ifdef CONFIG_ACPI_CPPC_LIB
+
+#define MAX_NUM_CPC_REGS_FFH 2
+
+struct ffh_cpc_reg {
+	struct cpc_reg *reg;
+	u64 value;
+};
+
+struct ffh_cpc_reg_values {
+	struct ffh_cpc_reg regs[MAX_NUM_CPC_REGS_FFH];
+};
+
 extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
 extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf);
 extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
@@ -153,6 +165,7 @@ extern unsigned int cppc_get_transition_latency(int cpu);
 extern bool cpc_ffh_supported(void);
 extern bool cpc_supported_by_cpu(void);
 extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
+extern int cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *regs);
 extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
 extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf);
 extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI
  2024-02-29 16:25 ` [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI Vanshidhar Konda
@ 2024-02-29 17:32   ` Rafael J. Wysocki
  2024-02-29 18:00     ` Vanshidhar Konda
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2024-02-29 17:32 UTC (permalink / raw)
  To: Vanshidhar Konda
  Cc: Huisong Li, Beata Michalska, Ionela Voinescu, linux-kernel,
	linux-pm, linux-arm-kernel, rafael, sumitg, zengheng4, yang,
	will, sudeep.holla, liuyonglong, zhanjie9, linux-acpi

On Thu, Feb 29, 2024 at 5:25 PM Vanshidhar Konda
<vanshikonda@os.amperecomputing.com> wrote:
>
> The CPPC driver reads delivered and reference counters using cpc_read
> one at a time. This leads to inaccurate measurement of CPU frequency
> discussed in [1]. If the firmware indicates that both the registers are
> in the FFH interface the kernel can read the registers together in a
> single IPI. This has two benefits:
> 1. Reduces the number of IPIs needed to read the two registers
> 2. The two registers will be read in close proximity resulting in more
>    accurate CPU frequency measurement
>
> [1]: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/
>
> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> ---
>  arch/arm64/kernel/topology.c | 37 ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/cppc_acpi.c     | 32 +++++++++++++++++++++++++++----
>  include/acpi/cppc_acpi.h     | 13 +++++++++++++
>  3 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 8905eb0c681f..8207565f43ee 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -421,6 +421,43 @@ int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
>         return ret;
>  }
>
> +static void cpc_update_freq_counters(void *info)
> +{
> +       update_freq_counters_refs();
> +}
> +
> +int cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *ffh_regs)
> +{
> +       struct amu_counters *ctrs = per_cpu_ptr(&cpu_samples, cpu);
> +       int idx;
> +
> +       if (!cpc_ffh_supported() || !freq_counters_valid(cpu))
> +               return -EOPNOTSUPP;
> +
> +       if (WARN_ON_ONCE(irqs_disabled()))
> +               return -EPERM;
> +
> +       if (!idle_cpu(cpu))
> +               smp_call_function_single(cpu, cpc_update_freq_counters, NULL, 1);
> +
> +       for (idx = 0; idx < MAX_NUM_CPC_REGS_FFH; idx++) {
> +
> +               if (!ffh_regs->regs[idx].reg)
> +                       continue;
> +
> +               switch ((u64)(ffh_regs->regs[idx].reg->address)) {
> +               case 0x0:
> +                       ffh_regs->regs[idx].value = ctrs->core_cnt;
> +                       break;
> +               case 0x1:
> +                       ffh_regs->regs[idx].value = ctrs->const_cnt;
> +                       break;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
>  int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
>  {
>         return -EOPNOTSUPP;
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index d155a86a8614..55ffb1915e4f 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -113,6 +113,10 @@ static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr);
>                                 (cpc)->cpc_entry.reg.space_id ==        \
>                                 ACPI_ADR_SPACE_SYSTEM_IO)
>
> +#define CPC_IN_FFH(cpc) ((cpc)->type == ACPI_TYPE_BUFFER &&            \
> +                               (cpc)->cpc_entry.reg.space_id ==        \
> +                               ACPI_ADR_SPACE_FIXED_HARDWARE)
> +
>  /* Evaluates to True if reg is a NULL register descriptor */
>  #define IS_NULL_REG(reg) ((reg)->space_id ==  ACPI_ADR_SPACE_SYSTEM_MEMORY && \
>                                 (reg)->address == 0 &&                  \
> @@ -974,6 +978,11 @@ int __weak cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
>         return -ENOTSUPP;
>  }
>
> +int __weak cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *regs)
> +{
> +       return -ENOTSUPP;
> +}

This might return a bool value.

Is there any case in which the caller would handle different error
codes differently?

> +
>  /*
>   * Since cpc_read and cpc_write are called while holding pcc_lock, it should be
>   * as fast as possible. We have already mapped the PCC subspace during init, so
> @@ -1317,7 +1326,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>         int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
>         struct cppc_pcc_data *pcc_ss_data = NULL;
>         u64 delivered, reference, ref_perf, ctr_wrap_time;
> -       int ret = 0, regs_in_pcc = 0;
> +       int ret = 0, regs_in_pcc = 0, regs_read_in_ffh = 0;

Please use bool as the type for boolean variables.

>
>         if (!cpc_desc) {
>                 pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
> @@ -1353,8 +1362,23 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>                 }
>         }
>
> -       cpc_read(cpunum, delivered_reg, &delivered);
> -       cpc_read(cpunum, reference_reg, &reference);
> +       if (CPC_IN_FFH(delivered_reg) && CPC_IN_FFH(reference_reg)) {
> +               struct ffh_cpc_reg_values ffh_regs;
> +
> +               ffh_regs.regs[0].reg = &(delivered_reg->cpc_entry.reg);
> +               ffh_regs.regs[1].reg = &(reference_reg->cpc_entry.reg);
> +               ret = cpc_read_regs_ffh(cpunum, &ffh_regs);
> +               if (!ret) {

If cpc_read_regs_ffh() returned 'true' on success, the above could be written as

if (cpc_read_regs_ffh(cpunum, &ffh_regs)) {

> +                       delivered = ffh_regs.regs[0].value;
> +                       reference = ffh_regs.regs[1].value;
> +                       regs_read_in_ffh = 1;
> +               }
> +       }
> +
> +       if (!regs_read_in_ffh) {
> +               cpc_read(cpunum, delivered_reg, &delivered);
> +               cpc_read(cpunum, reference_reg, &reference);
> +       }
>         cpc_read(cpunum, ref_perf_reg, &ref_perf);
>
>         /*
> @@ -1366,7 +1390,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>         if (CPC_SUPPORTED(ctr_wrap_reg))
>                 cpc_read(cpunum, ctr_wrap_reg, &ctr_wrap_time);
>
> -       if (!delivered || !reference || !ref_perf) {
> +       if (!delivered || !reference || !ref_perf) {
>                 ret = -EFAULT;
>                 goto out_err;
>         }
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 3a0995f8bce8..0da614a50edd 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -137,6 +137,18 @@ struct cppc_cpudata {
>  };
>
>  #ifdef CONFIG_ACPI_CPPC_LIB
> +
> +#define MAX_NUM_CPC_REGS_FFH 2
> +
> +struct ffh_cpc_reg {
> +       struct cpc_reg *reg;
> +       u64 value;
> +};
> +
> +struct ffh_cpc_reg_values {
> +       struct ffh_cpc_reg regs[MAX_NUM_CPC_REGS_FFH];
> +};
> +
>  extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
>  extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf);
>  extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
> @@ -153,6 +165,7 @@ extern unsigned int cppc_get_transition_latency(int cpu);
>  extern bool cpc_ffh_supported(void);
>  extern bool cpc_supported_by_cpu(void);
>  extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
> +extern int cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *regs);
>  extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
>  extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf);
>  extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI
  2024-02-29 17:32   ` Rafael J. Wysocki
@ 2024-02-29 18:00     ` Vanshidhar Konda
  2024-02-29 18:02       ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Vanshidhar Konda @ 2024-02-29 18:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Huisong Li, Beata Michalska, Ionela Voinescu, linux-kernel,
	linux-pm, linux-arm-kernel, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

On Thu, Feb 29, 2024 at 06:32:59PM +0100, Rafael J. Wysocki wrote:
>On Thu, Feb 29, 2024 at 5:25 PM Vanshidhar Konda
><vanshikonda@os.amperecomputing.com> wrote:
>>
>> The CPPC driver reads delivered and reference counters using cpc_read
>> one at a time. This leads to inaccurate measurement of CPU frequency
>> discussed in [1]. If the firmware indicates that both the registers are
>> in the FFH interface the kernel can read the registers together in a
>> single IPI. This has two benefits:
>> 1. Reduces the number of IPIs needed to read the two registers
>> 2. The two registers will be read in close proximity resulting in more
>>    accurate CPU frequency measurement
>>
>> [1]: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/
>>
>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>> ---
>>  arch/arm64/kernel/topology.c | 37 ++++++++++++++++++++++++++++++++++++
>>  drivers/acpi/cppc_acpi.c     | 32 +++++++++++++++++++++++++++----
>>  include/acpi/cppc_acpi.h     | 13 +++++++++++++
>>  3 files changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 8905eb0c681f..8207565f43ee 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -421,6 +421,43 @@ int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
>>         return ret;
>>  }
>>
>> +static void cpc_update_freq_counters(void *info)
>> +{
>> +       update_freq_counters_refs();
>> +}
>> +
>> +int cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *ffh_regs)
>> +{
>> +       struct amu_counters *ctrs = per_cpu_ptr(&cpu_samples, cpu);
>> +       int idx;
>> +
>> +       if (!cpc_ffh_supported() || !freq_counters_valid(cpu))
>> +               return -EOPNOTSUPP;
>> +
>> +       if (WARN_ON_ONCE(irqs_disabled()))
>> +               return -EPERM;
>> +
>> +       if (!idle_cpu(cpu))
>> +               smp_call_function_single(cpu, cpc_update_freq_counters, NULL, 1);
>> +
>> +       for (idx = 0; idx < MAX_NUM_CPC_REGS_FFH; idx++) {
>> +
>> +               if (!ffh_regs->regs[idx].reg)
>> +                       continue;
>> +
>> +               switch ((u64)(ffh_regs->regs[idx].reg->address)) {
>> +               case 0x0:
>> +                       ffh_regs->regs[idx].value = ctrs->core_cnt;
>> +                       break;
>> +               case 0x1:
>> +                       ffh_regs->regs[idx].value = ctrs->const_cnt;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>  int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
>>  {
>>         return -EOPNOTSUPP;
>> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
>> index d155a86a8614..55ffb1915e4f 100644
>> --- a/drivers/acpi/cppc_acpi.c
>> +++ b/drivers/acpi/cppc_acpi.c
>> @@ -113,6 +113,10 @@ static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr);
>>                                 (cpc)->cpc_entry.reg.space_id ==        \
>>                                 ACPI_ADR_SPACE_SYSTEM_IO)
>>
>> +#define CPC_IN_FFH(cpc) ((cpc)->type == ACPI_TYPE_BUFFER &&            \
>> +                               (cpc)->cpc_entry.reg.space_id ==        \
>> +                               ACPI_ADR_SPACE_FIXED_HARDWARE)
>> +
>>  /* Evaluates to True if reg is a NULL register descriptor */
>>  #define IS_NULL_REG(reg) ((reg)->space_id ==  ACPI_ADR_SPACE_SYSTEM_MEMORY && \
>>                                 (reg)->address == 0 &&                  \
>> @@ -974,6 +978,11 @@ int __weak cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
>>         return -ENOTSUPP;
>>  }
>>
>> +int __weak cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *regs)
>> +{
>> +       return -ENOTSUPP;
>> +}
>
>This might return a bool value.
>
>Is there any case in which the caller would handle different error
>codes differently?
>
I kept this similar to cpc_read_ffh(). I didn't see any usage of the error
codes. The return value of cpc_read_ffh() is returned only from cpc_read(),
but I didn't see anyone consuming the return value of cpc_read().
Additionally, checkpatch complains about using -ENOTSUPP and suggests
replacing it with -EOPNOTSUPP. Does it make sense to update
cpc_read/write_ffh() as well to switch to return type bool?

>> +
>>  /*
>>   * Since cpc_read and cpc_write are called while holding pcc_lock, it should be
>>   * as fast as possible. We have already mapped the PCC subspace during init, so
>> @@ -1317,7 +1326,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>>         int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
>>         struct cppc_pcc_data *pcc_ss_data = NULL;
>>         u64 delivered, reference, ref_perf, ctr_wrap_time;
>> -       int ret = 0, regs_in_pcc = 0;
>> +       int ret = 0, regs_in_pcc = 0, regs_read_in_ffh = 0;
>
>Please use bool as the type for boolean variables.
>
Thanks for pointing that out. I'll do that for the next version.

Thanks,
Vanshi

>>
>>         if (!cpc_desc) {
>>                 pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
>> @@ -1353,8 +1362,23 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>>                 }
>>         }
>>
>> -       cpc_read(cpunum, delivered_reg, &delivered);
>> -       cpc_read(cpunum, reference_reg, &reference);
>> +       if (CPC_IN_FFH(delivered_reg) && CPC_IN_FFH(reference_reg)) {
>> +               struct ffh_cpc_reg_values ffh_regs;
>> +
>> +               ffh_regs.regs[0].reg = &(delivered_reg->cpc_entry.reg);
>> +               ffh_regs.regs[1].reg = &(reference_reg->cpc_entry.reg);
>> +               ret = cpc_read_regs_ffh(cpunum, &ffh_regs);
>> +               if (!ret) {
>
>If cpc_read_regs_ffh() returned 'true' on success, the above could be written as
>
>if (cpc_read_regs_ffh(cpunum, &ffh_regs)) {
>
>> +                       delivered = ffh_regs.regs[0].value;
>> +                       reference = ffh_regs.regs[1].value;
>> +                       regs_read_in_ffh = 1;
>> +               }
>> +       }
>> +
>> +       if (!regs_read_in_ffh) {
>> +               cpc_read(cpunum, delivered_reg, &delivered);
>> +               cpc_read(cpunum, reference_reg, &reference);
>> +       }
>>         cpc_read(cpunum, ref_perf_reg, &ref_perf);
>>
>>         /*
>> @@ -1366,7 +1390,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>>         if (CPC_SUPPORTED(ctr_wrap_reg))
>>                 cpc_read(cpunum, ctr_wrap_reg, &ctr_wrap_time);
>>
>> -       if (!delivered || !reference || !ref_perf) {
>> +       if (!delivered || !reference || !ref_perf) {
>>                 ret = -EFAULT;
>>                 goto out_err;
>>         }
>> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
>> index 3a0995f8bce8..0da614a50edd 100644
>> --- a/include/acpi/cppc_acpi.h
>> +++ b/include/acpi/cppc_acpi.h
>> @@ -137,6 +137,18 @@ struct cppc_cpudata {
>>  };
>>
>>  #ifdef CONFIG_ACPI_CPPC_LIB
>> +
>> +#define MAX_NUM_CPC_REGS_FFH 2
>> +
>> +struct ffh_cpc_reg {
>> +       struct cpc_reg *reg;
>> +       u64 value;
>> +};
>> +
>> +struct ffh_cpc_reg_values {
>> +       struct ffh_cpc_reg regs[MAX_NUM_CPC_REGS_FFH];
>> +};
>> +
>>  extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
>>  extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf);
>>  extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
>> @@ -153,6 +165,7 @@ extern unsigned int cppc_get_transition_latency(int cpu);
>>  extern bool cpc_ffh_supported(void);
>>  extern bool cpc_supported_by_cpu(void);
>>  extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
>> +extern int cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *regs);
>>  extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
>>  extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf);
>>  extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
>> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI
  2024-02-29 18:00     ` Vanshidhar Konda
@ 2024-02-29 18:02       ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2024-02-29 18:02 UTC (permalink / raw)
  To: Vanshidhar Konda
  Cc: Rafael J. Wysocki, Huisong Li, Beata Michalska, Ionela Voinescu,
	linux-kernel, linux-pm, linux-arm-kernel, sumitg, zengheng4,
	yang, will, sudeep.holla, liuyonglong, zhanjie9, linux-acpi

On Thu, Feb 29, 2024 at 7:01 PM Vanshidhar Konda
<vanshikonda@os.amperecomputing.com> wrote:
>
> On Thu, Feb 29, 2024 at 06:32:59PM +0100, Rafael J. Wysocki wrote:
> >On Thu, Feb 29, 2024 at 5:25 PM Vanshidhar Konda
> ><vanshikonda@os.amperecomputing.com> wrote:
> >>
> >> The CPPC driver reads delivered and reference counters using cpc_read
> >> one at a time. This leads to inaccurate measurement of CPU frequency
> >> discussed in [1]. If the firmware indicates that both the registers are
> >> in the FFH interface the kernel can read the registers together in a
> >> single IPI. This has two benefits:
> >> 1. Reduces the number of IPIs needed to read the two registers
> >> 2. The two registers will be read in close proximity resulting in more
> >>    accurate CPU frequency measurement
> >>
> >> [1]: https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/
> >>
> >> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> >> ---
> >>  arch/arm64/kernel/topology.c | 37 ++++++++++++++++++++++++++++++++++++
> >>  drivers/acpi/cppc_acpi.c     | 32 +++++++++++++++++++++++++++----
> >>  include/acpi/cppc_acpi.h     | 13 +++++++++++++
> >>  3 files changed, 78 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> >> index 8905eb0c681f..8207565f43ee 100644
> >> --- a/arch/arm64/kernel/topology.c
> >> +++ b/arch/arm64/kernel/topology.c
> >> @@ -421,6 +421,43 @@ int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
> >>         return ret;
> >>  }
> >>
> >> +static void cpc_update_freq_counters(void *info)
> >> +{
> >> +       update_freq_counters_refs();
> >> +}
> >> +
> >> +int cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *ffh_regs)
> >> +{
> >> +       struct amu_counters *ctrs = per_cpu_ptr(&cpu_samples, cpu);
> >> +       int idx;
> >> +
> >> +       if (!cpc_ffh_supported() || !freq_counters_valid(cpu))
> >> +               return -EOPNOTSUPP;
> >> +
> >> +       if (WARN_ON_ONCE(irqs_disabled()))
> >> +               return -EPERM;
> >> +
> >> +       if (!idle_cpu(cpu))
> >> +               smp_call_function_single(cpu, cpc_update_freq_counters, NULL, 1);
> >> +
> >> +       for (idx = 0; idx < MAX_NUM_CPC_REGS_FFH; idx++) {
> >> +
> >> +               if (!ffh_regs->regs[idx].reg)
> >> +                       continue;
> >> +
> >> +               switch ((u64)(ffh_regs->regs[idx].reg->address)) {
> >> +               case 0x0:
> >> +                       ffh_regs->regs[idx].value = ctrs->core_cnt;
> >> +                       break;
> >> +               case 0x1:
> >> +                       ffh_regs->regs[idx].value = ctrs->const_cnt;
> >> +                       break;
> >> +               }
> >> +       }
> >> +
> >> +       return 0;
> >> +}
> >> +
> >>  int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
> >>  {
> >>         return -EOPNOTSUPP;
> >> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> >> index d155a86a8614..55ffb1915e4f 100644
> >> --- a/drivers/acpi/cppc_acpi.c
> >> +++ b/drivers/acpi/cppc_acpi.c
> >> @@ -113,6 +113,10 @@ static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr);
> >>                                 (cpc)->cpc_entry.reg.space_id ==        \
> >>                                 ACPI_ADR_SPACE_SYSTEM_IO)
> >>
> >> +#define CPC_IN_FFH(cpc) ((cpc)->type == ACPI_TYPE_BUFFER &&            \
> >> +                               (cpc)->cpc_entry.reg.space_id ==        \
> >> +                               ACPI_ADR_SPACE_FIXED_HARDWARE)
> >> +
> >>  /* Evaluates to True if reg is a NULL register descriptor */
> >>  #define IS_NULL_REG(reg) ((reg)->space_id ==  ACPI_ADR_SPACE_SYSTEM_MEMORY && \
> >>                                 (reg)->address == 0 &&                  \
> >> @@ -974,6 +978,11 @@ int __weak cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
> >>         return -ENOTSUPP;
> >>  }
> >>
> >> +int __weak cpc_read_regs_ffh(int cpu, struct ffh_cpc_reg_values *regs)
> >> +{
> >> +       return -ENOTSUPP;
> >> +}
> >
> >This might return a bool value.
> >
> >Is there any case in which the caller would handle different error
> >codes differently?
> >
> I kept this similar to cpc_read_ffh(). I didn't see any usage of the error
> codes. The return value of cpc_read_ffh() is returned only from cpc_read(),
> but I didn't see anyone consuming the return value of cpc_read().
> Additionally, checkpatch complains about using -ENOTSUPP and suggests
> replacing it with -EOPNOTSUPP. Does it make sense to update
> cpc_read/write_ffh() as well to switch to return type bool?

Probably, but in a separate patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support
  2024-02-29 16:25 ` [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support Vanshidhar Konda
@ 2024-03-06  8:24   ` Beata Michalska
  2024-03-06 22:04     ` Vanshidhar Konda
  2024-03-07  3:02   ` lihuisong (C)
  1 sibling, 1 reply; 15+ messages in thread
From: Beata Michalska @ 2024-03-06  8:24 UTC (permalink / raw)
  To: Vanshidhar Konda
  Cc: Huisong Li, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi


Hi Vanshidhar,

On Thu, Feb 29, 2024 at 08:25:13AM -0800, Vanshidhar Konda wrote:
> AMU counters are used by the Frequency Invariance Engine (FIE) to
> estimate the CPU utilization during each tick. The delta of the AMU
> counters between two ticks can also be used to estimate the average CPU
> frequency of each core over the tick duration. Measure the AMU counters
> during tick, compute the delta and store it. When the frequency of the
> core is queried, use the stored delta to determine the frequency.
> 
> arch_freq_get_on_cpu() is used on x86 systems to estimate the frequency
> of each CPU. It can be wired up on arm64 for the same functionality.
> 
> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> ---
>  arch/arm64/kernel/topology.c | 114 +++++++++++++++++++++++++++++------
>  1 file changed, 96 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..db8d14525cf4 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -17,6 +17,8 @@
>  #include <linux/cpufreq.h>
>  #include <linux/init.h>
>  #include <linux/percpu.h>
> +#include <linux/sched/isolation.h>
> +#include <linux/seqlock_types.h>
>  
>  #include <asm/cpu.h>
>  #include <asm/cputype.h>
> @@ -82,20 +84,54 @@ int __init parse_acpi_topology(void)
>  #undef pr_fmt
>  #define pr_fmt(fmt) "AMU: " fmt
>  
> +struct amu_counters {
> +	seqcount_t	seq;
> +	unsigned long	last_update;
> +	u64		core_cnt;
> +	u64		const_cnt;
> +	u64		delta_core_cnt;
> +	u64		delta_const_cnt;
> +};
It still might not be necessary to track both last taken sample and deltas from
previous ones, see[1].
I could send v3 of [1] and take into account the changes you have suggested here,
namely the last tick recorded. Otherwise few comments below.
> +
>  /*
>   * Ensure that amu_scale_freq_tick() will return SCHED_CAPACITY_SCALE until
>   * the CPU capacity and its associated frequency have been correctly
>   * initialized.
>   */
> -static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =  1UL << (2 * SCHED_CAPACITY_SHIFT);
> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =
> +	1UL << (2 * SCHED_CAPACITY_SHIFT);
> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_counters, cpu_samples) = {
> +	.seq = SEQCNT_ZERO(cpu_samples.seq)
> +};
>  static cpumask_var_t amu_fie_cpus;
>  
>  void update_freq_counters_refs(void)
>  {
> -	this_cpu_write(arch_core_cycles_prev, read_corecnt());
> -	this_cpu_write(arch_const_cycles_prev, read_constcnt());
> +	struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
> +	u64 core_cnt, const_cnt, delta_core_cnt, delta_const_cnt;
> +
> +	const_cnt = read_constcnt();
> +	core_cnt = read_corecnt();
> +
> +	if (unlikely(core_cnt < cpu_sample->core_cnt) ||
> +	    unlikely(const_cnt < cpu_sample->const_cnt)) {
> +		WARN(1, "AMU counter values should be monotonic.\n");
> +		cpu_sample->delta_const_cnt = 0;
> +		cpu_sample->delta_core_cnt = 0;
Not sure if zero-ing is really necessary here
> +		return;
> +	}
> +
> +	delta_core_cnt = core_cnt - cpu_sample->core_cnt;
> +	delta_const_cnt = const_cnt - cpu_sample->const_cnt;
> +
> +	cpu_sample->core_cnt = core_cnt;
> +	cpu_sample->const_cnt = const_cnt;
> +
> +	raw_write_seqcount_begin(&cpu_sample->seq);
> +	cpu_sample->last_update = jiffies;
> +	cpu_sample->delta_const_cnt = delta_const_cnt;
> +	cpu_sample->delta_core_cnt = delta_core_cnt;
> +	raw_write_seqcount_end(&cpu_sample->seq);
>  }
>  
>  static inline bool freq_counters_valid(int cpu)
> @@ -108,8 +144,7 @@ static inline bool freq_counters_valid(int cpu)
>  		return false;
>  	}
>  
> -	if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> -		     !per_cpu(arch_core_cycles_prev, cpu))) {
> +	if (unlikely(per_cpu_ptr(&cpu_samples, cpu) == NULL)) {
>  		pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
>  		return false;
>  	}
> @@ -152,19 +187,15 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>  
>  static void amu_scale_freq_tick(void)
>  {
> -	u64 prev_core_cnt, prev_const_cnt;
> -	u64 core_cnt, const_cnt, scale;
> -
> -	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> -	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> +	struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
> +	u64 delta_core_cnt, delta_const_cnt, scale;
>  
>  	update_freq_counters_refs();
>  
> -	const_cnt = this_cpu_read(arch_const_cycles_prev);
> -	core_cnt = this_cpu_read(arch_core_cycles_prev);
> +	delta_const_cnt = cpu_sample->delta_const_cnt;
> +	delta_core_cnt = cpu_sample->delta_core_cnt;
>  
> -	if (unlikely(core_cnt <= prev_core_cnt ||
> -		     const_cnt <= prev_const_cnt))
> +	if ((delta_const_cnt == 0) || (delta_core_cnt == 0))
>  		return;
>  
>  	/*
> @@ -175,15 +206,62 @@ static void amu_scale_freq_tick(void)
>  	 * See validate_cpu_freq_invariance_counters() for details on
>  	 * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
>  	 */
> -	scale = core_cnt - prev_core_cnt;
> +	scale = delta_core_cnt;
>  	scale *= this_cpu_read(arch_max_freq_scale);
>  	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
> -			  const_cnt - prev_const_cnt);
> +			  delta_const_cnt);
>  
>  	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
>  	this_cpu_write(arch_freq_scale, (unsigned long)scale);
>  }
>  
> +/*
> + * Discard samples older than the define maximum sample age of 20ms. There
> + * is no point in sending IPIs in such a case. If the scheduler tick was
> + * not running then the CPU is either idle or isolated.
> + */
> +#define MAX_SAMPLE_AGE	((unsigned long)HZ / 50)
This depends on the config, so for HZ_1000 it will indeed give 20ms,
for CONFIG_250 that will be 5ms. It might be better to set it to number of
expected missed ticks instead ? Or amend the comment.
> +
> +unsigned int arch_freq_get_on_cpu(int cpu)
> +{
> +	struct amu_counters *cpu_sample = per_cpu_ptr(&cpu_samples, cpu);
> +	u64 delta_const_cnt, delta_core_cnt;
> +	unsigned int seq, freq;
> +	unsigned long last;
> +
> +	if (!freq_counters_valid(cpu))
> +		goto fallback;
> +
> +	do {
> +		seq = raw_read_seqcount_begin(&cpu_sample->seq);
> +		last = cpu_sample->last_update;
> +		delta_core_cnt = cpu_sample->delta_core_cnt;
> +		delta_const_cnt = cpu_sample->delta_const_cnt;
> +	} while (read_seqcount_retry(&cpu_sample->seq, seq));
> +
This seems to be taken from APERF/MPERF relevant code. Including the comments.
> +	/*
> +	 * Bail on invalid count and when the last update was too long ago,
> +	 * which covers idle and NOHZ full CPUs.
> +	 */
> +	if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
Shouldn't the first condition (non-zero increase of cnt_cycles counter)
disqualify the sample taken altogether ?
> +		if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
> +			goto fallback;
Not entirely convinced that this condition is what is expected ?
For housekeeping cpu that is not idle it will still resolve to AMU counters,
not sure if that's what was intended ?
Also, for cases when given cpufreq policy spans more than a single core, the
frequency might be queried based on relevant CPU that might have seen the tick
within specified timeframe (see [1])

> +	}
> +
> +	/*
> +	 * CPU frequency = reference perf (in Hz) * (/\ delivered) / (/\ reference)
> +	 * AMU reference performance counter increment rate is equal to the rate
> +	 * of increment of the System counter, CNTPCT_EL0 and can be used to
> +	 * compute the CPU frequency.
> +	 */
> +	return div64_u64((delta_core_cnt * (arch_timer_get_rate() / HZ)),
/HZ/HZ_PER_KHZ ?
> +			 delta_const_cnt);
> +
> +fallback:
> +	freq = cpufreq_quick_get(cpu);
> +	return freq ? freq : cpufreq_get_hw_max_freq(cpu);
If the arch specific code cannot determine the frequency it should actually make
it clear by returning '0' instead of trying to patch things up by itself (?)

Overall I'd prefer to revive [1] and amened it accordingly instead.

---
[1] https://lore.kernel.org/all/20231127160838.1403404-1-beata.michalska@arm.com/
---
Best Regards
Beata
> +}
> +
>  static struct scale_freq_data amu_sfd = {
>  	.source = SCALE_FREQ_SOURCE_ARCH,
>  	.set_freq_scale = amu_scale_freq_tick,
> -- 
> 2.43.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support
  2024-03-06  8:24   ` Beata Michalska
@ 2024-03-06 22:04     ` Vanshidhar Konda
  2024-03-07  3:20       ` lihuisong (C)
  0 siblings, 1 reply; 15+ messages in thread
From: Vanshidhar Konda @ 2024-03-06 22:04 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Huisong Li, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

On Wed, Mar 06, 2024 at 09:24:19AM +0100, Beata Michalska wrote:
>
>Hi Vanshidhar,
>
>On Thu, Feb 29, 2024 at 08:25:13AM -0800, Vanshidhar Konda wrote:
>> AMU counters are used by the Frequency Invariance Engine (FIE) to
>> estimate the CPU utilization during each tick. The delta of the AMU
>> counters between two ticks can also be used to estimate the average CPU
>> frequency of each core over the tick duration. Measure the AMU counters
>> during tick, compute the delta and store it. When the frequency of the
>> core is queried, use the stored delta to determine the frequency.
>>
>> arch_freq_get_on_cpu() is used on x86 systems to estimate the frequency
>> of each CPU. It can be wired up on arm64 for the same functionality.
>>
>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>> ---
>>  arch/arm64/kernel/topology.c | 114 +++++++++++++++++++++++++++++------
>>  1 file changed, 96 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 1a2c72f3e7f8..db8d14525cf4 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -17,6 +17,8 @@
>>  #include <linux/cpufreq.h>
>>  #include <linux/init.h>
>>  #include <linux/percpu.h>
>> +#include <linux/sched/isolation.h>
>> +#include <linux/seqlock_types.h>
>>
>>  #include <asm/cpu.h>
>>  #include <asm/cputype.h>
>> @@ -82,20 +84,54 @@ int __init parse_acpi_topology(void)
>>  #undef pr_fmt
>>  #define pr_fmt(fmt) "AMU: " fmt
>>
>> +struct amu_counters {
>> +	seqcount_t	seq;
>> +	unsigned long	last_update;
>> +	u64		core_cnt;
>> +	u64		const_cnt;
>> +	u64		delta_core_cnt;
>> +	u64		delta_const_cnt;
>> +};
>It still might not be necessary to track both last taken sample and deltas from
>previous ones, see[1].
>I could send v3 of [1] and take into account the changes you have suggested here,
>namely the last tick recorded. Otherwise few comments below.

For this specific patch it might suffice to just track the deltas. The
reason for storing the core_cnt/const_cnt values is for the case where
CPU is idle and we are trying to read the read the counters through CPPC
FFH - see patch 3 of this series.

One of the drawbacks of updating/merging just [1] as a standalone patch is
that it doesn't cover idle or isolated CPUs. This patch series accounts
for those cases as well. I'm open to suggestions on how we can make that
happen with [1].

I tested v2 of [1] on AmpereOne system and noticed that there were some
inconsistent measurements reported - see [2]. I think that might be due to
frequency scale not being updated when CPU goes idle.

>> +
>>  /*
>>   * Ensure that amu_scale_freq_tick() will return SCHED_CAPACITY_SCALE until
>>   * the CPU capacity and its associated frequency have been correctly
>>   * initialized.
>>   */
>> -static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =  1UL << (2 * SCHED_CAPACITY_SHIFT);
>> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
>> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =
>> +	1UL << (2 * SCHED_CAPACITY_SHIFT);
>> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_counters, cpu_samples) = {
>> +	.seq = SEQCNT_ZERO(cpu_samples.seq)
>> +};
>>  static cpumask_var_t amu_fie_cpus;
>>
>>  void update_freq_counters_refs(void)
>>  {
>> -	this_cpu_write(arch_core_cycles_prev, read_corecnt());
>> -	this_cpu_write(arch_const_cycles_prev, read_constcnt());
>> +	struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
>> +	u64 core_cnt, const_cnt, delta_core_cnt, delta_const_cnt;
>> +
>> +	const_cnt = read_constcnt();
>> +	core_cnt = read_corecnt();
>> +
>> +	if (unlikely(core_cnt < cpu_sample->core_cnt) ||
>> +	    unlikely(const_cnt < cpu_sample->const_cnt)) {
>> +		WARN(1, "AMU counter values should be monotonic.\n");
>> +		cpu_sample->delta_const_cnt = 0;
>> +		cpu_sample->delta_core_cnt = 0;
>Not sure if zero-ing is really necessary here

I can remove that for the next version.

>> +		return;
>> +	}
>> +
>> +	delta_core_cnt = core_cnt - cpu_sample->core_cnt;
>> +	delta_const_cnt = const_cnt - cpu_sample->const_cnt;
>> +
>> +	cpu_sample->core_cnt = core_cnt;
>> +	cpu_sample->const_cnt = const_cnt;
>> +
>> +	raw_write_seqcount_begin(&cpu_sample->seq);
>> +	cpu_sample->last_update = jiffies;
>> +	cpu_sample->delta_const_cnt = delta_const_cnt;
>> +	cpu_sample->delta_core_cnt = delta_core_cnt;
>> +	raw_write_seqcount_end(&cpu_sample->seq);
>>  }
>>
>>  static inline bool freq_counters_valid(int cpu)
>> @@ -108,8 +144,7 @@ static inline bool freq_counters_valid(int cpu)
>>  		return false;
>>  	}
>>
>> -	if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
>> -		     !per_cpu(arch_core_cycles_prev, cpu))) {
>> +	if (unlikely(per_cpu_ptr(&cpu_samples, cpu) == NULL)) {
>>  		pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
>>  		return false;
>>  	}
>> @@ -152,19 +187,15 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>>
>>  static void amu_scale_freq_tick(void)
>>  {
>> -	u64 prev_core_cnt, prev_const_cnt;
>> -	u64 core_cnt, const_cnt, scale;
>> -
>> -	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
>> -	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
>> +	struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
>> +	u64 delta_core_cnt, delta_const_cnt, scale;
>>
>>  	update_freq_counters_refs();
>>
>> -	const_cnt = this_cpu_read(arch_const_cycles_prev);
>> -	core_cnt = this_cpu_read(arch_core_cycles_prev);
>> +	delta_const_cnt = cpu_sample->delta_const_cnt;
>> +	delta_core_cnt = cpu_sample->delta_core_cnt;
>>
>> -	if (unlikely(core_cnt <= prev_core_cnt ||
>> -		     const_cnt <= prev_const_cnt))
>> +	if ((delta_const_cnt == 0) || (delta_core_cnt == 0))
>>  		return;
>>
>>  	/*
>> @@ -175,15 +206,62 @@ static void amu_scale_freq_tick(void)
>>  	 * See validate_cpu_freq_invariance_counters() for details on
>>  	 * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
>>  	 */
>> -	scale = core_cnt - prev_core_cnt;
>> +	scale = delta_core_cnt;
>>  	scale *= this_cpu_read(arch_max_freq_scale);
>>  	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
>> -			  const_cnt - prev_const_cnt);
>> +			  delta_const_cnt);
>>
>>  	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
>>  	this_cpu_write(arch_freq_scale, (unsigned long)scale);
>>  }
>>
>> +/*
>> + * Discard samples older than the define maximum sample age of 20ms. There
>> + * is no point in sending IPIs in such a case. If the scheduler tick was
>> + * not running then the CPU is either idle or isolated.
>> + */
>> +#define MAX_SAMPLE_AGE	((unsigned long)HZ / 50)
>This depends on the config, so for HZ_1000 it will indeed give 20ms,
>for CONFIG_250 that will be 5ms. It might be better to set it to number of
>expected missed ticks instead ? Or amend the comment.

I think using jiffies/missed ticks is probably better. I'll update for
next version.

>> +
>> +unsigned int arch_freq_get_on_cpu(int cpu)
>> +{
>> +	struct amu_counters *cpu_sample = per_cpu_ptr(&cpu_samples, cpu);
>> +	u64 delta_const_cnt, delta_core_cnt;
>> +	unsigned int seq, freq;
>> +	unsigned long last;
>> +
>> +	if (!freq_counters_valid(cpu))
>> +		goto fallback;
>> +
>> +	do {
>> +		seq = raw_read_seqcount_begin(&cpu_sample->seq);
>> +		last = cpu_sample->last_update;
>> +		delta_core_cnt = cpu_sample->delta_core_cnt;
>> +		delta_const_cnt = cpu_sample->delta_const_cnt;
>> +	} while (read_seqcount_retry(&cpu_sample->seq, seq));
>> +
>This seems to be taken from APERF/MPERF relevant code. Including the comments.

Yes. The idea for this patch series is based on APERF/MPERF which are
quite similar to AMU counters.

>> +	/*
>> +	 * Bail on invalid count and when the last update was too long ago,
>> +	 * which covers idle and NOHZ full CPUs.
>> +	 */
>> +	if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
>Shouldn't the first condition (non-zero increase of cnt_cycles counter)
>disqualify the sample taken altogether ?

I was updating delta_*_cnt values to 0 in one case above. If we just
drop that sample and don't set delta_*_cnt values to 0 we wouldn't need
this check. I will remove that in the next version.

>> +		if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
>> +			goto fallback;
>Not entirely convinced that this condition is what is expected ?
>For housekeeping cpu that is not idle it will still resolve to AMU counters,
>not sure if that's what was intended ?

For a CPU that is not idle my preference is that it uses AMU counters
for frequency measurement. For idle and isolcpus we fallback to the CPPC
mechanism - which could be through MMIO or PCC.

>Also, for cases when given cpufreq policy spans more than a single core, the
>frequency might be queried based on relevant CPU that might have seen the tick
>within specified timeframe (see [1])
>

This would not be ideal. On Ampere systems I've not come across a
cpufreq policy that spans multiple cores, so I overlooked that
configuration.

>> +	}
>> +
>> +	/*
>> +	 * CPU frequency = reference perf (in Hz) * (/\ delivered) / (/\ reference)
>> +	 * AMU reference performance counter increment rate is equal to the rate
>> +	 * of increment of the System counter, CNTPCT_EL0 and can be used to
>> +	 * compute the CPU frequency.
>> +	 */
>> +	return div64_u64((delta_core_cnt * (arch_timer_get_rate() / HZ)),
>/HZ/HZ_PER_KHZ ?
>> +			 delta_const_cnt);
>> +
>> +fallback:
>> +	freq = cpufreq_quick_get(cpu);
>> +	return freq ? freq : cpufreq_get_hw_max_freq(cpu);
>If the arch specific code cannot determine the frequency it should actually make
>it clear by returning '0' instead of trying to patch things up by itself (?)
>

This was following the same logic as APERF/MPERF logic. Returning 0 from
here would result in a call to cpufreq_driver->get() vs a call to
cpufreq_quick_get(). The only difference I can tell is that
cpufreq_quick_get() will call read_lock_irqsave() before calling
cpufreq_driver->get(). I don't have enough knowledge to point out
which one is more appropriate. Following the same logic as the x86
implementation seemed more prudent.

>Overall I'd prefer to revive [1] and amened it accordingly instead.
>

As mentioned earlier, I'm ok with what leads to a coherent solution for
all the configurations we have - isolcpus, idle and active in
housekeeping_cpus.

This issue impacts existing Ampere products. It would be great if we
could arrive at a solution soon.

>---
>[1] https://lore.kernel.org/all/20231127160838.1403404-1-beata.michalska@arm.com/
[2] - https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/

Thanks for the discussion.

Vanshi

>---
>Best Regards
>Beata
>> +}
>> +
>>  static struct scale_freq_data amu_sfd = {
>>  	.source = SCALE_FREQ_SOURCE_ARCH,
>>  	.set_freq_scale = amu_scale_freq_tick,
>> --
>> 2.43.1
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support
  2024-02-29 16:25 ` [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support Vanshidhar Konda
  2024-03-06  8:24   ` Beata Michalska
@ 2024-03-07  3:02   ` lihuisong (C)
  1 sibling, 0 replies; 15+ messages in thread
From: lihuisong (C) @ 2024-03-07  3:02 UTC (permalink / raw)
  To: Vanshidhar Konda, Beata Michalska
  Cc: Ionela Voinescu, linux-kernel, linux-pm, linux-arm-kernel,
	rafael, sumitg, zengheng4, yang, will, sudeep.holla, liuyonglong,
	zhanjie9, linux-acpi

Hi Vanshidhar,

在 2024/3/1 0:25, Vanshidhar Konda 写道:
> AMU counters are used by the Frequency Invariance Engine (FIE) to
> estimate the CPU utilization during each tick. The delta of the AMU
> counters between two ticks can also be used to estimate the average CPU
> frequency of each core over the tick duration. Measure the AMU counters
> during tick, compute the delta and store it. When the frequency of the
> core is queried, use the stored delta to determine the frequency.
>
> arch_freq_get_on_cpu() is used on x86 systems to estimate the frequency
> of each CPU. It can be wired up on arm64 for the same functionality.
>
> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> ---
>   arch/arm64/kernel/topology.c | 114 +++++++++++++++++++++++++++++------
>   1 file changed, 96 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..db8d14525cf4 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -17,6 +17,8 @@
>   #include <linux/cpufreq.h>
>   #include <linux/init.h>
>   #include <linux/percpu.h>
> +#include <linux/sched/isolation.h>
> +#include <linux/seqlock_types.h>
>   
>   #include <asm/cpu.h>
>   #include <asm/cputype.h>
> @@ -82,20 +84,54 @@ int __init parse_acpi_topology(void)
>   #undef pr_fmt
>   #define pr_fmt(fmt) "AMU: " fmt
>   
> +struct amu_counters {
> +	seqcount_t	seq;
> +	unsigned long	last_update;
> +	u64		core_cnt;
> +	u64		const_cnt;
> +	u64		delta_core_cnt;
> +	u64		delta_const_cnt;
> +};
> +
>   /*
>    * Ensure that amu_scale_freq_tick() will return SCHED_CAPACITY_SCALE until
>    * the CPU capacity and its associated frequency have been correctly
>    * initialized.
>    */
> -static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =  1UL << (2 * SCHED_CAPACITY_SHIFT);
> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) =
> +	1UL << (2 * SCHED_CAPACITY_SHIFT);
> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_counters, cpu_samples) = {
> +	.seq = SEQCNT_ZERO(cpu_samples.seq)
> +};
>   static cpumask_var_t amu_fie_cpus;
>   
>   void update_freq_counters_refs(void)
>   {
> -	this_cpu_write(arch_core_cycles_prev, read_corecnt());
> -	this_cpu_write(arch_const_cycles_prev, read_constcnt());
> +	struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
> +	u64 core_cnt, const_cnt, delta_core_cnt, delta_const_cnt;
> +
> +	const_cnt = read_constcnt();
> +	core_cnt = read_corecnt();
> +
> +	if (unlikely(core_cnt < cpu_sample->core_cnt) ||
> +	    unlikely(const_cnt < cpu_sample->const_cnt)) {

The two counter register might be wrap around. So here is not fully 
true, right?

If we don't consider this case, below warning should be removed.

> +		WARN(1, "AMU counter values should be monotonic.\n");
> +		cpu_sample->delta_const_cnt = 0;
> +		cpu_sample->delta_core_cnt = 0;
> +		return;
> +	}
> +
> +	delta_core_cnt = core_cnt - cpu_sample->core_cnt;
> +	delta_const_cnt = const_cnt - cpu_sample->const_cnt;
> +
> +	cpu_sample->core_cnt = core_cnt;
> +	cpu_sample->const_cnt = const_cnt;
> +
> +	raw_write_seqcount_begin(&cpu_sample->seq);
> +	cpu_sample->last_update = jiffies;
> +	cpu_sample->delta_const_cnt = delta_const_cnt;
> +	cpu_sample->delta_core_cnt = delta_core_cnt;
> +	raw_write_seqcount_end(&cpu_sample->seq);
>   }
<...>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle
  2024-02-29 16:25 ` [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle Vanshidhar Konda
@ 2024-03-07  3:17   ` lihuisong (C)
  2024-03-11 18:27     ` Vanshidhar Konda
  0 siblings, 1 reply; 15+ messages in thread
From: lihuisong (C) @ 2024-03-07  3:17 UTC (permalink / raw)
  To: Vanshidhar Konda, Beata Michalska
  Cc: Ionela Voinescu, linux-kernel, linux-pm, linux-arm-kernel,
	rafael, sumitg, zengheng4, yang, will, sudeep.holla, liuyonglong,
	zhanjie9, linux-acpi


在 2024/3/1 0:25, Vanshidhar Konda 写道:
> AMU counters do not increment while a CPU is in idle. Saving the value
> of the core and constant counters prior to invoking WFI allows FIE to
> compute the frequency of a CPU that is idle.
>
> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> ---
>   arch/arm64/kernel/idle.c     | 10 ++++++++++
>   arch/arm64/kernel/topology.c | 14 ++++++++------
>   2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
> index 05cfb347ec26..5ed2e57188a8 100644
> --- a/arch/arm64/kernel/idle.c
> +++ b/arch/arm64/kernel/idle.c
> @@ -26,6 +26,16 @@ void __cpuidle cpu_do_idle(void)
>   
>   	arm_cpuidle_save_irq_context(&context);
>   
> +#ifdef CONFIG_ARM64_AMU_EXTN
> +	/* Update the AMU counters before entering WFI. The cached AMU counter
> +	 * value is used to determine CPU frequency while the CPU is idle
> +	 * without needing to wake up the CPU.
> +	 */
> +
> +	if (cpu_has_amu_feat(smp_processor_id()))
> +		update_freq_counters_refs();
> +#endif
The below point I has mentioned in [1].
This is just for the WFI state.
What about other deeper idle states, like retention and power down?
The path to enter idle state is different for them. We should do this 
for all idle states.

> +
>   	dsb(sy);
>   	wfi();
>   
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index db8d14525cf4..8905eb0c681f 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -240,13 +240,15 @@ unsigned int arch_freq_get_on_cpu(int cpu)
>   	} while (read_seqcount_retry(&cpu_sample->seq, seq));
>   
>   	/*
> -	 * Bail on invalid count and when the last update was too long ago,
> -	 * which covers idle and NOHZ full CPUs.
> +	 * Bail on invalid count and when the last update was too long ago.
> +	 * This covers idle, NOHZ full and isolated CPUs.
> +	 *
> +	 * Idle CPUs don't need to be measured because AMU counters stop
> +	 * incrementing during WFI/WFE.
>   	 */
> -	if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
> -		if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
> -			goto fallback;
> -	}
> +	if (!delta_const_cnt ||
> +	    ((jiffies - last) > MAX_SAMPLE_AGE && !idle_cpu(cpu)))
> +		goto fallback;
>   
>   	/*
>   	 * CPU frequency = reference perf (in Hz) * (/\ delivered) / (/\ reference)
[1] 
https://lore.kernel.org/linux-arm-kernel/20231212072617.14756-1-lihuisong@huawei.com/ 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support
  2024-03-06 22:04     ` Vanshidhar Konda
@ 2024-03-07  3:20       ` lihuisong (C)
  2024-03-07  7:21         ` Beata Michalska
  0 siblings, 1 reply; 15+ messages in thread
From: lihuisong (C) @ 2024-03-07  3:20 UTC (permalink / raw)
  To: Vanshidhar Konda, Beata Michalska
  Cc: Ionela Voinescu, linux-kernel, linux-pm, linux-arm-kernel,
	rafael, sumitg, zengheng4, yang, will, sudeep.holla, liuyonglong,
	zhanjie9, linux-acpi


在 2024/3/7 6:04, Vanshidhar Konda 写道:
> On Wed, Mar 06, 2024 at 09:24:19AM +0100, Beata Michalska wrote:
>>
>> Hi Vanshidhar,
>>
>> On Thu, Feb 29, 2024 at 08:25:13AM -0800, Vanshidhar Konda wrote:
>>> AMU counters are used by the Frequency Invariance Engine (FIE) to
>>> estimate the CPU utilization during each tick. The delta of the AMU
>>> counters between two ticks can also be used to estimate the average CPU
>>> frequency of each core over the tick duration. Measure the AMU counters
>>> during tick, compute the delta and store it. When the frequency of the
>>> core is queried, use the stored delta to determine the frequency.
>>>
>>> arch_freq_get_on_cpu() is used on x86 systems to estimate the frequency
>>> of each CPU. It can be wired up on arm64 for the same functionality.
>>>
>>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>>> ---
>>>  arch/arm64/kernel/topology.c | 114 +++++++++++++++++++++++++++++------
>>>  1 file changed, 96 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c 
>>> b/arch/arm64/kernel/topology.c
>>> index 1a2c72f3e7f8..db8d14525cf4 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -17,6 +17,8 @@
>>>  #include <linux/cpufreq.h>
>>>  #include <linux/init.h>
>>>  #include <linux/percpu.h>
>>> +#include <linux/sched/isolation.h>
>>> +#include <linux/seqlock_types.h>
>>>
>>>  #include <asm/cpu.h>
>>>  #include <asm/cputype.h>
>>> @@ -82,20 +84,54 @@ int __init parse_acpi_topology(void)
>>>  #undef pr_fmt
>>>  #define pr_fmt(fmt) "AMU: " fmt
>>>
>>> +struct amu_counters {
>>> +    seqcount_t    seq;
>>> +    unsigned long    last_update;
>>> +    u64        core_cnt;
>>> +    u64        const_cnt;
>>> +    u64        delta_core_cnt;
>>> +    u64        delta_const_cnt;
>>> +};
>> It still might not be necessary to track both last taken sample and 
>> deltas from
>> previous ones, see[1].
>> I could send v3 of [1] and take into account the changes you have 
>> suggested here,
>> namely the last tick recorded. Otherwise few comments below.
>
> For this specific patch it might suffice to just track the deltas. The
> reason for storing the core_cnt/const_cnt values is for the case where
> CPU is idle and we are trying to read the read the counters through CPPC
> FFH - see patch 3 of this series.
>
> One of the drawbacks of updating/merging just [1] as a standalone 
> patch is
> that it doesn't cover idle or isolated CPUs. This patch series accounts
> for those cases as well. I'm open to suggestions on how we can make that
> happen with [1].
>
> I tested v2 of [1] on AmpereOne system and noticed that there were some
> inconsistent measurements reported - see [2]. I think that might be 
> due to
> frequency scale not being updated when CPU goes idle.
>
>>> +
>>>  /*
>>>   * Ensure that amu_scale_freq_tick() will return 
>>> SCHED_CAPACITY_SCALE until
>>>   * the CPU capacity and its associated frequency have been correctly
>>>   * initialized.
>>>   */
>>> -static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, 
>>> arch_max_freq_scale) =  1UL << (2 * SCHED_CAPACITY_SHIFT);
>>> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>>> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
>>> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, 
>>> arch_max_freq_scale) =
>>> +    1UL << (2 * SCHED_CAPACITY_SHIFT);
>>> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_counters, 
>>> cpu_samples) = {
>>> +    .seq = SEQCNT_ZERO(cpu_samples.seq)
>>> +};
>>>  static cpumask_var_t amu_fie_cpus;
>>>
>>>  void update_freq_counters_refs(void)
>>>  {
>>> -    this_cpu_write(arch_core_cycles_prev, read_corecnt());
>>> -    this_cpu_write(arch_const_cycles_prev, read_constcnt());
>>> +    struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
>>> +    u64 core_cnt, const_cnt, delta_core_cnt, delta_const_cnt;
>>> +
>>> +    const_cnt = read_constcnt();
>>> +    core_cnt = read_corecnt();
>>> +
>>> +    if (unlikely(core_cnt < cpu_sample->core_cnt) ||
>>> +        unlikely(const_cnt < cpu_sample->const_cnt)) {
>>> +        WARN(1, "AMU counter values should be monotonic.\n");
>>> +        cpu_sample->delta_const_cnt = 0;
>>> +        cpu_sample->delta_core_cnt = 0;
>> Not sure if zero-ing is really necessary here
>
> I can remove that for the next version.
>
>>> +        return;
>>> +    }
>>> +
>>> +    delta_core_cnt = core_cnt - cpu_sample->core_cnt;
>>> +    delta_const_cnt = const_cnt - cpu_sample->const_cnt;
>>> +
>>> +    cpu_sample->core_cnt = core_cnt;
>>> +    cpu_sample->const_cnt = const_cnt;
>>> +
>>> +    raw_write_seqcount_begin(&cpu_sample->seq);
>>> +    cpu_sample->last_update = jiffies;
>>> +    cpu_sample->delta_const_cnt = delta_const_cnt;
>>> +    cpu_sample->delta_core_cnt = delta_core_cnt;
>>> +    raw_write_seqcount_end(&cpu_sample->seq);
>>>  }
>>>
>>>  static inline bool freq_counters_valid(int cpu)
>>> @@ -108,8 +144,7 @@ static inline bool freq_counters_valid(int cpu)
>>>          return false;
>>>      }
>>>
>>> -    if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
>>> -             !per_cpu(arch_core_cycles_prev, cpu))) {
>>> +    if (unlikely(per_cpu_ptr(&cpu_samples, cpu) == NULL)) {
>>>          pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
>>>          return false;
>>>      }
>>> @@ -152,19 +187,15 @@ void freq_inv_set_max_ratio(int cpu, u64 
>>> max_rate)
>>>
>>>  static void amu_scale_freq_tick(void)
>>>  {
>>> -    u64 prev_core_cnt, prev_const_cnt;
>>> -    u64 core_cnt, const_cnt, scale;
>>> -
>>> -    prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
>>> -    prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
>>> +    struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
>>> +    u64 delta_core_cnt, delta_const_cnt, scale;
>>>
>>>      update_freq_counters_refs();
>>>
>>> -    const_cnt = this_cpu_read(arch_const_cycles_prev);
>>> -    core_cnt = this_cpu_read(arch_core_cycles_prev);
>>> +    delta_const_cnt = cpu_sample->delta_const_cnt;
>>> +    delta_core_cnt = cpu_sample->delta_core_cnt;
>>>
>>> -    if (unlikely(core_cnt <= prev_core_cnt ||
>>> -             const_cnt <= prev_const_cnt))
>>> +    if ((delta_const_cnt == 0) || (delta_core_cnt == 0))
>>>          return;
>>>
>>>      /*
>>> @@ -175,15 +206,62 @@ static void amu_scale_freq_tick(void)
>>>       * See validate_cpu_freq_invariance_counters() for details on
>>>       * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
>>>       */
>>> -    scale = core_cnt - prev_core_cnt;
>>> +    scale = delta_core_cnt;
>>>      scale *= this_cpu_read(arch_max_freq_scale);
>>>      scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
>>> -              const_cnt - prev_const_cnt);
>>> +              delta_const_cnt);
>>>
>>>      scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
>>>      this_cpu_write(arch_freq_scale, (unsigned long)scale);
>>>  }
>>>
>>> +/*
>>> + * Discard samples older than the define maximum sample age of 
>>> 20ms. There
>>> + * is no point in sending IPIs in such a case. If the scheduler 
>>> tick was
>>> + * not running then the CPU is either idle or isolated.
>>> + */
>>> +#define MAX_SAMPLE_AGE    ((unsigned long)HZ / 50)
>> This depends on the config, so for HZ_1000 it will indeed give 20ms,
>> for CONFIG_250 that will be 5ms. It might be better to set it to 
>> number of
>> expected missed ticks instead ? Or amend the comment.
>
> I think using jiffies/missed ticks is probably better. I'll update for
> next version.
>
>>> +
>>> +unsigned int arch_freq_get_on_cpu(int cpu)
>>> +{
>>> +    struct amu_counters *cpu_sample = per_cpu_ptr(&cpu_samples, cpu);
>>> +    u64 delta_const_cnt, delta_core_cnt;
>>> +    unsigned int seq, freq;
>>> +    unsigned long last;
>>> +
>>> +    if (!freq_counters_valid(cpu))
>>> +        goto fallback;
>>> +
>>> +    do {
>>> +        seq = raw_read_seqcount_begin(&cpu_sample->seq);
>>> +        last = cpu_sample->last_update;
>>> +        delta_core_cnt = cpu_sample->delta_core_cnt;
>>> +        delta_const_cnt = cpu_sample->delta_const_cnt;
>>> +    } while (read_seqcount_retry(&cpu_sample->seq, seq));
>>> +
>> This seems to be taken from APERF/MPERF relevant code. Including the 
>> comments.
>
> Yes. The idea for this patch series is based on APERF/MPERF which are
> quite similar to AMU counters.
>
>>> +    /*
>>> +     * Bail on invalid count and when the last update was too long 
>>> ago,
>>> +     * which covers idle and NOHZ full CPUs.
>>> +     */
>>> +    if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
>> Shouldn't the first condition (non-zero increase of cnt_cycles counter)
>> disqualify the sample taken altogether ?
>
> I was updating delta_*_cnt values to 0 in one case above. If we just
> drop that sample and don't set delta_*_cnt values to 0 we wouldn't need
> this check. I will remove that in the next version.
>
>>> +        if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
>>> +            goto fallback;
>> Not entirely convinced that this condition is what is expected ?
>> For housekeeping cpu that is not idle it will still resolve to AMU 
>> counters,
>> not sure if that's what was intended ?
>
> For a CPU that is not idle my preference is that it uses AMU counters
> for frequency measurement. For idle and isolcpus we fallback to the CPPC
> mechanism - which could be through MMIO or PCC.
>
>> Also, for cases when given cpufreq policy spans more than a single 
>> core, the
>> frequency might be queried based on relevant CPU that might have seen 
>> the tick
>> within specified timeframe (see [1])
>>
>
> This would not be ideal. On Ampere systems I've not come across a
> cpufreq policy that spans multiple cores, so I overlooked that
> configuration.
>
>>> +    }
>>> +
>>> +    /*
>>> +     * CPU frequency = reference perf (in Hz) * (/\ delivered) / 
>>> (/\ reference)
>>> +     * AMU reference performance counter increment rate is equal to 
>>> the rate
>>> +     * of increment of the System counter, CNTPCT_EL0 and can be 
>>> used to
>>> +     * compute the CPU frequency.
>>> +     */
>>> +    return div64_u64((delta_core_cnt * (arch_timer_get_rate() / HZ)),
>> /HZ/HZ_PER_KHZ ?
>>> +             delta_const_cnt);
>>> +
>>> +fallback:
>>> +    freq = cpufreq_quick_get(cpu);
>>> +    return freq ? freq : cpufreq_get_hw_max_freq(cpu);
>> If the arch specific code cannot determine the frequency it should 
>> actually make
>> it clear by returning '0' instead of trying to patch things up by 
>> itself (?)
>>
>
> This was following the same logic as APERF/MPERF logic. Returning 0 from
> here would result in a call to cpufreq_driver->get() vs a call to
> cpufreq_quick_get(). The only difference I can tell is that
> cpufreq_quick_get() will call read_lock_irqsave() before calling
> cpufreq_driver->get(). I don't have enough knowledge to point out
> which one is more appropriate. Following the same logic as the x86
> implementation seemed more prudent.
>
>> Overall I'd prefer to revive [1] and amened it accordingly instead.
>>
>
> As mentioned earlier, I'm ok with what leads to a coherent solution for
> all the configurations we have - isolcpus, idle and active in
> housekeeping_cpus.
>
> This issue impacts existing Ampere products. It would be great if we
> could arrive at a solution soon.
+1 to have a solution soon, very expecting. @Vanshi and @Beata.
>
>> ---
>> [1] 
>> https://lore.kernel.org/all/20231127160838.1403404-1-beata.michalska@arm.com/
> [2] - 
> https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
>
> Thanks for the discussion.
>
> Vanshi
>
>> ---
>> Best Regards
>> Beata
>>> +}
>>> +
>>>  static struct scale_freq_data amu_sfd = {
>>>      .source = SCALE_FREQ_SOURCE_ARCH,
>>>      .set_freq_scale = amu_scale_freq_tick,
>>> -- 
>>> 2.43.1
>>>
> .

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support
  2024-03-07  3:20       ` lihuisong (C)
@ 2024-03-07  7:21         ` Beata Michalska
  0 siblings, 0 replies; 15+ messages in thread
From: Beata Michalska @ 2024-03-07  7:21 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Vanshidhar Konda, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

On Thu, Mar 07, 2024 at 11:20:46AM +0800, lihuisong (C) wrote:
> 
> 在 2024/3/7 6:04, Vanshidhar Konda 写道:
> > On Wed, Mar 06, 2024 at 09:24:19AM +0100, Beata Michalska wrote:
> > > 
> > > Hi Vanshidhar,
> > > 
> > > On Thu, Feb 29, 2024 at 08:25:13AM -0800, Vanshidhar Konda wrote:
> > > > AMU counters are used by the Frequency Invariance Engine (FIE) to
> > > > estimate the CPU utilization during each tick. The delta of the AMU
> > > > counters between two ticks can also be used to estimate the average CPU
> > > > frequency of each core over the tick duration. Measure the AMU counters
> > > > during tick, compute the delta and store it. When the frequency of the
> > > > core is queried, use the stored delta to determine the frequency.
> > > > 
> > > > arch_freq_get_on_cpu() is used on x86 systems to estimate the frequency
> > > > of each CPU. It can be wired up on arm64 for the same functionality.
> > > > 
> > > > Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> > > > ---
> > > >  arch/arm64/kernel/topology.c | 114 +++++++++++++++++++++++++++++------
> > > >  1 file changed, 96 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/topology.c
> > > > b/arch/arm64/kernel/topology.c
> > > > index 1a2c72f3e7f8..db8d14525cf4 100644
> > > > --- a/arch/arm64/kernel/topology.c
> > > > +++ b/arch/arm64/kernel/topology.c
> > > > @@ -17,6 +17,8 @@
> > > >  #include <linux/cpufreq.h>
> > > >  #include <linux/init.h>
> > > >  #include <linux/percpu.h>
> > > > +#include <linux/sched/isolation.h>
> > > > +#include <linux/seqlock_types.h>
> > > > 
> > > >  #include <asm/cpu.h>
> > > >  #include <asm/cputype.h>
> > > > @@ -82,20 +84,54 @@ int __init parse_acpi_topology(void)
> > > >  #undef pr_fmt
> > > >  #define pr_fmt(fmt) "AMU: " fmt
> > > > 
> > > > +struct amu_counters {
> > > > +    seqcount_t    seq;
> > > > +    unsigned long    last_update;
> > > > +    u64        core_cnt;
> > > > +    u64        const_cnt;
> > > > +    u64        delta_core_cnt;
> > > > +    u64        delta_const_cnt;
> > > > +};
> > > It still might not be necessary to track both last taken sample and
> > > deltas from
> > > previous ones, see[1].
> > > I could send v3 of [1] and take into account the changes you have
> > > suggested here,
> > > namely the last tick recorded. Otherwise few comments below.
> > 
> > For this specific patch it might suffice to just track the deltas. The
> > reason for storing the core_cnt/const_cnt values is for the case where
> > CPU is idle and we are trying to read the read the counters through CPPC
> > FFH - see patch 3 of this series.
> > 
> > One of the drawbacks of updating/merging just [1] as a standalone patch
> > is
> > that it doesn't cover idle or isolated CPUs. This patch series accounts
> > for those cases as well. I'm open to suggestions on how we can make that
> > happen with [1].
> > 
> > I tested v2 of [1] on AmpereOne system and noticed that there were some
> > inconsistent measurements reported - see [2]. I think that might be due
> > to
> > frequency scale not being updated when CPU goes idle.
> > 
> > > > +
> > > >  /*
> > > >   * Ensure that amu_scale_freq_tick() will return
> > > > SCHED_CAPACITY_SCALE until
> > > >   * the CPU capacity and its associated frequency have been correctly
> > > >   * initialized.
> > > >   */
> > > > -static DEFINE_PER_CPU_READ_MOSTLY(unsigned long,
> > > > arch_max_freq_scale) =  1UL << (2 * SCHED_CAPACITY_SHIFT);
> > > > -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> > > > -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> > > > +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long,
> > > > arch_max_freq_scale) =
> > > > +    1UL << (2 * SCHED_CAPACITY_SHIFT);
> > > > +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_counters,
> > > > cpu_samples) = {
> > > > +    .seq = SEQCNT_ZERO(cpu_samples.seq)
> > > > +};
> > > >  static cpumask_var_t amu_fie_cpus;
> > > > 
> > > >  void update_freq_counters_refs(void)
> > > >  {
> > > > -    this_cpu_write(arch_core_cycles_prev, read_corecnt());
> > > > -    this_cpu_write(arch_const_cycles_prev, read_constcnt());
> > > > +    struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
> > > > +    u64 core_cnt, const_cnt, delta_core_cnt, delta_const_cnt;
> > > > +
> > > > +    const_cnt = read_constcnt();
> > > > +    core_cnt = read_corecnt();
> > > > +
> > > > +    if (unlikely(core_cnt < cpu_sample->core_cnt) ||
> > > > +        unlikely(const_cnt < cpu_sample->const_cnt)) {
> > > > +        WARN(1, "AMU counter values should be monotonic.\n");
> > > > +        cpu_sample->delta_const_cnt = 0;
> > > > +        cpu_sample->delta_core_cnt = 0;
> > > Not sure if zero-ing is really necessary here
> > 
> > I can remove that for the next version.
> > 
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    delta_core_cnt = core_cnt - cpu_sample->core_cnt;
> > > > +    delta_const_cnt = const_cnt - cpu_sample->const_cnt;
> > > > +
> > > > +    cpu_sample->core_cnt = core_cnt;
> > > > +    cpu_sample->const_cnt = const_cnt;
> > > > +
> > > > +    raw_write_seqcount_begin(&cpu_sample->seq);
> > > > +    cpu_sample->last_update = jiffies;
> > > > +    cpu_sample->delta_const_cnt = delta_const_cnt;
> > > > +    cpu_sample->delta_core_cnt = delta_core_cnt;
> > > > +    raw_write_seqcount_end(&cpu_sample->seq);
> > > >  }
> > > > 
> > > >  static inline bool freq_counters_valid(int cpu)
> > > > @@ -108,8 +144,7 @@ static inline bool freq_counters_valid(int cpu)
> > > >          return false;
> > > >      }
> > > > 
> > > > -    if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> > > > -             !per_cpu(arch_core_cycles_prev, cpu))) {
> > > > +    if (unlikely(per_cpu_ptr(&cpu_samples, cpu) == NULL)) {
> > > >          pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> > > >          return false;
> > > >      }
> > > > @@ -152,19 +187,15 @@ void freq_inv_set_max_ratio(int cpu, u64
> > > > max_rate)
> > > > 
> > > >  static void amu_scale_freq_tick(void)
> > > >  {
> > > > -    u64 prev_core_cnt, prev_const_cnt;
> > > > -    u64 core_cnt, const_cnt, scale;
> > > > -
> > > > -    prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> > > > -    prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> > > > +    struct amu_counters *cpu_sample = this_cpu_ptr(&cpu_samples);
> > > > +    u64 delta_core_cnt, delta_const_cnt, scale;
> > > > 
> > > >      update_freq_counters_refs();
> > > > 
> > > > -    const_cnt = this_cpu_read(arch_const_cycles_prev);
> > > > -    core_cnt = this_cpu_read(arch_core_cycles_prev);
> > > > +    delta_const_cnt = cpu_sample->delta_const_cnt;
> > > > +    delta_core_cnt = cpu_sample->delta_core_cnt;
> > > > 
> > > > -    if (unlikely(core_cnt <= prev_core_cnt ||
> > > > -             const_cnt <= prev_const_cnt))
> > > > +    if ((delta_const_cnt == 0) || (delta_core_cnt == 0))
> > > >          return;
> > > > 
> > > >      /*
> > > > @@ -175,15 +206,62 @@ static void amu_scale_freq_tick(void)
> > > >       * See validate_cpu_freq_invariance_counters() for details on
> > > >       * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
> > > >       */
> > > > -    scale = core_cnt - prev_core_cnt;
> > > > +    scale = delta_core_cnt;
> > > >      scale *= this_cpu_read(arch_max_freq_scale);
> > > >      scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
> > > > -              const_cnt - prev_const_cnt);
> > > > +              delta_const_cnt);
> > > > 
> > > >      scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> > > >      this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > > >  }
> > > > 
> > > > +/*
> > > > + * Discard samples older than the define maximum sample age of
> > > > 20ms. There
> > > > + * is no point in sending IPIs in such a case. If the scheduler
> > > > tick was
> > > > + * not running then the CPU is either idle or isolated.
> > > > + */
> > > > +#define MAX_SAMPLE_AGE    ((unsigned long)HZ / 50)
> > > This depends on the config, so for HZ_1000 it will indeed give 20ms,
> > > for CONFIG_250 that will be 5ms. It might be better to set it to
> > > number of
> > > expected missed ticks instead ? Or amend the comment.
> > 
> > I think using jiffies/missed ticks is probably better. I'll update for
> > next version.
> > 
> > > > +
> > > > +unsigned int arch_freq_get_on_cpu(int cpu)
> > > > +{
> > > > +    struct amu_counters *cpu_sample = per_cpu_ptr(&cpu_samples, cpu);
> > > > +    u64 delta_const_cnt, delta_core_cnt;
> > > > +    unsigned int seq, freq;
> > > > +    unsigned long last;
> > > > +
> > > > +    if (!freq_counters_valid(cpu))
> > > > +        goto fallback;
> > > > +
> > > > +    do {
> > > > +        seq = raw_read_seqcount_begin(&cpu_sample->seq);
> > > > +        last = cpu_sample->last_update;
> > > > +        delta_core_cnt = cpu_sample->delta_core_cnt;
> > > > +        delta_const_cnt = cpu_sample->delta_const_cnt;
> > > > +    } while (read_seqcount_retry(&cpu_sample->seq, seq));
> > > > +
> > > This seems to be taken from APERF/MPERF relevant code. Including the
> > > comments.
> > 
> > Yes. The idea for this patch series is based on APERF/MPERF which are
> > quite similar to AMU counters.
> > 
> > > > +    /*
> > > > +     * Bail on invalid count and when the last update was too
> > > > long ago,
> > > > +     * which covers idle and NOHZ full CPUs.
> > > > +     */
> > > > +    if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
> > > Shouldn't the first condition (non-zero increase of cnt_cycles counter)
> > > disqualify the sample taken altogether ?
> > 
> > I was updating delta_*_cnt values to 0 in one case above. If we just
> > drop that sample and don't set delta_*_cnt values to 0 we wouldn't need
> > this check. I will remove that in the next version.
> > 
> > > > +        if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
> > > > +            goto fallback;
> > > Not entirely convinced that this condition is what is expected ?
> > > For housekeeping cpu that is not idle it will still resolve to AMU
> > > counters,
> > > not sure if that's what was intended ?
> > 
> > For a CPU that is not idle my preference is that it uses AMU counters
> > for frequency measurement. For idle and isolcpus we fallback to the CPPC
> > mechanism - which could be through MMIO or PCC.
> > 
> > > Also, for cases when given cpufreq policy spans more than a single
> > > core, the
> > > frequency might be queried based on relevant CPU that might have
> > > seen the tick
> > > within specified timeframe (see [1])
> > > 
> > 
> > This would not be ideal. On Ampere systems I've not come across a
> > cpufreq policy that spans multiple cores, so I overlooked that
> > configuration.
> > 
> > > > +    }
> > > > +
> > > > +    /*
> > > > +     * CPU frequency = reference perf (in Hz) * (/\ delivered)
> > > > / (/\ reference)
> > > > +     * AMU reference performance counter increment rate is
> > > > equal to the rate
> > > > +     * of increment of the System counter, CNTPCT_EL0 and can
> > > > be used to
> > > > +     * compute the CPU frequency.
> > > > +     */
> > > > +    return div64_u64((delta_core_cnt * (arch_timer_get_rate() / HZ)),
> > > /HZ/HZ_PER_KHZ ?
> > > > +             delta_const_cnt);
> > > > +
> > > > +fallback:
> > > > +    freq = cpufreq_quick_get(cpu);
> > > > +    return freq ? freq : cpufreq_get_hw_max_freq(cpu);
> > > If the arch specific code cannot determine the frequency it should
> > > actually make
> > > it clear by returning '0' instead of trying to patch things up by
> > > itself (?)
> > > 
> > 
> > This was following the same logic as APERF/MPERF logic. Returning 0 from
> > here would result in a call to cpufreq_driver->get() vs a call to
> > cpufreq_quick_get(). The only difference I can tell is that
> > cpufreq_quick_get() will call read_lock_irqsave() before calling
> > cpufreq_driver->get(). I don't have enough knowledge to point out
> > which one is more appropriate. Following the same logic as the x86
> > implementation seemed more prudent.
> > 
> > > Overall I'd prefer to revive [1] and amened it accordingly instead.
> > > 
> > 
> > As mentioned earlier, I'm ok with what leads to a coherent solution for
> > all the configurations we have - isolcpus, idle and active in
> > housekeeping_cpus.
> > 
> > This issue impacts existing Ampere products. It would be great if we
> > could arrive at a solution soon.
> +1 to have a solution soon, very expecting. @Vanshi and @Beata.

Understood.
Will prepare new version - should be out by Monday.
Apologies for delays on my side.

---
BR
Beata
> > 
> > > ---
> > > [1] https://lore.kernel.org/all/20231127160838.1403404-1-beata.michalska@arm.com/
> > [2] - https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
> > 
> > Thanks for the discussion.
> > 
> > Vanshi
> > 
> > > ---
> > > Best Regards
> > > Beata
> > > > +}
> > > > +
> > > >  static struct scale_freq_data amu_sfd = {
> > > >      .source = SCALE_FREQ_SOURCE_ARCH,
> > > >      .set_freq_scale = amu_scale_freq_tick,
> > > > -- 
> > > > 2.43.1
> > > > 
> > .

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle
  2024-03-07  3:17   ` lihuisong (C)
@ 2024-03-11 18:27     ` Vanshidhar Konda
  2024-03-12  8:44       ` Beata Michalska
  0 siblings, 1 reply; 15+ messages in thread
From: Vanshidhar Konda @ 2024-03-11 18:27 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Beata Michalska, Ionela Voinescu, linux-kernel, linux-pm,
	linux-arm-kernel, rafael, sumitg, zengheng4, yang, will,
	sudeep.holla, liuyonglong, zhanjie9, linux-acpi

On Thu, Mar 07, 2024 at 11:17:26AM +0800, lihuisong (C) wrote:
>
>在 2024/3/1 0:25, Vanshidhar Konda 写道:
>>AMU counters do not increment while a CPU is in idle. Saving the value
>>of the core and constant counters prior to invoking WFI allows FIE to
>>compute the frequency of a CPU that is idle.
>>
>>Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>>---
>>  arch/arm64/kernel/idle.c     | 10 ++++++++++
>>  arch/arm64/kernel/topology.c | 14 ++++++++------
>>  2 files changed, 18 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
>>index 05cfb347ec26..5ed2e57188a8 100644
>>--- a/arch/arm64/kernel/idle.c
>>+++ b/arch/arm64/kernel/idle.c
>>@@ -26,6 +26,16 @@ void __cpuidle cpu_do_idle(void)
>>  	arm_cpuidle_save_irq_context(&context);
>>+#ifdef CONFIG_ARM64_AMU_EXTN
>>+	/* Update the AMU counters before entering WFI. The cached AMU counter
>>+	 * value is used to determine CPU frequency while the CPU is idle
>>+	 * without needing to wake up the CPU.
>>+	 */
>>+
>>+	if (cpu_has_amu_feat(smp_processor_id()))
>>+		update_freq_counters_refs();
>>+#endif
>The below point I has mentioned in [1].
>This is just for the WFI state.
>What about other deeper idle states, like retention and power down?
>The path to enter idle state is different for them. We should do this 
>for all idle states.
>

Yes. That makes sense. I'll account for them in the next version of the
patch. I'll work on the next version of the patch based on the updated
patch from @Beata.

Thanks,
Vanshi

>>+
>>  	dsb(sy);
>>  	wfi();
>>diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>index db8d14525cf4..8905eb0c681f 100644
>>--- a/arch/arm64/kernel/topology.c
>>+++ b/arch/arm64/kernel/topology.c
>>@@ -240,13 +240,15 @@ unsigned int arch_freq_get_on_cpu(int cpu)
>>  	} while (read_seqcount_retry(&cpu_sample->seq, seq));
>>  	/*
>>-	 * Bail on invalid count and when the last update was too long ago,
>>-	 * which covers idle and NOHZ full CPUs.
>>+	 * Bail on invalid count and when the last update was too long ago.
>>+	 * This covers idle, NOHZ full and isolated CPUs.
>>+	 *
>>+	 * Idle CPUs don't need to be measured because AMU counters stop
>>+	 * incrementing during WFI/WFE.
>>  	 */
>>-	if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
>>-		if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
>>-			goto fallback;
>>-	}
>>+	if (!delta_const_cnt ||
>>+	    ((jiffies - last) > MAX_SAMPLE_AGE && !idle_cpu(cpu)))
>>+		goto fallback;
>>  	/*
>>  	 * CPU frequency = reference perf (in Hz) * (/\ delivered) / (/\ reference)
>[1] https://lore.kernel.org/linux-arm-kernel/20231212072617.14756-1-lihuisong@huawei.com/
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle
  2024-03-11 18:27     ` Vanshidhar Konda
@ 2024-03-12  8:44       ` Beata Michalska
  0 siblings, 0 replies; 15+ messages in thread
From: Beata Michalska @ 2024-03-12  8:44 UTC (permalink / raw)
  To: Vanshidhar Konda
  Cc: lihuisong (C),
	Ionela Voinescu, linux-kernel, linux-pm, linux-arm-kernel,
	rafael, sumitg, zengheng4, yang, will, sudeep.holla, liuyonglong,
	zhanjie9, linux-acpi

On Mon, Mar 11, 2024 at 11:27:27AM -0700, Vanshidhar Konda wrote:
> On Thu, Mar 07, 2024 at 11:17:26AM +0800, lihuisong (C) wrote:
> > 
> > 在 2024/3/1 0:25, Vanshidhar Konda 写道:
> > > AMU counters do not increment while a CPU is in idle. Saving the value
> > > of the core and constant counters prior to invoking WFI allows FIE to
> > > compute the frequency of a CPU that is idle.
> > > 
> > > Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> > > ---
> > >  arch/arm64/kernel/idle.c     | 10 ++++++++++
> > >  arch/arm64/kernel/topology.c | 14 ++++++++------
> > >  2 files changed, 18 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
> > > index 05cfb347ec26..5ed2e57188a8 100644
> > > --- a/arch/arm64/kernel/idle.c
> > > +++ b/arch/arm64/kernel/idle.c
> > > @@ -26,6 +26,16 @@ void __cpuidle cpu_do_idle(void)
> > >  	arm_cpuidle_save_irq_context(&context);
> > > +#ifdef CONFIG_ARM64_AMU_EXTN
> > > +	/* Update the AMU counters before entering WFI. The cached AMU counter
> > > +	 * value is used to determine CPU frequency while the CPU is idle
> > > +	 * without needing to wake up the CPU.
> > > +	 */
> > > +
> > > +	if (cpu_has_amu_feat(smp_processor_id()))
> > > +		update_freq_counters_refs();
> > > +#endif
> > The below point I has mentioned in [1].
> > This is just for the WFI state.
> > What about other deeper idle states, like retention and power down?
> > The path to enter idle state is different for them. We should do this
> > for all idle states.
> > 
> 
> Yes. That makes sense. I'll account for them in the next version of the
> patch. I'll work on the next version of the patch based on the updated
> patch from @Beata.
> 
This should now be covered by [1]

---
[1]https://lore.kernel.org/all/20240312083431.3239989-4-beata.michalska@arm.com/

---
BR
Beata

> Thanks,
> Vanshi
> 
> > > +
> > >  	dsb(sy);
> > >  	wfi();
> > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > > index db8d14525cf4..8905eb0c681f 100644
> > > --- a/arch/arm64/kernel/topology.c
> > > +++ b/arch/arm64/kernel/topology.c
> > > @@ -240,13 +240,15 @@ unsigned int arch_freq_get_on_cpu(int cpu)
> > >  	} while (read_seqcount_retry(&cpu_sample->seq, seq));
> > >  	/*
> > > -	 * Bail on invalid count and when the last update was too long ago,
> > > -	 * which covers idle and NOHZ full CPUs.
> > > +	 * Bail on invalid count and when the last update was too long ago.
> > > +	 * This covers idle, NOHZ full and isolated CPUs.
> > > +	 *
> > > +	 * Idle CPUs don't need to be measured because AMU counters stop
> > > +	 * incrementing during WFI/WFE.
> > >  	 */
> > > -	if (!delta_const_cnt || ((jiffies - last) > MAX_SAMPLE_AGE)) {
> > > -		if (!(housekeeping_cpu(cpu, HK_TYPE_TICK) && idle_cpu(cpu)))
> > > -			goto fallback;
> > > -	}
> > > +	if (!delta_const_cnt ||
> > > +	    ((jiffies - last) > MAX_SAMPLE_AGE && !idle_cpu(cpu)))
> > > +		goto fallback;
> > >  	/*
> > >  	 * CPU frequency = reference perf (in Hz) * (/\ delivered) / (/\ reference)
> > [1] https://lore.kernel.org/linux-arm-kernel/20231212072617.14756-1-lihuisong@huawei.com/
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-03-12  8:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29 16:25 [PATCH v1 0/3] arm64: Use AMU counters for measuring CPU frequency Vanshidhar Konda
2024-02-29 16:25 ` [PATCH v1 1/3] arm64: topology: Add arch_freq_get_on_cpu() support Vanshidhar Konda
2024-03-06  8:24   ` Beata Michalska
2024-03-06 22:04     ` Vanshidhar Konda
2024-03-07  3:20       ` lihuisong (C)
2024-03-07  7:21         ` Beata Michalska
2024-03-07  3:02   ` lihuisong (C)
2024-02-29 16:25 ` [PATCH v1 2/3] arm64: idle: Cache AMU counters before entering idle Vanshidhar Konda
2024-03-07  3:17   ` lihuisong (C)
2024-03-11 18:27     ` Vanshidhar Konda
2024-03-12  8:44       ` Beata Michalska
2024-02-29 16:25 ` [PATCH v1 3/3] ACPI: CPPC: Read CPC FFH counters in a single IPI Vanshidhar Konda
2024-02-29 17:32   ` Rafael J. Wysocki
2024-02-29 18:00     ` Vanshidhar Konda
2024-02-29 18:02       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).