linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] CPUFreq statistics retrieved by drivers
@ 2020-07-29 15:12 Lukasz Luba
  2020-07-29 15:12 ` [PATCH 1/4] cpufreq: Add support for statistics read from drivers Lukasz Luba
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Lukasz Luba @ 2020-07-29 15:12 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm
  Cc: sudeep.holla, cristian.marussi, viresh.kumar, lukasz.luba, rjw

Hi all,

The existing CPUFreq framework does not tracks the statistics when the
'fast switch' is used or when firmware changes the frequency independently
due to e.g. thermal reasons. However, the firmware might track the frequency
changes and expose this to the kernel.

This patch set aims to introduce CPUfreq statistics gathered by firmware
and retrieved by CPUFreq driver. It would require a new API functions
in the CPUFreq, which allows to poke drivers to get these stats.

The needed CPUFreq infrastructure is in patch 1/4, patch 2/4 extends
ARM SCMI protocol layer, patches 3/4, 4/4  modify ARM SCMI CPUFreq driver.

Regards,
Lukasz Luba

Lukasz Luba (4):
  cpufreq: Add support for statistics read from drivers
  scmi: perf: Extend protocol to support performance statistics
  cpufreq: scmi: Move scmi_cpufreq_driver structure to the top
  cpufreq: scmi: Read statistics from FW shared memory

 drivers/cpufreq/cpufreq.c        |  22 ++++
 drivers/cpufreq/cpufreq_stats.c  |  38 +++---
 drivers/cpufreq/scmi-cpufreq.c   | 116 ++++++++++++++---
 drivers/firmware/arm_scmi/perf.c | 210 +++++++++++++++++++++++++++++++
 include/linux/cpufreq.h          |  32 +++++
 include/linux/scmi_protocol.h    |  11 ++
 6 files changed, 401 insertions(+), 28 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] cpufreq: Add support for statistics read from drivers
  2020-07-29 15:12 [PATCH 0/4] CPUFreq statistics retrieved by drivers Lukasz Luba
@ 2020-07-29 15:12 ` Lukasz Luba
  2020-07-29 15:12 ` [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics Lukasz Luba
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Lukasz Luba @ 2020-07-29 15:12 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm
  Cc: sudeep.holla, cristian.marussi, viresh.kumar, lukasz.luba, rjw

The cpufreq statistics are not collected when the 'fast switch' is in use.
The information can be retrieved from firmware by the cpufreq driver, but
the cpufreq interfaces must be extended in order to support it.

Introduce callback in cpufreq driver structure and additional flag in
cpufreq policy to provide support for statistics maintained in firmware.

Since the cpufreq driver structure can be shared by many policy objects,
the flag 'has_driver_stats' in policy makes sure that the framework will
only use the statistics from cpufreq driver for actually supported CPUs.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/cpufreq.c       | 22 +++++++++++++++++++
 drivers/cpufreq/cpufreq_stats.c | 38 +++++++++++++++++++++------------
 include/linux/cpufreq.h         | 32 +++++++++++++++++++++++++++
 3 files changed, 78 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 17c1c3becd92..83098205e739 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2543,6 +2543,28 @@ void cpufreq_update_limits(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(cpufreq_update_limits);
 
+#ifdef CONFIG_CPU_FREQ_STAT
+/**
+ * cpufreq_update_statistics - Update statistics for a given policy.
+ * @policy: struct cpufreq_policy into which the current cpufreq_policy
+ *	is written
+ *
+ * Invoke the driver's ->get_statistics callback if present to fetch
+ * newest statistics from the driver.
+ */
+int cpufreq_update_statistics(struct cpufreq_policy *policy)
+{
+	if (!cpufreq_driver || !policy)
+		return -EINVAL;
+
+	if (cpufreq_driver->get_statistics)
+		return cpufreq_driver->get_statistics(policy);
+	else
+		return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(cpufreq_update_statistics);
+#endif
+
 /*********************************************************************
  *               BOOST						     *
  *********************************************************************/
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 94d959a8e954..d2d975b3cc6d 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -11,19 +11,6 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 
-
-struct cpufreq_stats {
-	unsigned int total_trans;
-	unsigned long long last_time;
-	unsigned int max_state;
-	unsigned int state_num;
-	unsigned int last_index;
-	u64 *time_in_state;
-	spinlock_t lock;
-	unsigned int *freq_table;
-	unsigned int *trans_table;
-};
-
 static void cpufreq_stats_update(struct cpufreq_stats *stats)
 {
 	unsigned long long cur_time = get_jiffies_64();
@@ -50,13 +37,36 @@ static ssize_t show_total_trans(struct cpufreq_policy *policy, char *buf)
 }
 cpufreq_freq_attr_ro(total_trans);
 
+static ssize_t cpufreq_stats_present_driver_data(struct cpufreq_policy *policy, char *buf)
+{
+	struct cpufreq_stats *stats = policy->stats;
+	ssize_t len = 0;
+	int i, ret;
+
+	spin_lock(&stats->lock);
+	ret = cpufreq_update_statistics(policy);
+	spin_unlock(&stats->lock);
+
+	if (ret)
+		return 0;
+
+	for (i = 0; i < stats->state_num; i++) {
+		len += sprintf(buf + len, "%u %llu\n",
+				stats->freq_table[i],
+				stats->time_in_state[i]);
+	}
+	return len;
+}
+
 static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf)
 {
 	struct cpufreq_stats *stats = policy->stats;
 	ssize_t len = 0;
 	int i;
 
-	if (policy->fast_switch_enabled)
+	if (policy->has_driver_stats)
+		return cpufreq_stats_present_driver_data(policy, buf);
+	else if (policy->fast_switch_enabled)
 		return 0;
 
 	spin_lock(&stats->lock);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index e62b022cb07e..21074703e08c 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -138,6 +138,14 @@ struct cpufreq_policy {
 	/* cpufreq-stats */
 	struct cpufreq_stats	*stats;
 
+	/*
+	 * Flag indicating that cpufreq statistics can be taken from device driver.
+	 * The statistics are collected by firmware and the driver is able to retrieve
+	 * them. It is useful when the 'fast switch' is used or the firmware changes
+	 * frequency independently due to e.g. thermal issues.
+	 */
+	bool			has_driver_stats;
+
 	/* For cpufreq driver's internal use */
 	void			*driver_data;
 
@@ -169,6 +177,18 @@ struct cpufreq_freqs {
 	u8 flags;		/* flags of cpufreq_driver, see below. */
 };
 
+struct cpufreq_stats {
+	unsigned int total_trans;
+	unsigned long long last_time;
+	unsigned int max_state;
+	unsigned int state_num;
+	unsigned int last_index;
+	u64 *time_in_state;
+	spinlock_t lock;
+	unsigned int *freq_table;
+	unsigned int *trans_table;
+};
+
 /* Only for ACPI */
 #define CPUFREQ_SHARED_TYPE_NONE (0) /* None */
 #define CPUFREQ_SHARED_TYPE_HW	 (1) /* HW does needed coordination */
@@ -245,11 +265,17 @@ void cpufreq_stats_create_table(struct cpufreq_policy *policy);
 void cpufreq_stats_free_table(struct cpufreq_policy *policy);
 void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
 				     unsigned int new_freq);
+int cpufreq_update_statistics(struct cpufreq_policy *policy);
 #else
 static inline void cpufreq_stats_create_table(struct cpufreq_policy *policy) { }
 static inline void cpufreq_stats_free_table(struct cpufreq_policy *policy) { }
 static inline void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
 						   unsigned int new_freq) { }
+static int __maybe_unused
+cpufreq_update_statistics(struct cpufreq_policy *policy)
+{
+	return -EINVAL;
+}
 #endif /* CONFIG_CPU_FREQ_STAT */
 
 /*********************************************************************
@@ -350,6 +376,12 @@ struct cpufreq_driver {
 	/* Called to update policy limits on firmware notifications. */
 	void		(*update_limits)(unsigned int cpu);
 
+	/*
+	 * Optional, used when driver can fetch statistics from firmware.
+	 * This callback function cannot sleep.
+	 */
+	int		(*get_statistics)(struct cpufreq_policy *policy);
+
 	/* optional */
 	int		(*bios_limit)(int cpu, unsigned int *limit);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics
  2020-07-29 15:12 [PATCH 0/4] CPUFreq statistics retrieved by drivers Lukasz Luba
  2020-07-29 15:12 ` [PATCH 1/4] cpufreq: Add support for statistics read from drivers Lukasz Luba
@ 2020-07-29 15:12 ` Lukasz Luba
  2020-07-31  1:50   ` kernel test robot
  2020-07-31 15:15   ` Cristian Marussi
  2020-07-29 15:12 ` [PATCH 3/4] cpufreq: scmi: Move scmi_cpufreq_driver structure to the top Lukasz Luba
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Lukasz Luba @ 2020-07-29 15:12 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm
  Cc: sudeep.holla, cristian.marussi, viresh.kumar, lukasz.luba, rjw

The firmware is able to maintain performance statistics and share with OS
via shared memory. The memory region can be interpreted by the SCMI perf
protocol after receiving and mapping the proper addresses from FW.
This patch aims to provide needed infrastructure and setup necessary
mechanisms in the protocol layer.

It also extends API functions for the upper layer (cpufreq, devfreq)
with a new callback, which allows to retrieve the statistics for a
particular performance domain. The new structure scmi_perf_domain_stats
added in the header works as a glue for these two layers.

The data processing code for the shared memory is aligned with SCMI v2
specification (DEN0056B) and handles the required endianness. It can
be changed in future not disturbing the upper layer.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/firmware/arm_scmi/perf.c | 210 +++++++++++++++++++++++++++++++
 include/linux/scmi_protocol.h    |  11 ++
 2 files changed, 221 insertions(+)

diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
index 3e1e87012c95..761067bb6237 100644
--- a/drivers/firmware/arm_scmi/perf.c
+++ b/drivers/firmware/arm_scmi/perf.c
@@ -19,6 +19,9 @@
 #include "common.h"
 #include "notify.h"
 
+#define PERF_DOMAIN_STATS_OFFSETS_BASE		0x10
+#define PERF_DOMAIN_COUNT_BASE			0x8
+
 enum scmi_performance_protocol_cmd {
 	PERF_DOMAIN_ATTRIBUTES = 0x3,
 	PERF_DESCRIBE_LEVELS = 0x4,
@@ -32,11 +35,27 @@ enum scmi_performance_protocol_cmd {
 };
 
 struct scmi_opp {
+	u32 id;
 	u32 perf;
 	u32 power;
 	u32 trans_latency_us;
 };
 
+struct scmi_perf_level_raw_stats {
+	__le32 perf_level_id;
+	__le32 reserved;
+	__le64 usage_count;
+	__le64 total_residency_us;
+};
+
+struct scmi_perf_domain_raw_stats {
+	__le16 perf_level_count;
+	__le16 curr_perf_level_id;
+	__le32 extended_stats_offset;
+	__le64 ts_last_change_us;
+	struct scmi_perf_level_raw_stats perf_level[];
+};
+
 struct scmi_msg_resp_perf_attributes {
 	__le16 num_domains;
 	__le16 flags;
@@ -161,13 +180,26 @@ struct perf_dom_info {
 	struct scmi_fc_info *fc_info;
 };
 
+struct scmi_perf_domain_stats_desc {
+	void __iomem *addr;
+	int *opp_map;
+	int size;
+};
+
+struct scmi_perf_stats_desc {
+	uint16_t domain_count;
+	struct scmi_perf_domain_stats_desc *domain_stats;
+};
+
 struct scmi_perf_info {
 	u32 version;
 	int num_domains;
 	bool power_scale_mw;
 	u64 stats_addr;
 	u32 stats_size;
+	void __iomem *stats_virt_addr;
 	struct perf_dom_info *dom_info;
+	struct scmi_perf_stats_desc *stats_desc;
 };
 
 static enum scmi_performance_protocol_cmd evt_2_cmd[] = {
@@ -175,6 +207,55 @@ static enum scmi_performance_protocol_cmd evt_2_cmd[] = {
 	PERF_NOTIFY_LEVEL,
 };
 
+static int scmi_perf_stats_init(const struct scmi_handle *handle,
+				struct scmi_perf_info *pi)
+{
+	struct scmi_perf_domain_stats_desc *domain_stats;
+	int i, domain_count;
+	__le32 offset;
+
+	domain_count = le16_to_cpu(ioread16(pi->stats_virt_addr +
+					    PERF_DOMAIN_COUNT_BASE));
+
+	pi->stats_desc = devm_kzalloc(handle->dev,
+				      sizeof(struct scmi_perf_stats_desc),
+				      GFP_KERNEL);
+	if (!pi->stats_desc)
+		return -ENOMEM;
+
+	pi->stats_desc->domain_stats = devm_kzalloc(handle->dev, domain_count *
+				sizeof(struct scmi_perf_domain_stats_desc),
+				GFP_KERNEL);
+	if (!pi->stats_desc->domain_stats)
+		return -ENOMEM;
+
+	pi->stats_desc->domain_count = domain_count;
+
+	for (i = 0; i < domain_count; i++) {
+		int stats_size;
+		__le16 opp_count;
+
+		offset = ioread32(pi->stats_virt_addr +
+				PERF_DOMAIN_STATS_OFFSETS_BASE + i * 4);
+		if (!offset)
+			continue;
+
+		domain_stats = &pi->stats_desc->domain_stats[i];
+
+		domain_stats->addr = pi->stats_virt_addr + le32_to_cpu(offset);
+
+		/* The first field is the performance level count. */
+		opp_count = le16_to_cpu(ioread16(domain_stats->addr));
+		stats_size = sizeof(struct scmi_perf_domain_raw_stats);
+		stats_size += sizeof(struct scmi_perf_level_raw_stats) *
+				opp_count;
+
+		domain_stats->size = stats_size;
+	}
+
+	return 0;
+}
+
 static int scmi_perf_attributes_get(const struct scmi_handle *handle,
 				    struct scmi_perf_info *pi)
 {
@@ -198,6 +279,14 @@ static int scmi_perf_attributes_get(const struct scmi_handle *handle,
 		pi->stats_addr = le32_to_cpu(attr->stats_addr_low) |
 				(u64)le32_to_cpu(attr->stats_addr_high) << 32;
 		pi->stats_size = le32_to_cpu(attr->stats_size);
+		if (pi->stats_addr && pi->stats_size) {
+			pi->stats_virt_addr = devm_ioremap(handle->dev,
+					pi->stats_addr, pi->stats_size);
+			if (pi->stats_virt_addr)
+				ret = scmi_perf_stats_init(handle, pi);
+			else
+				ret = -ENOMEM;
+		}
 	}
 
 	scmi_xfer_put(handle, t);
@@ -298,6 +387,7 @@ scmi_perf_describe_levels_get(const struct scmi_handle *handle, u32 domain,
 			opp->power = le32_to_cpu(level_info->opp[cnt].power);
 			opp->trans_latency_us = le16_to_cpu
 				(level_info->opp[cnt].transition_latency_us);
+			opp->id = tot_opp_cnt + cnt;
 
 			dev_dbg(handle->dev, "Level %d Power %d Latency %dus\n",
 				opp->perf, opp->power, opp->trans_latency_us);
@@ -748,6 +838,125 @@ static bool scmi_fast_switch_possible(const struct scmi_handle *handle,
 	return dom->fc_info && dom->fc_info->level_set_addr;
 }
 
+static int scmi_dvfs_setup_opps_mapping(const struct scmi_handle *handle,
+					u32 domain_id)
+{
+	struct scmi_perf_domain_stats_desc *domain_stats;
+	struct scmi_perf_info *pi = handle->perf_priv;
+	struct perf_dom_info *dom;
+	struct scmi_opp *opp;
+	int idx, *mapping;
+
+	dom = pi->dom_info + domain_id;
+	if (!dom)
+		return -EIO;
+
+	mapping = devm_kzalloc(handle->dev, sizeof(int) * dom->opp_count,
+			       GFP_KERNEL);
+	if (!mapping)
+		return -ENOMEM;
+
+	/* Construct LUT with FW OPP ids as an index */
+	for (opp = dom->opp, idx = 0; idx < dom->opp_count; idx++, opp++)
+		mapping[opp->id] = idx;
+
+	domain_stats = &pi->stats_desc->domain_stats[domain_id];
+	domain_stats->opp_map = mapping;
+
+	return 0;
+}
+
+static int
+scmi_dvfs_stats_get(const struct scmi_handle *handle, u32 domain_id,
+		    struct scmi_perf_domain_stats *stats)
+{
+	struct scmi_perf_domain_stats_desc *domain_stats;
+	struct scmi_perf_domain_raw_stats *raw_stats[2];
+	struct scmi_perf_info *pi = handle->perf_priv;
+	struct scmi_perf_level_raw_stats *perf;
+	int i, index, ret = -EINVAL;
+	struct perf_dom_info *dom;
+	u64 transition_count = 0;
+	struct scmi_opp *opp;
+
+	if (!stats)
+		return -EINVAL;
+
+	if (!pi->stats_virt_addr || !pi->stats_desc ||
+		!pi->stats_desc->domain_stats)
+		return -ENOENT;
+
+	if (pi->stats_desc->domain_count <= domain_id ||
+		!pi->stats_desc->domain_stats[domain_id].addr)
+		return -ENOENT;
+
+	dom = pi->dom_info + domain_id;
+	if (!dom)
+		return -EIO;
+
+	domain_stats = &pi->stats_desc->domain_stats[domain_id];
+
+	if (!domain_stats->opp_map) {
+		ret = scmi_dvfs_setup_opps_mapping(handle, domain_id);
+		if (ret)
+			return ret;
+	}
+
+	raw_stats[0] = vmalloc(domain_stats->size);
+	if (!raw_stats[0])
+		return -ENOMEM;
+
+	raw_stats[1] = vmalloc(domain_stats->size);
+	if (!raw_stats[1]) {
+		vfree(raw_stats[0]);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Let's try 10 times. If two consecutive reads are the same - done.
+	 * This approach is aligned with SCMI v2 specification.
+	 */
+	for (i = 0; i < 10; i++) {
+		memcpy_fromio(raw_stats[0], domain_stats->addr,
+			      domain_stats->size);
+		memcpy_fromio(raw_stats[1], domain_stats->addr,
+			      domain_stats->size);
+		if (!memcmp(raw_stats[0], raw_stats[1], domain_stats->size)) {
+			ret = 0;
+			break;
+		}
+	}
+
+	if (ret)
+		goto free_buf;
+
+	for (i = 0; i < dom->opp_count; i++) {
+		perf = &raw_stats[0]->perf_level[i];
+
+		transition_count += __le64_to_cpu(perf->usage_count);
+		stats->time_in_state[i] =
+				__le64_to_cpu(perf->total_residency_us);
+
+		/* Speed-up and initialize the frequencies only once. */
+		if (stats->freq_table[i] == 0) {
+			index = domain_stats->opp_map[i];
+			opp = &dom->opp[index];
+			stats->freq_table[i] = opp->perf * dom->mult_factor;
+		}
+	}
+
+	stats->total_trans = transition_count;
+
+	stats->last_index = __le16_to_cpu(raw_stats[0]->curr_perf_level_id);
+	stats->last_time = __le64_to_cpu(raw_stats[0]->ts_last_change_us);
+
+free_buf:
+	vfree(raw_stats[1]);
+	vfree(raw_stats[0]);
+
+	return ret;
+}
+
 static struct scmi_perf_ops perf_ops = {
 	.limits_set = scmi_perf_limits_set,
 	.limits_get = scmi_perf_limits_get,
@@ -760,6 +969,7 @@ static struct scmi_perf_ops perf_ops = {
 	.freq_get = scmi_dvfs_freq_get,
 	.est_power_get = scmi_dvfs_est_power_get,
 	.fast_switch_possible = scmi_fast_switch_possible,
+	.statistics_get = scmi_dvfs_stats_get,
 };
 
 static int scmi_perf_set_notify_enabled(const struct scmi_handle *handle,
diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h
index 7e5dd7d1e221..3316ff4f9d34 100644
--- a/include/linux/scmi_protocol.h
+++ b/include/linux/scmi_protocol.h
@@ -55,6 +55,15 @@ struct scmi_clock_info {
 	};
 };
 
+struct scmi_perf_domain_stats {
+	unsigned long long last_time;
+	unsigned long long last_index;
+	unsigned int total_trans;
+	unsigned int state_num;
+	u64 *time_in_state;
+	unsigned int *freq_table;
+};
+
 struct scmi_handle;
 
 /**
@@ -121,6 +130,8 @@ struct scmi_perf_ops {
 			     unsigned long *rate, unsigned long *power);
 	bool (*fast_switch_possible)(const struct scmi_handle *handle,
 				     struct device *dev);
+	int (*statistics_get)(const struct scmi_handle *handle, u32 domain_id,
+			      struct scmi_perf_domain_stats *stats);
 };
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] cpufreq: scmi: Move scmi_cpufreq_driver structure to the top
  2020-07-29 15:12 [PATCH 0/4] CPUFreq statistics retrieved by drivers Lukasz Luba
  2020-07-29 15:12 ` [PATCH 1/4] cpufreq: Add support for statistics read from drivers Lukasz Luba
  2020-07-29 15:12 ` [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics Lukasz Luba
@ 2020-07-29 15:12 ` Lukasz Luba
  2020-07-29 15:12 ` [PATCH 4/4] cpufreq: scmi: Read statistics from FW shared memory Lukasz Luba
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Lukasz Luba @ 2020-07-29 15:12 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm
  Cc: sudeep.holla, cristian.marussi, viresh.kumar, lukasz.luba, rjw

Move the scmi_cpufreq_driver to top of the file in order to prepare for
upcoming changes, which will extend it based on discoverable
functionality.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/scmi-cpufreq.c | 36 +++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index fb42e3390377..fe95350eb844 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -24,6 +24,28 @@ struct scmi_data {
 	struct device *cpu_dev;
 };
 
+static unsigned int scmi_cpufreq_get_rate(unsigned int cpu);
+static int scmi_cpufreq_set_target(struct cpufreq_policy *policy,
+				   unsigned int index);
+static unsigned int scmi_cpufreq_fast_switch(struct cpufreq_policy *policy,
+					     unsigned int target_freq);
+static int scmi_cpufreq_init(struct cpufreq_policy *policy);
+static int scmi_cpufreq_exit(struct cpufreq_policy *policy);
+
+static struct cpufreq_driver scmi_cpufreq_driver = {
+	.name	= "scmi",
+	.flags	= CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
+		  CPUFREQ_NEED_INITIAL_FREQ_CHECK |
+		  CPUFREQ_IS_COOLING_DEV,
+	.verify	= cpufreq_generic_frequency_table_verify,
+	.attr	= cpufreq_generic_attr,
+	.target_index	= scmi_cpufreq_set_target,
+	.fast_switch	= scmi_cpufreq_fast_switch,
+	.get	= scmi_cpufreq_get_rate,
+	.init	= scmi_cpufreq_init,
+	.exit	= scmi_cpufreq_exit,
+};
+
 static const struct scmi_handle *handle;
 
 static unsigned int scmi_cpufreq_get_rate(unsigned int cpu)
@@ -219,20 +241,6 @@ static int scmi_cpufreq_exit(struct cpufreq_policy *policy)
 	return 0;
 }
 
-static struct cpufreq_driver scmi_cpufreq_driver = {
-	.name	= "scmi",
-	.flags	= CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
-		  CPUFREQ_NEED_INITIAL_FREQ_CHECK |
-		  CPUFREQ_IS_COOLING_DEV,
-	.verify	= cpufreq_generic_frequency_table_verify,
-	.attr	= cpufreq_generic_attr,
-	.target_index	= scmi_cpufreq_set_target,
-	.fast_switch	= scmi_cpufreq_fast_switch,
-	.get	= scmi_cpufreq_get_rate,
-	.init	= scmi_cpufreq_init,
-	.exit	= scmi_cpufreq_exit,
-};
-
 static int scmi_cpufreq_probe(struct scmi_device *sdev)
 {
 	int ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] cpufreq: scmi: Read statistics from FW shared memory
  2020-07-29 15:12 [PATCH 0/4] CPUFreq statistics retrieved by drivers Lukasz Luba
                   ` (2 preceding siblings ...)
  2020-07-29 15:12 ` [PATCH 3/4] cpufreq: scmi: Move scmi_cpufreq_driver structure to the top Lukasz Luba
@ 2020-07-29 15:12 ` Lukasz Luba
  2020-07-30  8:53 ` [PATCH 0/4] CPUFreq statistics retrieved by drivers Viresh Kumar
  2020-08-04 17:27 ` Florian Fainelli
  5 siblings, 0 replies; 25+ messages in thread
From: Lukasz Luba @ 2020-07-29 15:12 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-pm
  Cc: sudeep.holla, cristian.marussi, viresh.kumar, lukasz.luba, rjw

Add support for reading CPUFreq statistics from the firmware. The setup
code initializes needed structures per domain basis. The driver callback
scmi_cpufreq_stats_get is called by the CPUFreq framework and invokes
SCMI performance protocol in order to retrieve latest statistics.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/scmi-cpufreq.c | 80 ++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index fe95350eb844..67ae63171dd0 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -22,6 +22,7 @@
 struct scmi_data {
 	int domain_id;
 	struct device *cpu_dev;
+	struct scmi_perf_domain_stats *stats;
 };
 
 static unsigned int scmi_cpufreq_get_rate(unsigned int cpu);
@@ -48,6 +49,34 @@ static struct cpufreq_driver scmi_cpufreq_driver = {
 
 static const struct scmi_handle *handle;
 
+static int scmi_cpufreq_stats_get(struct cpufreq_policy *policy)
+{
+	struct scmi_data *priv = policy->driver_data;
+	struct cpufreq_stats *stats = policy->stats;
+	int i, ret;
+
+	/*
+	 * Since the driver callback is global and can be set for all
+	 * policy objects, there is a need to check if that policy has
+	 * statistics in the shared memory.
+	 */
+	if (!policy->has_driver_stats)
+		return -EINVAL;
+
+	ret = handle->perf_ops->statistics_get(handle, priv->domain_id, priv->stats);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < priv->stats->state_num; i++)
+		stats->time_in_state[i] = priv->stats->time_in_state[i];
+
+	stats->total_trans = priv->stats->total_trans;
+	stats->last_index = priv->stats->last_index;
+	stats->last_time = priv->stats->last_time;
+
+	return ret;
+}
+
 static unsigned int scmi_cpufreq_get_rate(unsigned int cpu)
 {
 	struct cpufreq_policy *policy = cpufreq_cpu_get_raw(cpu);
@@ -147,6 +176,45 @@ scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
 	return 0;
 }
 
+static int scmi_cpufreq_stats_init(struct cpufreq_policy *policy,
+				   struct scmi_data *priv, int nr_opp)
+{
+	struct scmi_perf_domain_stats *stats;
+	int ret;
+
+	stats = kzalloc(sizeof(struct scmi_perf_domain_stats), GFP_KERNEL);
+	if (!stats)
+		return -ENOMEM;
+
+	stats->time_in_state = kcalloc(nr_opp, sizeof(u64), GFP_KERNEL);
+	if (!stats->time_in_state) {
+		kfree(stats);
+		return -ENOMEM;
+	}
+
+	stats->freq_table = kcalloc(nr_opp, sizeof(unsigned int), GFP_KERNEL);
+	if (!stats->freq_table) {
+		kfree(stats->time_in_state);
+		kfree(stats);
+		return -ENOMEM;
+	}
+
+	priv->stats = stats;
+	priv->stats->state_num = nr_opp;
+	policy->has_driver_stats = true;
+	scmi_cpufreq_driver.get_statistics = scmi_cpufreq_stats_get;
+
+	ret = handle->perf_ops->statistics_get(handle, priv->domain_id, stats);
+	if (ret) {
+		kfree(stats->freq_table);
+		kfree(stats->time_in_state);
+		kfree(stats);
+		policy->has_driver_stats = false;
+	}
+
+	return ret;
+}
+
 static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 {
 	int ret, nr_opp;
@@ -209,6 +277,10 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
 	/* SCMI allows DVFS request for any domain from any CPU */
 	policy->dvfs_possible_from_any_cpu = true;
 
+	ret = scmi_cpufreq_stats_init(policy, priv, nr_opp);
+	if (ret)
+		dev_warn(cpu_dev, "failed to init statistics: %d\n", ret);
+
 	latency = handle->perf_ops->transition_latency_get(handle, cpu_dev);
 	if (!latency)
 		latency = CPUFREQ_ETERNAL;
@@ -236,6 +308,14 @@ static int scmi_cpufreq_exit(struct cpufreq_policy *policy)
 
 	dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table);
 	dev_pm_opp_remove_all_dynamic(priv->cpu_dev);
+
+	if (priv->stats) {
+		policy->has_driver_stats = false;
+		kfree(priv->stats->time_in_state);
+		kfree(priv->stats->freq_table);
+		kfree(priv->stats);
+	}
+
 	kfree(priv);
 
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-07-29 15:12 [PATCH 0/4] CPUFreq statistics retrieved by drivers Lukasz Luba
                   ` (3 preceding siblings ...)
  2020-07-29 15:12 ` [PATCH 4/4] cpufreq: scmi: Read statistics from FW shared memory Lukasz Luba
@ 2020-07-30  8:53 ` Viresh Kumar
  2020-07-30  9:10   ` Sudeep Holla
  2020-08-04 17:27 ` Florian Fainelli
  5 siblings, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2020-07-30  8:53 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla,
	cristian.marussi, rjw

On 29-07-20, 16:12, Lukasz Luba wrote:
> The existing CPUFreq framework does not tracks the statistics when the
> 'fast switch' is used or when firmware changes the frequency independently
> due to e.g. thermal reasons. However, the firmware might track the frequency
> changes and expose this to the kernel.
> 
> This patch set aims to introduce CPUfreq statistics gathered by firmware
> and retrieved by CPUFreq driver. It would require a new API functions
> in the CPUFreq, which allows to poke drivers to get these stats.
> 
> The needed CPUFreq infrastructure is in patch 1/4, patch 2/4 extends
> ARM SCMI protocol layer, patches 3/4, 4/4  modify ARM SCMI CPUFreq driver.

Are you doing this for the fast switch case or because your platform
actually runs at frequencies which may be different from what cpufreq
core has requested ?

I am also not sure what these tables should represent, what the
cpufreq core has decided for the CPUs or the frequencies we actually
run at, as these two can be very different for example if the hardware
runs at frequencies which don't match exactly to what is there in the
freq table. I believe these are rather to show what cpufreq and its
governors are doing with the CPUs.

Over that I would like the userspace stats to work exactly as the way
they work right now, i.e. capture all transitions from one freq to
other, not just time-in-state. Also resetting of the stats from
userspace for example. All allocation and printing of the data must be
done from stats core, the only thing which the driver would do at the
end is updating the stats structure and nothing more. Instead of
reading all stats from the firmware, it will be much easier if you can
just get the information from the firmware whenever there is a
frequency switch and then we can update the stats the way it is done
right now. And that would be simple.

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-07-30  8:53 ` [PATCH 0/4] CPUFreq statistics retrieved by drivers Viresh Kumar
@ 2020-07-30  9:10   ` Sudeep Holla
  2020-07-30  9:36     ` Lukasz Luba
  0 siblings, 1 reply; 25+ messages in thread
From: Sudeep Holla @ 2020-07-30  9:10 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Lukasz Luba, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw, Sudeep Holla

On Thu, Jul 30, 2020 at 02:23:33PM +0530, Viresh Kumar wrote:
> On 29-07-20, 16:12, Lukasz Luba wrote:
> > The existing CPUFreq framework does not tracks the statistics when the
> > 'fast switch' is used or when firmware changes the frequency independently
> > due to e.g. thermal reasons. However, the firmware might track the frequency
> > changes and expose this to the kernel.
> >
> > This patch set aims to introduce CPUfreq statistics gathered by firmware
> > and retrieved by CPUFreq driver. It would require a new API functions
> > in the CPUFreq, which allows to poke drivers to get these stats.
> >
> > The needed CPUFreq infrastructure is in patch 1/4, patch 2/4 extends
> > ARM SCMI protocol layer, patches 3/4, 4/4  modify ARM SCMI CPUFreq driver.
>
> Are you doing this for the fast switch case or because your platform
> actually runs at frequencies which may be different from what cpufreq
> core has requested ?
>

I think so.

> I am also not sure what these tables should represent, what the
> cpufreq core has decided for the CPUs or the frequencies we actually
> run at, as these two can be very different for example if the hardware
> runs at frequencies which don't match exactly to what is there in the
> freq table. I believe these are rather to show what cpufreq and its
> governors are doing with the CPUs.
>

Exactly, I raised similar point in internal discussion and asked Lukasz
to take up the same on the list. I assume it was always what cpufreq
requested rather than what was delivered. So will we break the userspace
ABI if we change that is the main question.

> Over that I would like the userspace stats to work exactly as the way
> they work right now, i.e. capture all transitions from one freq to
> other, not just time-in-state. Also resetting of the stats from
> userspace for example. All allocation and printing of the data must be
> done from stats core, the only thing which the driver would do at the
> end is updating the stats structure and nothing more. Instead of
> reading all stats from the firmware, it will be much easier if you can
> just get the information from the firmware whenever there is a
> frequency switch and then we can update the stats the way it is done
> right now. And that would be simple.
>

Good point, but notifications may not be lightweight. If that is no good,
alternatively, I suggested to keep these firmware stats in a separate
debugfs. Thoughts ?

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-07-30  9:10   ` Sudeep Holla
@ 2020-07-30  9:36     ` Lukasz Luba
  2020-07-31 15:56       ` Sudeep Holla
  2020-08-04  5:35       ` Viresh Kumar
  0 siblings, 2 replies; 25+ messages in thread
From: Lukasz Luba @ 2020-07-30  9:36 UTC (permalink / raw)
  To: Sudeep Holla, Viresh Kumar
  Cc: linux-kernel, linux-arm-kernel, linux-pm, cristian.marussi, rjw



On 7/30/20 10:10 AM, Sudeep Holla wrote:
> On Thu, Jul 30, 2020 at 02:23:33PM +0530, Viresh Kumar wrote:
>> On 29-07-20, 16:12, Lukasz Luba wrote:
>>> The existing CPUFreq framework does not tracks the statistics when the
>>> 'fast switch' is used or when firmware changes the frequency independently
>>> due to e.g. thermal reasons. However, the firmware might track the frequency
>>> changes and expose this to the kernel.
>>>
>>> This patch set aims to introduce CPUfreq statistics gathered by firmware
>>> and retrieved by CPUFreq driver. It would require a new API functions
>>> in the CPUFreq, which allows to poke drivers to get these stats.
>>>
>>> The needed CPUFreq infrastructure is in patch 1/4, patch 2/4 extends
>>> ARM SCMI protocol layer, patches 3/4, 4/4  modify ARM SCMI CPUFreq driver.
>>
>> Are you doing this for the fast switch case or because your platform
>> actually runs at frequencies which may be different from what cpufreq
>> core has requested ?
>>
> 
> I think so.

For both cases, but fast switch is major and present. Thermal is not
currently implemented in SCP FW, but might be in future.

> 
>> I am also not sure what these tables should represent, what the
>> cpufreq core has decided for the CPUs or the frequencies we actually
>> run at, as these two can be very different for example if the hardware
>> runs at frequencies which don't match exactly to what is there in the
>> freq table. I believe these are rather to show what cpufreq and its
>> governors are doing with the CPUs.
>>
> 
> Exactly, I raised similar point in internal discussion and asked Lukasz
> to take up the same on the list. I assume it was always what cpufreq
> requested rather than what was delivered. So will we break the userspace
> ABI if we change that is the main question.

Thank you for confirmation. If that is the mechanism for tracking what
cpufreq governors are doing with the CPUs, then is clashes with
presented data in FW memory, because firmware is the governor.

> 
>> Over that I would like the userspace stats to work exactly as the way
>> they work right now, i.e. capture all transitions from one freq to
>> other, not just time-in-state. Also resetting of the stats from
>> userspace for example. All allocation and printing of the data must be
>> done from stats core, the only thing which the driver would do at the
>> end is updating the stats structure and nothing more. Instead of
>> reading all stats from the firmware, it will be much easier if you can
>> just get the information from the firmware whenever there is a
>> frequency switch and then we can update the stats the way it is done
>> right now. And that would be simple.
>>
> 
> Good point, but notifications may not be lightweight. If that is no good,
> alternatively, I suggested to keep these firmware stats in a separate
> debugfs. Thoughts ?

I agree that notifications might not be lightweight. Furthermore I think
this still clashes with the assumption that cpufreq governor decisions
are tracked in these statistics, not the firmware decision.

In this case I think we would have to create debugfs.
Sudeep do you think these debugfs should be exposed from the protocol
layer:
drivers/firmware/arm_scmi/perf.c
or maybe from the cpufreq scmi driver? I would probably be safer to have
it in the cpufreq driver because we have scmi_handle there.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics
  2020-07-29 15:12 ` [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics Lukasz Luba
@ 2020-07-31  1:50   ` kernel test robot
  2020-07-31 15:15   ` Cristian Marussi
  1 sibling, 0 replies; 25+ messages in thread
From: kernel test robot @ 2020-07-31  1:50 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-arm-kernel, linux-pm
  Cc: kbuild-all, sudeep.holla, cristian.marussi, viresh.kumar,
	lukasz.luba, rjw

[-- Attachment #1: Type: text/plain, Size: 5215 bytes --]

Hi Lukasz,

I love your patch! Yet something to improve:

[auto build test ERROR on next-20200729]
[cannot apply to pm/linux-next tip/auto-latest linux/master linus/master v5.8-rc7 v5.8-rc6 v5.8-rc5 v5.8-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Lukasz-Luba/CPUFreq-statistics-retrieved-by-drivers/20200729-231539
base:    04b4571786305a76ad81757bbec78eb16a5de582
config: mips-allyesconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   drivers/firmware/arm_scmi/perf.c: In function 'scmi_dvfs_stats_get':
>> drivers/firmware/arm_scmi/perf.c:905:17: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration]
     905 |  raw_stats[0] = vmalloc(domain_stats->size);
         |                 ^~~~~~~
>> drivers/firmware/arm_scmi/perf.c:905:15: warning: assignment to 'struct scmi_perf_domain_raw_stats *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
     905 |  raw_stats[0] = vmalloc(domain_stats->size);
         |               ^
   drivers/firmware/arm_scmi/perf.c:909:15: warning: assignment to 'struct scmi_perf_domain_raw_stats *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
     909 |  raw_stats[1] = vmalloc(domain_stats->size);
         |               ^
>> drivers/firmware/arm_scmi/perf.c:911:3: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
     911 |   vfree(raw_stats[0]);
         |   ^~~~~
   cc1: some warnings being treated as errors

vim +/vmalloc +905 drivers/firmware/arm_scmi/perf.c

   868	
   869	static int
   870	scmi_dvfs_stats_get(const struct scmi_handle *handle, u32 domain_id,
   871			    struct scmi_perf_domain_stats *stats)
   872	{
   873		struct scmi_perf_domain_stats_desc *domain_stats;
   874		struct scmi_perf_domain_raw_stats *raw_stats[2];
   875		struct scmi_perf_info *pi = handle->perf_priv;
   876		struct scmi_perf_level_raw_stats *perf;
   877		int i, index, ret = -EINVAL;
   878		struct perf_dom_info *dom;
   879		u64 transition_count = 0;
   880		struct scmi_opp *opp;
   881	
   882		if (!stats)
   883			return -EINVAL;
   884	
   885		if (!pi->stats_virt_addr || !pi->stats_desc ||
   886			!pi->stats_desc->domain_stats)
   887			return -ENOENT;
   888	
   889		if (pi->stats_desc->domain_count <= domain_id ||
   890			!pi->stats_desc->domain_stats[domain_id].addr)
   891			return -ENOENT;
   892	
   893		dom = pi->dom_info + domain_id;
   894		if (!dom)
   895			return -EIO;
   896	
   897		domain_stats = &pi->stats_desc->domain_stats[domain_id];
   898	
   899		if (!domain_stats->opp_map) {
   900			ret = scmi_dvfs_setup_opps_mapping(handle, domain_id);
   901			if (ret)
   902				return ret;
   903		}
   904	
 > 905		raw_stats[0] = vmalloc(domain_stats->size);
   906		if (!raw_stats[0])
   907			return -ENOMEM;
   908	
   909		raw_stats[1] = vmalloc(domain_stats->size);
   910		if (!raw_stats[1]) {
 > 911			vfree(raw_stats[0]);
   912			return -ENOMEM;
   913		}
   914	
   915		/*
   916		 * Let's try 10 times. If two consecutive reads are the same - done.
   917		 * This approach is aligned with SCMI v2 specification.
   918		 */
   919		for (i = 0; i < 10; i++) {
   920			memcpy_fromio(raw_stats[0], domain_stats->addr,
   921				      domain_stats->size);
   922			memcpy_fromio(raw_stats[1], domain_stats->addr,
   923				      domain_stats->size);
   924			if (!memcmp(raw_stats[0], raw_stats[1], domain_stats->size)) {
   925				ret = 0;
   926				break;
   927			}
   928		}
   929	
   930		if (ret)
   931			goto free_buf;
   932	
   933		for (i = 0; i < dom->opp_count; i++) {
   934			perf = &raw_stats[0]->perf_level[i];
   935	
   936			transition_count += __le64_to_cpu(perf->usage_count);
   937			stats->time_in_state[i] =
   938					__le64_to_cpu(perf->total_residency_us);
   939	
   940			/* Speed-up and initialize the frequencies only once. */
   941			if (stats->freq_table[i] == 0) {
   942				index = domain_stats->opp_map[i];
   943				opp = &dom->opp[index];
   944				stats->freq_table[i] = opp->perf * dom->mult_factor;
   945			}
   946		}
   947	
   948		stats->total_trans = transition_count;
   949	
   950		stats->last_index = __le16_to_cpu(raw_stats[0]->curr_perf_level_id);
   951		stats->last_time = __le64_to_cpu(raw_stats[0]->ts_last_change_us);
   952	
   953	free_buf:
   954		vfree(raw_stats[1]);
   955		vfree(raw_stats[0]);
   956	
   957		return ret;
   958	}
   959	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 67601 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics
  2020-07-29 15:12 ` [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics Lukasz Luba
  2020-07-31  1:50   ` kernel test robot
@ 2020-07-31 15:15   ` Cristian Marussi
  2020-08-04 11:10     ` Lukasz Luba
  1 sibling, 1 reply; 25+ messages in thread
From: Cristian Marussi @ 2020-07-31 15:15 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla,
	viresh.kumar, rjw

Hi

On Wed, Jul 29, 2020 at 04:12:06PM +0100, Lukasz Luba wrote:
> The firmware is able to maintain performance statistics and share with OS
> via shared memory. The memory region can be interpreted by the SCMI perf
> protocol after receiving and mapping the proper addresses from FW.
> This patch aims to provide needed infrastructure and setup necessary
> mechanisms in the protocol layer.
> 
> It also extends API functions for the upper layer (cpufreq, devfreq)
> with a new callback, which allows to retrieve the statistics for a
> particular performance domain. The new structure scmi_perf_domain_stats
> added in the header works as a glue for these two layers.
> 
> The data processing code for the shared memory is aligned with SCMI v2
> specification (DEN0056B) and handles the required endianness. It can
> be changed in future not disturbing the upper layer.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  drivers/firmware/arm_scmi/perf.c | 210 +++++++++++++++++++++++++++++++
>  include/linux/scmi_protocol.h    |  11 ++
>  2 files changed, 221 insertions(+)
> 
> diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
> index 3e1e87012c95..761067bb6237 100644
> --- a/drivers/firmware/arm_scmi/perf.c
> +++ b/drivers/firmware/arm_scmi/perf.c
> @@ -19,6 +19,9 @@
>  #include "common.h"
>  #include "notify.h"
>  
> +#define PERF_DOMAIN_STATS_OFFSETS_BASE		0x10
> +#define PERF_DOMAIN_COUNT_BASE			0x8
> +
>  enum scmi_performance_protocol_cmd {
>  	PERF_DOMAIN_ATTRIBUTES = 0x3,
>  	PERF_DESCRIBE_LEVELS = 0x4,
> @@ -32,11 +35,27 @@ enum scmi_performance_protocol_cmd {
>  };
>  
>  struct scmi_opp {
> +	u32 id;
>  	u32 perf;
>  	u32 power;
>  	u32 trans_latency_us;
>  };
>  
> +struct scmi_perf_level_raw_stats {
> +	__le32 perf_level_id;
> +	__le32 reserved;
> +	__le64 usage_count;
> +	__le64 total_residency_us;
> +};
> +
> +struct scmi_perf_domain_raw_stats {
> +	__le16 perf_level_count;
> +	__le16 curr_perf_level_id;
> +	__le32 extended_stats_offset;
> +	__le64 ts_last_change_us;
> +	struct scmi_perf_level_raw_stats perf_level[];
> +};
> +
>  struct scmi_msg_resp_perf_attributes {
>  	__le16 num_domains;
>  	__le16 flags;
> @@ -161,13 +180,26 @@ struct perf_dom_info {
>  	struct scmi_fc_info *fc_info;
>  };
>  
> +struct scmi_perf_domain_stats_desc {
> +	void __iomem *addr;
> +	int *opp_map;
> +	int size;
> +};
> +
> +struct scmi_perf_stats_desc {
> +	uint16_t domain_count;
> +	struct scmi_perf_domain_stats_desc *domain_stats;
> +};
> +
>  struct scmi_perf_info {
>  	u32 version;
>  	int num_domains;
>  	bool power_scale_mw;
>  	u64 stats_addr;
>  	u32 stats_size;
> +	void __iomem *stats_virt_addr;
>  	struct perf_dom_info *dom_info;
> +	struct scmi_perf_stats_desc *stats_desc;
>  };
>  
>  static enum scmi_performance_protocol_cmd evt_2_cmd[] = {
> @@ -175,6 +207,55 @@ static enum scmi_performance_protocol_cmd evt_2_cmd[] = {
>  	PERF_NOTIFY_LEVEL,
>  };
>  
> +static int scmi_perf_stats_init(const struct scmi_handle *handle,
> +				struct scmi_perf_info *pi)
> +{
> +	struct scmi_perf_domain_stats_desc *domain_stats;
> +	int i, domain_count;
> +	__le32 offset;
> +

LGTM by I'd add also the check for the Signature field first of all, to rule
out misconfigured/misaligned memory when integrating with fw.
Regarding revision and attributes, they are just zero and, as of now, not exposed
to upper layers but I'm wondering if we should not parse and expose them too already
to be future proof (since future SCMIv3 is near really and it will change this mechanism
and bump revision field.)

> +	domain_count = le16_to_cpu(ioread16(pi->stats_virt_addr +
> +					    PERF_DOMAIN_COUNT_BASE));
> +

Would be worth to check this against pinfo->num_domains ? (real question)
I suppose that if the platform limits the visible domains to this agent
it should be consistent between stats and messages.

> +	pi->stats_desc = devm_kzalloc(handle->dev,
> +				      sizeof(struct scmi_perf_stats_desc),
> +				      GFP_KERNEL);
> +	if (!pi->stats_desc)
> +		return -ENOMEM;
> +
> +	pi->stats_desc->domain_stats = devm_kzalloc(handle->dev, domain_count *
> +				sizeof(struct scmi_perf_domain_stats_desc),

nit: ... sizeof(*domain_stats) ?

> +				GFP_KERNEL);
> +	if (!pi->stats_desc->domain_stats)
> +		return -ENOMEM;
> +
> +	pi->stats_desc->domain_count = domain_count;
> +
> +	for (i = 0; i < domain_count; i++) {
> +		int stats_size;
> +		__le16 opp_count;
> +
> +		offset = ioread32(pi->stats_virt_addr +
> +				PERF_DOMAIN_STATS_OFFSETS_BASE + i * 4);
> +		if (!offset)
> +			continue;
> +
> +		domain_stats = &pi->stats_desc->domain_stats[i];
> +
> +		domain_stats->addr = pi->stats_virt_addr + le32_to_cpu(offset);
> +
> +		/* The first field is the performance level count. */
> +		opp_count = le16_to_cpu(ioread16(domain_stats->addr));
> +		stats_size = sizeof(struct scmi_perf_domain_raw_stats);
> +		stats_size += sizeof(struct scmi_perf_level_raw_stats) *
> +				opp_count;
> +
> +		domain_stats->size = stats_size;
> +	}
> +
> +	return 0;
> +}
> +
>  static int scmi_perf_attributes_get(const struct scmi_handle *handle,
>  				    struct scmi_perf_info *pi)
>  {
> @@ -198,6 +279,14 @@ static int scmi_perf_attributes_get(const struct scmi_handle *handle,
>  		pi->stats_addr = le32_to_cpu(attr->stats_addr_low) |
>  				(u64)le32_to_cpu(attr->stats_addr_high) << 32;
>  		pi->stats_size = le32_to_cpu(attr->stats_size);
> +		if (pi->stats_addr && pi->stats_size) {
> +			pi->stats_virt_addr = devm_ioremap(handle->dev,
> +					pi->stats_addr, pi->stats_size);
> +			if (pi->stats_virt_addr)
> +				ret = scmi_perf_stats_init(handle, pi);
> +			else
> +				ret = -ENOMEM;
> +		}
>  	}
>  
>  	scmi_xfer_put(handle, t);
> @@ -298,6 +387,7 @@ scmi_perf_describe_levels_get(const struct scmi_handle *handle, u32 domain,
>  			opp->power = le32_to_cpu(level_info->opp[cnt].power);
>  			opp->trans_latency_us = le16_to_cpu
>  				(level_info->opp[cnt].transition_latency_us);
> +			opp->id = tot_opp_cnt + cnt;
>  
>  			dev_dbg(handle->dev, "Level %d Power %d Latency %dus\n",
>  				opp->perf, opp->power, opp->trans_latency_us);
> @@ -748,6 +838,125 @@ static bool scmi_fast_switch_possible(const struct scmi_handle *handle,
>  	return dom->fc_info && dom->fc_info->level_set_addr;
>  }
>  
> +static int scmi_dvfs_setup_opps_mapping(const struct scmi_handle *handle,
> +					u32 domain_id)
> +{
> +	struct scmi_perf_domain_stats_desc *domain_stats;
> +	struct scmi_perf_info *pi = handle->perf_priv;
> +	struct perf_dom_info *dom;
> +	struct scmi_opp *opp;
> +	int idx, *mapping;
> +
> +	dom = pi->dom_info + domain_id;
> +	if (!dom)
> +		return -EIO;
This is a bit scary without something like dom < pi->num_domains :D

> +
> +	mapping = devm_kzalloc(handle->dev, sizeof(int) * dom->opp_count,
> +			       GFP_KERNEL);
> +	if (!mapping)
> +		return -ENOMEM;
> +
> +	/* Construct LUT with FW OPP ids as an index */
> +	for (opp = dom->opp, idx = 0; idx < dom->opp_count; idx++, opp++)
> +		mapping[opp->id] = idx;
> +
> +	domain_stats = &pi->stats_desc->domain_stats[domain_id];
> +	domain_stats->opp_map = mapping;
> +
> +	return 0;
> +}
> +
> +static int
> +scmi_dvfs_stats_get(const struct scmi_handle *handle, u32 domain_id,
> +		    struct scmi_perf_domain_stats *stats)
> +{
> +	struct scmi_perf_domain_stats_desc *domain_stats;
> +	struct scmi_perf_domain_raw_stats *raw_stats[2];
> +	struct scmi_perf_info *pi = handle->perf_priv;
> +	struct scmi_perf_level_raw_stats *perf;
> +	int i, index, ret = -EINVAL;
> +	struct perf_dom_info *dom;
> +	u64 transition_count = 0;
> +	struct scmi_opp *opp;
> +
> +	if (!stats)
> +		return -EINVAL;
> +
> +	if (!pi->stats_virt_addr || !pi->stats_desc ||
> +		!pi->stats_desc->domain_stats)
> +		return -ENOENT;
> +
> +	if (pi->stats_desc->domain_count <= domain_id ||
> +		!pi->stats_desc->domain_stats[domain_id].addr)
> +		return -ENOENT;
> +
> +	dom = pi->dom_info + domain_id;

same ....scary without something like dom < pi->num_domains, even more
because this comes from the handle->statisticts_get() straight away

> +	if (!dom)
> +		return -EIO;
> +
> +	domain_stats = &pi->stats_desc->domain_stats[domain_id];
> +
> +	if (!domain_stats->opp_map) {
> +		ret = scmi_dvfs_setup_opps_mapping(handle, domain_id);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	raw_stats[0] = vmalloc(domain_stats->size);
> +	if (!raw_stats[0])
> +		return -ENOMEM;
> +
> +	raw_stats[1] = vmalloc(domain_stats->size);
> +	if (!raw_stats[1]) {
> +		vfree(raw_stats[0]);
> +		return -ENOMEM;
> +	}
> +
> +	/*
> +	 * Let's try 10 times. If two consecutive reads are the same - done.
> +	 * This approach is aligned with SCMI v2 specification.
> +	 */
> +	for (i = 0; i < 10; i++) {
> +		memcpy_fromio(raw_stats[0], domain_stats->addr,
> +			      domain_stats->size);
> +		memcpy_fromio(raw_stats[1], domain_stats->addr,
> +			      domain_stats->size);
> +		if (!memcmp(raw_stats[0], raw_stats[1], domain_stats->size)) {
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		goto free_buf;
> +
> +	for (i = 0; i < dom->opp_count; i++) {
> +		perf = &raw_stats[0]->perf_level[i];
> +
> +		transition_count += __le64_to_cpu(perf->usage_count);
> +		stats->time_in_state[i] =
> +				__le64_to_cpu(perf->total_residency_us);
> +
> +		/* Speed-up and initialize the frequencies only once. */
> +		if (stats->freq_table[i] == 0) {
> +			index = domain_stats->opp_map[i];
> +			opp = &dom->opp[index];
> +			stats->freq_table[i] = opp->perf * dom->mult_factor;
> +		}
> +	}
> +
> +	stats->total_trans = transition_count;
> +
> +	stats->last_index = __le16_to_cpu(raw_stats[0]->curr_perf_level_id);
> +	stats->last_time = __le64_to_cpu(raw_stats[0]->ts_last_change_us);
> +
> +free_buf:
> +	vfree(raw_stats[1]);
> +	vfree(raw_stats[0]);
> +
> +	return ret;
> +}
> +
>  static struct scmi_perf_ops perf_ops = {
>  	.limits_set = scmi_perf_limits_set,
>  	.limits_get = scmi_perf_limits_get,
> @@ -760,6 +969,7 @@ static struct scmi_perf_ops perf_ops = {
>  	.freq_get = scmi_dvfs_freq_get,
>  	.est_power_get = scmi_dvfs_est_power_get,
>  	.fast_switch_possible = scmi_fast_switch_possible,
> +	.statistics_get = scmi_dvfs_stats_get,
>  };
>  
>  static int scmi_perf_set_notify_enabled(const struct scmi_handle *handle,
> diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h
> index 7e5dd7d1e221..3316ff4f9d34 100644
> --- a/include/linux/scmi_protocol.h
> +++ b/include/linux/scmi_protocol.h
> @@ -55,6 +55,15 @@ struct scmi_clock_info {
>  	};
>  };
>  
> +struct scmi_perf_domain_stats {
> +	unsigned long long last_time;
> +	unsigned long long last_index;
> +	unsigned int total_trans;
> +	unsigned int state_num;
> +	u64 *time_in_state;

...got some recent negative feedback on mixing fixed-size fileds like u64 with
generic like unsigned long/int etc...so maybe unsigned long long is better here
since it is big enough; being this a time you could use ktime_t in other scenarios
BUT I suppose here derives from the nice 64bit microseconds fields in shared mem
stats so unsigned long long seems more clear (and ktime_t is signed and nanoseconds).

Thanks

Cristian

> +	unsigned int *freq_table;
> +};
> +
>  struct scmi_handle;
>  
>  /**
> @@ -121,6 +130,8 @@ struct scmi_perf_ops {
>  			     unsigned long *rate, unsigned long *power);
>  	bool (*fast_switch_possible)(const struct scmi_handle *handle,
>  				     struct device *dev);
> +	int (*statistics_get)(const struct scmi_handle *handle, u32 domain_id,
> +			      struct scmi_perf_domain_stats *stats);
>  };
>  
>  /**
> -- 
> 2.17.1


> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-07-30  9:36     ` Lukasz Luba
@ 2020-07-31 15:56       ` Sudeep Holla
  2020-08-04 17:19         ` Florian Fainelli
  2020-08-04  5:35       ` Viresh Kumar
  1 sibling, 1 reply; 25+ messages in thread
From: Sudeep Holla @ 2020-07-31 15:56 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Viresh Kumar, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw, Sudeep Holla

On Thu, Jul 30, 2020 at 10:36:51AM +0100, Lukasz Luba wrote:
> 
> In this case I think we would have to create debugfs.
> Sudeep do you think these debugfs should be exposed from the protocol
> layer:
> drivers/firmware/arm_scmi/perf.c

I prefer above over cpufreq as we can support for all the devices not
just cpus which avoids adding similar support elsewhere(mostly devfreq)

> or maybe from the cpufreq scmi driver? I would probably be safer to have
> it in the cpufreq driver because we have scmi_handle there.
>

Cristian was thinking if we can consolidate all such debugfs under one
device may be and that should eliminate your handle restriction. I would
like to see how that works out in implementation but I don't have any 
better suggestion ATM.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-07-30  9:36     ` Lukasz Luba
  2020-07-31 15:56       ` Sudeep Holla
@ 2020-08-04  5:35       ` Viresh Kumar
  2020-08-04 10:29         ` Lukasz Luba
  1 sibling, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2020-08-04  5:35 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Sudeep Holla, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw

On 30-07-20, 10:36, Lukasz Luba wrote:
> On 7/30/20 10:10 AM, Sudeep Holla wrote:
> > On Thu, Jul 30, 2020 at 02:23:33PM +0530, Viresh Kumar wrote:
> > > On 29-07-20, 16:12, Lukasz Luba wrote:
> > > > The existing CPUFreq framework does not tracks the statistics when the
> > > > 'fast switch' is used or when firmware changes the frequency independently
> > > > due to e.g. thermal reasons. However, the firmware might track the frequency
> > > > changes and expose this to the kernel.
> > > > 
> > > > This patch set aims to introduce CPUfreq statistics gathered by firmware
> > > > and retrieved by CPUFreq driver. It would require a new API functions
> > > > in the CPUFreq, which allows to poke drivers to get these stats.
> > > > 
> > > > The needed CPUFreq infrastructure is in patch 1/4, patch 2/4 extends
> > > > ARM SCMI protocol layer, patches 3/4, 4/4  modify ARM SCMI CPUFreq driver.
> > > 
> > > Are you doing this for the fast switch case or because your platform
> > > actually runs at frequencies which may be different from what cpufreq
> > > core has requested ?
> > > 
> > 
> > I think so.
> 
> For both cases, but fast switch is major and present. Thermal is not
> currently implemented in SCP FW, but might be in future.

Okay, lets simplify things a bit and merge things slowly upstream and merge only
what is required right now.

IIUC, the only concern right now is to capture stats with fast switch ? Maybe we
can do something else in that case and brainstorm a bit..

> > > I am also not sure what these tables should represent, what the
> > > cpufreq core has decided for the CPUs or the frequencies we actually
> > > run at, as these two can be very different for example if the hardware
> > > runs at frequencies which don't match exactly to what is there in the
> > > freq table. I believe these are rather to show what cpufreq and its
> > > governors are doing with the CPUs.
> > > 
> > 
> > Exactly, I raised similar point in internal discussion and asked Lukasz
> > to take up the same on the list. I assume it was always what cpufreq
> > requested rather than what was delivered. So will we break the userspace
> > ABI if we change that is the main question.
> 
> Thank you for confirmation. If that is the mechanism for tracking what
> cpufreq governors are doing with the CPUs, then is clashes with
> presented data in FW memory, because firmware is the governor.

Why is firmware the governor here ? Aren't you talking about the simple fast
switch case only ?

Over that, I think this cpufreq stats information isn't parsed by any tool right
now and tweaking it a bit won't hurt anyone (like if we start capturing things a
bit differently). So we may not want to worry about breaking userspace ABI here,
if what we are looking to do is the right thing to do.

> > > Over that I would like the userspace stats to work exactly as the way
> > > they work right now, i.e. capture all transitions from one freq to
> > > other, not just time-in-state. Also resetting of the stats from
> > > userspace for example. All allocation and printing of the data must be
> > > done from stats core, the only thing which the driver would do at the
> > > end is updating the stats structure and nothing more. Instead of
> > > reading all stats from the firmware, it will be much easier if you can
> > > just get the information from the firmware whenever there is a
> > > frequency switch and then we can update the stats the way it is done
> > > right now. And that would be simple.
> > > 
> > 
> > Good point, but notifications may not be lightweight. If that is no good,
> > alternatively, I suggested to keep these firmware stats in a separate
> > debugfs. Thoughts ?
> 
> I agree that notifications might not be lightweight.

I am not sure what notifications are we talking about here.

> Furthermore I think
> this still clashes with the assumption that cpufreq governor decisions
> are tracked in these statistics, not the firmware decision.
> 
> In this case I think we would have to create debugfs.
> Sudeep do you think these debugfs should be exposed from the protocol
> layer:
> drivers/firmware/arm_scmi/perf.c
> or maybe from the cpufreq scmi driver? I would probably be safer to have
> it in the cpufreq driver because we have scmi_handle there.

For the CPUs it would be better if we can keep things in cpufreq only, lets see
how we go about it.

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-04  5:35       ` Viresh Kumar
@ 2020-08-04 10:29         ` Lukasz Luba
  2020-08-04 10:38           ` Viresh Kumar
  0 siblings, 1 reply; 25+ messages in thread
From: Lukasz Luba @ 2020-08-04 10:29 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Sudeep Holla, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw



On 8/4/20 6:35 AM, Viresh Kumar wrote:
> On 30-07-20, 10:36, Lukasz Luba wrote:
>> On 7/30/20 10:10 AM, Sudeep Holla wrote:
>>> On Thu, Jul 30, 2020 at 02:23:33PM +0530, Viresh Kumar wrote:
>>>> On 29-07-20, 16:12, Lukasz Luba wrote:
>>>>> The existing CPUFreq framework does not tracks the statistics when the
>>>>> 'fast switch' is used or when firmware changes the frequency independently
>>>>> due to e.g. thermal reasons. However, the firmware might track the frequency
>>>>> changes and expose this to the kernel.
>>>>>
>>>>> This patch set aims to introduce CPUfreq statistics gathered by firmware
>>>>> and retrieved by CPUFreq driver. It would require a new API functions
>>>>> in the CPUFreq, which allows to poke drivers to get these stats.
>>>>>
>>>>> The needed CPUFreq infrastructure is in patch 1/4, patch 2/4 extends
>>>>> ARM SCMI protocol layer, patches 3/4, 4/4  modify ARM SCMI CPUFreq driver.
>>>>
>>>> Are you doing this for the fast switch case or because your platform
>>>> actually runs at frequencies which may be different from what cpufreq
>>>> core has requested ?
>>>>
>>>
>>> I think so.
>>
>> For both cases, but fast switch is major and present. Thermal is not
>> currently implemented in SCP FW, but might be in future.
> 
> Okay, lets simplify things a bit and merge things slowly upstream and merge only
> what is required right now.
> 
> IIUC, the only concern right now is to capture stats with fast switch ? Maybe we
> can do something else in that case and brainstorm a bit..

Correct, the fast switch is the only concern right now and not tracked. 
We could fill in that information with statistics data from firmware
with a cpufreq driver help.

I could make the if from patch 1/4 covering narrowed case, when
fast switch is present, check for drivers stats.
Something like:
-----------8<------------------------------------------------------------
if (policy->fast_switch_enabled)
	if (policy->has_driver_stats)
		return cpufreq_stats_present_driver_data(policy, buf);
	else
		return 0;
-------------->8----------------------------------------------------------

> 
>>>> I am also not sure what these tables should represent, what the
>>>> cpufreq core has decided for the CPUs or the frequencies we actually
>>>> run at, as these two can be very different for example if the hardware
>>>> runs at frequencies which don't match exactly to what is there in the
>>>> freq table. I believe these are rather to show what cpufreq and its
>>>> governors are doing with the CPUs.
>>>>
>>>
>>> Exactly, I raised similar point in internal discussion and asked Lukasz
>>> to take up the same on the list. I assume it was always what cpufreq
>>> requested rather than what was delivered. So will we break the userspace
>>> ABI if we change that is the main question.
>>
>> Thank you for confirmation. If that is the mechanism for tracking what
>> cpufreq governors are doing with the CPUs, then is clashes with
>> presented data in FW memory, because firmware is the governor.
> 
> Why is firmware the governor here ? Aren't you talking about the simple fast
> switch case only ?

I used a term 'governor' for the firmware because it makes the final
set for the frequency. It (FW) should respect the frequency value
set using the fast switch. I don't know how other firmware (e.g. Intel)
treats this fast switch value or if they even expose FW stats, though.

You can read about this statistics region in [1] at:
4.5.5 Performance domain statistics shared memory region

> 
> Over that, I think this cpufreq stats information isn't parsed by any tool right
> now and tweaking it a bit won't hurt anyone (like if we start capturing things a
> bit differently). So we may not want to worry about breaking userspace ABI here,
> if what we are looking to do is the right thing to do.

So, there is some hope... IMHO it would be better to have this cpufreq
stats in normal location, rather then in scmi debugfs.

> 
>>>> Over that I would like the userspace stats to work exactly as the way
>>>> they work right now, i.e. capture all transitions from one freq to
>>>> other, not just time-in-state. Also resetting of the stats from
>>>> userspace for example. All allocation and printing of the data must be
>>>> done from stats core, the only thing which the driver would do at the
>>>> end is updating the stats structure and nothing more. Instead of
>>>> reading all stats from the firmware, it will be much easier if you can
>>>> just get the information from the firmware whenever there is a
>>>> frequency switch and then we can update the stats the way it is done
>>>> right now. And that would be simple.
>>>>
>>>
>>> Good point, but notifications may not be lightweight. If that is no good,
>>> alternatively, I suggested to keep these firmware stats in a separate
>>> debugfs. Thoughts ?
>>
>> I agree that notifications might not be lightweight.
> 
> I am not sure what notifications are we talking about here.

There is a notification mechanism described in the SCMI spec [1] at
4.5.4 Notifications.
We were referring to that mechanism.

> 
>> Furthermore I think
>> this still clashes with the assumption that cpufreq governor decisions
>> are tracked in these statistics, not the firmware decision.
>>
>> In this case I think we would have to create debugfs.
>> Sudeep do you think these debugfs should be exposed from the protocol
>> layer:
>> drivers/firmware/arm_scmi/perf.c
>> or maybe from the cpufreq scmi driver? I would probably be safer to have
>> it in the cpufreq driver because we have scmi_handle there.
> 
> For the CPUs it would be better if we can keep things in cpufreq only, lets see
> how we go about it.
> 

If that would be only ARM SCMI debugfs directory, then we would like to
keep it in the scmi. We could re-use the code for devfreq (GPU)
device, which is also exposed as performance domain.

Thank you Viresh for your comments.

Regards,
Lukasz

[1] 
https://static.docs.arm.com/den0056/b/DEN0056B_System_Control_and_Management_Interface_v2_0.pdf

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-04 10:29         ` Lukasz Luba
@ 2020-08-04 10:38           ` Viresh Kumar
  2020-08-04 10:44             ` Lukasz Luba
  0 siblings, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2020-08-04 10:38 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Sudeep Holla, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw

On 04-08-20, 11:29, Lukasz Luba wrote:
> On 8/4/20 6:35 AM, Viresh Kumar wrote:
> > IIUC, the only concern right now is to capture stats with fast switch ? Maybe we
> > can do something else in that case and brainstorm a bit..
> 
> Correct, the fast switch is the only concern right now and not tracked. We
> could fill in that information with statistics data from firmware
> with a cpufreq driver help.
> 
> I could make the if from patch 1/4 covering narrowed case, when
> fast switch is present, check for drivers stats.
> Something like:
> -----------8<------------------------------------------------------------
> if (policy->fast_switch_enabled)
> 	if (policy->has_driver_stats)
> 		return cpufreq_stats_present_driver_data(policy, buf);
> 	else
> 		return 0;
> -------------->8----------------------------------------------------------

I don't think doing it with help of firmware is the right thing to do
here then. For another platform we may not have a firmware which can
help us, we need something in the opp core itself for that. Lemme see
if I can do something about it.

> > Why is firmware the governor here ? Aren't you talking about the simple fast
> > switch case only ?
> 
> I used a term 'governor' for the firmware because it makes the final
> set for the frequency. It (FW) should respect the frequency value
> set using the fast switch. I don't know how other firmware (e.g. Intel)
> treats this fast switch value or if they even expose FW stats, though.

For Intel I think, Linux is one of the entities that vote for deciding
the frequency of the CPUs and the firmware (after taking all such
factors into account) chooses a frequency by its own, which must be >=
the frequency requested by Linux.

> You can read about this statistics region in [1] at:
> 4.5.5 Performance domain statistics shared memory region
> 
> > 
> > Over that, I think this cpufreq stats information isn't parsed by any tool right
> > now and tweaking it a bit won't hurt anyone (like if we start capturing things a
> > bit differently). So we may not want to worry about breaking userspace ABI here,
> > if what we are looking to do is the right thing to do.
> 
> So, there is some hope... IMHO it would be better to have this cpufreq
> stats in normal location, rather then in scmi debugfs.

I agree.

> > I am not sure what notifications are we talking about here.
> 
> There is a notification mechanism described in the SCMI spec [1] at
> 4.5.4 Notifications.
> We were referring to that mechanism.

Ahh, I see. All I was thinking was about the cpufreq specific
notifiers :)

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-04 10:38           ` Viresh Kumar
@ 2020-08-04 10:44             ` Lukasz Luba
  2020-09-02  7:26               ` Viresh Kumar
  0 siblings, 1 reply; 25+ messages in thread
From: Lukasz Luba @ 2020-08-04 10:44 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Sudeep Holla, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw



On 8/4/20 11:38 AM, Viresh Kumar wrote:
> On 04-08-20, 11:29, Lukasz Luba wrote:
>> On 8/4/20 6:35 AM, Viresh Kumar wrote:
>>> IIUC, the only concern right now is to capture stats with fast switch ? Maybe we
>>> can do something else in that case and brainstorm a bit..
>>
>> Correct, the fast switch is the only concern right now and not tracked. We
>> could fill in that information with statistics data from firmware
>> with a cpufreq driver help.
>>
>> I could make the if from patch 1/4 covering narrowed case, when
>> fast switch is present, check for drivers stats.
>> Something like:
>> -----------8<------------------------------------------------------------
>> if (policy->fast_switch_enabled)
>> 	if (policy->has_driver_stats)
>> 		return cpufreq_stats_present_driver_data(policy, buf);
>> 	else
>> 		return 0;
>> -------------->8----------------------------------------------------------
> 
> I don't think doing it with help of firmware is the right thing to do
> here then. For another platform we may not have a firmware which can
> help us, we need something in the opp core itself for that. Lemme see
> if I can do something about it.

OK, great, I will wait then with this patch series v2 which would change
into debugfs scmi only. Could you please add me on CC, I am very
interested in.

> 
>>> Why is firmware the governor here ? Aren't you talking about the simple fast
>>> switch case only ?
>>
>> I used a term 'governor' for the firmware because it makes the final
>> set for the frequency. It (FW) should respect the frequency value
>> set using the fast switch. I don't know how other firmware (e.g. Intel)
>> treats this fast switch value or if they even expose FW stats, though.
> 
> For Intel I think, Linux is one of the entities that vote for deciding
> the frequency of the CPUs and the firmware (after taking all such
> factors into account) chooses a frequency by its own, which must be >=
> the frequency requested by Linux.
> 
>> You can read about this statistics region in [1] at:
>> 4.5.5 Performance domain statistics shared memory region
>>
>>>
>>> Over that, I think this cpufreq stats information isn't parsed by any tool right
>>> now and tweaking it a bit won't hurt anyone (like if we start capturing things a
>>> bit differently). So we may not want to worry about breaking userspace ABI here,
>>> if what we are looking to do is the right thing to do.
>>
>> So, there is some hope... IMHO it would be better to have this cpufreq
>> stats in normal location, rather then in scmi debugfs.
> 
> I agree.
> 
>>> I am not sure what notifications are we talking about here.
>>
>> There is a notification mechanism described in the SCMI spec [1] at
>> 4.5.4 Notifications.
>> We were referring to that mechanism.
> 
> Ahh, I see. All I was thinking was about the cpufreq specific
> notifiers :)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics
  2020-07-31 15:15   ` Cristian Marussi
@ 2020-08-04 11:10     ` Lukasz Luba
  0 siblings, 0 replies; 25+ messages in thread
From: Lukasz Luba @ 2020-08-04 11:10 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: linux-kernel, linux-arm-kernel, linux-pm, sudeep.holla,
	viresh.kumar, rjw

Hi Cristian,

On 7/31/20 4:15 PM, Cristian Marussi wrote:
> Hi
> 
> On Wed, Jul 29, 2020 at 04:12:06PM +0100, Lukasz Luba wrote:
>> The firmware is able to maintain performance statistics and share with OS
>> via shared memory. The memory region can be interpreted by the SCMI perf
>> protocol after receiving and mapping the proper addresses from FW.
>> This patch aims to provide needed infrastructure and setup necessary
>> mechanisms in the protocol layer.
>>
>> It also extends API functions for the upper layer (cpufreq, devfreq)
>> with a new callback, which allows to retrieve the statistics for a
>> particular performance domain. The new structure scmi_perf_domain_stats
>> added in the header works as a glue for these two layers.
>>
>> The data processing code for the shared memory is aligned with SCMI v2
>> specification (DEN0056B) and handles the required endianness. It can
>> be changed in future not disturbing the upper layer.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   drivers/firmware/arm_scmi/perf.c | 210 +++++++++++++++++++++++++++++++
>>   include/linux/scmi_protocol.h    |  11 ++
>>   2 files changed, 221 insertions(+)
>>
>> diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
>> index 3e1e87012c95..761067bb6237 100644
>> --- a/drivers/firmware/arm_scmi/perf.c
>> +++ b/drivers/firmware/arm_scmi/perf.c
>> @@ -19,6 +19,9 @@
>>   #include "common.h"
>>   #include "notify.h"
>>   
>> +#define PERF_DOMAIN_STATS_OFFSETS_BASE		0x10
>> +#define PERF_DOMAIN_COUNT_BASE			0x8
>> +
>>   enum scmi_performance_protocol_cmd {
>>   	PERF_DOMAIN_ATTRIBUTES = 0x3,
>>   	PERF_DESCRIBE_LEVELS = 0x4,
>> @@ -32,11 +35,27 @@ enum scmi_performance_protocol_cmd {
>>   };
>>   
>>   struct scmi_opp {
>> +	u32 id;
>>   	u32 perf;
>>   	u32 power;
>>   	u32 trans_latency_us;
>>   };
>>   
>> +struct scmi_perf_level_raw_stats {
>> +	__le32 perf_level_id;
>> +	__le32 reserved;
>> +	__le64 usage_count;
>> +	__le64 total_residency_us;
>> +};
>> +
>> +struct scmi_perf_domain_raw_stats {
>> +	__le16 perf_level_count;
>> +	__le16 curr_perf_level_id;
>> +	__le32 extended_stats_offset;
>> +	__le64 ts_last_change_us;
>> +	struct scmi_perf_level_raw_stats perf_level[];
>> +};
>> +
>>   struct scmi_msg_resp_perf_attributes {
>>   	__le16 num_domains;
>>   	__le16 flags;
>> @@ -161,13 +180,26 @@ struct perf_dom_info {
>>   	struct scmi_fc_info *fc_info;
>>   };
>>   
>> +struct scmi_perf_domain_stats_desc {
>> +	void __iomem *addr;
>> +	int *opp_map;
>> +	int size;
>> +};
>> +
>> +struct scmi_perf_stats_desc {
>> +	uint16_t domain_count;
>> +	struct scmi_perf_domain_stats_desc *domain_stats;
>> +};
>> +
>>   struct scmi_perf_info {
>>   	u32 version;
>>   	int num_domains;
>>   	bool power_scale_mw;
>>   	u64 stats_addr;
>>   	u32 stats_size;
>> +	void __iomem *stats_virt_addr;
>>   	struct perf_dom_info *dom_info;
>> +	struct scmi_perf_stats_desc *stats_desc;
>>   };
>>   
>>   static enum scmi_performance_protocol_cmd evt_2_cmd[] = {
>> @@ -175,6 +207,55 @@ static enum scmi_performance_protocol_cmd evt_2_cmd[] = {
>>   	PERF_NOTIFY_LEVEL,
>>   };
>>   
>> +static int scmi_perf_stats_init(const struct scmi_handle *handle,
>> +				struct scmi_perf_info *pi)
>> +{
>> +	struct scmi_perf_domain_stats_desc *domain_stats;
>> +	int i, domain_count;
>> +	__le32 offset;
>> +
> 
> LGTM by I'd add also the check for the Signature field first of all, to rule
> out misconfigured/misaligned memory when integrating with fw.
> Regarding revision and attributes, they are just zero and, as of now, not exposed
> to upper layers but I'm wondering if we should not parse and expose them too already
> to be future proof (since future SCMIv3 is near really and it will change this mechanism
> and bump revision field.)

Make sense, I will add that revision check.

> 
>> +	domain_count = le16_to_cpu(ioread16(pi->stats_virt_addr +
>> +					    PERF_DOMAIN_COUNT_BASE));
>> +
> 
> Would be worth to check this against pinfo->num_domains ? (real question)
> I suppose that if the platform limits the visible domains to this agent
> it should be consistent between stats and messages.

Good point. The value above should be less or equal to the previously
returned pi->num_domain. But if the firmware has a bug, then that could
explode. I will change it, and compare with the 'pi->num_domains'.


> 
>> +	pi->stats_desc = devm_kzalloc(handle->dev,
>> +				      sizeof(struct scmi_perf_stats_desc),
>> +				      GFP_KERNEL);
>> +	if (!pi->stats_desc)
>> +		return -ENOMEM;
>> +
>> +	pi->stats_desc->domain_stats = devm_kzalloc(handle->dev, domain_count *
>> +				sizeof(struct scmi_perf_domain_stats_desc),
> 
> nit: ... sizeof(*domain_stats) ?

yes

> 
>> +				GFP_KERNEL);
>> +	if (!pi->stats_desc->domain_stats)
>> +		return -ENOMEM;
>> +
>> +	pi->stats_desc->domain_count = domain_count;
>> +
>> +	for (i = 0; i < domain_count; i++) {
>> +		int stats_size;
>> +		__le16 opp_count;
>> +
>> +		offset = ioread32(pi->stats_virt_addr +
>> +				PERF_DOMAIN_STATS_OFFSETS_BASE + i * 4);
>> +		if (!offset)
>> +			continue;
>> +
>> +		domain_stats = &pi->stats_desc->domain_stats[i];
>> +
>> +		domain_stats->addr = pi->stats_virt_addr + le32_to_cpu(offset);
>> +
>> +		/* The first field is the performance level count. */
>> +		opp_count = le16_to_cpu(ioread16(domain_stats->addr));
>> +		stats_size = sizeof(struct scmi_perf_domain_raw_stats);
>> +		stats_size += sizeof(struct scmi_perf_level_raw_stats) *
>> +				opp_count;
>> +
>> +		domain_stats->size = stats_size;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int scmi_perf_attributes_get(const struct scmi_handle *handle,
>>   				    struct scmi_perf_info *pi)
>>   {
>> @@ -198,6 +279,14 @@ static int scmi_perf_attributes_get(const struct scmi_handle *handle,
>>   		pi->stats_addr = le32_to_cpu(attr->stats_addr_low) |
>>   				(u64)le32_to_cpu(attr->stats_addr_high) << 32;
>>   		pi->stats_size = le32_to_cpu(attr->stats_size);
>> +		if (pi->stats_addr && pi->stats_size) {
>> +			pi->stats_virt_addr = devm_ioremap(handle->dev,
>> +					pi->stats_addr, pi->stats_size);
>> +			if (pi->stats_virt_addr)
>> +				ret = scmi_perf_stats_init(handle, pi);
>> +			else
>> +				ret = -ENOMEM;
>> +		}
>>   	}
>>   
>>   	scmi_xfer_put(handle, t);
>> @@ -298,6 +387,7 @@ scmi_perf_describe_levels_get(const struct scmi_handle *handle, u32 domain,
>>   			opp->power = le32_to_cpu(level_info->opp[cnt].power);
>>   			opp->trans_latency_us = le16_to_cpu
>>   				(level_info->opp[cnt].transition_latency_us);
>> +			opp->id = tot_opp_cnt + cnt;
>>   
>>   			dev_dbg(handle->dev, "Level %d Power %d Latency %dus\n",
>>   				opp->perf, opp->power, opp->trans_latency_us);
>> @@ -748,6 +838,125 @@ static bool scmi_fast_switch_possible(const struct scmi_handle *handle,
>>   	return dom->fc_info && dom->fc_info->level_set_addr;
>>   }
>>   
>> +static int scmi_dvfs_setup_opps_mapping(const struct scmi_handle *handle,
>> +					u32 domain_id)
>> +{
>> +	struct scmi_perf_domain_stats_desc *domain_stats;
>> +	struct scmi_perf_info *pi = handle->perf_priv;
>> +	struct perf_dom_info *dom;
>> +	struct scmi_opp *opp;
>> +	int idx, *mapping;
>> +
>> +	dom = pi->dom_info + domain_id;
>> +	if (!dom)
>> +		return -EIO;
> This is a bit scary without something like dom < pi->num_domains :D

That make sense in case of FW bug described above.

> 
>> +
>> +	mapping = devm_kzalloc(handle->dev, sizeof(int) * dom->opp_count,
>> +			       GFP_KERNEL);
>> +	if (!mapping)
>> +		return -ENOMEM;
>> +
>> +	/* Construct LUT with FW OPP ids as an index */
>> +	for (opp = dom->opp, idx = 0; idx < dom->opp_count; idx++, opp++)
>> +		mapping[opp->id] = idx;
>> +
>> +	domain_stats = &pi->stats_desc->domain_stats[domain_id];
>> +	domain_stats->opp_map = mapping;
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +scmi_dvfs_stats_get(const struct scmi_handle *handle, u32 domain_id,
>> +		    struct scmi_perf_domain_stats *stats)
>> +{
>> +	struct scmi_perf_domain_stats_desc *domain_stats;
>> +	struct scmi_perf_domain_raw_stats *raw_stats[2];
>> +	struct scmi_perf_info *pi = handle->perf_priv;
>> +	struct scmi_perf_level_raw_stats *perf;
>> +	int i, index, ret = -EINVAL;
>> +	struct perf_dom_info *dom;
>> +	u64 transition_count = 0;
>> +	struct scmi_opp *opp;
>> +
>> +	if (!stats)
>> +		return -EINVAL;
>> +
>> +	if (!pi->stats_virt_addr || !pi->stats_desc ||
>> +		!pi->stats_desc->domain_stats)
>> +		return -ENOENT;
>> +
>> +	if (pi->stats_desc->domain_count <= domain_id ||
>> +		!pi->stats_desc->domain_stats[domain_id].addr)
>> +		return -ENOENT;
>> +
>> +	dom = pi->dom_info + domain_id;
> 
> same ....scary without something like dom < pi->num_domains, even more
> because this comes from the handle->statisticts_get() straight away

Same here. I will rewrite that bit with more safe checks.

> 
>> +	if (!dom)
>> +		return -EIO;
>> +
>> +	domain_stats = &pi->stats_desc->domain_stats[domain_id];
>> +
>> +	if (!domain_stats->opp_map) {
>> +		ret = scmi_dvfs_setup_opps_mapping(handle, domain_id);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	raw_stats[0] = vmalloc(domain_stats->size);
>> +	if (!raw_stats[0])
>> +		return -ENOMEM;
>> +
>> +	raw_stats[1] = vmalloc(domain_stats->size);
>> +	if (!raw_stats[1]) {
>> +		vfree(raw_stats[0]);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	/*
>> +	 * Let's try 10 times. If two consecutive reads are the same - done.
>> +	 * This approach is aligned with SCMI v2 specification.
>> +	 */
>> +	for (i = 0; i < 10; i++) {
>> +		memcpy_fromio(raw_stats[0], domain_stats->addr,
>> +			      domain_stats->size);
>> +		memcpy_fromio(raw_stats[1], domain_stats->addr,
>> +			      domain_stats->size);
>> +		if (!memcmp(raw_stats[0], raw_stats[1], domain_stats->size)) {
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (ret)
>> +		goto free_buf;
>> +
>> +	for (i = 0; i < dom->opp_count; i++) {
>> +		perf = &raw_stats[0]->perf_level[i];
>> +
>> +		transition_count += __le64_to_cpu(perf->usage_count);
>> +		stats->time_in_state[i] =
>> +				__le64_to_cpu(perf->total_residency_us);
>> +
>> +		/* Speed-up and initialize the frequencies only once. */
>> +		if (stats->freq_table[i] == 0) {
>> +			index = domain_stats->opp_map[i];
>> +			opp = &dom->opp[index];
>> +			stats->freq_table[i] = opp->perf * dom->mult_factor;
>> +		}
>> +	}
>> +
>> +	stats->total_trans = transition_count;
>> +
>> +	stats->last_index = __le16_to_cpu(raw_stats[0]->curr_perf_level_id);
>> +	stats->last_time = __le64_to_cpu(raw_stats[0]->ts_last_change_us);
>> +
>> +free_buf:
>> +	vfree(raw_stats[1]);
>> +	vfree(raw_stats[0]);
>> +
>> +	return ret;
>> +}
>> +
>>   static struct scmi_perf_ops perf_ops = {
>>   	.limits_set = scmi_perf_limits_set,
>>   	.limits_get = scmi_perf_limits_get,
>> @@ -760,6 +969,7 @@ static struct scmi_perf_ops perf_ops = {
>>   	.freq_get = scmi_dvfs_freq_get,
>>   	.est_power_get = scmi_dvfs_est_power_get,
>>   	.fast_switch_possible = scmi_fast_switch_possible,
>> +	.statistics_get = scmi_dvfs_stats_get,
>>   };
>>   
>>   static int scmi_perf_set_notify_enabled(const struct scmi_handle *handle,
>> diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h
>> index 7e5dd7d1e221..3316ff4f9d34 100644
>> --- a/include/linux/scmi_protocol.h
>> +++ b/include/linux/scmi_protocol.h
>> @@ -55,6 +55,15 @@ struct scmi_clock_info {
>>   	};
>>   };
>>   
>> +struct scmi_perf_domain_stats {
>> +	unsigned long long last_time;
>> +	unsigned long long last_index;
>> +	unsigned int total_trans;
>> +	unsigned int state_num;
>> +	u64 *time_in_state;
> 
> ...got some recent negative feedback on mixing fixed-size fileds like u64 with
> generic like unsigned long/int etc...so maybe unsigned long long is better here
> since it is big enough; being this a time you could use ktime_t in other scenarios
> BUT I suppose here derives from the nice 64bit microseconds fields in shared mem
> stats so unsigned long long seems more clear (and ktime_t is signed and nanoseconds).

I followed the structure cpufreq_stats, which has these fields, but I
agree with you unsigned long long looks better here. I will change it.


Thank you for the review.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-07-31 15:56       ` Sudeep Holla
@ 2020-08-04 17:19         ` Florian Fainelli
  2020-08-05 12:36           ` Sudeep Holla
  0 siblings, 1 reply; 25+ messages in thread
From: Florian Fainelli @ 2020-08-04 17:19 UTC (permalink / raw)
  To: Sudeep Holla, Lukasz Luba
  Cc: Viresh Kumar, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw



On 7/31/2020 8:56 AM, Sudeep Holla wrote:
> On Thu, Jul 30, 2020 at 10:36:51AM +0100, Lukasz Luba wrote:
>>
>> In this case I think we would have to create debugfs.
>> Sudeep do you think these debugfs should be exposed from the protocol
>> layer:
>> drivers/firmware/arm_scmi/perf.c
> 
> I prefer above over cpufreq as we can support for all the devices not
> just cpus which avoids adding similar support elsewhere(mostly devfreq)
> 
>> or maybe from the cpufreq scmi driver? I would probably be safer to have
>> it in the cpufreq driver because we have scmi_handle there.
>>
> 
> Cristian was thinking if we can consolidate all such debugfs under one
> device may be and that should eliminate your handle restriction. I would
> like to see how that works out in implementation but I don't have any 
> better suggestion ATM.

debugfs is not enabled in production kernels, and especially not with
Android kernels, so sticking those in sysfs like the existing cpufreq
subsystem statistics may be a better choice.
-- 
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-07-29 15:12 [PATCH 0/4] CPUFreq statistics retrieved by drivers Lukasz Luba
                   ` (4 preceding siblings ...)
  2020-07-30  8:53 ` [PATCH 0/4] CPUFreq statistics retrieved by drivers Viresh Kumar
@ 2020-08-04 17:27 ` Florian Fainelli
  2020-08-05 11:04   ` Lukasz Luba
  5 siblings, 1 reply; 25+ messages in thread
From: Florian Fainelli @ 2020-08-04 17:27 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-arm-kernel, linux-pm
  Cc: sudeep.holla, cristian.marussi, viresh.kumar, rjw



On 7/29/2020 8:12 AM, Lukasz Luba wrote:
> Hi all,
> 
> The existing CPUFreq framework does not tracks the statistics when the
> 'fast switch' is used or when firmware changes the frequency independently
> due to e.g. thermal reasons. However, the firmware might track the frequency
> changes and expose this to the kernel.

Or the firmware might have changed the CPU frequency in response to a
request from the secure world for instance.

> 
> This patch set aims to introduce CPUfreq statistics gathered by firmware
> and retrieved by CPUFreq driver. It would require a new API functions
> in the CPUFreq, which allows to poke drivers to get these stats.

From a debugging perspective, it would be helpful if the firmware
maintained statistics were exposed as a super-set of the Linux cpufreq
statistics and aggregated into them such that you could view the normal
world vs. secure world residency of a given frequency point. This would
help because a lot of times, Linux requests freq X, but the secure world
requires freq Y (with X >= Y) and people do not really understand why
the resulting power usage is higher for instance.

What are your thoughts on this?
-- 
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-04 17:27 ` Florian Fainelli
@ 2020-08-05 11:04   ` Lukasz Luba
  2020-08-05 13:04     ` Viresh Kumar
  0 siblings, 1 reply; 25+ messages in thread
From: Lukasz Luba @ 2020-08-05 11:04 UTC (permalink / raw)
  To: Florian Fainelli, linux-kernel, linux-arm-kernel, linux-pm
  Cc: sudeep.holla, cristian.marussi, viresh.kumar, rjw



On 8/4/20 6:27 PM, Florian Fainelli wrote:
> 
> 
> On 7/29/2020 8:12 AM, Lukasz Luba wrote:
>> Hi all,
>>
>> The existing CPUFreq framework does not tracks the statistics when the
>> 'fast switch' is used or when firmware changes the frequency independently
>> due to e.g. thermal reasons. However, the firmware might track the frequency
>> changes and expose this to the kernel.
> 
> Or the firmware might have changed the CPU frequency in response to a
> request from the secure world for instance.

Possible

> 
>>
>> This patch set aims to introduce CPUfreq statistics gathered by firmware
>> and retrieved by CPUFreq driver. It would require a new API functions
>> in the CPUFreq, which allows to poke drivers to get these stats.
> 
>  From a debugging perspective, it would be helpful if the firmware
> maintained statistics were exposed as a super-set of the Linux cpufreq
> statistics and aggregated into them such that you could view the normal
> world vs. secure world residency of a given frequency point. This would
> help because a lot of times, Linux requests freq X, but the secure world
> requires freq Y (with X >= Y) and people do not really understand why
> the resulting power usage is higher for instance.
> 
> What are your thoughts on this?
> 

I know that Viresh is going to develop patches and improve these
cpufreq stats framework. Maybe he also had this 'aggregation' in mind.
I will leave it him.

Thank you for your feedback.

Regards,
Lukasz




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-04 17:19         ` Florian Fainelli
@ 2020-08-05 12:36           ` Sudeep Holla
  0 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2020-08-05 12:36 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Lukasz Luba, Viresh Kumar, linux-kernel, linux-arm-kernel,
	linux-pm, cristian.marussi, Sudeep Holla, rjw

On Tue, Aug 04, 2020 at 10:19:23AM -0700, Florian Fainelli wrote:
>
>
> On 7/31/2020 8:56 AM, Sudeep Holla wrote:
> > On Thu, Jul 30, 2020 at 10:36:51AM +0100, Lukasz Luba wrote:
> >>
> >> In this case I think we would have to create debugfs.
> >> Sudeep do you think these debugfs should be exposed from the protocol
> >> layer:
> >> drivers/firmware/arm_scmi/perf.c
> >
> > I prefer above over cpufreq as we can support for all the devices not
> > just cpus which avoids adding similar support elsewhere(mostly devfreq)
> >
> >> or maybe from the cpufreq scmi driver? I would probably be safer to have
> >> it in the cpufreq driver because we have scmi_handle there.
> >>
> >
> > Cristian was thinking if we can consolidate all such debugfs under one
> > device may be and that should eliminate your handle restriction. I would
> > like to see how that works out in implementation but I don't have any
> > better suggestion ATM.
>
> debugfs is not enabled in production kernels, and especially not with
> Android kernels, so sticking those in sysfs like the existing cpufreq
> subsystem statistics may be a better choice.

Fair enough. I was suggesting that only if we can't push this into existing
sysfs support. If we can, then we need not worry about it. If not, I don't
want a user ABI just for SCMI for this firmware stats, I would rather keep
it in debugfs for debug purposes. This will be useless once we start seeing
AMU in the hardware and hence I was pushing for debugfs.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-05 11:04   ` Lukasz Luba
@ 2020-08-05 13:04     ` Viresh Kumar
  2020-08-05 16:03       ` Sudeep Holla
  0 siblings, 1 reply; 25+ messages in thread
From: Viresh Kumar @ 2020-08-05 13:04 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Florian Fainelli, linux-kernel, linux-arm-kernel, linux-pm,
	sudeep.holla, cristian.marussi, rjw

On 05-08-20, 12:04, Lukasz Luba wrote:
> I know that Viresh is going to develop patches and improve these
> cpufreq stats framework. Maybe he also had this 'aggregation' in mind.
> I will leave it him.

I am only going to look at cpufreq's view of stats independently from
the firmware.

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-05 13:04     ` Viresh Kumar
@ 2020-08-05 16:03       ` Sudeep Holla
  2020-08-05 17:33         ` Florian Fainelli
  0 siblings, 1 reply; 25+ messages in thread
From: Sudeep Holla @ 2020-08-05 16:03 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Lukasz Luba, Florian Fainelli, linux-kernel, linux-arm-kernel,
	linux-pm, cristian.marussi, rjw, Sudeep Holla

On Wed, Aug 05, 2020 at 06:34:36PM +0530, Viresh Kumar wrote:
> On 05-08-20, 12:04, Lukasz Luba wrote:
> > I know that Viresh is going to develop patches and improve these
> > cpufreq stats framework. Maybe he also had this 'aggregation' in mind.
> > I will leave it him.
>
> I am only going to look at cpufreq's view of stats independently from
> the firmware.
>

+1, I agree with that. Kernel must avoid any logic to aggregate or
interpret the data in a generic way. The userspace tools can manage that
especially if this tend to be platform specific.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-05 16:03       ` Sudeep Holla
@ 2020-08-05 17:33         ` Florian Fainelli
  2020-08-06 13:37           ` Sudeep Holla
  0 siblings, 1 reply; 25+ messages in thread
From: Florian Fainelli @ 2020-08-05 17:33 UTC (permalink / raw)
  To: Sudeep Holla, Viresh Kumar
  Cc: Lukasz Luba, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw



On 8/5/2020 9:03 AM, Sudeep Holla wrote:
> On Wed, Aug 05, 2020 at 06:34:36PM +0530, Viresh Kumar wrote:
>> On 05-08-20, 12:04, Lukasz Luba wrote:
>>> I know that Viresh is going to develop patches and improve these
>>> cpufreq stats framework. Maybe he also had this 'aggregation' in mind.
>>> I will leave it him.
>>
>> I am only going to look at cpufreq's view of stats independently from
>> the firmware.
>>
> 
> +1, I agree with that. Kernel must avoid any logic to aggregate or
> interpret the data in a generic way. The userspace tools can manage that
> especially if this tend to be platform specific.

We can probably standardize on how to expose the firmware maintained
statistics such that these tools do not have to widely vary from
platform to platform, right?
-- 
Florian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-05 17:33         ` Florian Fainelli
@ 2020-08-06 13:37           ` Sudeep Holla
  0 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2020-08-06 13:37 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Viresh Kumar, Lukasz Luba, linux-kernel, linux-arm-kernel,
	linux-pm, cristian.marussi, rjw, Sudeep Holla

On Wed, Aug 05, 2020 at 10:33:02AM -0700, Florian Fainelli wrote:
>
>
> On 8/5/2020 9:03 AM, Sudeep Holla wrote:
> > On Wed, Aug 05, 2020 at 06:34:36PM +0530, Viresh Kumar wrote:
> >> On 05-08-20, 12:04, Lukasz Luba wrote:
> >>> I know that Viresh is going to develop patches and improve these
> >>> cpufreq stats framework. Maybe he also had this 'aggregation' in mind.
> >>> I will leave it him.
> >>
> >> I am only going to look at cpufreq's view of stats independently from
> >> the firmware.
> >>
> >
> > +1, I agree with that. Kernel must avoid any logic to aggregate or
> > interpret the data in a generic way. The userspace tools can manage that
> > especially if this tend to be platform specific.
>
> We can probably standardize on how to expose the firmware maintained
> statistics such that these tools do not have to widely vary from
> platform to platform, right?

Ofcourse. I just don't want any logic that interpret/analyse the stats
comparing with the in-kernel stats or otherwise.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] CPUFreq statistics retrieved by drivers
  2020-08-04 10:44             ` Lukasz Luba
@ 2020-09-02  7:26               ` Viresh Kumar
  0 siblings, 0 replies; 25+ messages in thread
From: Viresh Kumar @ 2020-09-02  7:26 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Sudeep Holla, linux-kernel, linux-arm-kernel, linux-pm,
	cristian.marussi, rjw

On 04-08-20, 11:44, Lukasz Luba wrote:
> On 8/4/20 11:38 AM, Viresh Kumar wrote:
> > I don't think doing it with help of firmware is the right thing to do
> > here then. For another platform we may not have a firmware which can
> > help us, we need something in the opp core itself for that. Lemme see
> > if I can do something about it.
> 
> OK, great, I will wait then with this patch series v2 which would change
> into debugfs scmi only. Could you please add me on CC, I am very
> interested in.

Here is an attempt.

http://lore.kernel.org/lkml/cover.1599031227.git.viresh.kumar@linaro.org

-- 
viresh

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-09-02  7:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-29 15:12 [PATCH 0/4] CPUFreq statistics retrieved by drivers Lukasz Luba
2020-07-29 15:12 ` [PATCH 1/4] cpufreq: Add support for statistics read from drivers Lukasz Luba
2020-07-29 15:12 ` [PATCH 2/4] scmi: perf: Extend protocol to support performance statistics Lukasz Luba
2020-07-31  1:50   ` kernel test robot
2020-07-31 15:15   ` Cristian Marussi
2020-08-04 11:10     ` Lukasz Luba
2020-07-29 15:12 ` [PATCH 3/4] cpufreq: scmi: Move scmi_cpufreq_driver structure to the top Lukasz Luba
2020-07-29 15:12 ` [PATCH 4/4] cpufreq: scmi: Read statistics from FW shared memory Lukasz Luba
2020-07-30  8:53 ` [PATCH 0/4] CPUFreq statistics retrieved by drivers Viresh Kumar
2020-07-30  9:10   ` Sudeep Holla
2020-07-30  9:36     ` Lukasz Luba
2020-07-31 15:56       ` Sudeep Holla
2020-08-04 17:19         ` Florian Fainelli
2020-08-05 12:36           ` Sudeep Holla
2020-08-04  5:35       ` Viresh Kumar
2020-08-04 10:29         ` Lukasz Luba
2020-08-04 10:38           ` Viresh Kumar
2020-08-04 10:44             ` Lukasz Luba
2020-09-02  7:26               ` Viresh Kumar
2020-08-04 17:27 ` Florian Fainelli
2020-08-05 11:04   ` Lukasz Luba
2020-08-05 13:04     ` Viresh Kumar
2020-08-05 16:03       ` Sudeep Holla
2020-08-05 17:33         ` Florian Fainelli
2020-08-06 13:37           ` Sudeep Holla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).