linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver
@ 2016-01-22  7:19 Shilpasri G Bhat
  2016-01-22  7:19 ` [PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread Shilpasri G Bhat
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-01-22  7:19 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: rjw, viresh.kumar, linux-pm, pc, anton, ego, shreyas, Shilpasri G Bhat

In POWER8, OCC(On-Chip-Controller) can throttle the frequency of the
CPU when the chip crosses its thermal and power limits. Currently,
powernv-cpufreq driver detects and reports this event as a console
message. Some machines may not sustain the max turbo frequency in all
conditions and can be throttled frequently. This can lead to the
flooding of console with throttle messages. So this patchset aims to
redesign the presentation of this event via sysfs counters and
tracepoints. And it also fixes couple of bugs reported in the driver.

- Patch [1] fixes the cpu hot-plug bug in powernv_cpufreq_work_fn().
- Patch [2] solves a bug in powernv_cpufreq_throttle_check(), which
  calls in to cpu_to_chip_id() in hot path which reads DT every time
  to find the chip id.
- Patches [3] to [5] will add a perf trace point
  "power:powernv_throttle" and sysfs throttle counter stats in
  /sys/devices/system/cpu/cpufreq/chipN.

Changes from v5:
- Fix kbuild error:
drivers/cpufreq/powernv-cpufreq.c:428:2: error: implicit declaration of
function 'get_online_cpus' [-Werror=implicit-function-declaration]

Changes from v4:
- Fix a hot-plug bug in powernv_cpufreq_work_fn()
- Changes wrt Gautham's and Shreyas's comments 

Changes from v3:
- Add a fix to replace cpu_to_chip_id() with simpler PIR shift to 
  obtain the chip id.
- Break patch2 in to two patches separating the tracepoint and sysfs
  attribute changes.

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]

Shilpasri G Bhat (5):
  cpufreq: powernv: Hot-plug safe the kworker thread
  cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  cpufreq: powernv/tracing: Add powernv_throttle tracepoint
  cpufreq: powernv: Replace pr_info with trace print for throttle event
  cpufreq: powernv: Add sysfs attributes to show throttle stats

 drivers/cpufreq/powernv-cpufreq.c | 313 +++++++++++++++++++++++++++++++-------
 include/trace/events/power.h      |  22 +++
 kernel/trace/power-traces.c       |   1 +
 3 files changed, 281 insertions(+), 55 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread
  2016-01-22  7:19 [PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver Shilpasri G Bhat
@ 2016-01-22  7:19 ` Shilpasri G Bhat
  2016-01-25  5:49   ` Viresh Kumar
  2016-01-22  7:19 ` [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path Shilpasri G Bhat
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-01-22  7:19 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: rjw, viresh.kumar, linux-pm, pc, anton, ego, shreyas, Shilpasri G Bhat

In the kworker_thread powernv_cpufreq_work_fn(), we can end up
sending an IPI to a cpu going offline. This is a rare corner case
which is fixed using {get/put}_online_cpus(). Along with this fix,
this patch adds changes to do oneshot cpumask_{clear/and} operation.

Suggested-by: Shreyas B Prabhu <shreyas@linux.vnet.ibm.com>
Suggested-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
Changes form v5:
- Fix the kbuild-error:
drivers/cpufreq/powernv-cpufreq.c:428:2: error: implicit declaration of
function 'get_online_cpus' [-Werror=implicit-function-declaration

 drivers/cpufreq/powernv-cpufreq.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 547890f..140c75f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -28,6 +28,7 @@
 #include <linux/of.h>
 #include <linux/reboot.h>
 #include <linux/slab.h>
+#include <linux/cpu.h>
 
 #include <asm/cputhreads.h>
 #include <asm/firmware.h>
@@ -423,18 +424,19 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
 {
 	struct chip *chip = container_of(work, struct chip, throttle);
 	unsigned int cpu;
-	cpumask_var_t mask;
+	cpumask_t mask;
 
-	smp_call_function_any(&chip->mask,
+	get_online_cpus();
+	cpumask_and(&mask, &chip->mask, cpu_online_mask);
+	smp_call_function_any(&mask,
 			      powernv_cpufreq_throttle_check, NULL, 0);
 
 	if (!chip->restore)
-		return;
+		goto out;
 
 	chip->restore = false;
-	cpumask_copy(mask, &chip->mask);
-	for_each_cpu_and(cpu, mask, cpu_online_mask) {
-		int index, tcpu;
+	for_each_cpu(cpu, &mask) {
+		int index;
 		struct cpufreq_policy policy;
 
 		cpufreq_get_policy(&policy, cpu);
@@ -442,9 +444,10 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
 					       policy.cur,
 					       CPUFREQ_RELATION_C, &index);
 		powernv_cpufreq_target_index(&policy, index);
-		for_each_cpu(tcpu, policy.cpus)
-			cpumask_clear_cpu(tcpu, mask);
+		cpumask_andnot(&mask, &mask, policy.cpus);
 	}
+out:
+	put_online_cpus();
 }
 
 static char throttle_reason[][30] = {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  2016-01-22  7:19 [PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver Shilpasri G Bhat
  2016-01-22  7:19 ` [PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread Shilpasri G Bhat
@ 2016-01-22  7:19 ` Shilpasri G Bhat
  2016-01-23  8:59   ` Balbir Singh
  2016-01-25  5:53   ` Viresh Kumar
  2016-01-22  7:19 ` [PATCH v6 3/5] cpufreq: powernv/tracing: Add powernv_throttle tracepoint Shilpasri G Bhat
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-01-22  7:19 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: rjw, viresh.kumar, linux-pm, pc, anton, ego, shreyas, Shilpasri G Bhat

cpu_to_chip_id() does a DT walk through to find out the chip id by
taking a contended device tree lock. This adds an unnecessary overhead
in a hot path. So instead of calling cpu_to_chip_id() everytime cache
the chip ids for all cores in the array 'core_to_chip_map' and use it
in the hotpath.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
No changes from v5.

Changes from v4:
- Taken care of Shreyas's comments to add a core_to_chip_map array to
  store the chip id.

 drivers/cpufreq/powernv-cpufreq.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 140c75f..6f186dc 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -43,6 +43,7 @@
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
+static unsigned int *core_to_chip_map;
 
 static struct chip {
 	unsigned int id;
@@ -313,13 +314,14 @@ static inline unsigned int get_nominal_index(void)
 static void powernv_cpufreq_throttle_check(void *data)
 {
 	unsigned int cpu = smp_processor_id();
+	unsigned int chip_id = core_to_chip_map[cpu_core_index_of_thread(cpu)];
 	unsigned long pmsr;
 	int pmsr_pmax, i;
 
 	pmsr = get_pmspr(SPRN_PMSR);
 
 	for (i = 0; i < nr_chips; i++)
-		if (chips[i].id == cpu_to_chip_id(cpu))
+		if (chips[i].id == chip_id)
 			break;
 
 	/* Check for Pmax Capping */
@@ -559,19 +561,29 @@ static int init_chip_info(void)
 	unsigned int chip[256];
 	unsigned int cpu, i;
 	unsigned int prev_chip_id = UINT_MAX;
+	cpumask_t cpu_mask;
+	int ret = -ENOMEM;
 
-	for_each_possible_cpu(cpu) {
+	cpumask_copy(&cpu_mask, cpu_possible_mask);
+	core_to_chip_map = kcalloc(cpu_nr_cores(), sizeof(unsigned int),
+				   GFP_KERNEL);
+	if (!core_to_chip_map)
+		goto out;
+
+	for_each_cpu(cpu, &cpu_mask) {
 		unsigned int id = cpu_to_chip_id(cpu);
 
 		if (prev_chip_id != id) {
 			prev_chip_id = id;
 			chip[nr_chips++] = id;
 		}
+		core_to_chip_map[cpu_core_index_of_thread(cpu)] = id;
+		cpumask_andnot(&cpu_mask, &cpu_mask, cpu_sibling_mask(cpu));
 	}
 
 	chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
 	if (!chips)
-		return -ENOMEM;
+		goto free_chip_map;
 
 	for (i = 0; i < nr_chips; i++) {
 		chips[i].id = chip[i];
@@ -582,6 +594,10 @@ static int init_chip_info(void)
 	}
 
 	return 0;
+free_chip_map:
+	kfree(core_to_chip_map);
+out:
+	return ret;
 }
 
 static int __init powernv_cpufreq_init(void)
@@ -615,6 +631,8 @@ static void __exit powernv_cpufreq_exit(void)
 	unregister_reboot_notifier(&powernv_cpufreq_reboot_nb);
 	opal_message_notifier_unregister(OPAL_MSG_OCC,
 					 &powernv_cpufreq_opal_nb);
+	kfree(chips);
+	kfree(core_to_chip_map);
 	cpufreq_unregister_driver(&powernv_cpufreq_driver);
 }
 module_exit(powernv_cpufreq_exit);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 3/5] cpufreq: powernv/tracing: Add powernv_throttle tracepoint
  2016-01-22  7:19 [PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver Shilpasri G Bhat
  2016-01-22  7:19 ` [PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread Shilpasri G Bhat
  2016-01-22  7:19 ` [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path Shilpasri G Bhat
@ 2016-01-22  7:19 ` Shilpasri G Bhat
  2016-01-22  7:19 ` [PATCH v6 4/5] cpufreq: powernv: Replace pr_info with trace print for throttle event Shilpasri G Bhat
  2016-01-22  7:19 ` [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats Shilpasri G Bhat
  4 siblings, 0 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-01-22  7:19 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: rjw, viresh.kumar, linux-pm, pc, anton, ego, shreyas,
	Shilpasri G Bhat, Ingo Molnar, Steven Rostedt

This patch adds the powernv_throttle tracepoint to trace the CPU
frequency throttling event, which is used by the powernv-cpufreq
driver in POWER8.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Steven Rostedt <rostedt@goodmis.org>
---
No changes since v2.

 include/trace/events/power.h | 22 ++++++++++++++++++++++
 kernel/trace/power-traces.c  |  1 +
 2 files changed, 23 insertions(+)

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 284244e..19e5030 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -38,6 +38,28 @@ DEFINE_EVENT(cpu, cpu_idle,
 	TP_ARGS(state, cpu_id)
 );
 
+TRACE_EVENT(powernv_throttle,
+
+	TP_PROTO(int chip_id, const char *reason, int pmax),
+
+	TP_ARGS(chip_id, reason, pmax),
+
+	TP_STRUCT__entry(
+		__field(int, chip_id)
+		__string(reason, reason)
+		__field(int, pmax)
+	),
+
+	TP_fast_assign(
+		__entry->chip_id = chip_id;
+		__assign_str(reason, reason);
+		__entry->pmax = pmax;
+	),
+
+	TP_printk("Chip %d Pmax %d %s", __entry->chip_id,
+		  __entry->pmax, __get_str(reason))
+);
+
 TRACE_EVENT(pstate_sample,
 
 	TP_PROTO(u32 core_busy,
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index eb4220a..81b8745 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -15,4 +15,5 @@
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(suspend_resume);
 EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
+EXPORT_TRACEPOINT_SYMBOL_GPL(powernv_throttle);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 4/5] cpufreq: powernv: Replace pr_info with trace print for throttle event
  2016-01-22  7:19 [PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver Shilpasri G Bhat
                   ` (2 preceding siblings ...)
  2016-01-22  7:19 ` [PATCH v6 3/5] cpufreq: powernv/tracing: Add powernv_throttle tracepoint Shilpasri G Bhat
@ 2016-01-22  7:19 ` Shilpasri G Bhat
  2016-01-25  5:58   ` Viresh Kumar
  2016-01-22  7:19 ` [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats Shilpasri G Bhat
  4 siblings, 1 reply; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-01-22  7:19 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: rjw, viresh.kumar, linux-pm, pc, anton, ego, shreyas, Shilpasri G Bhat

Currently we use printk message to notify the throttle event. But this
can flood the console if the cpu is throttled frequently. So replace the
printk with the tracepoint to notify the throttle event. And also events
like throttle below nominal frequency and OCC_RESET are reduced to
pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
messages. This patch adds 'throt_reason' to struct chip to store the
throttle reason.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
No changes from v5.

Changes from v4:
- Taken care of Gautham's comments to remove the new function
  powernv_cpufreq_check_pmax()
- Modified commit message

Changes from v3:
- Separate this patch to contain trace_point changes
- Move struct chip member 'restore' of type bool above 'mask' to reduce
  structure padding.

No changes from v2.

Changes from v1:
- As suggested by Paul Clarke replaced char * throttle_reason[][30] by 
  const char * const throttle_reason[].

 drivers/cpufreq/powernv-cpufreq.c | 73 ++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 6f186dc..2d09274 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -29,6 +29,7 @@
 #include <linux/reboot.h>
 #include <linux/slab.h>
 #include <linux/cpu.h>
+#include <trace/events/power.h>
 
 #include <asm/cputhreads.h>
 #include <asm/firmware.h>
@@ -45,12 +46,22 @@ static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
 static unsigned int *core_to_chip_map;
 
+static const char * const throttle_reason[] = {
+	"No throttling",
+	"Power Cap",
+	"Processor Over Temperature",
+	"Power Supply Failure",
+	"Over Current",
+	"OCC Reset"
+};
+
 static struct chip {
 	unsigned int id;
 	bool throttled;
+	bool restore;
+	u8 throt_reason;
 	cpumask_t mask;
 	struct work_struct throttle;
-	bool restore;
 } *chips;
 
 static int nr_chips;
@@ -331,17 +342,17 @@ static void powernv_cpufreq_throttle_check(void *data)
 			goto next;
 		chips[i].throttled = true;
 		if (pmsr_pmax < powernv_pstate_info.nominal)
-			pr_crit("CPU %d on Chip %u has Pmax reduced below nominal frequency (%d < %d)\n",
-				cpu, chips[i].id, pmsr_pmax,
-				powernv_pstate_info.nominal);
-		else
-			pr_info("CPU %d on Chip %u has Pmax reduced below turbo frequency (%d < %d)\n",
-				cpu, chips[i].id, pmsr_pmax,
-				powernv_pstate_info.max);
+			pr_warn_once("CPU %d on Chip %u has Pmax reduced below nominal frequency (%d < %d)\n",
+				     cpu, chips[i].id, pmsr_pmax,
+				     powernv_pstate_info.nominal);
+		trace_powernv_throttle(chips[i].id,
+				       throttle_reason[chips[i].throt_reason],
+				       pmsr_pmax);
 	} else if (chips[i].throttled) {
 		chips[i].throttled = false;
-		pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
-			chips[i].id, pmsr_pmax);
+		trace_powernv_throttle(chips[i].id,
+				       throttle_reason[chips[i].throt_reason],
+				       pmsr_pmax);
 	}
 
 	/* Check if Psafe_mode_active is set in PMSR. */
@@ -359,7 +370,7 @@ next:
 
 	if (throttled) {
 		pr_info("PMSR = %16lx\n", pmsr);
-		pr_crit("CPU Frequency could be throttled\n");
+		pr_warn("CPU Frequency could be throttled\n");
 	}
 }
 
@@ -452,15 +463,6 @@ out:
 	put_online_cpus();
 }
 
-static char throttle_reason[][30] = {
-					"No throttling",
-					"Power Cap",
-					"Processor Over Temperature",
-					"Power Supply Failure",
-					"Over Current",
-					"OCC Reset"
-				     };
-
 static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 				   unsigned long msg_type, void *_msg)
 {
@@ -486,7 +488,7 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 		 */
 		if (!throttled) {
 			throttled = true;
-			pr_crit("CPU frequency is throttled for duration\n");
+			pr_warn("CPU frequency is throttled for duration\n");
 		}
 
 		break;
@@ -510,23 +512,18 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 			return 0;
 		}
 
-		if (omsg.throttle_status &&
+		for (i = 0; i < nr_chips; i++)
+			if (chips[i].id == omsg.chip)
+				break;
+
+		if (omsg.throttle_status >= 0 &&
 		    omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS)
-			pr_info("OCC: Chip %u Pmax reduced due to %s\n",
-				(unsigned int)omsg.chip,
-				throttle_reason[omsg.throttle_status]);
-		else if (!omsg.throttle_status)
-			pr_info("OCC: Chip %u %s\n", (unsigned int)omsg.chip,
-				throttle_reason[omsg.throttle_status]);
-		else
-			return 0;
+			chips[i].throt_reason = omsg.throttle_status;
 
-		for (i = 0; i < nr_chips; i++)
-			if (chips[i].id == omsg.chip) {
-				if (!omsg.throttle_status)
-					chips[i].restore = true;
-				schedule_work(&chips[i].throttle);
-			}
+		if (!omsg.throttle_status)
+			chips[i].restore = true;
+
+		schedule_work(&chips[i].throttle);
 	}
 	return 0;
 }
@@ -581,16 +578,14 @@ static int init_chip_info(void)
 		cpumask_andnot(&cpu_mask, &cpu_mask, cpu_sibling_mask(cpu));
 	}
 
-	chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
+	chips = kcalloc(nr_chips, sizeof(struct chip), GFP_KERNEL);
 	if (!chips)
 		goto free_chip_map;
 
 	for (i = 0; i < nr_chips; i++) {
 		chips[i].id = chip[i];
-		chips[i].throttled = false;
 		cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
 		INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
-		chips[i].restore = false;
 	}
 
 	return 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats
  2016-01-22  7:19 [PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver Shilpasri G Bhat
                   ` (3 preceding siblings ...)
  2016-01-22  7:19 ` [PATCH v6 4/5] cpufreq: powernv: Replace pr_info with trace print for throttle event Shilpasri G Bhat
@ 2016-01-22  7:19 ` Shilpasri G Bhat
  2016-01-23  8:40   ` Balbir Singh
  2016-01-25  5:43   ` [v6, " Michael Ellerman
  4 siblings, 2 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-01-22  7:19 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: rjw, viresh.kumar, linux-pm, pc, anton, ego, shreyas, Shilpasri G Bhat

Create sysfs attributes to export throttle information in
/sys/devices/system/cpu/cpufreq/chipN. The newly added sysfs files are as
follows:

1)/sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
  This gives the throttle stats for each of the available frequencies.
  The throttle stat of a frequency is the total number of times the max
  frequency is reduced to that frequency.
  # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
  4023000 0
  3990000 0
  3956000 1
  3923000 0
  3890000 0
  3857000 2
  3823000 0
  3790000 0
  3757000 2
  3724000 1
  3690000 1
  ...

2)/sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
  This directory contains throttle reason files. Each file gives the
  total number of times the max frequency is throttled, except for
  'throttle_reset', which gives the total number of times the max
  frequency is unthrottled after being throttled.
  # cd /sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
  # cat cpu_over_temperature
  7
  # cat occ_reset
  0
  # cat over_current
  0
  # cat power_cap
  0
  # cat power_supply_failure
  0
  # cat throttle_reset
  7

3)/sys/devices/system/cpu/cpufreq/chip0/throttle_stat
  This gives the total number of events of max frequency throttling to
  lower frequencies in the turbo range of frequencies and the sub-turbo(at
  and below nominal) range of frequencies.
  # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_stat
  turbo 7
  sub-turbo 0

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
No changes from v5.

Changes from v4:
- Taken care of Gautham's comments to use inline get_chip_index()

Changes from v3:
- Seperate the patch to contain only the throttle sysfs attribute changes.
- Add helper inline function get_chip_index()

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]

Changes from v1:
- Added a kobject to struct chip
- Grouped the throttle reasons under a separate attribute_group and
  exported each reason as individual file.
- Moved the sysfs files from /sys/devices/system/node/nodeN to
  /sys/devices/system/cpu/cpufreq/chipN
- As suggested by Paul Clarke replaced 'Nominal' with 'sub-turbo'.

 drivers/cpufreq/powernv-cpufreq.c | 205 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 196 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 2d09274..7d65c82 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -55,6 +55,16 @@ static const char * const throttle_reason[] = {
 	"OCC Reset"
 };
 
+enum throt_reason_type {
+	NO_THROTTLE = 0,
+	POWERCAP,
+	CPU_OVERTEMP,
+	POWER_SUPPLY_FAILURE,
+	OVERCURRENT,
+	OCC_RESET_THROTTLE,
+	OCC_MAX_REASON
+};
+
 static struct chip {
 	unsigned int id;
 	bool throttled;
@@ -62,6 +72,11 @@ static struct chip {
 	u8 throt_reason;
 	cpumask_t mask;
 	struct work_struct throttle;
+	int throt_turbo;
+	int throt_nominal;
+	int reason[OCC_MAX_REASON];
+	int *pstate_stat;
+	struct kobject *kobj;
 } *chips;
 
 static int nr_chips;
@@ -196,6 +211,128 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
 	NULL,
 };
 
+static inline int get_chip_index(unsigned int id)
+{
+	int i;
+
+	for (i = 0; i < nr_chips; i++)
+		if (chips[i].id == id)
+			return i;
+
+	return -EINVAL;
+}
+
+static ssize_t throttle_freq_show(struct kobject *kobj,
+				  struct kobj_attribute *attr, char *buf)
+{
+	int i, count = 0, id;
+
+	i = kstrtoint(kobj->name + 4, 0, &id);
+	if (i)
+		return i;
+
+	id = get_chip_index(id);
+	if (id < 0) {
+		pr_warn_once("%s Matching chip-id not found\n", __func__);
+		return id;
+	}
+
+	for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
+		count += sprintf(&buf[count], "%d %d\n",
+			       powernv_freqs[i].frequency,
+			       chips[id].pstate_stat[i]);
+
+	return count;
+}
+
+static struct kobj_attribute attr_throttle_frequencies =
+__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);
+
+static ssize_t throttle_stat_show(struct kobject *kobj,
+				  struct kobj_attribute *attr, char *buf)
+{
+	int ret, id, count = 0;
+
+	ret = kstrtoint(kobj->name + 4, 0, &id);
+	if (ret)
+		return ret;
+
+	id = get_chip_index(id);
+	if (id < 0) {
+		pr_warn_once("%s Matching chip-id not found\n", __func__);
+		return id;
+	}
+
+	count += sprintf(&buf[count], "turbo %d\n", chips[id].throt_turbo);
+	count += sprintf(&buf[count], "sub-turbo %d\n",
+					chips[id].throt_nominal);
+
+	return count;
+}
+
+static struct kobj_attribute attr_throttle_stat =
+__ATTR(throttle_stat, 0444, throttle_stat_show, NULL);
+
+#define define_throttle_reason_attr(attr_name, val)			  \
+static ssize_t attr_name##_show(struct kobject *kobj,			  \
+				  struct kobj_attribute *attr, char *buf) \
+{									  \
+	int ret, id;							  \
+									  \
+	ret = kstrtoint(kobj->name + 4, 0, &id);			  \
+	if (ret)							  \
+		return ret;						  \
+									  \
+	id = get_chip_index(id);					  \
+	if (id < 0) {							  \
+		pr_warn_once("%s Matching chip-id not found\n", __func__);\
+		return id;						  \
+	}								  \
+									  \
+	return sprintf(buf, "%d\n", chips[id].reason[val]);		  \
+}									  \
+									  \
+static struct kobj_attribute attr_##attr_name =				  \
+__ATTR(attr_name, 0444, attr_name##_show, NULL)
+
+define_throttle_reason_attr(throttle_reset, NO_THROTTLE);
+define_throttle_reason_attr(power_cap, POWERCAP);
+define_throttle_reason_attr(cpu_over_temperature, CPU_OVERTEMP);
+define_throttle_reason_attr(power_supply_failure, POWER_SUPPLY_FAILURE);
+define_throttle_reason_attr(over_current, OVERCURRENT);
+define_throttle_reason_attr(occ_reset, OCC_RESET_THROTTLE);
+
+static struct attribute *throttle_reason_attrs[] = {
+	&attr_throttle_reset.attr,
+	&attr_power_cap.attr,
+	&attr_cpu_over_temperature.attr,
+	&attr_power_supply_failure.attr,
+	&attr_over_current.attr,
+	&attr_occ_reset.attr,
+	NULL
+};
+
+static struct attribute *throttle_stat_attrs[] = {
+	&attr_throttle_frequencies.attr,
+	&attr_throttle_stat.attr,
+	NULL
+};
+
+static const struct attribute_group throttle_reason_group = {
+	.name   = "throttle_reasons",
+	.attrs  = throttle_reason_attrs,
+};
+
+static const struct attribute_group throttle_stat_group = {
+	.attrs = throttle_stat_attrs,
+};
+
+static const struct attribute_group *throttle_attr_groups[] = {
+	&throttle_stat_group,
+	&throttle_reason_group,
+	NULL
+};
+
 /* Helper routines */
 
 /* Access helpers to power mgt SPR */
@@ -327,13 +464,15 @@ static void powernv_cpufreq_throttle_check(void *data)
 	unsigned int cpu = smp_processor_id();
 	unsigned int chip_id = core_to_chip_map[cpu_core_index_of_thread(cpu)];
 	unsigned long pmsr;
-	int pmsr_pmax, i;
+	int pmsr_pmax, i, index;
 
 	pmsr = get_pmspr(SPRN_PMSR);
 
-	for (i = 0; i < nr_chips; i++)
-		if (chips[i].id == chip_id)
-			break;
+	i = get_chip_index(chip_id);
+	if (unlikely(i < 0)) {
+		pr_warn_once("%s Matching chip-id not found\n", __func__);
+		return;
+	}
 
 	/* Check for Pmax Capping */
 	pmsr_pmax = (s8)PMSR_MAX(pmsr);
@@ -341,10 +480,19 @@ static void powernv_cpufreq_throttle_check(void *data)
 		if (chips[i].throttled)
 			goto next;
 		chips[i].throttled = true;
-		if (pmsr_pmax < powernv_pstate_info.nominal)
+		if (pmsr_pmax < powernv_pstate_info.nominal) {
 			pr_warn_once("CPU %d on Chip %u has Pmax reduced below nominal frequency (%d < %d)\n",
 				     cpu, chips[i].id, pmsr_pmax,
 				     powernv_pstate_info.nominal);
+			chips[i].throt_nominal++;
+		} else {
+			chips[i].throt_turbo++;
+		}
+
+		index  = powernv_pstate_info.max - pmsr_pmax;
+		if (index >= 0 && index < powernv_pstate_info.nr_pstates)
+			chips[i].pstate_stat[index]++;
+
 		trace_powernv_throttle(chips[i].id,
 				       throttle_reason[chips[i].throt_reason],
 				       pmsr_pmax);
@@ -512,13 +660,18 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 			return 0;
 		}
 
-		for (i = 0; i < nr_chips; i++)
-			if (chips[i].id == omsg.chip)
-				break;
+		i = get_chip_index(omsg.chip);
+		if (i < 0) {
+			pr_warn_once("%s Matching chip-id not found\n",
+				     __func__);
+			return i;
+		}
 
 		if (omsg.throttle_status >= 0 &&
-		    omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS)
+		    omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS) {
 			chips[i].throt_reason = omsg.throttle_status;
+			chips[i].reason[omsg.throttle_status]++;
+		}
 
 		if (!omsg.throttle_status)
 			chips[i].restore = true;
@@ -583,12 +736,38 @@ static int init_chip_info(void)
 		goto free_chip_map;
 
 	for (i = 0; i < nr_chips; i++) {
+		char name[10];
+
 		chips[i].id = chip[i];
 		cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
 		INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
+		chips[i].pstate_stat = kcalloc(powernv_pstate_info.nr_pstates,
+						sizeof(int), GFP_KERNEL);
+		if (!chips[i].pstate_stat)
+			goto free;
+
+		sprintf(name, "chip%d", chips[i].id);
+		chips[i].kobj = kobject_create_and_add(name,
+						       cpufreq_global_kobject);
+		if (!chips[i].kobj)
+			goto free;
+
+		ret = sysfs_create_groups(chips[i].kobj, throttle_attr_groups);
+		if (ret) {
+			pr_info("Chip %d failed to create throttle sysfs group\n",
+				chips[i].id);
+			goto free;
+		}
 	}
 
 	return 0;
+free:
+	nr_chips = i;
+	for (i = 0; i <= nr_chips; i++) {
+		kobject_put(chips[i].kobj);
+		kfree(chips[i].pstate_stat);
+	}
+	kfree(chips);
 free_chip_map:
 	kfree(core_to_chip_map);
 out:
@@ -623,9 +802,17 @@ module_init(powernv_cpufreq_init);
 
 static void __exit powernv_cpufreq_exit(void)
 {
+	int i;
+
 	unregister_reboot_notifier(&powernv_cpufreq_reboot_nb);
 	opal_message_notifier_unregister(OPAL_MSG_OCC,
 					 &powernv_cpufreq_opal_nb);
+
+	for (i = 0; i < nr_chips; i++) {
+		kobject_put(chips[i].kobj);
+		kfree(chips[i].pstate_stat);
+	}
+
 	kfree(chips);
 	kfree(core_to_chip_map);
 	cpufreq_unregister_driver(&powernv_cpufreq_driver);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats
  2016-01-22  7:19 ` [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats Shilpasri G Bhat
@ 2016-01-23  8:40   ` Balbir Singh
  2016-01-25  6:46     ` Shilpasri G Bhat
  2016-01-25  5:43   ` [v6, " Michael Ellerman
  1 sibling, 1 reply; 14+ messages in thread
From: Balbir Singh @ 2016-01-23  8:40 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: linuxppc-dev, linux-kernel, rjw, viresh.kumar, linux-pm, pc,
	anton, ego, shreyas

On Fri, 22 Jan 2016 12:49:05 +0530
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:

> Create sysfs attributes to export throttle information in
> /sys/devices/system/cpu/cpufreq/chipN. The newly added sysfs files are as
> follows:
> 
> 1)/sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
>   This gives the throttle stats for each of the available frequencies.
>   The throttle stat of a frequency is the total number of times the max
>   frequency is reduced to that frequency.
>   # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
>   4023000 0
>   3990000 0
>   3956000 1
>   3923000 0
>   3890000 0
>   3857000 2
>   3823000 0
>   3790000 0
>   3757000 2
>   3724000 1
>   3690000 1
>   ...
> 
> 2)/sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
>   This directory contains throttle reason files. Each file gives the
>   total number of times the max frequency is throttled, except for
>   'throttle_reset', which gives the total number of times the max

Is reset a good name? Ideally a reset, reset's stats.

>   frequency is unthrottled after being throttled.
>   # cd /sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
>   # cat cpu_over_temperature
>   7
>   # cat occ_reset
>   0
>   # cat over_current
>   0
>   # cat power_cap
>   0
>   # cat power_supply_failure
>   0
>   # cat throttle_reset
>   7
> 
> 3)/sys/devices/system/cpu/cpufreq/chip0/throttle_stat
>   This gives the total number of events of max frequency throttling to
>   lower frequencies in the turbo range of frequencies and the sub-turbo(at
>   and below nominal) range of frequencies.
>   # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_stat
>   turbo 7
>   sub-turbo 0
> 
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
snip

>  
> +enum throt_reason_type {
> +	NO_THROTTLE = 0,
NO is not throttled or number_throttle?

> +	POWERCAP,
> +	CPU_OVERTEMP,
> +	POWER_SUPPLY_FAILURE,
> +	OVERCURRENT,
> +	OCC_RESET_THROTTLE,
> +	OCC_MAX_REASON
> +};
> +
>  static struct chip {
>  	unsigned int id;
>  	bool throttled;
> @@ -62,6 +72,11 @@ static struct chip {
>  	u8 throt_reason;
>  	cpumask_t mask;
>  	struct work_struct throttle;
> +	int throt_turbo;

The enum uses _THROTTLE so why can't the struct member
be throttle_nominal? throttle_turbo?

> +	int throt_nominal;
> +	int reason[OCC_MAX_REASON];
> +	int *pstate_stat;
> +	struct kobject *kobj;
>  } *chips;
>  
>  static int nr_chips;
> @@ -196,6 +211,128 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
>  	NULL,
>  };
>  
> +static inline int get_chip_index(unsigned int id)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr_chips; i++)
> +		if (chips[i].id == id)
> +			return i;
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t throttle_freq_show(struct kobject *kobj,
> +				  struct kobj_attribute *attr, char *buf)
> +{
> +	int i, count = 0, id;
> +
> +	i = kstrtoint(kobj->name + 4, 0, &id);

Why the +4 magic, make it more readable?

> +	if (i)
> +		return i;
> +
> +	id = get_chip_index(id);
> +	if (id < 0) {
> +		pr_warn_once("%s Matching chip-id not found\n", __func__);

The pr_warn_once should also print which chip-id was not found, please
add that to the print

> +		return id;
> +	}
> +
> +	for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
> +		count += sprintf(&buf[count], "%d %d\n",
> +			       powernv_freqs[i].frequency,
> +			       chips[id].pstate_stat[i]);
> +
> +	return count;
> +}
> +
> +static struct kobj_attribute attr_throttle_frequencies =
> +__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);
> +
> +static ssize_t throttle_stat_show(struct kobject *kobj,
> +				  struct kobj_attribute *attr, char *buf)
> +{
> +	int ret, id, count = 0;
> +
> +	ret = kstrtoint(kobj->name + 4, 0, &id);
> +	if (ret)
> +		return ret;
> +
> +	id = get_chip_index(id);
> +	if (id < 0) {
> +		pr_warn_once("%s Matching chip-id not found\n", __func__);
> +		return id;
> +	}
> +

The above 9 lines look like common code, you can easily collapse it
instead of repeating it

> +	count += sprintf(&buf[count], "turbo %d\n", chips[id].throt_turbo);
> +	count += sprintf(&buf[count], "sub-turbo %d\n",
> +					chips[id].throt_nominal);
> +
> +	return count;
> +}
> +
> +static struct kobj_attribute attr_throttle_stat =
> +__ATTR(throttle_stat, 0444, throttle_stat_show, NULL);
> +
> +#define define_throttle_reason_attr(attr_name, val)			  \
> +static ssize_t attr_name##_show(struct kobject *kobj,			  \
> +				  struct kobj_attribute *attr, char *buf) \
> +{									  \
> +	int ret, id;							  \
> +									  \
> +	ret = kstrtoint(kobj->name + 4, 0, &id);			  \
> +	if (ret)							  \
> +		return ret;						  \
> +									  \
> +	id = get_chip_index(id);					  \
> +	if (id < 0) {							  \
> +		pr_warn_once("%s Matching chip-id not found\n", __func__);\
> +		return id;						  \
> +	}								  \
> +									  \
> +	return sprintf(buf, "%d\n", chips[id].reason[val]);		  \
> +}									  \
> +									  \
> +static struct kobj_attribute attr_##attr_name =				  \
> +__ATTR(attr_name, 0444, attr_name##_show, NULL)
> +
> +define_throttle_reason_attr(throttle_reset, NO_THROTTLE);
> +define_throttle_reason_attr(power_cap, POWERCAP);
> +define_throttle_reason_attr(cpu_over_temperature, CPU_OVERTEMP);
> +define_throttle_reason_attr(power_supply_failure, POWER_SUPPLY_FAILURE);
> +define_throttle_reason_attr(over_current, OVERCURRENT);
> +define_throttle_reason_attr(occ_reset, OCC_RESET_THROTTLE);
> +
> +static struct attribute *throttle_reason_attrs[] = {
> +	&attr_throttle_reset.attr,
> +	&attr_power_cap.attr,
> +	&attr_cpu_over_temperature.attr,
> +	&attr_power_supply_failure.attr,
> +	&attr_over_current.attr,
> +	&attr_occ_reset.attr,
> +	NULL
> +};
> +
> +static struct attribute *throttle_stat_attrs[] = {
> +	&attr_throttle_frequencies.attr,
> +	&attr_throttle_stat.attr,
> +	NULL
> +};
> +
> +static const struct attribute_group throttle_reason_group = {
> +	.name   = "throttle_reasons",
> +	.attrs  = throttle_reason_attrs,
> +};
> +
> +static const struct attribute_group throttle_stat_group = {
> +	.attrs = throttle_stat_attrs,
> +};
> +
> +static const struct attribute_group *throttle_attr_groups[] = {
> +	&throttle_stat_group,
> +	&throttle_reason_group,
> +	NULL
> +};
> +
>  /* Helper routines */
>  
>  /* Access helpers to power mgt SPR */
> @@ -327,13 +464,15 @@ static void powernv_cpufreq_throttle_check(void *data)
>  	unsigned int cpu = smp_processor_id();
>  	unsigned int chip_id = core_to_chip_map[cpu_core_index_of_thread(cpu)];
>  	unsigned long pmsr;
> -	int pmsr_pmax, i;
> +	int pmsr_pmax, i, index;
>  
>  	pmsr = get_pmspr(SPRN_PMSR);
>  
> -	for (i = 0; i < nr_chips; i++)
> -		if (chips[i].id == chip_id)
> -			break;
> +	i = get_chip_index(chip_id);
> +	if (unlikely(i < 0)) {
> +		pr_warn_once("%s Matching chip-id not found\n", __func__);
> +		return;
> +	}
>  
>  	/* Check for Pmax Capping */
>  	pmsr_pmax = (s8)PMSR_MAX(pmsr);
> @@ -341,10 +480,19 @@ static void powernv_cpufreq_throttle_check(void *data)
>  		if (chips[i].throttled)
>  			goto next;
>  		chips[i].throttled = true;
> -		if (pmsr_pmax < powernv_pstate_info.nominal)
> +		if (pmsr_pmax < powernv_pstate_info.nominal) {
>  			pr_warn_once("CPU %d on Chip %u has Pmax reduced below nominal frequency (%d < %d)\n",
>  				     cpu, chips[i].id, pmsr_pmax,
>  				     powernv_pstate_info.nominal);
> +			chips[i].throt_nominal++;
> +		} else {
> +			chips[i].throt_turbo++;
> +		}
> +
> +		index  = powernv_pstate_info.max - pmsr_pmax;
> +		if (index >= 0 && index < powernv_pstate_info.nr_pstates)
> +			chips[i].pstate_stat[index]++;
> +
>  		trace_powernv_throttle(chips[i].id,
>  				       throttle_reason[chips[i].throt_reason],
>  				       pmsr_pmax);
> @@ -512,13 +660,18 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
>  			return 0;
>  		}
>  
> -		for (i = 0; i < nr_chips; i++)
> -			if (chips[i].id == omsg.chip)
> -				break;
> +		i = get_chip_index(omsg.chip);
> +		if (i < 0) {
> +			pr_warn_once("%s Matching chip-id not found\n",
> +				     __func__);
> +			return i;
> +		}
>  
>  		if (omsg.throttle_status >= 0 &&
> -		    omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS)
> +		    omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS) {
>  			chips[i].throt_reason = omsg.throttle_status;
> +			chips[i].reason[omsg.throttle_status]++;
> +		}
>  
>  		if (!omsg.throttle_status)
>  			chips[i].restore = true;
> @@ -583,12 +736,38 @@ static int init_chip_info(void)
>  		goto free_chip_map;
>  
>  	for (i = 0; i < nr_chips; i++) {
> +		char name[10];
> +
>  		chips[i].id = chip[i];
>  		cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
>  		INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
> +		chips[i].pstate_stat = kcalloc(powernv_pstate_info.nr_pstates,
> +						sizeof(int), GFP_KERNEL);
> +		if (!chips[i].pstate_stat)
> +			goto free;
> +
> +		sprintf(name, "chip%d", chips[i].id);
> +		chips[i].kobj = kobject_create_and_add(name,
> +						       cpufreq_global_kobject);
> +		if (!chips[i].kobj)
> +			goto free;
> +
> +		ret = sysfs_create_groups(chips[i].kobj, throttle_attr_groups);
> +		if (ret) {
> +			pr_info("Chip %d failed to create throttle sysfs group\n",
> +				chips[i].id);
> +			goto free;
> +		}
>  	}
>  
>  	return 0;
> +free:
> +	nr_chips = i;
> +	for (i = 0; i <= nr_chips; i++) {
> +		kobject_put(chips[i].kobj);
> +		kfree(chips[i].pstate_stat);
> +	}
> +	kfree(chips);
>  free_chip_map:
>  	kfree(core_to_chip_map);
>  out:
> @@ -623,9 +802,17 @@ module_init(powernv_cpufreq_init);
>  
>  static void __exit powernv_cpufreq_exit(void)
>  {
> +	int i;
> +
>  	unregister_reboot_notifier(&powernv_cpufreq_reboot_nb);
>  	opal_message_notifier_unregister(OPAL_MSG_OCC,
>  					 &powernv_cpufreq_opal_nb);
> +
> +	for (i = 0; i < nr_chips; i++) {
> +		kobject_put(chips[i].kobj);
> +		kfree(chips[i].pstate_stat);
> +	}
> +
>  	kfree(chips);
>  	kfree(core_to_chip_map);
>  	cpufreq_unregister_driver(&powernv_cpufreq_driver);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  2016-01-22  7:19 ` [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path Shilpasri G Bhat
@ 2016-01-23  8:59   ` Balbir Singh
  2016-01-25  5:38     ` Gautham R Shenoy
  2016-01-25  5:53   ` Viresh Kumar
  1 sibling, 1 reply; 14+ messages in thread
From: Balbir Singh @ 2016-01-23  8:59 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: linuxppc-dev, linux-kernel, rjw, viresh.kumar, linux-pm, pc,
	anton, ego, shreyas

On Fri, 22 Jan 2016 12:49:02 +0530
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:

> cpu_to_chip_id() does a DT walk through to find out the chip id by
> taking a contended device tree lock. This adds an unnecessary overhead
> in a hot path. So instead of calling cpu_to_chip_id() everytime cache
> the chip ids for all cores in the array 'core_to_chip_map' and use it
> in the hotpath.
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

snip

Does the core_to_chip_map need to be updated/refreshed on/after/ a
cpu (core) hotplug? I presume id's don't change

Balbir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  2016-01-23  8:59   ` Balbir Singh
@ 2016-01-25  5:38     ` Gautham R Shenoy
  0 siblings, 0 replies; 14+ messages in thread
From: Gautham R Shenoy @ 2016-01-25  5:38 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Shilpasri G Bhat, linuxppc-dev, linux-kernel, rjw, viresh.kumar,
	linux-pm, pc, anton, ego, shreyas

Hello Balbir,

On Sat, Jan 23, 2016 at 07:59:20PM +1100, Balbir Singh wrote:
> On Fri, 22 Jan 2016 12:49:02 +0530
> Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:
> 
> > cpu_to_chip_id() does a DT walk through to find out the chip id by
> > taking a contended device tree lock. This adds an unnecessary overhead
> > in a hot path. So instead of calling cpu_to_chip_id() everytime cache
> > the chip ids for all cores in the array 'core_to_chip_map' and use it
> > in the hotpath.
> > 
> > Reported-by: Anton Blanchard <anton@samba.org>
> > Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> > Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> 
> snip
> 
> Does the core_to_chip_map need to be updated/refreshed on/after/ a
> cpu (core) hotplug? I presume id's don't change

No, the id's don't change on cpu/core hotplug. The core_to_chip_map is
initialized in init_chip_info() where we use for_each_possible_cpu().

Thanks for the review!
> 
> Balbir
> 
--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v6, 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats
  2016-01-22  7:19 ` [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats Shilpasri G Bhat
  2016-01-23  8:40   ` Balbir Singh
@ 2016-01-25  5:43   ` Michael Ellerman
  1 sibling, 0 replies; 14+ messages in thread
From: Michael Ellerman @ 2016-01-25  5:43 UTC (permalink / raw)
  To: Shilpasri G Bhat, linuxppc-dev, linux-kernel
  Cc: ego, linux-pm, viresh.kumar, rjw, pc, Shilpasri G Bhat, shreyas, anton

On Fri, 2016-22-01 at 07:19:05 UTC, Shilpasri G Bhat wrote:
> Create sysfs attributes to export throttle information in
> /sys/devices/system/cpu/cpufreq/chipN. The newly added sysfs files are as
> follows:
> 
> 1)/sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
..
> 2)/sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
..
> 3)/sys/devices/system/cpu/cpufreq/chip0/throttle_stat
..

These should all be added to Documentation/ABI/testing/sysfs-devices-system-cpu.

cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread
  2016-01-22  7:19 ` [PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread Shilpasri G Bhat
@ 2016-01-25  5:49   ` Viresh Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Viresh Kumar @ 2016-01-25  5:49 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: linuxppc-dev, linux-kernel, rjw, linux-pm, pc, anton, ego, shreyas

On 22-01-16, 12:49, Shilpasri G Bhat wrote:
> In the kworker_thread powernv_cpufreq_work_fn(), we can end up
> sending an IPI to a cpu going offline. This is a rare corner case
> which is fixed using {get/put}_online_cpus(). Along with this fix,
> this patch adds changes to do oneshot cpumask_{clear/and} operation.
> 
> Suggested-by: Shreyas B Prabhu <shreyas@linux.vnet.ibm.com>
> Suggested-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> Changes form v5:
> - Fix the kbuild-error:
> drivers/cpufreq/powernv-cpufreq.c:428:2: error: implicit declaration of
> function 'get_online_cpus' [-Werror=implicit-function-declaration
> 
>  drivers/cpufreq/powernv-cpufreq.c | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  2016-01-22  7:19 ` [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path Shilpasri G Bhat
  2016-01-23  8:59   ` Balbir Singh
@ 2016-01-25  5:53   ` Viresh Kumar
  1 sibling, 0 replies; 14+ messages in thread
From: Viresh Kumar @ 2016-01-25  5:53 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: linuxppc-dev, linux-kernel, rjw, linux-pm, pc, anton, ego, shreyas

On 22-01-16, 12:49, Shilpasri G Bhat wrote:
> cpu_to_chip_id() does a DT walk through to find out the chip id by
> taking a contended device tree lock. This adds an unnecessary overhead
> in a hot path. So instead of calling cpu_to_chip_id() everytime cache
> the chip ids for all cores in the array 'core_to_chip_map' and use it
> in the hotpath.
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> No changes from v5.
> 
> Changes from v4:
> - Taken care of Shreyas's comments to add a core_to_chip_map array to
>   store the chip id.
> 
>  drivers/cpufreq/powernv-cpufreq.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index 140c75f..6f186dc 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -43,6 +43,7 @@
>  
>  static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
>  static bool rebooting, throttled, occ_reset;
> +static unsigned int *core_to_chip_map;
>  
>  static struct chip {
>  	unsigned int id;
> @@ -313,13 +314,14 @@ static inline unsigned int get_nominal_index(void)
>  static void powernv_cpufreq_throttle_check(void *data)
>  {
>  	unsigned int cpu = smp_processor_id();
> +	unsigned int chip_id = core_to_chip_map[cpu_core_index_of_thread(cpu)];
>  	unsigned long pmsr;
>  	int pmsr_pmax, i;
>  
>  	pmsr = get_pmspr(SPRN_PMSR);
>  
>  	for (i = 0; i < nr_chips; i++)
> -		if (chips[i].id == cpu_to_chip_id(cpu))
> +		if (chips[i].id == chip_id)
>  			break;
>  
>  	/* Check for Pmax Capping */
> @@ -559,19 +561,29 @@ static int init_chip_info(void)
>  	unsigned int chip[256];
>  	unsigned int cpu, i;
>  	unsigned int prev_chip_id = UINT_MAX;
> +	cpumask_t cpu_mask;
> +	int ret = -ENOMEM;
>  
> -	for_each_possible_cpu(cpu) {
> +	cpumask_copy(&cpu_mask, cpu_possible_mask);

Shouldn't this copy be done after the following check, so that we
don't do that on failures ?

> +	core_to_chip_map = kcalloc(cpu_nr_cores(), sizeof(unsigned int),
> +				   GFP_KERNEL);
> +	if (!core_to_chip_map)
> +		goto out;
> +
> +	for_each_cpu(cpu, &cpu_mask) {
>  		unsigned int id = cpu_to_chip_id(cpu);
>  
>  		if (prev_chip_id != id) {
>  			prev_chip_id = id;
>  			chip[nr_chips++] = id;
>  		}
> +		core_to_chip_map[cpu_core_index_of_thread(cpu)] = id;
> +		cpumask_andnot(&cpu_mask, &cpu_mask, cpu_sibling_mask(cpu));
>  	}
>  
>  	chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
>  	if (!chips)
> -		return -ENOMEM;
> +		goto free_chip_map;
>  
>  	for (i = 0; i < nr_chips; i++) {
>  		chips[i].id = chip[i];
> @@ -582,6 +594,10 @@ static int init_chip_info(void)
>  	}
>  
>  	return 0;
> +free_chip_map:
> +	kfree(core_to_chip_map);
> +out:
> +	return ret;
>  }
>  
>  static int __init powernv_cpufreq_init(void)
> @@ -615,6 +631,8 @@ static void __exit powernv_cpufreq_exit(void)
>  	unregister_reboot_notifier(&powernv_cpufreq_reboot_nb);
>  	opal_message_notifier_unregister(OPAL_MSG_OCC,
>  					 &powernv_cpufreq_opal_nb);
> +	kfree(chips);

Yeah, this is an important fix, but it shouldn't be part of this
patch, isn't it ?

> +	kfree(core_to_chip_map);
>  	cpufreq_unregister_driver(&powernv_cpufreq_driver);
>  }
>  module_exit(powernv_cpufreq_exit);
> -- 
> 1.9.3

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/5] cpufreq: powernv: Replace pr_info with trace print for throttle event
  2016-01-22  7:19 ` [PATCH v6 4/5] cpufreq: powernv: Replace pr_info with trace print for throttle event Shilpasri G Bhat
@ 2016-01-25  5:58   ` Viresh Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Viresh Kumar @ 2016-01-25  5:58 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: linuxppc-dev, linux-kernel, rjw, linux-pm, pc, anton, ego, shreyas

On 22-01-16, 12:49, Shilpasri G Bhat wrote:
> Currently we use printk message to notify the throttle event. But this
> can flood the console if the cpu is throttled frequently. So replace the
> printk with the tracepoint to notify the throttle event. And also events
> like throttle below nominal frequency and OCC_RESET are reduced to
> pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
> messages. This patch adds 'throt_reason' to struct chip to store the
> throttle reason.
> 
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> No changes from v5.

You have done too many things in a single patch, its always better to
keep a patch for a special problem. Don't try to solve everything in
the same patch.

Though, I am NOT strongly saying that you respin this patch now.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats
  2016-01-23  8:40   ` Balbir Singh
@ 2016-01-25  6:46     ` Shilpasri G Bhat
  0 siblings, 0 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-01-25  6:46 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linuxppc-dev, linux-kernel, rjw, viresh.kumar, linux-pm, pc,
	anton, ego, shreyas

Hi,

On 01/23/2016 02:10 PM, Balbir Singh wrote:
> On Fri, 22 Jan 2016 12:49:05 +0530
> Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:
> 
>> Create sysfs attributes to export throttle information in
>> /sys/devices/system/cpu/cpufreq/chipN. The newly added sysfs files are as
>> follows:
>>
>> 1)/sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
>>   This gives the throttle stats for each of the available frequencies.
>>   The throttle stat of a frequency is the total number of times the max
>>   frequency is reduced to that frequency.
>>   # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
>>   4023000 0
>>   3990000 0
>>   3956000 1
>>   3923000 0
>>   3890000 0
>>   3857000 2
>>   3823000 0
>>   3790000 0
>>   3757000 2
>>   3724000 1
>>   3690000 1
>>   ...
>>
>> 2)/sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
>>   This directory contains throttle reason files. Each file gives the
>>   total number of times the max frequency is throttled, except for
>>   'throttle_reset', which gives the total number of times the max
> 
> Is reset a good name? Ideally a reset, reset's stats.

Is unthrottle_count a better name?

> 
>>   frequency is unthrottled after being throttled.
>>   # cd /sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
>>   # cat cpu_over_temperature
>>   7
>>   # cat occ_reset
>>   0
>>   # cat over_current
>>   0
>>   # cat power_cap
>>   0
>>   # cat power_supply_failure
>>   0
>>   # cat throttle_reset
>>   7
>>
>> 3)/sys/devices/system/cpu/cpufreq/chip0/throttle_stat
>>   This gives the total number of events of max frequency throttling to
>>   lower frequencies in the turbo range of frequencies and the sub-turbo(at
>>   and below nominal) range of frequencies.
>>   # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_stat
>>   turbo 7
>>   sub-turbo 0
>>
>> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
>> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> snip
> 
>>  
>> +enum throt_reason_type {
>> +	NO_THROTTLE = 0,
> NO is not throttled or number_throttle?

It stands for not throttled.

> 
>> +	POWERCAP,
>> +	CPU_OVERTEMP,
>> +	POWER_SUPPLY_FAILURE,
>> +	OVERCURRENT,
>> +	OCC_RESET_THROTTLE,
>> +	OCC_MAX_REASON
>> +};
>> +
>>  static struct chip {
>>  	unsigned int id;
>>  	bool throttled;
>> @@ -62,6 +72,11 @@ static struct chip {
>>  	u8 throt_reason;
>>  	cpumask_t mask;
>>  	struct work_struct throttle;
>> +	int throt_turbo;
> 
> The enum uses _THROTTLE so why can't the struct member
> be throttle_nominal? throttle_turbo?

Okay will change the struct members to throttle_*

> 
>> +	int throt_nominal;
>> +	int reason[OCC_MAX_REASON];
>> +	int *pstate_stat;
>> +	struct kobject *kobj;
>>  } *chips;
>>  
>>  static int nr_chips;
>> @@ -196,6 +211,128 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
>>  	NULL,
>>  };
>>  
>> +static inline int get_chip_index(unsigned int id)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < nr_chips; i++)
>> +		if (chips[i].id == id)
>> +			return i;
>> +
>> +	return -EINVAL;
>> +}
>> +
>> +static ssize_t throttle_freq_show(struct kobject *kobj,
>> +				  struct kobj_attribute *attr, char *buf)
>> +{
>> +	int i, count = 0, id;
>> +
>> +	i = kstrtoint(kobj->name + 4, 0, &id);
> 
> Why the +4 magic, make it more readable?

Okay. Will do.

> 
>> +	if (i)
>> +		return i;
>> +
>> +	id = get_chip_index(id);
>> +	if (id < 0) {
>> +		pr_warn_once("%s Matching chip-id not found\n", __func__);
> 
> The pr_warn_once should also print which chip-id was not found, please
> add that to the print

Okay will do.

> 
>> +		return id;
>> +	}
>> +
>> +	for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
>> +		count += sprintf(&buf[count], "%d %d\n",
>> +			       powernv_freqs[i].frequency,
>> +			       chips[id].pstate_stat[i]);
>> +
>> +	return count;
>> +}
>> +
>> +static struct kobj_attribute attr_throttle_frequencies =
>> +__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);
>> +
>> +static ssize_t throttle_stat_show(struct kobject *kobj,
>> +				  struct kobj_attribute *attr, char *buf)
>> +{
>> +	int ret, id, count = 0;
>> +
>> +	ret = kstrtoint(kobj->name + 4, 0, &id);
>> +	if (ret)
>> +		return ret;
>> +
>> +	id = get_chip_index(id);
>> +	if (id < 0) {
>> +		pr_warn_once("%s Matching chip-id not found\n", __func__);
>> +		return id;
>> +	}
>> +
> 
> The above 9 lines look like common code, you can easily collapse it
> instead of repeating it

okay will do.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-01-25  6:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-22  7:19 [PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver Shilpasri G Bhat
2016-01-22  7:19 ` [PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread Shilpasri G Bhat
2016-01-25  5:49   ` Viresh Kumar
2016-01-22  7:19 ` [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path Shilpasri G Bhat
2016-01-23  8:59   ` Balbir Singh
2016-01-25  5:38     ` Gautham R Shenoy
2016-01-25  5:53   ` Viresh Kumar
2016-01-22  7:19 ` [PATCH v6 3/5] cpufreq: powernv/tracing: Add powernv_throttle tracepoint Shilpasri G Bhat
2016-01-22  7:19 ` [PATCH v6 4/5] cpufreq: powernv: Replace pr_info with trace print for throttle event Shilpasri G Bhat
2016-01-25  5:58   ` Viresh Kumar
2016-01-22  7:19 ` [PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats Shilpasri G Bhat
2016-01-23  8:40   ` Balbir Singh
2016-01-25  6:46     ` Shilpasri G Bhat
2016-01-25  5:43   ` [v6, " Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).