All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] powernv: cpufreq: Report frequency throttle by OCC
@ 2015-04-28  6:23 Shilpasri G Bhat
  2015-04-28  6:23 ` [PATCH v2 1/2] powerpc/powernv: Add definition of OPAL_MSG_OCC message type Shilpasri G Bhat
  2015-04-28  6:23   ` Shilpasri G Bhat
  0 siblings, 2 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  6:23 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: Shilpasri G Bhat

This patchset intends to add frequency throttle reporting mechanism
to powernv-cpufreq driver when OCC throttles the frequency. OCC is an
On-Chip-Controller which takes care of the power and thermal safety of
the chip. The CPU frequency can be throttled during an OCC reset or
when OCC tries to limit the max allowed frequency. The patchset will
report such conditions so as to keep the user informed about reason
for the drop in performance of workloads when frequency is throttled.

Shilpasri G Bhat (2):
  powerpc/powernv: Add definition of OPAL_MSG_OCC message type
  cpufreq: powernv: Register for OCC related opal_message notification

 arch/powerpc/include/asm/opal-api.h |   8 ++
 drivers/cpufreq/powernv-cpufreq.c   | 181 +++++++++++++++++++++++++++++++++---
 2 files changed, 174 insertions(+), 15 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] powerpc/powernv: Add definition of OPAL_MSG_OCC message type
  2015-04-28  6:23 [PATCH v2 0/2] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
@ 2015-04-28  6:23 ` Shilpasri G Bhat
  2015-04-28  6:23   ` Shilpasri G Bhat
  1 sibling, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  6:23 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: Shilpasri G Bhat

Add OPAL_MSG_OCC message definition to opal_message_type to receive
OCC events like reset, load and throttled. Host performance can be
affected when OCC is reset or OCC throttles the max Pstate.
We can register to opal_message_notifier to receive OPAL_MSG_OCC type
of message and report it to the userspace so as to keep the user
informed about the reason for a performance drop in workloads.

The reset and load OCC events are notified to kernel when FSP sends
OCC_RESET and OCC_LOAD commands.  Both reset and load messages are
sent to kernel on successful completion of reset and load operation
respectively.

The throttle OCC event indicates that the Pmax of the chip is reduced.
The chip_id and throttle reason for reducing Pmax is also queued along
with the message.

Additional opal message type OPAL_MSG_PRD is added to maintain
compatibility between opal and kernel definition of opal_message_type.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
Changes from v1:
- Update the commit changelog

 arch/powerpc/include/asm/opal-api.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 0321a90..50053b7 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -352,6 +352,14 @@ enum opal_msg_type {
 	OPAL_MSG_SHUTDOWN,		/* params[0] = 1 reboot, 0 shutdown */
 	OPAL_MSG_HMI_EVT,
 	OPAL_MSG_DPO,
+	OPAL_MSG_PRD,
+	OPAL_MSG_OCC,                   /*
+					 * params[0] = 0 reset,
+					 *             1 load,
+					 *             2 throttle
+					 * params[1] = chip_id
+					 * params[2] = throttle_status
+					 */
 	OPAL_MSG_TYPE_MAX,
 };
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
  2015-04-28  6:23 [PATCH v2 0/2] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
@ 2015-04-28  6:23   ` Shilpasri G Bhat
  2015-04-28  6:23   ` Shilpasri G Bhat
  1 sibling, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  6:23 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar,
	Preeti U Murthy, linux-pm

OCC is an On-Chip-Controller which takes care of power and thermal
safety of the chip. During runtime due to power failure or
overtemperature the OCC may throttle the frequencies of the CPUs to
remain within the power budget.

We want the cpufreq driver to be aware of such situations to be able
to report it to the user. We register to opal_message_notifier to
receive OCC messages from opal.

powernv_cpufreq_throttle_check() reports any frequency throttling and
this patch will report the reason or event that caused throttling. We
can be throttled if OCC is reset or OCC limits Pmax due to power or
thermal reasons. We are also notified of unthrottling after an OCC
reset or if OCC restores Pmax on the chip.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
CC: "Rafael J. Wysocki" <rjw@rjwysocki.net>
CC: Viresh Kumar <viresh.kumar@linaro.org>
CC: Preeti U Murthy <preeti@linux.vnet.ibm.com>
CC: linux-pm@vger.kernel.org
---
Changes from v1:
- Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
- Define a structure to store chip id, chip mask which has bits set
  for cpus present in the chip, throttled state and a work_struct.
- Modify powernv_cpufreq_throttle_check() to be called via smp_call()
- On Pmax throttling/unthrottling update 'chip.throttled' and not the
  global 'throttled' as Pmax capping is local to the chip.
- Remove the condition which checks if local pstate is less than Pmin
  while checking for Psafe frequency. When OCC becomes active after
  reset we update 'thottled' to false and when the cpufreq governor
  initiates a pstate change, the local pstate will be in Psafe and we
  will be reporting a false positive when we are not throttled.
- Schedule a kworker on receiving throttling/unthrottling OCC message
  for that chip and schedule on all chips after receiving active.
- After an OCC reset all the cpus will be in Psafe frequency. So call
  target() and restore the frequency to policy->cur after OCC_ACTIVE
  and Pmax unthrottling
- Taken care of Viresh and Preeti's comments.

 drivers/cpufreq/powernv-cpufreq.c | 181 ++++++++++++++++++++++++++++++++++----
 1 file changed, 166 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index ebef0d8..b356c9d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -27,20 +27,33 @@
 #include <linux/smp.h>
 #include <linux/of.h>
 #include <linux/reboot.h>
+#include <linux/slab.h>
 
 #include <asm/cputhreads.h>
 #include <asm/firmware.h>
 #include <asm/reg.h>
 #include <asm/smp.h> /* Required for cpu_sibling_mask() in UP configs */
+#include <asm/opal.h>
 
 #define POWERNV_MAX_PSTATES	256
 #define PMSR_PSAFE_ENABLE	(1UL << 30)
 #define PMSR_SPR_EM_DISABLE	(1UL << 31)
 #define PMSR_MAX(x)		((x >> 32) & 0xFF)
-#define PMSR_LP(x)		((x >> 48) & 0xFF)
+#define OCC_RESET		0
+#define OCC_LOAD		1
+#define OCC_THROTTLE		2
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-static bool rebooting, throttled;
+static bool rebooting, throttled, occ_reset;
+
+static struct chip {
+	int id;
+	bool throttled;
+	cpumask_t mask;
+	struct work_struct throttle;
+} *chips;
+
+static int nr_chips;
 
 /*
  * Note: The set of pstates consists of contiguous integers, the
@@ -298,28 +311,33 @@ static inline unsigned int get_nominal_index(void)
 	return powernv_pstate_info.max - powernv_pstate_info.nominal;
 }
 
-static void powernv_cpufreq_throttle_check(unsigned int cpu)
+static void powernv_cpufreq_throttle_check(void *data)
 {
+	unsigned int cpu = smp_processor_id();
 	unsigned long pmsr;
-	int pmsr_pmax, pmsr_lp;
+	int pmsr_pmax, i;
 
 	pmsr = get_pmspr(SPRN_PMSR);
 
+	for (i = 0; i < nr_chips; i++)
+		if (chips[i].id == cpu_to_chip_id(cpu))
+			break;
+
 	/* Check for Pmax Capping */
 	pmsr_pmax = (s8)PMSR_MAX(pmsr);
 	if (pmsr_pmax != powernv_pstate_info.max) {
-		throttled = true;
-		pr_info("CPU %d Pmax is reduced to %d\n", cpu, pmsr_pmax);
-		pr_info("Max allowed Pstate is capped\n");
+		if (chips[i].throttled)
+			goto next;
+		chips[i].throttled = true;
+		pr_info("CPU %d on chip %d Pmax is reduced to %d\n", cpu,
+			chips[i].id, pmsr_pmax);
+	} else {
+		chips[i].throttled = false;
 	}
 
-	/*
-	 * Check for Psafe by reading LocalPstate
-	 * or check if Psafe_mode_active is set in PMSR.
-	 */
-	pmsr_lp = (s8)PMSR_LP(pmsr);
-	if ((pmsr_lp < powernv_pstate_info.min) ||
-				(pmsr & PMSR_PSAFE_ENABLE)) {
+	/* Check if Psafe_mode_active is set in PMSR. */
+next:
+	if (pmsr & PMSR_PSAFE_ENABLE) {
 		throttled = true;
 		pr_info("Pstate set to safe frequency\n");
 	}
@@ -350,7 +368,7 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 		return 0;
 
 	if (!throttled)
-		powernv_cpufreq_throttle_check(smp_processor_id());
+		powernv_cpufreq_throttle_check(NULL);
 
 	freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
@@ -395,6 +413,104 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 	.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
+void powernv_cpufreq_work_fn(struct work_struct *work)
+{
+	struct chip *c = container_of(work, struct chip, throttle);
+	unsigned int cpu;
+
+	smp_call_function_any(&c->mask,
+			      powernv_cpufreq_throttle_check, NULL, 0);
+
+	for_each_cpu(cpu, &c->mask) {
+		int index;
+		struct cpufreq_frequency_table *freq_table;
+		struct cpufreq_policy cpu_policy;
+
+		if (!cpu_online(cpu))
+			continue;
+
+		cpufreq_get_policy(&cpu_policy, cpu);
+		freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);
+		cpufreq_frequency_table_target(&cpu_policy, freq_table,
+					       cpu_policy.cur,
+					       CPUFREQ_RELATION_C, &index);
+		powernv_cpufreq_target_index(&cpu_policy, index);
+	}
+}
+
+static char throttle_reason[][30] = {
+					"No throttling",
+					"Power Cap",
+					"Processor Over Temperature",
+					"Power Supply Failure",
+					"Over Current",
+					"OCC Reset"
+				     };
+
+static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
+				   unsigned long msg_type, void *msg)
+{
+	struct opal_msg *occ_msg = msg;
+	uint64_t token;
+	uint64_t chip_id, reason;
+	int i;
+
+	if (msg_type != OPAL_MSG_OCC)
+		return 0;
+
+	token = be64_to_cpu(occ_msg->params[0]);
+
+	switch (token) {
+	case OCC_RESET:
+		occ_reset = true;
+		/*
+		 * powernv_cpufreq_throttle_check() is called in
+		 * target() callback which can detect the throttle state
+		 * for governors like ondemand.
+		 * But static governors will not call target() often thus
+		 * report throttling here.
+		 */
+		if (!throttled) {
+			throttled = true;
+			pr_crit("CPU Frequency is throttled\n");
+		}
+		pr_info("OCC: Reset\n");
+		break;
+	case OCC_LOAD:
+		pr_info("OCC: Loaded\n");
+		break;
+	case OCC_THROTTLE:
+		chip_id = be64_to_cpu(occ_msg->params[1]);
+		reason = be64_to_cpu(occ_msg->params[2]);
+
+		if (occ_reset) {
+			occ_reset = false;
+			throttled = false;
+			pr_info("OCC: Active\n");
+			for (i = 0; i < nr_chips; i++)
+				schedule_work(&chips[i].throttle);
+			return 0;
+		}
+
+		if (reason && reason <= 5)
+			pr_info("OCC: Chip %d Pmax reduced due to %s\n",
+				(int)chip_id, throttle_reason[reason]);
+		else
+			pr_info("OCC: Chip %d %s\n", (int)chip_id,
+				throttle_reason[reason]);
+		for (i = 0; i < nr_chips; i++)
+			if (chips[i].id == (int)chip_id)
+				schedule_work(&chips[i].throttle);
+	}
+	return 0;
+}
+
+static struct notifier_block powernv_cpufreq_opal_nb = {
+	.notifier_call	= powernv_cpufreq_occ_msg,
+	.next		= NULL,
+	.priority	= 0,
+};
+
 static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
 {
 	struct powernv_smp_call_data freq_data;
@@ -414,6 +530,35 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 	.attr		= powernv_cpu_freq_attr,
 };
 
+static int init_chip_info(void)
+{
+	int chip[256], i = 0, cpu;
+	int prev_chip_id = INT_MAX;
+
+	for_each_possible_cpu(cpu) {
+		int c = cpu_to_chip_id(cpu);
+
+		if (prev_chip_id != c) {
+			prev_chip_id = c;
+			chip[nr_chips++] = c;
+		}
+	}
+
+	chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
+
+	if (!chips)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_chips; i++) {
+		chips[i].id = chip[i];
+		cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
+		chips[i].throttled = false;
+		INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
+	}
+
+	return 0;
+}
+
 static int __init powernv_cpufreq_init(void)
 {
 	int rc = 0;
@@ -429,7 +574,13 @@ static int __init powernv_cpufreq_init(void)
 		return rc;
 	}
 
+	/* Populate chip info */
+	rc = init_chip_info();
+	if (rc)
+		return rc;
+
 	register_reboot_notifier(&powernv_cpufreq_reboot_nb);
+	opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
 	return cpufreq_register_driver(&powernv_cpufreq_driver);
 }
 module_init(powernv_cpufreq_init);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
@ 2015-04-28  6:23   ` Shilpasri G Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  6:23 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Preeti U Murthy, Viresh Kumar, Rafael J. Wysocki,
	Shilpasri G Bhat, linux-pm

OCC is an On-Chip-Controller which takes care of power and thermal
safety of the chip. During runtime due to power failure or
overtemperature the OCC may throttle the frequencies of the CPUs to
remain within the power budget.

We want the cpufreq driver to be aware of such situations to be able
to report it to the user. We register to opal_message_notifier to
receive OCC messages from opal.

powernv_cpufreq_throttle_check() reports any frequency throttling and
this patch will report the reason or event that caused throttling. We
can be throttled if OCC is reset or OCC limits Pmax due to power or
thermal reasons. We are also notified of unthrottling after an OCC
reset or if OCC restores Pmax on the chip.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
CC: "Rafael J. Wysocki" <rjw@rjwysocki.net>
CC: Viresh Kumar <viresh.kumar@linaro.org>
CC: Preeti U Murthy <preeti@linux.vnet.ibm.com>
CC: linux-pm@vger.kernel.org
---
Changes from v1:
- Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
- Define a structure to store chip id, chip mask which has bits set
  for cpus present in the chip, throttled state and a work_struct.
- Modify powernv_cpufreq_throttle_check() to be called via smp_call()
- On Pmax throttling/unthrottling update 'chip.throttled' and not the
  global 'throttled' as Pmax capping is local to the chip.
- Remove the condition which checks if local pstate is less than Pmin
  while checking for Psafe frequency. When OCC becomes active after
  reset we update 'thottled' to false and when the cpufreq governor
  initiates a pstate change, the local pstate will be in Psafe and we
  will be reporting a false positive when we are not throttled.
- Schedule a kworker on receiving throttling/unthrottling OCC message
  for that chip and schedule on all chips after receiving active.
- After an OCC reset all the cpus will be in Psafe frequency. So call
  target() and restore the frequency to policy->cur after OCC_ACTIVE
  and Pmax unthrottling
- Taken care of Viresh and Preeti's comments.

 drivers/cpufreq/powernv-cpufreq.c | 181 ++++++++++++++++++++++++++++++++++----
 1 file changed, 166 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index ebef0d8..b356c9d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -27,20 +27,33 @@
 #include <linux/smp.h>
 #include <linux/of.h>
 #include <linux/reboot.h>
+#include <linux/slab.h>
 
 #include <asm/cputhreads.h>
 #include <asm/firmware.h>
 #include <asm/reg.h>
 #include <asm/smp.h> /* Required for cpu_sibling_mask() in UP configs */
+#include <asm/opal.h>
 
 #define POWERNV_MAX_PSTATES	256
 #define PMSR_PSAFE_ENABLE	(1UL << 30)
 #define PMSR_SPR_EM_DISABLE	(1UL << 31)
 #define PMSR_MAX(x)		((x >> 32) & 0xFF)
-#define PMSR_LP(x)		((x >> 48) & 0xFF)
+#define OCC_RESET		0
+#define OCC_LOAD		1
+#define OCC_THROTTLE		2
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-static bool rebooting, throttled;
+static bool rebooting, throttled, occ_reset;
+
+static struct chip {
+	int id;
+	bool throttled;
+	cpumask_t mask;
+	struct work_struct throttle;
+} *chips;
+
+static int nr_chips;
 
 /*
  * Note: The set of pstates consists of contiguous integers, the
@@ -298,28 +311,33 @@ static inline unsigned int get_nominal_index(void)
 	return powernv_pstate_info.max - powernv_pstate_info.nominal;
 }
 
-static void powernv_cpufreq_throttle_check(unsigned int cpu)
+static void powernv_cpufreq_throttle_check(void *data)
 {
+	unsigned int cpu = smp_processor_id();
 	unsigned long pmsr;
-	int pmsr_pmax, pmsr_lp;
+	int pmsr_pmax, i;
 
 	pmsr = get_pmspr(SPRN_PMSR);
 
+	for (i = 0; i < nr_chips; i++)
+		if (chips[i].id == cpu_to_chip_id(cpu))
+			break;
+
 	/* Check for Pmax Capping */
 	pmsr_pmax = (s8)PMSR_MAX(pmsr);
 	if (pmsr_pmax != powernv_pstate_info.max) {
-		throttled = true;
-		pr_info("CPU %d Pmax is reduced to %d\n", cpu, pmsr_pmax);
-		pr_info("Max allowed Pstate is capped\n");
+		if (chips[i].throttled)
+			goto next;
+		chips[i].throttled = true;
+		pr_info("CPU %d on chip %d Pmax is reduced to %d\n", cpu,
+			chips[i].id, pmsr_pmax);
+	} else {
+		chips[i].throttled = false;
 	}
 
-	/*
-	 * Check for Psafe by reading LocalPstate
-	 * or check if Psafe_mode_active is set in PMSR.
-	 */
-	pmsr_lp = (s8)PMSR_LP(pmsr);
-	if ((pmsr_lp < powernv_pstate_info.min) ||
-				(pmsr & PMSR_PSAFE_ENABLE)) {
+	/* Check if Psafe_mode_active is set in PMSR. */
+next:
+	if (pmsr & PMSR_PSAFE_ENABLE) {
 		throttled = true;
 		pr_info("Pstate set to safe frequency\n");
 	}
@@ -350,7 +368,7 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 		return 0;
 
 	if (!throttled)
-		powernv_cpufreq_throttle_check(smp_processor_id());
+		powernv_cpufreq_throttle_check(NULL);
 
 	freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
@@ -395,6 +413,104 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 	.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
+void powernv_cpufreq_work_fn(struct work_struct *work)
+{
+	struct chip *c = container_of(work, struct chip, throttle);
+	unsigned int cpu;
+
+	smp_call_function_any(&c->mask,
+			      powernv_cpufreq_throttle_check, NULL, 0);
+
+	for_each_cpu(cpu, &c->mask) {
+		int index;
+		struct cpufreq_frequency_table *freq_table;
+		struct cpufreq_policy cpu_policy;
+
+		if (!cpu_online(cpu))
+			continue;
+
+		cpufreq_get_policy(&cpu_policy, cpu);
+		freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);
+		cpufreq_frequency_table_target(&cpu_policy, freq_table,
+					       cpu_policy.cur,
+					       CPUFREQ_RELATION_C, &index);
+		powernv_cpufreq_target_index(&cpu_policy, index);
+	}
+}
+
+static char throttle_reason[][30] = {
+					"No throttling",
+					"Power Cap",
+					"Processor Over Temperature",
+					"Power Supply Failure",
+					"Over Current",
+					"OCC Reset"
+				     };
+
+static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
+				   unsigned long msg_type, void *msg)
+{
+	struct opal_msg *occ_msg = msg;
+	uint64_t token;
+	uint64_t chip_id, reason;
+	int i;
+
+	if (msg_type != OPAL_MSG_OCC)
+		return 0;
+
+	token = be64_to_cpu(occ_msg->params[0]);
+
+	switch (token) {
+	case OCC_RESET:
+		occ_reset = true;
+		/*
+		 * powernv_cpufreq_throttle_check() is called in
+		 * target() callback which can detect the throttle state
+		 * for governors like ondemand.
+		 * But static governors will not call target() often thus
+		 * report throttling here.
+		 */
+		if (!throttled) {
+			throttled = true;
+			pr_crit("CPU Frequency is throttled\n");
+		}
+		pr_info("OCC: Reset\n");
+		break;
+	case OCC_LOAD:
+		pr_info("OCC: Loaded\n");
+		break;
+	case OCC_THROTTLE:
+		chip_id = be64_to_cpu(occ_msg->params[1]);
+		reason = be64_to_cpu(occ_msg->params[2]);
+
+		if (occ_reset) {
+			occ_reset = false;
+			throttled = false;
+			pr_info("OCC: Active\n");
+			for (i = 0; i < nr_chips; i++)
+				schedule_work(&chips[i].throttle);
+			return 0;
+		}
+
+		if (reason && reason <= 5)
+			pr_info("OCC: Chip %d Pmax reduced due to %s\n",
+				(int)chip_id, throttle_reason[reason]);
+		else
+			pr_info("OCC: Chip %d %s\n", (int)chip_id,
+				throttle_reason[reason]);
+		for (i = 0; i < nr_chips; i++)
+			if (chips[i].id == (int)chip_id)
+				schedule_work(&chips[i].throttle);
+	}
+	return 0;
+}
+
+static struct notifier_block powernv_cpufreq_opal_nb = {
+	.notifier_call	= powernv_cpufreq_occ_msg,
+	.next		= NULL,
+	.priority	= 0,
+};
+
 static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
 {
 	struct powernv_smp_call_data freq_data;
@@ -414,6 +530,35 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 	.attr		= powernv_cpu_freq_attr,
 };
 
+static int init_chip_info(void)
+{
+	int chip[256], i = 0, cpu;
+	int prev_chip_id = INT_MAX;
+
+	for_each_possible_cpu(cpu) {
+		int c = cpu_to_chip_id(cpu);
+
+		if (prev_chip_id != c) {
+			prev_chip_id = c;
+			chip[nr_chips++] = c;
+		}
+	}
+
+	chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
+
+	if (!chips)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_chips; i++) {
+		chips[i].id = chip[i];
+		cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
+		chips[i].throttled = false;
+		INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
+	}
+
+	return 0;
+}
+
 static int __init powernv_cpufreq_init(void)
 {
 	int rc = 0;
@@ -429,7 +574,13 @@ static int __init powernv_cpufreq_init(void)
 		return rc;
 	}
 
+	/* Populate chip info */
+	rc = init_chip_info();
+	if (rc)
+		return rc;
+
 	register_reboot_notifier(&powernv_cpufreq_reboot_nb);
+	opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
 	return cpufreq_register_driver(&powernv_cpufreq_driver);
 }
 module_init(powernv_cpufreq_init);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
  2015-04-28  6:23   ` Shilpasri G Bhat
@ 2015-04-28  6:48     ` Viresh Kumar
  -1 siblings, 0 replies; 12+ messages in thread
From: Viresh Kumar @ 2015-04-28  6:48 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: linuxppc-dev, Linux Kernel Mailing List, Rafael J. Wysocki,
	Preeti U Murthy, linux-pm

On 28 April 2015 at 11:53, Shilpasri G Bhat
<shilpa.bhat@linux.vnet.ibm.com> wrote:

> Changes from v1:
> - Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
> - Define a structure to store chip id, chip mask which has bits set
>   for cpus present in the chip, throttled state and a work_struct.
> - Modify powernv_cpufreq_throttle_check() to be called via smp_call()

Why ? I might have missed it but there should be some reasoning behind
what you are changing.

> - On Pmax throttling/unthrottling update 'chip.throttled' and not the
>   global 'throttled' as Pmax capping is local to the chip.
> - Remove the condition which checks if local pstate is less than Pmin
>   while checking for Psafe frequency. When OCC becomes active after
>   reset we update 'thottled' to false and when the cpufreq governor
>   initiates a pstate change, the local pstate will be in Psafe and we
>   will be reporting a false positive when we are not throttled.
> - Schedule a kworker on receiving throttling/unthrottling OCC message
>   for that chip and schedule on all chips after receiving active.
> - After an OCC reset all the cpus will be in Psafe frequency. So call
>   target() and restore the frequency to policy->cur after OCC_ACTIVE
>   and Pmax unthrottling
> - Taken care of Viresh and Preeti's comments.

That's a lot. I am not an expert here and so really can't comment on
the internals of ppc. But, is it patch solving a single problem ? I don't
know, I somehow got the impression that it can be split into multiple
(smaller & review-able) patches. Only if it makes sense. Your call.

> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c

> +void powernv_cpufreq_work_fn(struct work_struct *work)
> +{
> +       struct chip *c = container_of(work, struct chip, throttle);
> +       unsigned int cpu;
> +
> +       smp_call_function_any(&c->mask,
> +                             powernv_cpufreq_throttle_check, NULL, 0);
> +
> +       for_each_cpu(cpu, &c->mask) {

for_each_online_cpu ?

> +               int index;
> +               struct cpufreq_frequency_table *freq_table;
> +               struct cpufreq_policy cpu_policy;

Name it policy.

> +
> +               if (!cpu_online(cpu))
> +                       continue;

And you can kill this..

> +               cpufreq_get_policy(&cpu_policy, cpu);
> +               freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);

Just do, policy->freq_table.


> +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
> +                                  unsigned long msg_type, void *msg)
> +{

> +               if (reason && reason <= 5)
> +                       pr_info("OCC: Chip %d Pmax reduced due to %s\n",
> +                               (int)chip_id, throttle_reason[reason]);
> +               else
> +                       pr_info("OCC: Chip %d %s\n", (int)chip_id,
> +                               throttle_reason[reason]);

Blank line here. They are better for readability after blocks and loops.

> +               for (i = 0; i < nr_chips; i++)
> +                       if (chips[i].id == (int)chip_id)

Why isn't .id 64 bit ?

> +                               schedule_work(&chips[i].throttle);
> +       }
> +       return 0;
> +}
> +
> +static struct notifier_block powernv_cpufreq_opal_nb = {
> +       .notifier_call  = powernv_cpufreq_occ_msg,
> +       .next           = NULL,
> +       .priority       = 0,
> +};
> +
>  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
>  {
>         struct powernv_smp_call_data freq_data;
> @@ -414,6 +530,35 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
>         .attr           = powernv_cpu_freq_attr,
>  };
>
> +static int init_chip_info(void)
> +{
> +       int chip[256], i = 0, cpu;
> +       int prev_chip_id = INT_MAX;
> +
> +       for_each_possible_cpu(cpu) {
> +               int c = cpu_to_chip_id(cpu);

Does 'c' refer to id here ? Name it so then.

> +
> +               if (prev_chip_id != c) {
> +                       prev_chip_id = c;
> +                       chip[nr_chips++] = c;
> +               }
> +       }
> +
> +       chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
> +

A blank line isn't preferred much here :). Sorry about these blank lines.

> +       if (!chips)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < nr_chips; i++) {
> +               chips[i].id = chip[i];
> +               cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
> +               chips[i].throttled = false;
> +               INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
> +       }
> +
> +       return 0;
> +}
> +
>  static int __init powernv_cpufreq_init(void)
>  {
>         int rc = 0;
> @@ -429,7 +574,13 @@ static int __init powernv_cpufreq_init(void)
>                 return rc;
>         }
>
> +       /* Populate chip info */
> +       rc = init_chip_info();
> +       if (rc)
> +               return rc;
> +
>         register_reboot_notifier(&powernv_cpufreq_reboot_nb);
> +       opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
>         return cpufreq_register_driver(&powernv_cpufreq_driver);
>  }
>  module_init(powernv_cpufreq_init);
> --
> 1.9.3
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
@ 2015-04-28  6:48     ` Viresh Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Viresh Kumar @ 2015-04-28  6:48 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: Preeti U Murthy, linuxppc-dev, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm

On 28 April 2015 at 11:53, Shilpasri G Bhat
<shilpa.bhat@linux.vnet.ibm.com> wrote:

> Changes from v1:
> - Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
> - Define a structure to store chip id, chip mask which has bits set
>   for cpus present in the chip, throttled state and a work_struct.
> - Modify powernv_cpufreq_throttle_check() to be called via smp_call()

Why ? I might have missed it but there should be some reasoning behind
what you are changing.

> - On Pmax throttling/unthrottling update 'chip.throttled' and not the
>   global 'throttled' as Pmax capping is local to the chip.
> - Remove the condition which checks if local pstate is less than Pmin
>   while checking for Psafe frequency. When OCC becomes active after
>   reset we update 'thottled' to false and when the cpufreq governor
>   initiates a pstate change, the local pstate will be in Psafe and we
>   will be reporting a false positive when we are not throttled.
> - Schedule a kworker on receiving throttling/unthrottling OCC message
>   for that chip and schedule on all chips after receiving active.
> - After an OCC reset all the cpus will be in Psafe frequency. So call
>   target() and restore the frequency to policy->cur after OCC_ACTIVE
>   and Pmax unthrottling
> - Taken care of Viresh and Preeti's comments.

That's a lot. I am not an expert here and so really can't comment on
the internals of ppc. But, is it patch solving a single problem ? I don't
know, I somehow got the impression that it can be split into multiple
(smaller & review-able) patches. Only if it makes sense. Your call.

> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c

> +void powernv_cpufreq_work_fn(struct work_struct *work)
> +{
> +       struct chip *c = container_of(work, struct chip, throttle);
> +       unsigned int cpu;
> +
> +       smp_call_function_any(&c->mask,
> +                             powernv_cpufreq_throttle_check, NULL, 0);
> +
> +       for_each_cpu(cpu, &c->mask) {

for_each_online_cpu ?

> +               int index;
> +               struct cpufreq_frequency_table *freq_table;
> +               struct cpufreq_policy cpu_policy;

Name it policy.

> +
> +               if (!cpu_online(cpu))
> +                       continue;

And you can kill this..

> +               cpufreq_get_policy(&cpu_policy, cpu);
> +               freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);

Just do, policy->freq_table.


> +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
> +                                  unsigned long msg_type, void *msg)
> +{

> +               if (reason && reason <= 5)
> +                       pr_info("OCC: Chip %d Pmax reduced due to %s\n",
> +                               (int)chip_id, throttle_reason[reason]);
> +               else
> +                       pr_info("OCC: Chip %d %s\n", (int)chip_id,
> +                               throttle_reason[reason]);

Blank line here. They are better for readability after blocks and loops.

> +               for (i = 0; i < nr_chips; i++)
> +                       if (chips[i].id == (int)chip_id)

Why isn't .id 64 bit ?

> +                               schedule_work(&chips[i].throttle);
> +       }
> +       return 0;
> +}
> +
> +static struct notifier_block powernv_cpufreq_opal_nb = {
> +       .notifier_call  = powernv_cpufreq_occ_msg,
> +       .next           = NULL,
> +       .priority       = 0,
> +};
> +
>  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
>  {
>         struct powernv_smp_call_data freq_data;
> @@ -414,6 +530,35 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
>         .attr           = powernv_cpu_freq_attr,
>  };
>
> +static int init_chip_info(void)
> +{
> +       int chip[256], i = 0, cpu;
> +       int prev_chip_id = INT_MAX;
> +
> +       for_each_possible_cpu(cpu) {
> +               int c = cpu_to_chip_id(cpu);

Does 'c' refer to id here ? Name it so then.

> +
> +               if (prev_chip_id != c) {
> +                       prev_chip_id = c;
> +                       chip[nr_chips++] = c;
> +               }
> +       }
> +
> +       chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
> +

A blank line isn't preferred much here :). Sorry about these blank lines.

> +       if (!chips)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < nr_chips; i++) {
> +               chips[i].id = chip[i];
> +               cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
> +               chips[i].throttled = false;
> +               INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
> +       }
> +
> +       return 0;
> +}
> +
>  static int __init powernv_cpufreq_init(void)
>  {
>         int rc = 0;
> @@ -429,7 +574,13 @@ static int __init powernv_cpufreq_init(void)
>                 return rc;
>         }
>
> +       /* Populate chip info */
> +       rc = init_chip_info();
> +       if (rc)
> +               return rc;
> +
>         register_reboot_notifier(&powernv_cpufreq_reboot_nb);
> +       opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
>         return cpufreq_register_driver(&powernv_cpufreq_driver);
>  }
>  module_init(powernv_cpufreq_init);
> --
> 1.9.3
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
  2015-04-28  6:48     ` Viresh Kumar
@ 2015-04-28  8:18       ` Shilpasri G Bhat
  -1 siblings, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  8:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linuxppc-dev, Linux Kernel Mailing List, Rafael J. Wysocki,
	Preeti U Murthy, linux-pm

Hi Viresh,

On 04/28/2015 12:18 PM, Viresh Kumar wrote:
> On 28 April 2015 at 11:53, Shilpasri G Bhat
> <shilpa.bhat@linux.vnet.ibm.com> wrote:
> 
>> Changes from v1:
>> - Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
>> - Define a structure to store chip id, chip mask which has bits set
>>   for cpus present in the chip, throttled state and a work_struct.
>> - Modify powernv_cpufreq_throttle_check() to be called via smp_call()
> 
> Why ? I might have missed it but there should be some reasoning behind
> what you are changing.

My bad I haven't added explicit comment to state reason behind this change.

I modified the definition of *throttle_check() to match the function definition
to be called via smp_call() instead of adding an additional wrapper around
*throttle_check().

OCC is a chip entity and any local throttle state changes should be associated
to cpus belonging to that chip. The *throttle_check() will read the core
register PMSR to verify throttling. All the cores in a chip will have the same
throttled state as they are managed by a the same OCC in that chip.

smp_call() is required to ensure *throttle_check() is called on a cpu belonging
to the chip for which we have received throttled/unthrottled notification. We
could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so do
an smp_call() on 'chip1'.

We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
Thus the use of kworker to do an smp_call and restore policy->cur.

OCC_RESET is global event it affects frequency of all chips. Pmax capping is
local event, it affects the frequency of a chip.

> 
>> - On Pmax throttling/unthrottling update 'chip.throttled' and not the
>>   global 'throttled' as Pmax capping is local to the chip.
>> - Remove the condition which checks if local pstate is less than Pmin
>>   while checking for Psafe frequency. When OCC becomes active after
>>   reset we update 'thottled' to false and when the cpufreq governor
>>   initiates a pstate change, the local pstate will be in Psafe and we
>>   will be reporting a false positive when we are not throttled.
>> - Schedule a kworker on receiving throttling/unthrottling OCC message
>>   for that chip and schedule on all chips after receiving active.
>> - After an OCC reset all the cpus will be in Psafe frequency. So call
>>   target() and restore the frequency to policy->cur after OCC_ACTIVE
>>   and Pmax unthrottling
>> - Taken care of Viresh and Preeti's comments.
> 
> That's a lot. I am not an expert here and so really can't comment on
> the internals of ppc. But, is it patch solving a single problem ? I don't
> know, I somehow got the impression that it can be split into multiple
> (smaller & review-able) patches. Only if it makes sense. Your call.

All the changes introduced in this patch is centered around opal_message
notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
patches but it all will be relevant only to solve the above problem.

> 
>> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> 
>> +void powernv_cpufreq_work_fn(struct work_struct *work)
>> +{
>> +       struct chip *c = container_of(work, struct chip, throttle);
>> +       unsigned int cpu;
>> +
>> +       smp_call_function_any(&c->mask,
>> +                             powernv_cpufreq_throttle_check, NULL, 0);
>> +
>> +       for_each_cpu(cpu, &c->mask) {
> 
> for_each_online_cpu ?

I want to iterate on all the cpus in a chip stored in 'struct chip.mask'.
If you were intending me to avoid 'if(!cpu_online(cpu))' then will the following do:

for_each_cpu_and(cpu, &c->mask, cpu_online_mask)

> 
>> +               int index;
>> +               struct cpufreq_frequency_table *freq_table;
>> +               struct cpufreq_policy cpu_policy;
> 
> Name it policy.

Okay.

> 
>> +
>> +               if (!cpu_online(cpu))
>> +                       continue;
> 
> And you can kill this..
> 
>> +               cpufreq_get_policy(&cpu_policy, cpu);
>> +               freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);
> 
> Just do, policy->freq_table.

Okay.

> 
> 
>> +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
>> +                                  unsigned long msg_type, void *msg)
>> +{
> 
>> +               if (reason && reason <= 5)
>> +                       pr_info("OCC: Chip %d Pmax reduced due to %s\n",
>> +                               (int)chip_id, throttle_reason[reason]);
>> +               else
>> +                       pr_info("OCC: Chip %d %s\n", (int)chip_id,
>> +                               throttle_reason[reason]);
> 
> Blank line here. They are better for readability after blocks and loops.

Yes will do.

> 
>> +               for (i = 0; i < nr_chips; i++)
>> +                       if (chips[i].id == (int)chip_id)
> 
> Why isn't .id 64 bit ?

I guess 6 bits are sufficient to store chip id given that max number of chips
can be 256. I don't have good reason for defining .id 32 bit.

Yeah 64-bit .id will avoid the typecast.

> 
>> +                               schedule_work(&chips[i].throttle);
>> +       }
>> +       return 0;
>> +}
>> +
>> +static struct notifier_block powernv_cpufreq_opal_nb = {
>> +       .notifier_call  = powernv_cpufreq_occ_msg,
>> +       .next           = NULL,
>> +       .priority       = 0,
>> +};
>> +
>>  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
>>  {
>>         struct powernv_smp_call_data freq_data;
>> @@ -414,6 +530,35 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
>>         .attr           = powernv_cpu_freq_attr,
>>  };
>>
>> +static int init_chip_info(void)
>> +{
>> +       int chip[256], i = 0, cpu;
>> +       int prev_chip_id = INT_MAX;
>> +
>> +       for_each_possible_cpu(cpu) {
>> +               int c = cpu_to_chip_id(cpu);
> 
> Does 'c' refer to id here ? Name it so then.

Okay.

> 
>> +
>> +               if (prev_chip_id != c) {
>> +                       prev_chip_id = c;
>> +                       chip[nr_chips++] = c;
>> +               }
>> +       }
>> +
>> +       chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
>> +
> 
> A blank line isn't preferred much here :). Sorry about these blank lines.

Okay.

Thanks and Regards,
Shilpa


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
@ 2015-04-28  8:18       ` Shilpasri G Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  8:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Preeti U Murthy, linuxppc-dev, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm

Hi Viresh,

On 04/28/2015 12:18 PM, Viresh Kumar wrote:
> On 28 April 2015 at 11:53, Shilpasri G Bhat
> <shilpa.bhat@linux.vnet.ibm.com> wrote:
> 
>> Changes from v1:
>> - Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
>> - Define a structure to store chip id, chip mask which has bits set
>>   for cpus present in the chip, throttled state and a work_struct.
>> - Modify powernv_cpufreq_throttle_check() to be called via smp_call()
> 
> Why ? I might have missed it but there should be some reasoning behind
> what you are changing.

My bad I haven't added explicit comment to state reason behind this change.

I modified the definition of *throttle_check() to match the function definition
to be called via smp_call() instead of adding an additional wrapper around
*throttle_check().

OCC is a chip entity and any local throttle state changes should be associated
to cpus belonging to that chip. The *throttle_check() will read the core
register PMSR to verify throttling. All the cores in a chip will have the same
throttled state as they are managed by a the same OCC in that chip.

smp_call() is required to ensure *throttle_check() is called on a cpu belonging
to the chip for which we have received throttled/unthrottled notification. We
could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so do
an smp_call() on 'chip1'.

We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
Thus the use of kworker to do an smp_call and restore policy->cur.

OCC_RESET is global event it affects frequency of all chips. Pmax capping is
local event, it affects the frequency of a chip.

> 
>> - On Pmax throttling/unthrottling update 'chip.throttled' and not the
>>   global 'throttled' as Pmax capping is local to the chip.
>> - Remove the condition which checks if local pstate is less than Pmin
>>   while checking for Psafe frequency. When OCC becomes active after
>>   reset we update 'thottled' to false and when the cpufreq governor
>>   initiates a pstate change, the local pstate will be in Psafe and we
>>   will be reporting a false positive when we are not throttled.
>> - Schedule a kworker on receiving throttling/unthrottling OCC message
>>   for that chip and schedule on all chips after receiving active.
>> - After an OCC reset all the cpus will be in Psafe frequency. So call
>>   target() and restore the frequency to policy->cur after OCC_ACTIVE
>>   and Pmax unthrottling
>> - Taken care of Viresh and Preeti's comments.
> 
> That's a lot. I am not an expert here and so really can't comment on
> the internals of ppc. But, is it patch solving a single problem ? I don't
> know, I somehow got the impression that it can be split into multiple
> (smaller & review-able) patches. Only if it makes sense. Your call.

All the changes introduced in this patch is centered around opal_message
notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
patches but it all will be relevant only to solve the above problem.

> 
>> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> 
>> +void powernv_cpufreq_work_fn(struct work_struct *work)
>> +{
>> +       struct chip *c = container_of(work, struct chip, throttle);
>> +       unsigned int cpu;
>> +
>> +       smp_call_function_any(&c->mask,
>> +                             powernv_cpufreq_throttle_check, NULL, 0);
>> +
>> +       for_each_cpu(cpu, &c->mask) {
> 
> for_each_online_cpu ?

I want to iterate on all the cpus in a chip stored in 'struct chip.mask'.
If you were intending me to avoid 'if(!cpu_online(cpu))' then will the following do:

for_each_cpu_and(cpu, &c->mask, cpu_online_mask)

> 
>> +               int index;
>> +               struct cpufreq_frequency_table *freq_table;
>> +               struct cpufreq_policy cpu_policy;
> 
> Name it policy.

Okay.

> 
>> +
>> +               if (!cpu_online(cpu))
>> +                       continue;
> 
> And you can kill this..
> 
>> +               cpufreq_get_policy(&cpu_policy, cpu);
>> +               freq_table = cpufreq_frequency_get_table(cpu_policy.cpu);
> 
> Just do, policy->freq_table.

Okay.

> 
> 
>> +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
>> +                                  unsigned long msg_type, void *msg)
>> +{
> 
>> +               if (reason && reason <= 5)
>> +                       pr_info("OCC: Chip %d Pmax reduced due to %s\n",
>> +                               (int)chip_id, throttle_reason[reason]);
>> +               else
>> +                       pr_info("OCC: Chip %d %s\n", (int)chip_id,
>> +                               throttle_reason[reason]);
> 
> Blank line here. They are better for readability after blocks and loops.

Yes will do.

> 
>> +               for (i = 0; i < nr_chips; i++)
>> +                       if (chips[i].id == (int)chip_id)
> 
> Why isn't .id 64 bit ?

I guess 6 bits are sufficient to store chip id given that max number of chips
can be 256. I don't have good reason for defining .id 32 bit.

Yeah 64-bit .id will avoid the typecast.

> 
>> +                               schedule_work(&chips[i].throttle);
>> +       }
>> +       return 0;
>> +}
>> +
>> +static struct notifier_block powernv_cpufreq_opal_nb = {
>> +       .notifier_call  = powernv_cpufreq_occ_msg,
>> +       .next           = NULL,
>> +       .priority       = 0,
>> +};
>> +
>>  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
>>  {
>>         struct powernv_smp_call_data freq_data;
>> @@ -414,6 +530,35 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
>>         .attr           = powernv_cpu_freq_attr,
>>  };
>>
>> +static int init_chip_info(void)
>> +{
>> +       int chip[256], i = 0, cpu;
>> +       int prev_chip_id = INT_MAX;
>> +
>> +       for_each_possible_cpu(cpu) {
>> +               int c = cpu_to_chip_id(cpu);
> 
> Does 'c' refer to id here ? Name it so then.

Okay.

> 
>> +
>> +               if (prev_chip_id != c) {
>> +                       prev_chip_id = c;
>> +                       chip[nr_chips++] = c;
>> +               }
>> +       }
>> +
>> +       chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
>> +
> 
> A blank line isn't preferred much here :). Sorry about these blank lines.

Okay.

Thanks and Regards,
Shilpa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
  2015-04-28  8:18       ` Shilpasri G Bhat
@ 2015-04-28  8:53         ` Viresh Kumar
  -1 siblings, 0 replies; 12+ messages in thread
From: Viresh Kumar @ 2015-04-28  8:53 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: linuxppc-dev, Linux Kernel Mailing List, Rafael J. Wysocki,
	Preeti U Murthy, linux-pm

On 28 April 2015 at 13:48, Shilpasri G Bhat
<shilpa.bhat@linux.vnet.ibm.com> wrote:
> My bad I haven't added explicit comment to state reason behind this change.
>
> I modified the definition of *throttle_check() to match the function definition
> to be called via smp_call() instead of adding an additional wrapper around
> *throttle_check().
>
> OCC is a chip entity and any local throttle state changes should be associated
> to cpus belonging to that chip. The *throttle_check() will read the core
> register PMSR to verify throttling. All the cores in a chip will have the same
> throttled state as they are managed by a the same OCC in that chip.
>
> smp_call() is required to ensure *throttle_check() is called on a cpu belonging
> to the chip for which we have received throttled/unthrottled notification. We
> could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so do
> an smp_call() on 'chip1'.

Okay. Lets talk about the code that is already present in mainline. Isn't that
suffering from this issue ? If yes, then you need to bugfix that separately.

> We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
> Thus the use of kworker to do an smp_call and restore policy->cur.
>
> OCC_RESET is global event it affects frequency of all chips. Pmax capping is
> local event, it affects the frequency of a chip.
>

>> That's a lot. I am not an expert here and so really can't comment on
>> the internals of ppc. But, is it patch solving a single problem ? I don't
>> know, I somehow got the impression that it can be split into multiple
>> (smaller & review-able) patches. Only if it makes sense. Your call.
>
> All the changes introduced in this patch is centered around opal_message
> notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
> patches but it all will be relevant only to solve the above problem.

And that's what I meant here. Yes, this all is solving a central problem, but
a patch must be divided into separate, independently working, entities.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
@ 2015-04-28  8:53         ` Viresh Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Viresh Kumar @ 2015-04-28  8:53 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: Preeti U Murthy, linuxppc-dev, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm

On 28 April 2015 at 13:48, Shilpasri G Bhat
<shilpa.bhat@linux.vnet.ibm.com> wrote:
> My bad I haven't added explicit comment to state reason behind this change.
>
> I modified the definition of *throttle_check() to match the function definition
> to be called via smp_call() instead of adding an additional wrapper around
> *throttle_check().
>
> OCC is a chip entity and any local throttle state changes should be associated
> to cpus belonging to that chip. The *throttle_check() will read the core
> register PMSR to verify throttling. All the cores in a chip will have the same
> throttled state as they are managed by a the same OCC in that chip.
>
> smp_call() is required to ensure *throttle_check() is called on a cpu belonging
> to the chip for which we have received throttled/unthrottled notification. We
> could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so do
> an smp_call() on 'chip1'.

Okay. Lets talk about the code that is already present in mainline. Isn't that
suffering from this issue ? If yes, then you need to bugfix that separately.

> We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
> Thus the use of kworker to do an smp_call and restore policy->cur.
>
> OCC_RESET is global event it affects frequency of all chips. Pmax capping is
> local event, it affects the frequency of a chip.
>

>> That's a lot. I am not an expert here and so really can't comment on
>> the internals of ppc. But, is it patch solving a single problem ? I don't
>> know, I somehow got the impression that it can be split into multiple
>> (smaller & review-able) patches. Only if it makes sense. Your call.
>
> All the changes introduced in this patch is centered around opal_message
> notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
> patches but it all will be relevant only to solve the above problem.

And that's what I meant here. Yes, this all is solving a central problem, but
a patch must be divided into separate, independently working, entities.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
  2015-04-28  8:53         ` Viresh Kumar
@ 2015-04-28  9:07           ` Shilpasri G Bhat
  -1 siblings, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  9:07 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linuxppc-dev, Linux Kernel Mailing List, Rafael J. Wysocki,
	Preeti U Murthy, linux-pm



On 04/28/2015 02:23 PM, Viresh Kumar wrote:
> On 28 April 2015 at 13:48, Shilpasri G Bhat
> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>> My bad I haven't added explicit comment to state reason behind this change.
>>
>> I modified the definition of *throttle_check() to match the function definition
>> to be called via smp_call() instead of adding an additional wrapper around
>> *throttle_check().
>>
>> OCC is a chip entity and any local throttle state changes should be associated
>> to cpus belonging to that chip. The *throttle_check() will read the core
>> register PMSR to verify throttling. All the cores in a chip will have the same
>> throttled state as they are managed by a the same OCC in that chip.
>>
>> smp_call() is required to ensure *throttle_check() is called on a cpu belonging
>> to the chip for which we have received throttled/unthrottled notification. We
>> could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so do
>> an smp_call() on 'chip1'.
> 
> Okay. Lets talk about the code that is already present in mainline. Isn't that
> suffering from this issue ? If yes, then you need to bugfix that separately.

Nope. The upstream code does not have this issue as it does not have checks to
detect unthrottling state. The unthrottling i.e, 'throttled=false' is being
handled only in this patchset.

Yes this can be fixed separately.

> 
>> We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
>> Thus the use of kworker to do an smp_call and restore policy->cur.
>>
>> OCC_RESET is global event it affects frequency of all chips. Pmax capping is
>> local event, it affects the frequency of a chip.
>>
> 
>>> That's a lot. I am not an expert here and so really can't comment on
>>> the internals of ppc. But, is it patch solving a single problem ? I don't
>>> know, I somehow got the impression that it can be split into multiple
>>> (smaller & review-able) patches. Only if it makes sense. Your call.
>>
>> All the changes introduced in this patch is centered around opal_message
>> notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
>> patches but it all will be relevant only to solve the above problem.
> 
> And that's what I meant here. Yes, this all is solving a central problem, but
> a patch must be divided into separate, independently working, entities.
> 

Yup agree. Will do.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification
@ 2015-04-28  9:07           ` Shilpasri G Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2015-04-28  9:07 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Preeti U Murthy, linuxppc-dev, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm



On 04/28/2015 02:23 PM, Viresh Kumar wrote:
> On 28 April 2015 at 13:48, Shilpasri G Bhat
> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>> My bad I haven't added explicit comment to state reason behind this change.
>>
>> I modified the definition of *throttle_check() to match the function definition
>> to be called via smp_call() instead of adding an additional wrapper around
>> *throttle_check().
>>
>> OCC is a chip entity and any local throttle state changes should be associated
>> to cpus belonging to that chip. The *throttle_check() will read the core
>> register PMSR to verify throttling. All the cores in a chip will have the same
>> throttled state as they are managed by a the same OCC in that chip.
>>
>> smp_call() is required to ensure *throttle_check() is called on a cpu belonging
>> to the chip for which we have received throttled/unthrottled notification. We
>> could be handling throttled/unthrottled notification of 'chip1' in 'chip2' so do
>> an smp_call() on 'chip1'.
> 
> Okay. Lets talk about the code that is already present in mainline. Isn't that
> suffering from this issue ? If yes, then you need to bugfix that separately.

Nope. The upstream code does not have this issue as it does not have checks to
detect unthrottling state. The unthrottling i.e, 'throttled=false' is being
handled only in this patchset.

Yes this can be fixed separately.

> 
>> We are irq_disabled in powernv_cpufreq_occ_msg() the notification handler.
>> Thus the use of kworker to do an smp_call and restore policy->cur.
>>
>> OCC_RESET is global event it affects frequency of all chips. Pmax capping is
>> local event, it affects the frequency of a chip.
>>
> 
>>> That's a lot. I am not an expert here and so really can't comment on
>>> the internals of ppc. But, is it patch solving a single problem ? I don't
>>> know, I somehow got the impression that it can be split into multiple
>>> (smaller & review-able) patches. Only if it makes sense. Your call.
>>
>> All the changes introduced in this patch is centered around opal_message
>> notification handler powernv_cpufreq_occ_msg(). I can split it into multiple
>> patches but it all will be relevant only to solve the above problem.
> 
> And that's what I meant here. Yes, this all is solving a central problem, but
> a patch must be divided into separate, independently working, entities.
> 

Yup agree. Will do.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-28  9:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-28  6:23 [PATCH v2 0/2] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
2015-04-28  6:23 ` [PATCH v2 1/2] powerpc/powernv: Add definition of OPAL_MSG_OCC message type Shilpasri G Bhat
2015-04-28  6:23 ` [PATCH v2 2/2] cpufreq: powernv: Register for OCC related opal_message notification Shilpasri G Bhat
2015-04-28  6:23   ` Shilpasri G Bhat
2015-04-28  6:48   ` Viresh Kumar
2015-04-28  6:48     ` Viresh Kumar
2015-04-28  8:18     ` Shilpasri G Bhat
2015-04-28  8:18       ` Shilpasri G Bhat
2015-04-28  8:53       ` Viresh Kumar
2015-04-28  8:53         ` Viresh Kumar
2015-04-28  9:07         ` Shilpasri G Bhat
2015-04-28  9:07           ` Shilpasri G Bhat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.