All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC
@ 2015-07-13 14:09 Shilpasri G Bhat
  2015-07-13 14:09 ` [PATCH v4 1/6] cpufreq: powernv: Handle throttling due to Pmax capping at chip level Shilpasri G Bhat
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-13 14:09 UTC (permalink / raw)
  To: rjw; +Cc: viresh.kumar, linuxppc-dev, linux-kernel, linux-pm, Shilpasri G Bhat

This patchset intends to add frequency throttle reporting mechanism
to powernv-cpufreq driver when OCC throttles the frequency. OCC is an
On-Chip-Controller which takes care of the power and thermal safety of
the chip. The CPU frequency can be throttled during an OCC reset or
when OCC tries to limit the max allowed frequency. The patchset will
report such conditions so as to keep the user informed about reason
for the drop in performance of workloads when frequency is throttled.

Changes from v3:
- Rebased on top of 4.2-rc1
- Minor changes in patch 2,3,4,6 this does not change the
  functionality of the code
- 594fcb9ec9e powerpc/powernv: Expose OPAL APIs required by PRD
  interface , this patch fixes the build error due to which this
  series was initially dropped
  ERROR: ".opal_message_notifier_register"
  drivers/cpufreq/powernv-cpufreq.ko] undefined!

Changes from v2:
- Split into multiple patches
- Semantic fixes

Shilpasri G Bhat (6):
  cpufreq: powernv: Handle throttling due to Pmax capping at chip level
  powerpc/powernv: Add definition of OPAL_MSG_OCC message type
  cpufreq: powernv: Register for OCC related opal_message notification
  cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE
  cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is
    set
  cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling

 arch/powerpc/include/asm/opal-api.h |  12 +++
 drivers/cpufreq/powernv-cpufreq.c   | 195 +++++++++++++++++++++++++++++++++---
 2 files changed, 192 insertions(+), 15 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/6] cpufreq: powernv: Handle throttling due to Pmax capping at chip level
  2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
@ 2015-07-13 14:09 ` Shilpasri G Bhat
  2015-07-13 14:09 ` [PATCH v4 2/6] powerpc/powernv: Add definition of OPAL_MSG_OCC message type Shilpasri G Bhat
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-13 14:09 UTC (permalink / raw)
  To: rjw; +Cc: viresh.kumar, linuxppc-dev, linux-kernel, linux-pm, Shilpasri G Bhat

The On-Chip-Controller(OCC) can throttle cpu frequency by reducing the
max allowed frequency for that chip if the chip exceeds its power or
temperature limits. As Pmax capping is a chip level condition report
this throttling behavior at chip level and also do not set the global
'throttled' on Pmax capping instead set the per-chip throttled
variable. Report unthrottling if Pmax is restored after throttling.

This patch adds a structure to store chip id and throttled state of
the chip.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
No change from v3

 drivers/cpufreq/powernv-cpufreq.c | 59 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index ebef0d8..d0c18c9 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -27,6 +27,7 @@
 #include <linux/smp.h>
 #include <linux/of.h>
 #include <linux/reboot.h>
+#include <linux/slab.h>
 
 #include <asm/cputhreads.h>
 #include <asm/firmware.h>
@@ -42,6 +43,13 @@
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled;
 
+static struct chip {
+	unsigned int id;
+	bool throttled;
+} *chips;
+
+static int nr_chips;
+
 /*
  * Note: The set of pstates consists of contiguous integers, the
  * smallest of which is indicated by powernv_pstate_info.min, the
@@ -301,22 +309,33 @@ static inline unsigned int get_nominal_index(void)
 static void powernv_cpufreq_throttle_check(unsigned int cpu)
 {
 	unsigned long pmsr;
-	int pmsr_pmax, pmsr_lp;
+	int pmsr_pmax, pmsr_lp, i;
 
 	pmsr = get_pmspr(SPRN_PMSR);
 
+	for (i = 0; i < nr_chips; i++)
+		if (chips[i].id == cpu_to_chip_id(cpu))
+			break;
+
 	/* Check for Pmax Capping */
 	pmsr_pmax = (s8)PMSR_MAX(pmsr);
 	if (pmsr_pmax != powernv_pstate_info.max) {
-		throttled = true;
-		pr_info("CPU %d Pmax is reduced to %d\n", cpu, pmsr_pmax);
-		pr_info("Max allowed Pstate is capped\n");
+		if (chips[i].throttled)
+			goto next;
+		chips[i].throttled = true;
+		pr_info("CPU %d on Chip %u has Pmax reduced to %d\n", cpu,
+			chips[i].id, pmsr_pmax);
+	} else if (chips[i].throttled) {
+		chips[i].throttled = false;
+		pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
+			chips[i].id, pmsr_pmax);
 	}
 
 	/*
 	 * Check for Psafe by reading LocalPstate
 	 * or check if Psafe_mode_active is set in PMSR.
 	 */
+next:
 	pmsr_lp = (s8)PMSR_LP(pmsr);
 	if ((pmsr_lp < powernv_pstate_info.min) ||
 				(pmsr & PMSR_PSAFE_ENABLE)) {
@@ -414,6 +433,33 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 	.attr		= powernv_cpu_freq_attr,
 };
 
+static int init_chip_info(void)
+{
+	unsigned int chip[256];
+	unsigned int cpu, i;
+	unsigned int prev_chip_id = UINT_MAX;
+
+	for_each_possible_cpu(cpu) {
+		unsigned int id = cpu_to_chip_id(cpu);
+
+		if (prev_chip_id != id) {
+			prev_chip_id = id;
+			chip[nr_chips++] = id;
+		}
+	}
+
+	chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
+	if (!chips)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_chips; i++) {
+		chips[i].id = chip[i];
+		chips[i].throttled = false;
+	}
+
+	return 0;
+}
+
 static int __init powernv_cpufreq_init(void)
 {
 	int rc = 0;
@@ -429,6 +475,11 @@ static int __init powernv_cpufreq_init(void)
 		return rc;
 	}
 
+	/* Populate chip info */
+	rc = init_chip_info();
+	if (rc)
+		return rc;
+
 	register_reboot_notifier(&powernv_cpufreq_reboot_nb);
 	return cpufreq_register_driver(&powernv_cpufreq_driver);
 }
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/6] powerpc/powernv: Add definition of OPAL_MSG_OCC message type
  2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
  2015-07-13 14:09 ` [PATCH v4 1/6] cpufreq: powernv: Handle throttling due to Pmax capping at chip level Shilpasri G Bhat
@ 2015-07-13 14:09 ` Shilpasri G Bhat
  2015-07-13 14:09 ` [PATCH v4 3/6] cpufreq: powernv: Register for OCC related opal_message notification Shilpasri G Bhat
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-13 14:09 UTC (permalink / raw)
  To: rjw
  Cc: viresh.kumar, linuxppc-dev, linux-kernel, linux-pm,
	Shilpasri G Bhat, Stewart Smith

Add OPAL_MSG_OCC message definition to opal_message_type to receive
OCC events like reset, load and throttled. Host performance can be
affected when OCC is reset or OCC throttles the max Pstate.
We can register to opal_message_notifier to receive OPAL_MSG_OCC type
of message and report it to the userspace so as to keep the user
informed about the reason for a performance drop in workloads.

The reset and load OCC events are notified to kernel when FSP sends
OCC_RESET and OCC_LOAD commands.  Both reset and load messages are
sent to kernel on successful completion of reset and load operation
respectively.

The throttle OCC event indicates that the Pmax of the chip is reduced.
The chip_id and throttle reason for reducing Pmax is also queued along
with the message.

CC: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from v3:
- '0d7cd8550d3 powerpc/powernv: Add opal-prd channel' this patch adds
  the definition of OPAL_MSG_PRD, so remove it and update the
  changelog.
- Move the definitions of OCC_RESET, OCC_LOAD and OCC_THROTTLE from 
  drivers/cpufreq/powernv-cpufreq.c to arch/powerpc/include/asm/opal-api.h
- Define OCC_MAX_THROTTLE_STATUS 
- Add a wrapper structure 'opal_occ_msg' to copy 'struct opal_msg.params[0..2]'
  This structure will define the parameters received from firmware to
  maintain compatibility for any future additions.

No change from v2

Change from v1:
- Update the commit changelog

 arch/powerpc/include/asm/opal-api.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index e9e4c52..64dc9f5 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -361,6 +361,7 @@ enum opal_msg_type {
 	OPAL_MSG_HMI_EVT,
 	OPAL_MSG_DPO,
 	OPAL_MSG_PRD,
+	OPAL_MSG_OCC,
 	OPAL_MSG_TYPE_MAX,
 };
 
@@ -700,6 +701,17 @@ struct opal_prd_msg_header {
 
 struct opal_prd_msg;
 
+#define OCC_RESET                       0
+#define OCC_LOAD                        1
+#define OCC_THROTTLE                    2
+#define OCC_MAX_THROTTLE_STATUS         5
+
+struct opal_occ_msg {
+	__be64 type;
+	__be64 chip;
+	__be64 throttle_status;
+};
+
 /*
  * SG entries
  *
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/6] cpufreq: powernv: Register for OCC related opal_message notification
  2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
  2015-07-13 14:09 ` [PATCH v4 1/6] cpufreq: powernv: Handle throttling due to Pmax capping at chip level Shilpasri G Bhat
  2015-07-13 14:09 ` [PATCH v4 2/6] powerpc/powernv: Add definition of OPAL_MSG_OCC message type Shilpasri G Bhat
@ 2015-07-13 14:09 ` Shilpasri G Bhat
  2015-07-15  6:17   ` Joel Stanley
  2015-07-13 14:10 ` [PATCH v4 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE Shilpasri G Bhat
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-13 14:09 UTC (permalink / raw)
  To: rjw; +Cc: viresh.kumar, linuxppc-dev, linux-kernel, linux-pm, Shilpasri G Bhat

OCC is an On-Chip-Controller which takes care of power and thermal
safety of the chip. During runtime due to power failure or
overtemperature the OCC may throttle the frequencies of the CPUs to
remain within the power budget.

We want the cpufreq driver to be aware of such situations to be able
to report the reason to the user. We register to opal_message_notifier
to receive OCC messages from opal.

powernv_cpufreq_throttle_check() reports any frequency throttling and
this patch will report the reason or event that caused throttling. We
can be throttled if OCC is reset or OCC limits Pmax due to power or
thermal reasons. We are also notified of unthrottling after an OCC
reset or if OCC restores Pmax on the chip.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from v3:
- Move the macro definitions of OCC_RESET, OCC_LOAD, OCC_THROTTLE to
  arch/powerpc/include/asm/opal-api.h
- Use 'struct opal_occ_msg' to copy the 'opal_msg->params[]' and refer
  the members of this structure in the code; Replace 'chip_id',
  'token' and 'reason' with omsg.chip, omsg.type, omsg.throttle_status
- Use OCC_MAX_THROTTLE_STATUS instead of the magic number.
- Add opal_message_notifier_unregister()

Changes from v2:
- Patch split in to multiple patches.
- This patch contains only the opal_message notification handler

Changes from v1:
- Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
- Define a structure to store chip id, chip mask which has bits set
  for cpus present in the chip, throttled state and a work_struct.
- Modify powernv_cpufreq_throttle_check() to be called via smp_call()
- On Pmax throttling/unthrottling update 'chip.throttled' and not the
  global 'throttled' as Pmax capping is local to the chip.
- Remove the condition which checks if local pstate is less than Pmin
  while checking for Psafe frequency. When OCC becomes active after
  reset we update 'thottled' to false and when the cpufreq governor
  initiates a pstate change, the local pstate will be in Psafe and we
  will be reporting a false positive when we are not throttled.
- Schedule a kworker on receiving throttling/unthrottling OCC message
  for that chip and schedule on all chips after receiving active.
- After an OCC reset all the cpus will be in Psafe frequency. So call
  target() and restore the frequency to policy->cur after OCC_ACTIVE
  and Pmax unthrottling
- Taken care of Viresh and Preeti's comments.

 drivers/cpufreq/powernv-cpufreq.c | 71 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index d0c18c9..1f59958 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -33,6 +33,7 @@
 #include <asm/firmware.h>
 #include <asm/reg.h>
 #include <asm/smp.h> /* Required for cpu_sibling_mask() in UP configs */
+#include <asm/opal.h>
 
 #define POWERNV_MAX_PSTATES	256
 #define PMSR_PSAFE_ENABLE	(1UL << 30)
@@ -41,7 +42,7 @@
 #define PMSR_LP(x)		((x >> 48) & 0xFF)
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-static bool rebooting, throttled;
+static bool rebooting, throttled, occ_reset;
 
 static struct chip {
 	unsigned int id;
@@ -414,6 +415,71 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 	.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
+static char throttle_reason[][30] = {
+					"No throttling",
+					"Power Cap",
+					"Processor Over Temperature",
+					"Power Supply Failure",
+					"Over Current",
+					"OCC Reset"
+				     };
+
+static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
+				   unsigned long msg_type, void *_msg)
+{
+	struct opal_msg *msg = _msg;
+	struct opal_occ_msg omsg;
+
+	if (msg_type != OPAL_MSG_OCC)
+		return 0;
+
+	memcpy(&omsg, msg->params, sizeof(omsg));
+
+	switch (omsg.type) {
+	case OCC_RESET:
+		occ_reset = true;
+		/*
+		 * powernv_cpufreq_throttle_check() is called in
+		 * target() callback which can detect the throttle state
+		 * for governors like ondemand.
+		 * But static governors will not call target() often thus
+		 * report throttling here.
+		 */
+		if (!throttled) {
+			throttled = true;
+			pr_crit("CPU Frequency is throttled\n");
+		}
+		pr_info("OCC: Reset\n");
+		break;
+	case OCC_LOAD:
+		pr_info("OCC: Loaded\n");
+		break;
+	case OCC_THROTTLE:
+		if (occ_reset) {
+			occ_reset = false;
+			throttled = false;
+			pr_info("OCC: Active\n");
+			return 0;
+		}
+
+		if (omsg.throttle_status &&
+		    omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS)
+			pr_info("OCC: Chip %u Pmax reduced due to %s\n",
+				(unsigned int)omsg.chip,
+				throttle_reason[omsg.throttle_status]);
+		else if (!omsg.throttle_status)
+			pr_info("OCC: Chip %u %s\n", (unsigned int)omsg.chip,
+				throttle_reason[omsg.throttle_status]);
+	}
+	return 0;
+}
+
+static struct notifier_block powernv_cpufreq_opal_nb = {
+	.notifier_call	= powernv_cpufreq_occ_msg,
+	.next		= NULL,
+	.priority	= 0,
+};
+
 static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
 {
 	struct powernv_smp_call_data freq_data;
@@ -481,6 +547,7 @@ static int __init powernv_cpufreq_init(void)
 		return rc;
 
 	register_reboot_notifier(&powernv_cpufreq_reboot_nb);
+	opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
 	return cpufreq_register_driver(&powernv_cpufreq_driver);
 }
 module_init(powernv_cpufreq_init);
@@ -488,6 +555,8 @@ module_init(powernv_cpufreq_init);
 static void __exit powernv_cpufreq_exit(void)
 {
 	unregister_reboot_notifier(&powernv_cpufreq_reboot_nb);
+	opal_message_notifier_unregister(OPAL_MSG_OCC,
+					 &powernv_cpufreq_opal_nb);
 	cpufreq_unregister_driver(&powernv_cpufreq_driver);
 }
 module_exit(powernv_cpufreq_exit);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE
  2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
                   ` (2 preceding siblings ...)
  2015-07-13 14:09 ` [PATCH v4 3/6] cpufreq: powernv: Register for OCC related opal_message notification Shilpasri G Bhat
@ 2015-07-13 14:10 ` Shilpasri G Bhat
  2015-07-13 14:10 ` [PATCH v4 5/6] cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is set Shilpasri G Bhat
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-13 14:10 UTC (permalink / raw)
  To: rjw; +Cc: viresh.kumar, linuxppc-dev, linux-kernel, linux-pm, Shilpasri G Bhat

Re-evaluate the chip's throttled state on recieving OCC_THROTTLE
notification by executing *throttle_check() on any one of the cpu on
the chip. This is a sanity check to verify if we were indeed
throttled/unthrottled after receiving OCC_THROTTLE notification.

We cannot call *throttle_check() directly from the notification
handler because we could be handling chip1's notification in chip2. So
initiate an smp_call to execute *throttle_check(). We are irq-disabled
in the notification handler, so use a worker thread to smp_call
throttle_check() on any of the cpu in the chipmask.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from v3:
- Refer to the members of 'struct opal_occ_msg' in the patch.
  Replace 'chip_id' with 'omsg.chip'

 drivers/cpufreq/powernv-cpufreq.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 1f59958..f2da30a 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -47,6 +47,8 @@ static bool rebooting, throttled, occ_reset;
 static struct chip {
 	unsigned int id;
 	bool throttled;
+	cpumask_t mask;
+	struct work_struct throttle;
 } *chips;
 
 static int nr_chips;
@@ -307,8 +309,9 @@ static inline unsigned int get_nominal_index(void)
 	return powernv_pstate_info.max - powernv_pstate_info.nominal;
 }
 
-static void powernv_cpufreq_throttle_check(unsigned int cpu)
+static void powernv_cpufreq_throttle_check(void *data)
 {
+	unsigned int cpu = smp_processor_id();
 	unsigned long pmsr;
 	int pmsr_pmax, pmsr_lp, i;
 
@@ -370,7 +373,7 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 		return 0;
 
 	if (!throttled)
-		powernv_cpufreq_throttle_check(smp_processor_id());
+		powernv_cpufreq_throttle_check(NULL);
 
 	freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
@@ -415,6 +418,14 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 	.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
+void powernv_cpufreq_work_fn(struct work_struct *work)
+{
+	struct chip *chip = container_of(work, struct chip, throttle);
+
+	smp_call_function_any(&chip->mask,
+			      powernv_cpufreq_throttle_check, NULL, 0);
+}
+
 static char throttle_reason[][30] = {
 					"No throttling",
 					"Power Cap",
@@ -429,6 +440,7 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 {
 	struct opal_msg *msg = _msg;
 	struct opal_occ_msg omsg;
+	int i;
 
 	if (msg_type != OPAL_MSG_OCC)
 		return 0;
@@ -459,6 +471,10 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 			occ_reset = false;
 			throttled = false;
 			pr_info("OCC: Active\n");
+
+			for (i = 0; i < nr_chips; i++)
+				schedule_work(&chips[i].throttle);
+
 			return 0;
 		}
 
@@ -470,6 +486,12 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 		else if (!omsg.throttle_status)
 			pr_info("OCC: Chip %u %s\n", (unsigned int)omsg.chip,
 				throttle_reason[omsg.throttle_status]);
+		else
+			return 0;
+
+		for (i = 0; i < nr_chips; i++)
+			if (chips[i].id == omsg.chip)
+				schedule_work(&chips[i].throttle);
 	}
 	return 0;
 }
@@ -521,6 +543,8 @@ static int init_chip_info(void)
 	for (i = 0; i < nr_chips; i++) {
 		chips[i].id = chip[i];
 		chips[i].throttled = false;
+		cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
+		INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
 	}
 
 	return 0;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 5/6] cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is set
  2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
                   ` (3 preceding siblings ...)
  2015-07-13 14:10 ` [PATCH v4 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE Shilpasri G Bhat
@ 2015-07-13 14:10 ` Shilpasri G Bhat
  2015-07-13 14:10 ` [PATCH v4 6/6] cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling Shilpasri G Bhat
  2015-07-16  5:14 ` [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Viresh Kumar
  6 siblings, 0 replies; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-13 14:10 UTC (permalink / raw)
  To: rjw; +Cc: viresh.kumar, linuxppc-dev, linux-kernel, linux-pm, Shilpasri G Bhat

On a reset cycle of OCC, although the system retires from safe
frequency state the local pstate is not restored to Pmin or last
requested pstate. Now if the cpufreq governor initiates a pstate
change, the local pstate will be in Psafe and we will be reporting a
false positive when we are not throttled.

So in powernv_cpufreq_throttle_check() remove the condition which
checks if local pstate is less than Pmin while checking for Psafe
frequency. If the cpus are forced to Psafe then PMSR.psafe_mode_active
bit will be set. So, when OCCs become active this bit will be cleared.
Let us just rely on this bit for reporting throttling.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
No changes from v3

 drivers/cpufreq/powernv-cpufreq.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index f2da30a..d6d7e68 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -39,7 +39,6 @@
 #define PMSR_PSAFE_ENABLE	(1UL << 30)
 #define PMSR_SPR_EM_DISABLE	(1UL << 31)
 #define PMSR_MAX(x)		((x >> 32) & 0xFF)
-#define PMSR_LP(x)		((x >> 48) & 0xFF)
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
@@ -313,7 +312,7 @@ static void powernv_cpufreq_throttle_check(void *data)
 {
 	unsigned int cpu = smp_processor_id();
 	unsigned long pmsr;
-	int pmsr_pmax, pmsr_lp, i;
+	int pmsr_pmax, i;
 
 	pmsr = get_pmspr(SPRN_PMSR);
 
@@ -335,14 +334,9 @@ static void powernv_cpufreq_throttle_check(void *data)
 			chips[i].id, pmsr_pmax);
 	}
 
-	/*
-	 * Check for Psafe by reading LocalPstate
-	 * or check if Psafe_mode_active is set in PMSR.
-	 */
+	/* Check if Psafe_mode_active is set in PMSR. */
 next:
-	pmsr_lp = (s8)PMSR_LP(pmsr);
-	if ((pmsr_lp < powernv_pstate_info.min) ||
-				(pmsr & PMSR_PSAFE_ENABLE)) {
+	if (pmsr & PMSR_PSAFE_ENABLE) {
 		throttled = true;
 		pr_info("Pstate set to safe frequency\n");
 	}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 6/6] cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling
  2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
                   ` (4 preceding siblings ...)
  2015-07-13 14:10 ` [PATCH v4 5/6] cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is set Shilpasri G Bhat
@ 2015-07-13 14:10 ` Shilpasri G Bhat
  2015-07-16  5:14 ` [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Viresh Kumar
  6 siblings, 0 replies; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-13 14:10 UTC (permalink / raw)
  To: rjw; +Cc: viresh.kumar, linuxppc-dev, linux-kernel, linux-pm, Shilpasri G Bhat

If frequency is throttled due to OCC reset then cpus will be in Psafe
frequency, so restore the frequency on all cpus to policy->cur when
OCCs are active again. And if frequency is throttled due to Pmax
capping then restore the frequency of all the cpus  in the chip on
unthrottling.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from v3:
- Refer to the members of 'struct opal_occ_msg' in the patch.
  Replace 'reason' with 'omsg.throttle_status'

 drivers/cpufreq/powernv-cpufreq.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index d6d7e68..824141a 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -48,6 +48,7 @@ static struct chip {
 	bool throttled;
 	cpumask_t mask;
 	struct work_struct throttle;
+	bool restore;
 } *chips;
 
 static int nr_chips;
@@ -415,9 +416,29 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 void powernv_cpufreq_work_fn(struct work_struct *work)
 {
 	struct chip *chip = container_of(work, struct chip, throttle);
+	unsigned int cpu;
+	cpumask_var_t mask;
 
 	smp_call_function_any(&chip->mask,
 			      powernv_cpufreq_throttle_check, NULL, 0);
+
+	if (!chip->restore)
+		return;
+
+	chip->restore = false;
+	cpumask_copy(mask, &chip->mask);
+	for_each_cpu_and(cpu, mask, cpu_online_mask) {
+		int index, tcpu;
+		struct cpufreq_policy policy;
+
+		cpufreq_get_policy(&policy, cpu);
+		cpufreq_frequency_table_target(&policy, policy.freq_table,
+					       policy.cur,
+					       CPUFREQ_RELATION_C, &index);
+		powernv_cpufreq_target_index(&policy, index);
+		for_each_cpu(tcpu, policy.cpus)
+			cpumask_clear_cpu(tcpu, mask);
+	}
 }
 
 static char throttle_reason[][30] = {
@@ -466,8 +487,10 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 			throttled = false;
 			pr_info("OCC: Active\n");
 
-			for (i = 0; i < nr_chips; i++)
+			for (i = 0; i < nr_chips; i++) {
+				chips[i].restore = true;
 				schedule_work(&chips[i].throttle);
+			}
 
 			return 0;
 		}
@@ -484,8 +507,11 @@ static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 			return 0;
 
 		for (i = 0; i < nr_chips; i++)
-			if (chips[i].id == omsg.chip)
+			if (chips[i].id == omsg.chip) {
+				if (!omsg.throttle_status)
+					chips[i].restore = true;
 				schedule_work(&chips[i].throttle);
+			}
 	}
 	return 0;
 }
@@ -539,6 +565,7 @@ static int init_chip_info(void)
 		chips[i].throttled = false;
 		cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
 		INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
+		chips[i].restore = false;
 	}
 
 	return 0;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/6] cpufreq: powernv: Register for OCC related opal_message notification
  2015-07-13 14:09 ` [PATCH v4 3/6] cpufreq: powernv: Register for OCC related opal_message notification Shilpasri G Bhat
@ 2015-07-15  6:17   ` Joel Stanley
  2015-07-15 10:15     ` Shilpasri G Bhat
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Stanley @ 2015-07-15  6:17 UTC (permalink / raw)
  To: Shilpasri G Bhat; +Cc: rjw, viresh.kumar, linux-pm, linux-kernel, linuxppc-dev

Hello,

On Mon, 2015-07-13 at 19:39 +0530, Shilpasri G Bhat wrote:
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index d0c18c9..1f59958 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -414,6 +415,71 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
>  	.notifier_call = powernv_cpufreq_reboot_notifier,
>  };
>  
> +static char throttle_reason[][30] = {
> +					"No throttling",
> +					"Power Cap",
> +					"Processor Over Temperature",
> +					"Power Supply Failure",
> +					"Over Current",
> +					"OCC Reset"
> +				     };
> +
> +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
> +				   unsigned long msg_type, void *_msg)
> +{
> +	struct opal_msg *msg = _msg;
> +	struct opal_occ_msg omsg;
> +
> +	if (msg_type != OPAL_MSG_OCC)
> +		return 0;
> +
> +	memcpy(&omsg, msg->params, sizeof(omsg));

You need to ensure the of the members of struct opal_occ_msg are in the
correct byte order when copying them over.

Have you tested this code with in a little endian configuration?

Do the messages you're sending make sense for a system that has a BMC
instead of a FSP?

Cheers,

Joel

> +
> +	switch (omsg.type) {


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/6] cpufreq: powernv: Register for OCC related opal_message notification
  2015-07-15  6:17   ` Joel Stanley
@ 2015-07-15 10:15     ` Shilpasri G Bhat
  0 siblings, 0 replies; 10+ messages in thread
From: Shilpasri G Bhat @ 2015-07-15 10:15 UTC (permalink / raw)
  To: Joel Stanley; +Cc: rjw, viresh.kumar, linux-pm, linux-kernel, linuxppc-dev

Hi Joel,

On 07/15/2015 11:47 AM, Joel Stanley wrote:
> Hello,
> 
> On Mon, 2015-07-13 at 19:39 +0530, Shilpasri G Bhat wrote:
>> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
>> index d0c18c9..1f59958 100644
>> --- a/drivers/cpufreq/powernv-cpufreq.c
>> +++ b/drivers/cpufreq/powernv-cpufreq.c
>> @@ -414,6 +415,71 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
>>  	.notifier_call = powernv_cpufreq_reboot_notifier,
>>  };
>>  
>> +static char throttle_reason[][30] = {
>> +					"No throttling",
>> +					"Power Cap",
>> +					"Processor Over Temperature",
>> +					"Power Supply Failure",
>> +					"Over Current",
>> +					"OCC Reset"
>> +				     };
>> +
>> +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
>> +				   unsigned long msg_type, void *_msg)
>> +{
>> +	struct opal_msg *msg = _msg;
>> +	struct opal_occ_msg omsg;
>> +
>> +	if (msg_type != OPAL_MSG_OCC)
>> +		return 0;
>> +
>> +	memcpy(&omsg, msg->params, sizeof(omsg));
> 
> You need to ensure the of the members of struct opal_occ_msg are in the
> correct byte order when copying them over.
> 
> Have you tested this code with in a little endian configuration?

Ah yes this wont work in LE.
I tested the below diff in both BE/LE configuration on Power8 box which has FSP.

-       memcpy(&omsg, msg->params, sizeof(omsg));
+       omsg.type = be64_to_cpu(msg->params[0]);
+       omsg.chip = be64_to_cpu(msg->params[1]);
+       omsg.throttle_status = be64_to_cpu(msg->params[2]);
> 
> Do the messages you're sending make sense for a system that has a BMC
> instead of a FSP?

For a system with BMC, only OCC_THROTTLE will be received by the host. The
remaining two (OCC_RESET and OCC_LOAD) are sent only in FSP based systems.
OCC_THROTTLE is sent by opal which polls on the throttle_status byte in the
OPAL-OCC shared memory region.

> 
> Cheers,
> 
> Joel
> 

Thanks and Regards,
Shilpa


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC
  2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
                   ` (5 preceding siblings ...)
  2015-07-13 14:10 ` [PATCH v4 6/6] cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling Shilpasri G Bhat
@ 2015-07-16  5:14 ` Viresh Kumar
  6 siblings, 0 replies; 10+ messages in thread
From: Viresh Kumar @ 2015-07-16  5:14 UTC (permalink / raw)
  To: Shilpasri G Bhat; +Cc: rjw, linuxppc-dev, linux-kernel, linux-pm

On 13-07-15, 19:39, Shilpasri G Bhat wrote:
> This patchset intends to add frequency throttle reporting mechanism
> to powernv-cpufreq driver when OCC throttles the frequency. OCC is an
> On-Chip-Controller which takes care of the power and thermal safety of
> the chip. The CPU frequency can be throttled during an OCC reset or
> when OCC tries to limit the max allowed frequency. The patchset will
> report such conditions so as to keep the user informed about reason
> for the drop in performance of workloads when frequency is throttled.
> 
> Changes from v3:
> - Rebased on top of 4.2-rc1
> - Minor changes in patch 2,3,4,6 this does not change the
>   functionality of the code
> - 594fcb9ec9e powerpc/powernv: Expose OPAL APIs required by PRD
>   interface , this patch fixes the build error due to which this
>   series was initially dropped
>   ERROR: ".opal_message_notifier_register"
>   drivers/cpufreq/powernv-cpufreq.ko] undefined!

I have already Acked v3 of this and that applies to this one as well..

-- 
viresh

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-16  5:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-13 14:09 [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Shilpasri G Bhat
2015-07-13 14:09 ` [PATCH v4 1/6] cpufreq: powernv: Handle throttling due to Pmax capping at chip level Shilpasri G Bhat
2015-07-13 14:09 ` [PATCH v4 2/6] powerpc/powernv: Add definition of OPAL_MSG_OCC message type Shilpasri G Bhat
2015-07-13 14:09 ` [PATCH v4 3/6] cpufreq: powernv: Register for OCC related opal_message notification Shilpasri G Bhat
2015-07-15  6:17   ` Joel Stanley
2015-07-15 10:15     ` Shilpasri G Bhat
2015-07-13 14:10 ` [PATCH v4 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE Shilpasri G Bhat
2015-07-13 14:10 ` [PATCH v4 5/6] cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is set Shilpasri G Bhat
2015-07-13 14:10 ` [PATCH v4 6/6] cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling Shilpasri G Bhat
2015-07-16  5:14 ` [PATCH v4 0/6] powernv: cpufreq: Report frequency throttle by OCC Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.