linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/12] Add throttler driver for non-thermal throttling
@ 2018-06-14 19:47 Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa Matthias Kaehlcke
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

This series adds the throttler driver, for non-thermal throttling of
CPUs and devfreq devices. A use case for non-thermal throttling could
be the detection of a high battery discharge voltage, close to the
over-current protection (OCP) limit of the battery.

To support throttling of devfreq devices the series introduces the
concept of a devfreq policy and the DEVFREQ_ADJUST notifier (similar
to CPUFREQ_ADJUST). Further it includes some related devfreq bugfixes
and improvements that change some of the code that is also touched
by the policy changes.

Matthias Kaehlcke (12):
  PM / devfreq: Init user limits from OPP limits, not viceversa
  PM / devfreq: Fix handling of min/max_freq == 0
  PM / devfreq: Don't adjust to user limits in governors
  PM / devfreq: Add struct devfreq_policy
  PM / devfreg: Add support for policy notifiers
  PM / devfreq: Make update_devfreq() public
  PM / devfreq: export devfreq_class
  cpufreq: Add stub for cpufreq_update_policy()
  dt-bindings: PM / OPP: add opp-throttlers property
  misc: throttler: Add core support for non-thermal throttling
  misc: throttler: Add Chrome OS EC throttler
  mfd: cros_ec: Add throttler sub-device

 Documentation/devicetree/bindings/opp/opp.txt |   3 +
 MAINTAINERS                                   |   7 +
 drivers/devfreq/devfreq.c                     | 222 +++---
 drivers/devfreq/governor.h                    |   6 +-
 drivers/devfreq/governor_passive.c            |   4 +-
 drivers/devfreq/governor_performance.c        |   5 +-
 drivers/devfreq/governor_powersave.c          |   2 +-
 drivers/devfreq/governor_simpleondemand.c     |  12 +-
 drivers/devfreq/governor_userspace.c          |  16 +-
 drivers/mfd/cros_ec.c                         |  16 +
 drivers/misc/Kconfig                          |   1 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/throttler/Kconfig                |  33 +
 drivers/misc/throttler/Makefile               |   2 +
 drivers/misc/throttler/core.c                 | 687 ++++++++++++++++++
 drivers/misc/throttler/cros_ec_throttler.c    | 109 +++
 include/linux/cpufreq.h                       |   1 +
 include/linux/devfreq.h                       | 113 ++-
 include/linux/throttler.h                     |  21 +
 19 files changed, 1136 insertions(+), 125 deletions(-)
 create mode 100644 drivers/misc/throttler/Kconfig
 create mode 100644 drivers/misc/throttler/Makefile
 create mode 100644 drivers/misc/throttler/core.c
 create mode 100644 drivers/misc/throttler/cros_ec_throttler.c
 create mode 100644 include/linux/throttler.h

-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-15 21:06   ` Brian Norris
  2018-06-14 19:47 ` [PATCH v3 02/12] PM / devfreq: Fix handling of min/max_freq == 0 Matthias Kaehlcke
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding
the devfreq device") introduced the initialization of the user
limits min/max_freq from the lowest/highest available OPPs. Later
commit f1d981eaecf8 ("PM / devfreq: Use the available min/max
frequency") added scaling_min/max_freq, which actually represent
the frequencies of the lowest/highest available OPP. scaling_min/
max_freq are initialized with the values from min/max_freq, which
is totally correct in the context, but a bit awkward to read.

Swap the initialization and assign scaling_min/max_freq with the
OPP freqs and then the user limts min/max_freq with scaling_min/
max_freq.

Needless to say that this change is a NOP, intended to improve
readability.
---
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>

Changes in v3:
- none

Changes in v2:
- added 'Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>' tag

 drivers/devfreq/devfreq.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index fe2af6aa88fc..0057ef5b0a98 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -604,21 +604,21 @@ struct devfreq *devfreq_add_device(struct device *dev,
 		mutex_lock(&devfreq->lock);
 	}
 
-	devfreq->min_freq = find_available_min_freq(devfreq);
-	if (!devfreq->min_freq) {
+	devfreq->scaling_min_freq = find_available_min_freq(devfreq);
+	if (!devfreq->scaling_min_freq) {
 		mutex_unlock(&devfreq->lock);
 		err = -EINVAL;
 		goto err_dev;
 	}
-	devfreq->scaling_min_freq = devfreq->min_freq;
+	devfreq->min_freq = devfreq->scaling_min_freq;
 
-	devfreq->max_freq = find_available_max_freq(devfreq);
-	if (!devfreq->max_freq) {
+	devfreq->scaling_max_freq = find_available_max_freq(devfreq);
+	if (!devfreq->scaling_max_freq) {
 		mutex_unlock(&devfreq->lock);
 		err = -EINVAL;
 		goto err_dev;
 	}
-	devfreq->scaling_max_freq = devfreq->max_freq;
+	devfreq->max_freq = devfreq->scaling_max_freq;
 
 	dev_set_name(&devfreq->dev, "devfreq%d",
 				atomic_inc_return(&devfreq_no));
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 02/12] PM / devfreq: Fix handling of min/max_freq == 0
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-18 20:12   ` Brian Norris
  2018-06-14 19:47 ` [PATCH v3 03/12] PM / devfreq: Don't adjust to user limits in governors Matthias Kaehlcke
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
devfreq device") initializes df->min/max_freq with the min/max OPP when
the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
available min/max frequency") adds df->scaling_min/max_freq and the
following to the frequency adjustment code:

  max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);

With the current handling of min/max_freq this is incorrect:

Even though df->max_freq is now initialized to a value != 0 user space
can still set it to 0, in this case max_freq would be 0 instead of
df->scaling_max_freq as intended. In consequence the frequency adjustment
is not performed:

  if (max_freq && freq > max_freq) {
	freq = max_freq;

To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
when the user passes a value of 0. This also prevents df->max_freq from
being set below the min OPP when df->min_freq is 0, and similar for
min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
checks for this case can be removed.

Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- none

Changes in v2:
- handle freq tables sorted in ascending and descending order in
  min/max_freq_store()
- use same order for conditional statements in min/max_freq_store()

 drivers/devfreq/devfreq.c | 42 ++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 0057ef5b0a98..6f604f8b2b81 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -283,11 +283,11 @@ int update_devfreq(struct devfreq *devfreq)
 	max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
 	min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
 
-	if (min_freq && freq < min_freq) {
+	if (freq < min_freq) {
 		freq = min_freq;
 		flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
 	}
-	if (max_freq && freq > max_freq) {
+	if (freq > max_freq) {
 		freq = max_freq;
 		flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
 	}
@@ -1122,18 +1122,27 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
 {
 	struct devfreq *df = to_devfreq(dev);
 	unsigned long value;
+	unsigned long *freq_table;
 	int ret;
-	unsigned long max;
 
 	ret = sscanf(buf, "%lu", &value);
 	if (ret != 1)
 		return -EINVAL;
 
 	mutex_lock(&df->lock);
-	max = df->max_freq;
-	if (value && max && value > max) {
-		ret = -EINVAL;
-		goto unlock;
+
+	if (value) {
+		if (value > df->max_freq) {
+			ret = -EINVAL;
+			goto unlock;
+		}
+	} else {
+		freq_table = df->profile->freq_table;
+		/* typical order is ascending, some drivers use descending */
+		if (freq_table[0] < freq_table[df->profile->max_state - 1])
+			value = freq_table[0];
+		else
+			value = freq_table[df->profile->max_state - 1];
 	}
 
 	df->min_freq = value;
@@ -1157,18 +1166,27 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
 {
 	struct devfreq *df = to_devfreq(dev);
 	unsigned long value;
+	unsigned long *freq_table;
 	int ret;
-	unsigned long min;
 
 	ret = sscanf(buf, "%lu", &value);
 	if (ret != 1)
 		return -EINVAL;
 
 	mutex_lock(&df->lock);
-	min = df->min_freq;
-	if (value && min && value < min) {
-		ret = -EINVAL;
-		goto unlock;
+
+	if (value) {
+		if (value < df->min_freq) {
+			ret = -EINVAL;
+			goto unlock;
+		}
+	} else {
+		freq_table = df->profile->freq_table;
+		/* typical order is ascending, some drivers use descending */
+		if (freq_table[0] < freq_table[df->profile->max_state - 1])
+			value = freq_table[df->profile->max_state - 1];
+		else
+			value = freq_table[0];
 	}
 
 	df->max_freq = value;
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 03/12] PM / devfreq: Don't adjust to user limits in governors
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 02/12] PM / devfreq: Fix handling of min/max_freq == 0 Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 04/12] PM / devfreq: Add struct devfreq_policy Matthias Kaehlcke
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Several governors use the user space limits df->min/max_freq to adjust
the target frequency. This is not necessary, since update_devfreq()
already takes care of this. Instead the governor can request the available
min/max frequency by setting the target frequency to DEVFREQ_MIN/MAX_FREQ
and let update_devfreq() take care of any adjustments.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- none

Changes in v2:
- squashed "PM / devfreq: Remove redundant frequency adjustment from governors"
  and "PM / devfreq: governors: Return device frequency limits instead of user
  limits"
- updated subject and commit message
- use DEVFREQ_MIN/MAX_FREQ instead of df->scaling_min/max_freq

 drivers/devfreq/governor.h                |  3 +++
 drivers/devfreq/governor_performance.c    |  5 +----
 drivers/devfreq/governor_powersave.c      |  2 +-
 drivers/devfreq/governor_simpleondemand.c | 12 +++---------
 drivers/devfreq/governor_userspace.c      | 16 ++++------------
 5 files changed, 12 insertions(+), 26 deletions(-)

diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
index cfc50a61a90d..b81700244ce3 100644
--- a/drivers/devfreq/governor.h
+++ b/drivers/devfreq/governor.h
@@ -25,6 +25,9 @@
 #define DEVFREQ_GOV_SUSPEND			0x4
 #define DEVFREQ_GOV_RESUME			0x5
 
+#define DEVFREQ_MIN_FREQ			0
+#define DEVFREQ_MAX_FREQ			ULONG_MAX
+
 /**
  * struct devfreq_governor - Devfreq policy governor
  * @node:		list node - contains registered devfreq governors
diff --git a/drivers/devfreq/governor_performance.c b/drivers/devfreq/governor_performance.c
index 4d23ecfbd948..ded429fd51be 100644
--- a/drivers/devfreq/governor_performance.c
+++ b/drivers/devfreq/governor_performance.c
@@ -20,10 +20,7 @@ static int devfreq_performance_func(struct devfreq *df,
 	 * target callback should be able to get floor value as
 	 * said in devfreq.h
 	 */
-	if (!df->max_freq)
-		*freq = UINT_MAX;
-	else
-		*freq = df->max_freq;
+	*freq = DEVFREQ_MAX_FREQ;
 	return 0;
 }
 
diff --git a/drivers/devfreq/governor_powersave.c b/drivers/devfreq/governor_powersave.c
index 0c42f23249ef..9e8897f5ac42 100644
--- a/drivers/devfreq/governor_powersave.c
+++ b/drivers/devfreq/governor_powersave.c
@@ -20,7 +20,7 @@ static int devfreq_powersave_func(struct devfreq *df,
 	 * target callback should be able to get ceiling value as
 	 * said in devfreq.h
 	 */
-	*freq = df->min_freq;
+	*freq = DEVFREQ_MIN_FREQ;
 	return 0;
 }
 
diff --git a/drivers/devfreq/governor_simpleondemand.c b/drivers/devfreq/governor_simpleondemand.c
index 28e0f2de7100..c0417f0e081e 100644
--- a/drivers/devfreq/governor_simpleondemand.c
+++ b/drivers/devfreq/governor_simpleondemand.c
@@ -27,7 +27,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
 	unsigned int dfso_upthreshold = DFSO_UPTHRESHOLD;
 	unsigned int dfso_downdifferential = DFSO_DOWNDIFFERENCTIAL;
 	struct devfreq_simple_ondemand_data *data = df->data;
-	unsigned long max = (df->max_freq) ? df->max_freq : UINT_MAX;
 
 	err = devfreq_update_stats(df);
 	if (err)
@@ -47,7 +46,7 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
 
 	/* Assume MAX if it is going to be divided by zero */
 	if (stat->total_time == 0) {
-		*freq = max;
+		*freq = DEVFREQ_MAX_FREQ;
 		return 0;
 	}
 
@@ -60,13 +59,13 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
 	/* Set MAX if it's busy enough */
 	if (stat->busy_time * 100 >
 	    stat->total_time * dfso_upthreshold) {
-		*freq = max;
+		*freq = DEVFREQ_MAX_FREQ;
 		return 0;
 	}
 
 	/* Set MAX if we do not know the initial frequency */
 	if (stat->current_frequency == 0) {
-		*freq = max;
+		*freq = DEVFREQ_MAX_FREQ;
 		return 0;
 	}
 
@@ -85,11 +84,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
 	b = div_u64(b, (dfso_upthreshold - dfso_downdifferential / 2));
 	*freq = (unsigned long) b;
 
-	if (df->min_freq && *freq < df->min_freq)
-		*freq = df->min_freq;
-	if (df->max_freq && *freq > df->max_freq)
-		*freq = df->max_freq;
-
 	return 0;
 }
 
diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c
index 080607c3f34d..378d84c011df 100644
--- a/drivers/devfreq/governor_userspace.c
+++ b/drivers/devfreq/governor_userspace.c
@@ -26,19 +26,11 @@ static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
 {
 	struct userspace_data *data = df->data;
 
-	if (data->valid) {
-		unsigned long adjusted_freq = data->user_frequency;
-
-		if (df->max_freq && adjusted_freq > df->max_freq)
-			adjusted_freq = df->max_freq;
-
-		if (df->min_freq && adjusted_freq < df->min_freq)
-			adjusted_freq = df->min_freq;
-
-		*freq = adjusted_freq;
-	} else {
+	if (data->valid)
+		*freq = data->user_frequency;
+	else
 		*freq = df->previous_freq; /* No user freq specified yet */
-	}
+
 	return 0;
 }
 
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 04/12] PM / devfreq: Add struct devfreq_policy
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (2 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 03/12] PM / devfreq: Don't adjust to user limits in governors Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 05/12] PM / devfreg: Add support for policy notifiers Matthias Kaehlcke
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Move variables related with devfreq policy changes from struct devfreq
to the new struct devfreq_policy and add a policy field to struct devfreq.

The following variables are moved:

df->min/max_freq           =>   p->user.min/max_freq
df->scaling_min/max_freq   =>   p->devinfo.min/max_freq
df->governor               =>   p->governor
df->governor_name          =>   p->governor_name

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- none

Changes in v2:
- performance, powersave and simpleondemand governors don't need changes
  with "PM / devfreq: Don't adjust to user limits in governors"
- formatting fixes

 drivers/devfreq/devfreq.c          | 137 ++++++++++++++++-------------
 drivers/devfreq/governor_passive.c |   4 +-
 include/linux/devfreq.h            |  38 +++++---
 3 files changed, 103 insertions(+), 76 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 6f604f8b2b81..21604d6ae2b8 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -255,6 +255,7 @@ static int devfreq_notify_transition(struct devfreq *devfreq,
  */
 int update_devfreq(struct devfreq *devfreq)
 {
+	struct devfreq_policy *policy = &devfreq->policy;
 	struct devfreq_freqs freqs;
 	unsigned long freq, cur_freq, min_freq, max_freq;
 	int err = 0;
@@ -265,11 +266,11 @@ int update_devfreq(struct devfreq *devfreq)
 		return -EINVAL;
 	}
 
-	if (!devfreq->governor)
+	if (!policy->governor)
 		return -EINVAL;
 
 	/* Reevaluate the proper frequency */
-	err = devfreq->governor->get_target_freq(devfreq, &freq);
+	err = policy->governor->get_target_freq(devfreq, &freq);
 	if (err)
 		return err;
 
@@ -280,8 +281,8 @@ int update_devfreq(struct devfreq *devfreq)
 	 * max_freq
 	 * min_freq
 	 */
-	max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
-	min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
+	max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
+	min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
 
 	if (freq < min_freq) {
 		freq = min_freq;
@@ -493,18 +494,19 @@ static int devfreq_notifier_call(struct notifier_block *nb, unsigned long type,
 				 void *devp)
 {
 	struct devfreq *devfreq = container_of(nb, struct devfreq, nb);
+	struct devfreq_policy *policy = &devfreq->policy;
 	int ret;
 
 	mutex_lock(&devfreq->lock);
 
-	devfreq->scaling_min_freq = find_available_min_freq(devfreq);
-	if (!devfreq->scaling_min_freq) {
+	policy->devinfo.min_freq = find_available_min_freq(devfreq);
+	if (!policy->devinfo.min_freq) {
 		mutex_unlock(&devfreq->lock);
 		return -EINVAL;
 	}
 
-	devfreq->scaling_max_freq = find_available_max_freq(devfreq);
-	if (!devfreq->scaling_max_freq) {
+	policy->devinfo.max_freq = find_available_max_freq(devfreq);
+	if (!policy->devinfo.max_freq) {
 		mutex_unlock(&devfreq->lock);
 		return -EINVAL;
 	}
@@ -524,6 +526,7 @@ static int devfreq_notifier_call(struct notifier_block *nb, unsigned long type,
 static void devfreq_dev_release(struct device *dev)
 {
 	struct devfreq *devfreq = to_devfreq(dev);
+	struct devfreq_policy *policy = &devfreq->policy;
 
 	mutex_lock(&devfreq_list_lock);
 	if (IS_ERR(find_device_devfreq(devfreq->dev.parent))) {
@@ -534,9 +537,9 @@ static void devfreq_dev_release(struct device *dev)
 	list_del(&devfreq->node);
 	mutex_unlock(&devfreq_list_lock);
 
-	if (devfreq->governor)
-		devfreq->governor->event_handler(devfreq,
-						 DEVFREQ_GOV_STOP, NULL);
+	if (policy->governor)
+		policy->governor->event_handler(devfreq,
+						DEVFREQ_GOV_STOP, NULL);
 
 	if (devfreq->profile->exit)
 		devfreq->profile->exit(devfreq->dev.parent);
@@ -559,6 +562,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
 				   void *data)
 {
 	struct devfreq *devfreq;
+	struct devfreq_policy *policy;
 	struct devfreq_governor *governor;
 	static atomic_t devfreq_no = ATOMIC_INIT(-1);
 	int err = 0;
@@ -584,13 +588,14 @@ struct devfreq *devfreq_add_device(struct device *dev,
 		goto err_out;
 	}
 
+	policy = &devfreq->policy;
 	mutex_init(&devfreq->lock);
 	mutex_lock(&devfreq->lock);
 	devfreq->dev.parent = dev;
 	devfreq->dev.class = devfreq_class;
 	devfreq->dev.release = devfreq_dev_release;
 	devfreq->profile = profile;
-	strncpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN);
+	strncpy(policy->governor_name, governor_name, DEVFREQ_NAME_LEN);
 	devfreq->previous_freq = profile->initial_freq;
 	devfreq->last_status.current_frequency = profile->initial_freq;
 	devfreq->data = data;
@@ -604,21 +609,21 @@ struct devfreq *devfreq_add_device(struct device *dev,
 		mutex_lock(&devfreq->lock);
 	}
 
-	devfreq->scaling_min_freq = find_available_min_freq(devfreq);
-	if (!devfreq->scaling_min_freq) {
+	policy->devinfo.min_freq = find_available_min_freq(devfreq);
+	if (!policy->devinfo.min_freq) {
 		mutex_unlock(&devfreq->lock);
 		err = -EINVAL;
 		goto err_dev;
 	}
-	devfreq->min_freq = devfreq->scaling_min_freq;
+	policy->user.min_freq = policy->devinfo.min_freq;
 
-	devfreq->scaling_max_freq = find_available_max_freq(devfreq);
-	if (!devfreq->scaling_max_freq) {
+	policy->devinfo.max_freq = find_available_max_freq(devfreq);
+	if (!policy->devinfo.max_freq) {
 		mutex_unlock(&devfreq->lock);
 		err = -EINVAL;
 		goto err_dev;
 	}
-	devfreq->max_freq = devfreq->scaling_max_freq;
+	policy->user.max_freq = policy->devinfo.max_freq;
 
 	dev_set_name(&devfreq->dev, "devfreq%d",
 				atomic_inc_return(&devfreq_no));
@@ -646,7 +651,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
 	mutex_lock(&devfreq_list_lock);
 	list_add(&devfreq->node, &devfreq_list);
 
-	governor = find_devfreq_governor(devfreq->governor_name);
+	governor = find_devfreq_governor(policy->governor_name);
 	if (IS_ERR(governor)) {
 		dev_err(dev, "%s: Unable to find governor for the device\n",
 			__func__);
@@ -654,9 +659,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
 		goto err_init;
 	}
 
-	devfreq->governor = governor;
-	err = devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_START,
-						NULL);
+	policy->governor = governor;
+	err = policy->governor->event_handler(devfreq, DEVFREQ_GOV_START,
+					      NULL);
 	if (err) {
 		dev_err(dev, "%s: Unable to start governor for the device\n",
 			__func__);
@@ -817,10 +822,10 @@ int devfreq_suspend_device(struct devfreq *devfreq)
 	if (!devfreq)
 		return -EINVAL;
 
-	if (!devfreq->governor)
+	if (!devfreq->policy.governor)
 		return 0;
 
-	return devfreq->governor->event_handler(devfreq,
+	return devfreq->policy.governor->event_handler(devfreq,
 				DEVFREQ_GOV_SUSPEND, NULL);
 }
 EXPORT_SYMBOL(devfreq_suspend_device);
@@ -838,10 +843,10 @@ int devfreq_resume_device(struct devfreq *devfreq)
 	if (!devfreq)
 		return -EINVAL;
 
-	if (!devfreq->governor)
+	if (!devfreq->policy.governor)
 		return 0;
 
-	return devfreq->governor->event_handler(devfreq,
+	return devfreq->policy.governor->event_handler(devfreq,
 				DEVFREQ_GOV_RESUME, NULL);
 }
 EXPORT_SYMBOL(devfreq_resume_device);
@@ -875,30 +880,31 @@ int devfreq_add_governor(struct devfreq_governor *governor)
 	list_for_each_entry(devfreq, &devfreq_list, node) {
 		int ret = 0;
 		struct device *dev = devfreq->dev.parent;
+		struct devfreq_policy *policy = &devfreq->policy;
 
-		if (!strncmp(devfreq->governor_name, governor->name,
+		if (!strncmp(policy->governor_name, governor->name,
 			     DEVFREQ_NAME_LEN)) {
 			/* The following should never occur */
-			if (devfreq->governor) {
+			if (policy->governor) {
 				dev_warn(dev,
 					 "%s: Governor %s already present\n",
-					 __func__, devfreq->governor->name);
-				ret = devfreq->governor->event_handler(devfreq,
+					 __func__, policy->governor->name);
+				ret = policy->governor->event_handler(devfreq,
 							DEVFREQ_GOV_STOP, NULL);
 				if (ret) {
 					dev_warn(dev,
 						 "%s: Governor %s stop = %d\n",
 						 __func__,
-						 devfreq->governor->name, ret);
+						 policy->governor->name, ret);
 				}
 				/* Fall through */
 			}
-			devfreq->governor = governor;
-			ret = devfreq->governor->event_handler(devfreq,
+			policy->governor = governor;
+			ret = policy->governor->event_handler(devfreq,
 						DEVFREQ_GOV_START, NULL);
 			if (ret) {
 				dev_warn(dev, "%s: Governor %s start=%d\n",
-					 __func__, devfreq->governor->name,
+					 __func__, policy->governor->name,
 					 ret);
 			}
 		}
@@ -937,24 +943,25 @@ int devfreq_remove_governor(struct devfreq_governor *governor)
 	list_for_each_entry(devfreq, &devfreq_list, node) {
 		int ret;
 		struct device *dev = devfreq->dev.parent;
+		struct devfreq_policy *policy = &devfreq->policy;
 
-		if (!strncmp(devfreq->governor_name, governor->name,
+		if (!strncmp(policy->governor_name, governor->name,
 			     DEVFREQ_NAME_LEN)) {
 			/* we should have a devfreq governor! */
-			if (!devfreq->governor) {
+			if (!policy->governor) {
 				dev_warn(dev, "%s: Governor %s NOT present\n",
 					 __func__, governor->name);
 				continue;
 				/* Fall through */
 			}
-			ret = devfreq->governor->event_handler(devfreq,
+			ret = policy->governor->event_handler(devfreq,
 						DEVFREQ_GOV_STOP, NULL);
 			if (ret) {
 				dev_warn(dev, "%s: Governor %s stop=%d\n",
-					 __func__, devfreq->governor->name,
+					 __func__, policy->governor->name,
 					 ret);
 			}
-			devfreq->governor = NULL;
+			policy->governor = NULL;
 		}
 	}
 
@@ -969,16 +976,17 @@ EXPORT_SYMBOL(devfreq_remove_governor);
 static ssize_t governor_show(struct device *dev,
 			     struct device_attribute *attr, char *buf)
 {
-	if (!to_devfreq(dev)->governor)
+	if (!to_devfreq(dev)->policy.governor)
 		return -EINVAL;
 
-	return sprintf(buf, "%s\n", to_devfreq(dev)->governor->name);
+	return sprintf(buf, "%s\n", to_devfreq(dev)->policy.governor->name);
 }
 
 static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
 			      const char *buf, size_t count)
 {
 	struct devfreq *df = to_devfreq(dev);
+	struct devfreq_policy *policy = &df->policy;
 	int ret;
 	char str_governor[DEVFREQ_NAME_LEN + 1];
 	struct devfreq_governor *governor;
@@ -993,29 +1001,30 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
 		ret = PTR_ERR(governor);
 		goto out;
 	}
-	if (df->governor == governor) {
+	if (policy->governor == governor) {
 		ret = 0;
 		goto out;
-	} else if ((df->governor && df->governor->immutable) ||
+	} else if ((policy->governor && policy->governor->immutable) ||
 					governor->immutable) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	if (df->governor) {
-		ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
+	if (policy->governor) {
+		ret = policy->governor->event_handler(df, DEVFREQ_GOV_STOP,
+						      NULL);
 		if (ret) {
 			dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
-				 __func__, df->governor->name, ret);
+				 __func__, policy->governor->name, ret);
 			goto out;
 		}
 	}
-	df->governor = governor;
-	strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN);
-	ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
+	policy->governor = governor;
+	strncpy(policy->governor_name, governor->name, DEVFREQ_NAME_LEN);
+	ret = policy->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
 	if (ret)
 		dev_warn(dev, "%s: Governor %s not started(%d)\n",
-			 __func__, df->governor->name, ret);
+			 __func__, policy->governor->name, ret);
 out:
 	mutex_unlock(&devfreq_list_lock);
 
@@ -1030,6 +1039,7 @@ static ssize_t available_governors_show(struct device *d,
 					char *buf)
 {
 	struct devfreq *df = to_devfreq(d);
+	struct devfreq_policy *policy = &df->policy;
 	ssize_t count = 0;
 
 	mutex_lock(&devfreq_list_lock);
@@ -1038,9 +1048,9 @@ static ssize_t available_governors_show(struct device *d,
 	 * The devfreq with immutable governor (e.g., passive) shows
 	 * only own governor.
 	 */
-	if (df->governor->immutable) {
+	if (policy->governor->immutable) {
 		count = scnprintf(&buf[count], DEVFREQ_NAME_LEN,
-				   "%s ", df->governor_name);
+				   "%s ", policy->governor_name);
 	/*
 	 * The devfreq device shows the registered governor except for
 	 * immutable governors such as passive governor .
@@ -1100,17 +1110,18 @@ static ssize_t polling_interval_store(struct device *dev,
 				      const char *buf, size_t count)
 {
 	struct devfreq *df = to_devfreq(dev);
+	struct devfreq_policy *policy = &df->policy;
 	unsigned int value;
 	int ret;
 
-	if (!df->governor)
+	if (!policy->governor)
 		return -EINVAL;
 
 	ret = sscanf(buf, "%u", &value);
 	if (ret != 1)
 		return -EINVAL;
 
-	df->governor->event_handler(df, DEVFREQ_GOV_INTERVAL, &value);
+	policy->governor->event_handler(df, DEVFREQ_GOV_INTERVAL, &value);
 	ret = count;
 
 	return ret;
@@ -1132,7 +1143,7 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
 	mutex_lock(&df->lock);
 
 	if (value) {
-		if (value > df->max_freq) {
+		if (value > df->policy.user.max_freq) {
 			ret = -EINVAL;
 			goto unlock;
 		}
@@ -1145,7 +1156,7 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
 			value = freq_table[df->profile->max_state - 1];
 	}
 
-	df->min_freq = value;
+	df->policy.user.min_freq = value;
 	update_devfreq(df);
 	ret = count;
 unlock:
@@ -1156,9 +1167,10 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
 static ssize_t min_freq_show(struct device *dev, struct device_attribute *attr,
 			     char *buf)
 {
-	struct devfreq *df = to_devfreq(dev);
+	struct devfreq_policy *policy = &to_devfreq(dev)->policy;
 
-	return sprintf(buf, "%lu\n", MAX(df->scaling_min_freq, df->min_freq));
+	return sprintf(buf, "%lu\n",
+		       MAX(policy->devinfo.min_freq, policy->user.min_freq));
 }
 
 static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
@@ -1176,7 +1188,7 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
 	mutex_lock(&df->lock);
 
 	if (value) {
-		if (value < df->min_freq) {
+		if (value < df->policy.user.min_freq) {
 			ret = -EINVAL;
 			goto unlock;
 		}
@@ -1189,7 +1201,7 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
 			value = freq_table[0];
 	}
 
-	df->max_freq = value;
+	df->policy.user.max_freq = value;
 	update_devfreq(df);
 	ret = count;
 unlock:
@@ -1201,9 +1213,10 @@ static DEVICE_ATTR_RW(min_freq);
 static ssize_t max_freq_show(struct device *dev, struct device_attribute *attr,
 			     char *buf)
 {
-	struct devfreq *df = to_devfreq(dev);
+	struct devfreq_policy *policy = &to_devfreq(dev)->policy;
 
-	return sprintf(buf, "%lu\n", MIN(df->scaling_max_freq, df->max_freq));
+	return sprintf(buf, "%lu\n",
+		       MIN(policy->devinfo.max_freq, policy->user.max_freq));
 }
 static DEVICE_ATTR_RW(max_freq);
 
diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
index 3bc29acbd54e..e0987c749ec2 100644
--- a/drivers/devfreq/governor_passive.c
+++ b/drivers/devfreq/governor_passive.c
@@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq *devfreq, unsigned long freq)
 {
 	int ret;
 
-	if (!devfreq->governor)
+	if (!devfreq->policy.governor)
 		return -EINVAL;
 
 	mutex_lock_nested(&devfreq->lock, SINGLE_DEPTH_NESTING);
 
-	ret = devfreq->governor->get_target_freq(devfreq, &freq);
+	ret = devfreq->policy.governor->get_target_freq(devfreq, &freq);
 	if (ret < 0)
 		goto out;
 
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 3aae5b3af87c..9bf23b976f4d 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -109,6 +109,30 @@ struct devfreq_dev_profile {
 	unsigned int max_state;
 };
 
+/**
+ * struct devfreq_freq_limits - Devfreq frequency limits
+ * @min_freq:	minimum frequency
+ * @max_freq:	maximum frequency
+ */
+struct devfreq_freq_limits {
+	unsigned long min_freq;
+	unsigned long max_freq;
+};
+
+/**
+ * struct devfreq_policy - Devfreq policy
+ * @user:	frequency limits requested by the user
+ * @devinfo:	frequency limits of the device (available OPPs)
+ * @governor:	method how to choose frequency based on the usage.
+ * @governor_name:	devfreq governor name for use with this devfreq
+ */
+struct devfreq_policy {
+	struct devfreq_freq_limits user;
+	struct devfreq_freq_limits devinfo;
+	const struct devfreq_governor *governor;
+	char governor_name[DEVFREQ_NAME_LEN];
+};
+
 /**
  * struct devfreq - Device devfreq structure
  * @node:	list node - contains the devices with devfreq that have been
@@ -117,8 +141,6 @@ struct devfreq_dev_profile {
  * @dev:	device registered by devfreq class. dev.parent is the device
  *		using devfreq.
  * @profile:	device-specific devfreq profile
- * @governor:	method how to choose frequency based on the usage.
- * @governor_name:	devfreq governor name for use with this devfreq
  * @nb:		notifier block used to notify devfreq object that it should
  *		reevaluate operable frequencies. Devfreq users may use
  *		devfreq.nb to the corresponding register notifier call chain.
@@ -126,10 +148,7 @@ struct devfreq_dev_profile {
  * @previous_freq:	previously configured frequency value.
  * @data:	Private data of the governor. The devfreq framework does not
  *		touch this.
- * @min_freq:	Limit minimum frequency requested by user (0: none)
- * @max_freq:	Limit maximum frequency requested by user (0: none)
- * @scaling_min_freq:	Limit minimum frequency requested by OPP interface
- * @scaling_max_freq:	Limit maximum frequency requested by OPP interface
+ * @policy:		Policy for frequency adjustments
  * @stop_polling:	 devfreq polling status of a device.
  * @total_trans:	Number of devfreq transitions
  * @trans_table:	Statistics of devfreq transitions
@@ -151,8 +170,6 @@ struct devfreq {
 	struct mutex lock;
 	struct device dev;
 	struct devfreq_dev_profile *profile;
-	const struct devfreq_governor *governor;
-	char governor_name[DEVFREQ_NAME_LEN];
 	struct notifier_block nb;
 	struct delayed_work work;
 
@@ -161,10 +178,7 @@ struct devfreq {
 
 	void *data; /* private data for governors */
 
-	unsigned long min_freq;
-	unsigned long max_freq;
-	unsigned long scaling_min_freq;
-	unsigned long scaling_max_freq;
+	struct devfreq_policy policy;
 	bool stop_polling;
 
 	/* information for device frequency transition */
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 05/12] PM / devfreg: Add support for policy notifiers
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (3 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 04/12] PM / devfreq: Add struct devfreq_policy Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 06/12] PM / devfreq: Make update_devfreq() public Matthias Kaehlcke
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Policy notifiers are called before a frequency change and may narrow
the min/max frequency range in devfreq_policy, which is used to adjust
the target frequency if it is beyond this range.

Also add a few helpers:
 - devfreq_verify_within_[dev_]limits()
    - should be used by the notifiers for policy adjustments.
 - dev_to_devfreq()
    - lookup a devfreq strict from a device pointer

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- devfreq.h: fixed misspelling of struct devfreq_policy

Changes in v2:
- performance, powersave and simpleondemand governors don't need changes
  with "PM / devfreq: Don't adjust to user limits in governors"
- formatting fixes

 drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++-------
 include/linux/devfreq.h   | 65 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 11 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 21604d6ae2b8..4cbaa7ad1972 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev)
 	return ERR_PTR(-ENODEV);
 }
 
+/**
+ * dev_to_devfreq() - find devfreq struct using device pointer
+ * @dev:	device pointer used to lookup device devfreq.
+ */
+struct devfreq *dev_to_devfreq(struct device *dev)
+{
+	struct devfreq *devfreq;
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	mutex_unlock(&devfreq_list_lock);
+
+	return devfreq;
+}
+
 static unsigned long find_available_min_freq(struct devfreq *devfreq)
 {
 	struct dev_pm_opp *opp;
@@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq)
 	if (!policy->governor)
 		return -EINVAL;
 
+	policy->min = policy->devinfo.min_freq;
+	policy->max = policy->devinfo.max_freq;
+
+	srcu_notifier_call_chain(&devfreq->policy_notifier_list,
+				DEVFREQ_ADJUST, policy);
+
 	/* Reevaluate the proper frequency */
 	err = policy->governor->get_target_freq(devfreq, &freq);
 	if (err)
 		return err;
 
-	/*
-	 * Adjust the frequency with user freq, QoS and available freq.
-	 *
-	 * List from the highest priority
-	 * max_freq
-	 * min_freq
-	 */
-	max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
-	min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
+	/* Adjust the frequency */
+
+	max_freq = MIN(policy->max, policy->user.max_freq);
+	min_freq = MAX(policy->min, policy->user.min_freq);
 
 	if (freq < min_freq) {
 		freq = min_freq;
@@ -645,6 +661,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
 	devfreq->last_stat_updated = jiffies;
 
 	srcu_init_notifier_head(&devfreq->transition_notifier_list);
+	srcu_init_notifier_head(&devfreq->policy_notifier_list);
 
 	mutex_unlock(&devfreq->lock);
 
@@ -1445,7 +1462,7 @@ EXPORT_SYMBOL(devm_devfreq_unregister_opp_notifier);
  * devfreq_register_notifier() - Register a driver with devfreq
  * @devfreq:	The devfreq object.
  * @nb:		The notifier block to register.
- * @list:	DEVFREQ_TRANSITION_NOTIFIER.
+ * @list:	DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER.
  */
 int devfreq_register_notifier(struct devfreq *devfreq,
 				struct notifier_block *nb,
@@ -1461,6 +1478,10 @@ int devfreq_register_notifier(struct devfreq *devfreq,
 		ret = srcu_notifier_chain_register(
 				&devfreq->transition_notifier_list, nb);
 		break;
+	case DEVFREQ_POLICY_NOTIFIER:
+		ret = srcu_notifier_chain_register(
+				&devfreq->policy_notifier_list, nb);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1473,7 +1494,7 @@ EXPORT_SYMBOL(devfreq_register_notifier);
  * devfreq_unregister_notifier() - Unregister a driver with devfreq
  * @devfreq:	The devfreq object.
  * @nb:		The notifier block to be unregistered.
- * @list:	DEVFREQ_TRANSITION_NOTIFIER.
+ * @list:	DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER.
  */
 int devfreq_unregister_notifier(struct devfreq *devfreq,
 				struct notifier_block *nb,
@@ -1489,6 +1510,11 @@ int devfreq_unregister_notifier(struct devfreq *devfreq,
 		ret = srcu_notifier_chain_unregister(
 				&devfreq->transition_notifier_list, nb);
 		break;
+	case DEVFREQ_POLICY_NOTIFIER:
+		ret = srcu_notifier_chain_unregister(
+				&devfreq->policy_notifier_list, nb);
+		break;
+
 	default:
 		ret = -EINVAL;
 	}
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 9bf23b976f4d..7c8dce96db73 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -33,6 +33,10 @@
 #define	DEVFREQ_PRECHANGE		(0)
 #define DEVFREQ_POSTCHANGE		(1)
 
+#define DEVFREQ_POLICY_NOTIFIER		1
+
+#define	DEVFREQ_ADJUST			0
+
 struct devfreq;
 struct devfreq_governor;
 
@@ -121,12 +125,16 @@ struct devfreq_freq_limits {
 
 /**
  * struct devfreq_policy - Devfreq policy
+ * @min:	minimum frequency (adjustable by policy notifiers)
+ * @min:	maximum frequency (adjustable by policy notifiers)
  * @user:	frequency limits requested by the user
  * @devinfo:	frequency limits of the device (available OPPs)
  * @governor:	method how to choose frequency based on the usage.
  * @governor_name:	devfreq governor name for use with this devfreq
  */
 struct devfreq_policy {
+	unsigned long min;
+	unsigned long max;
 	struct devfreq_freq_limits user;
 	struct devfreq_freq_limits devinfo;
 	const struct devfreq_governor *governor;
@@ -155,6 +163,7 @@ struct devfreq_policy {
  * @time_in_state:	Statistics of devfreq states
  * @last_stat_updated:	The last time stat updated
  * @transition_notifier_list: list head of DEVFREQ_TRANSITION_NOTIFIER notifier
+ * @policy_notifier_list: list head of DEVFREQ_POLICY_NOTIFIER notifier
  *
  * This structure stores the devfreq information for a give device.
  *
@@ -188,6 +197,7 @@ struct devfreq {
 	unsigned long last_stat_updated;
 
 	struct srcu_notifier_head transition_notifier_list;
+	struct srcu_notifier_head policy_notifier_list;
 };
 
 struct devfreq_freqs {
@@ -240,6 +250,45 @@ extern void devm_devfreq_unregister_notifier(struct device *dev,
 extern struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev,
 						int index);
 
+/**
+ * devfreq_verify_within_limits() - Adjust a devfreq policy if needed to make
+ *                                  sure its min/max values are within a
+ *                                  specified range.
+ * @policy:	the policy
+ * @min:	the minimum frequency
+ * @max:	the maximum frequency
+ */
+static inline void devfreq_verify_within_limits(struct devfreq_policy *policy,
+		unsigned int min, unsigned int max)
+{
+	if (policy->min < min)
+		policy->min = min;
+	if (policy->max < min)
+		policy->max = min;
+	if (policy->min > max)
+		policy->min = max;
+	if (policy->max > max)
+		policy->max = max;
+	if (policy->min > policy->max)
+		policy->min = policy->max;
+}
+
+/**
+ * devfreq_verify_within_dev_limits() - Adjust a devfreq policy if needed to
+ *                                      make sure its min/max values are within
+ *                                      the frequency range supported by the
+ *                                      device.
+ * @policy:	the policy
+ */
+static inline void
+devfreq_verify_within_dev_limits(struct devfreq_policy *policy)
+{
+	devfreq_verify_within_limits(policy, policy->devinfo.min_freq,
+			policy->devinfo.max_freq);
+}
+
+struct devfreq *dev_to_devfreq(struct device *dev);
+
 #if IS_ENABLED(CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND)
 /**
  * struct devfreq_simple_ondemand_data - void *data fed to struct devfreq
@@ -394,10 +443,26 @@ static inline struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev,
 	return ERR_PTR(-ENODEV);
 }
 
+static inline void devfreq_verify_within_limits(struct devfreq_policy *policy,
+		unsigned int min, unsigned int max)
+{
+}
+
+static inline void
+devfreq_verify_within_dev_limits(struct devfreq_policy *policy)
+{
+}
+
 static inline int devfreq_update_stats(struct devfreq *df)
 {
 	return -EINVAL;
 }
+
+static inline struct devfreq *dev_to_devfreq(struct device *dev)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_PM_DEVFREQ */
 
 #endif /* __LINUX_DEVFREQ_H__ */
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 06/12] PM / devfreq: Make update_devfreq() public
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (4 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 05/12] PM / devfreg: Add support for policy notifiers Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 07/12] PM / devfreq: export devfreq_class Matthias Kaehlcke
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Currently update_devfreq() is only visible to devfreq governors outside
of devfreq.c. Make it public to allow drivers that adjust devfreq policies
to cause a re-evaluation of the frequency after a policy change.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
---
Changes in v3:
- none

Changes in v2:
- added 'Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>' tag

 drivers/devfreq/governor.h | 3 ---
 include/linux/devfreq.h    | 8 ++++++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
index b81700244ce3..f53339ca610f 100644
--- a/drivers/devfreq/governor.h
+++ b/drivers/devfreq/governor.h
@@ -57,9 +57,6 @@ struct devfreq_governor {
 				unsigned int event, void *data);
 };
 
-/* Caution: devfreq->lock must be locked before calling update_devfreq */
-extern int update_devfreq(struct devfreq *devfreq);
-
 extern void devfreq_monitor_start(struct devfreq *devfreq);
 extern void devfreq_monitor_stop(struct devfreq *devfreq);
 extern void devfreq_monitor_suspend(struct devfreq *devfreq);
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 7c8dce96db73..c4f84a769cb5 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -222,6 +222,14 @@ extern void devm_devfreq_remove_device(struct device *dev,
 extern int devfreq_suspend_device(struct devfreq *devfreq);
 extern int devfreq_resume_device(struct devfreq *devfreq);
 
+/**
+ * update_devfreq() - Reevaluate the device and configure frequency
+ * @devfreq:	the devfreq device
+ *
+ * Note: devfreq->lock must be held
+ */
+extern int update_devfreq(struct devfreq *devfreq);
+
 /* Helper functions for devfreq user device driver with OPP. */
 extern struct dev_pm_opp *devfreq_recommended_opp(struct device *dev,
 					   unsigned long *freq, u32 flags);
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 07/12] PM / devfreq: export devfreq_class
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (5 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 06/12] PM / devfreq: Make update_devfreq() public Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 08/12] cpufreq: Add stub for cpufreq_update_policy() Matthias Kaehlcke
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Exporting the device class allows other parts of the kernel to enumerate
the devfreq devices and receive notification when a devfreq device is
added or removed.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- none

Changes in v2:
- patch added to series

 drivers/devfreq/devfreq.c | 3 ++-
 include/linux/devfreq.h   | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 4cbaa7ad1972..38b90b64fc6e 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -31,7 +31,8 @@
 #define MAX(a,b)	((a > b) ? a : b)
 #define MIN(a,b)	((a < b) ? a : b)
 
-static struct class *devfreq_class;
+struct class *devfreq_class;
+EXPORT_SYMBOL_GPL(devfreq_class);
 
 /*
  * devfreq core provides delayed work based load monitoring helper
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index c4f84a769cb5..964e064a951f 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -206,6 +206,8 @@ struct devfreq_freqs {
 };
 
 #if defined(CONFIG_PM_DEVFREQ)
+extern struct class *devfreq_class;
+
 extern struct devfreq *devfreq_add_device(struct device *dev,
 				  struct devfreq_dev_profile *profile,
 				  const char *governor_name,
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 08/12] cpufreq: Add stub for cpufreq_update_policy()
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (6 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 07/12] PM / devfreq: export devfreq_class Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 09/12] dt-bindings: PM / OPP: add opp-throttlers property Matthias Kaehlcke
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

cpufreq stubs out some functions when CONFIG_CPU_FREQ=n , but
cpufreq_update_policy() is not among them. The throttler driver
(https://patchwork.kernel.org/patch/10453351/) uses cpufreq as one
possible throttling mechanism, but it can still be useful without
cpufreq. Stubbing out cpufreq_update_policy() allows the throttler
driver to be built without ugly #ifdef'ery when cpufreq is disabled.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- patch added to series

 include/linux/cpufreq.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 882a9b9e34bc..dba8c4951e2e 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -210,6 +210,7 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu)
 	return 0;
 }
 static inline void disable_cpufreq(void) { }
+static inline void cpufreq_update_policy(unsigned int cpu) { }
 #endif
 
 #ifdef CONFIG_CPU_FREQ_STAT
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 09/12] dt-bindings: PM / OPP: add opp-throttlers property
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (7 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 08/12] cpufreq: Add stub for cpufreq_update_policy() Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 10/12] misc: throttler: Add core support for non-thermal throttling Matthias Kaehlcke
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

The optional opp-throttlers property is used to configure the throttlers
(see drivers/misc/throttler/*) that use a given OPP to throttle the
corresponding device(s).

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- none

Changes in v2:
- patch added to series

 Documentation/devicetree/bindings/opp/opp.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
index c396c4c0af92..747e79740c75 100644
--- a/Documentation/devicetree/bindings/opp/opp.txt
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -170,6 +170,9 @@ Optional properties:
   functioning of the current device at the current OPP (where this property is
   present).
 
+- opp-throttlers: Array with phandles of throttlers that use this OPP to
+  throttle the corresponding device(s).
+
 Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
 
 / {
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 10/12] misc: throttler: Add core support for non-thermal throttling
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (8 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 09/12] dt-bindings: PM / OPP: add opp-throttlers property Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-18 23:03   ` Brian Norris
  2018-06-14 19:47 ` [PATCH v3 11/12] misc: throttler: Add Chrome OS EC throttler Matthias Kaehlcke
  2018-06-14 19:47 ` [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device Matthias Kaehlcke
  11 siblings, 1 reply; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

The purpose of the throttler is to provide support for non-thermal
throttling. Throttling is triggered by external event, e.g. the
detection of a high battery discharge current, close to the OCP limit
of the battery. The throttler is only in charge of the throttling, not
the monitoring, which is done by another (possibly platform specific)
driver.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- Kconfig: don't select CPU_FREQ and PM_DEVFREQ
- added CONFIG_THROTTLER_DEBUG option
- changed 'level' sysfs attribute to debugfs
- introduced thr_<level> macros for logging
- added debug logs
- added field clamp_freq to struct cpufreq_thrdev and devfreq_thrdev
  to keep track of the current frequency limits and avoid spammy logs

Changes in v2:
- completely reworked the driver to support configuration through OPPs, which
  requires a more dynamic handling
- added sysfs attribute to set the level for debugging and testing
- Makefile: depend on Kconfig option to traverse throttler directory
- Kconfig: removed 'default n'
- added SPDX line instead of license boiler-plate
- added entry to MAINTAINERS file

 MAINTAINERS                     |   7 +
 drivers/misc/Kconfig            |   1 +
 drivers/misc/Makefile           |   1 +
 drivers/misc/throttler/Kconfig  |  23 ++
 drivers/misc/throttler/Makefile |   1 +
 drivers/misc/throttler/core.c   | 687 ++++++++++++++++++++++++++++++++
 include/linux/throttler.h       |  21 +
 7 files changed, 741 insertions(+)
 create mode 100644 drivers/misc/throttler/Kconfig
 create mode 100644 drivers/misc/throttler/Makefile
 create mode 100644 drivers/misc/throttler/core.c
 create mode 100644 include/linux/throttler.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dc241b04d1bd..db359af7cb1c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14090,6 +14090,13 @@ T:	git git://linuxtv.org/mhadli/v4l-dvb-davinci_devices.git
 S:	Maintained
 F:	drivers/media/platform/am437x/
 
+THROTTLER DRIVERS
+M:	Matthias Kaehlcke <mka@chromium.org>
+L:	linux-pm@vger.kernel.org
+S:	Maintained
+F:	drivers/misc/throttler/
+F:	include/linux/throttler.h
+
 TI BANDGAP AND THERMAL DRIVER
 M:	Eduardo Valentin <edubezval@gmail.com>
 M:	Keerthy <j-keerthy@ti.com>
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 3726eacdf65d..717fa3bd0e09 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -527,4 +527,5 @@ source "drivers/misc/echo/Kconfig"
 source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
+source "drivers/misc/throttler/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index af22bbc3d00c..0f4ecc6a7532 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,3 +58,4 @@ obj-$(CONFIG_ASPEED_LPC_SNOOP)	+= aspeed-lpc-snoop.o
 obj-$(CONFIG_PCI_ENDPOINT_TEST)	+= pci_endpoint_test.o
 obj-$(CONFIG_OCXL)		+= ocxl/
 obj-$(CONFIG_MISC_RTSX)		+= cardreader/
+obj-$(CONFIG_THROTTLER)		+= throttler/
diff --git a/drivers/misc/throttler/Kconfig b/drivers/misc/throttler/Kconfig
new file mode 100644
index 000000000000..8b2e63b2ef48
--- /dev/null
+++ b/drivers/misc/throttler/Kconfig
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0
+
+menuconfig THROTTLER
+	bool "Throttler support"
+	depends on OF
+	help
+	  This option enables core support for non-thermal throttling of CPUs
+	  and devfreq devices.
+
+	  Note that you also need a event monitor module usually called
+	  *_throttler.
+
+if THROTTLER
+
+menuconfig THROTTLER_DEBUG
+	bool "Enable throttler debugging"
+	help
+	  This option enables throttler debugging features like additional
+	  logging and a debugfs attribute for setting the logging level.
+
+	  Choose N unless you want to debug throttler drivers.
+
+endif # THROTTLER
diff --git a/drivers/misc/throttler/Makefile b/drivers/misc/throttler/Makefile
new file mode 100644
index 000000000000..c8d920cee315
--- /dev/null
+++ b/drivers/misc/throttler/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_THROTTLER)		+= core.o
diff --git a/drivers/misc/throttler/core.c b/drivers/misc/throttler/core.c
new file mode 100644
index 000000000000..52350d846654
--- /dev/null
+++ b/drivers/misc/throttler/core.c
@@ -0,0 +1,687 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Core code for non-thermal throttling
+ *
+ * Copyright (C) 2018 Google, Inc.
+ */
+
+#include <linux/cpu.h>
+#include <linux/cpufreq.h>
+#include <linux/cpumask.h>
+#include <linux/debugfs.h>
+#include <linux/devfreq.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_opp.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+#include <linux/throttler.h>
+
+/*
+ * Non-thermal throttling: throttling of system components in response to
+ * external events (e.g. high battery discharge current).
+ *
+ * The throttler supports throttling through cpufreq and devfreq. Multiple
+ * levels of throttling can be configured. At level 0 no throttling is
+ * active on behalf of the throttler, for values > 0 throttling is typically
+ * configured to be increasingly aggressive with each level.
+ * The number of throttling levels is not limited by the throttler (though
+ * it is likely limited by the throttling devices). It is not necessary to
+ * configure the same number of levels for all throttling devices. If the
+ * requested throttling level for a device is higher than the maximum level
+ * of the device the throttler will select the maximum throttling level of
+ * the device.
+ *
+ * Non-thermal throttling is split in two parts:
+ *
+ * - throttler core
+ *   - parses the thermal policy
+ *   - applies throttling settings for a requested level of throttling
+ *
+ * - event monitor driver
+ *   - monitors events that trigger throttling
+ *   - determines the throttling level (often limited to on/off)
+ *   - asks throttler core to apply throttling settings
+ *
+ * It is possible for a system to have more than one throttler and the
+ * throttlers may make use of the same throttling devices, in case of
+ * conflicting settings for a device the more aggressive values will be
+ * applied.
+ *
+ */
+
+#define ci_to_throttler(ci) \
+	container_of(ci, struct throttler, devfreq.class_iface)
+
+struct thr_freq_table {
+	uint32_t *freqs;
+	int n_entries;
+};
+
+struct cpufreq_thrdev {
+	uint32_t cpu;
+	struct thr_freq_table freq_table;
+	uint32_t clamp_freq;
+	struct list_head node;
+};
+
+struct devfreq_thrdev {
+	struct devfreq *devfreq;
+	struct thr_freq_table freq_table;
+	uint32_t clamp_freq;
+	struct throttler *thr;
+	struct notifier_block nb;
+	struct list_head node;
+};
+
+struct __thr_cpufreq {
+	struct list_head list;
+	cpumask_t cm_initialized;
+	cpumask_t cm_ignore;
+	struct notifier_block nb;
+};
+
+struct __thr_devfreq {
+	struct list_head list;
+	struct class_interface class_iface;
+};
+
+struct __thr_debugfs {
+	struct dentry *dir;
+	struct dentry *attr_level;
+};
+
+struct throttler {
+	struct device *dev;
+	int level;
+	struct __thr_cpufreq cpufreq;
+	struct __thr_devfreq devfreq;
+	struct mutex lock;
+#ifdef CONFIG_THROTTLER_DEBUG
+	struct __thr_debugfs debugfs;
+#endif
+};
+
+static inline int cmp_freqs(const void *a, const void *b)
+{
+	const uint32_t *pa = a, *pb = b;
+
+	if (*pa < *pb)
+		return 1;
+	else if (*pa > *pb)
+		return -1;
+
+	return 0;
+}
+
+static int thr_handle_devfreq_event(struct notifier_block *nb,
+				    unsigned long event, void *data);
+
+static unsigned long thr_get_throttling_freq(struct thr_freq_table *ft,
+					     int level)
+{
+	if (level == 0) {
+		WARN(true, "level == 0");
+		return ULONG_MAX;
+	}
+
+	if (level <= ft->n_entries)
+		return ft->freqs[level - 1];
+	else
+		return ft->freqs[ft->n_entries - 1];
+}
+
+static int thr_init_freq_table(struct throttler *thr, struct device *opp_dev,
+			       struct thr_freq_table *ft)
+{
+	struct device_node *np_opp_desc, *np_opp;
+	int nchilds;
+	uint32_t *freqs;
+	int nfreqs = 0;
+	int err = 0;
+
+	np_opp_desc = dev_pm_opp_of_get_opp_desc_node(opp_dev);
+	if (!np_opp_desc)
+		return -EINVAL;
+
+	nchilds = of_get_child_count(np_opp_desc);
+	if (!nchilds) {
+		err = -EINVAL;
+		goto out_node_put;
+	}
+
+	freqs = kzalloc(nchilds * sizeof(uint32_t), GFP_KERNEL);
+	if (!freqs) {
+		err = -ENOMEM;
+		goto out_node_put;
+	}
+
+	/* determine which OPPs are used by this throttler (if any) */
+	for_each_child_of_node(np_opp_desc, np_opp) {
+		int num_thr;
+		int i;
+
+		num_thr = of_property_count_u32_elems(np_opp, "opp-throttlers");
+		if (num_thr < 0)
+			continue;
+
+		for (i = 0; i < num_thr; i++) {
+			struct device_node *np_thr;
+
+			np_thr = of_parse_phandle(np_opp, "opp-throttlers", i);
+			if (!np_thr) {
+				thr_err(thr,
+					"failed to parse phandle %d: %s\n", i,
+					np_opp->full_name);
+				continue;
+			}
+
+			if (thr->dev->of_node == np_thr) {
+				u64 rate;
+
+				err = of_property_read_u64(np_opp, "opp-hz",
+							   &rate);
+				if (!err) {
+					freqs[nfreqs] = rate;
+					nfreqs++;
+
+					thr_dbg(thr,
+						"OPP %s (%llu MHz) is used for throttling\n",
+						np_opp->full_name,
+						rate / 1000000);
+
+				} else {
+					thr_err(thr, "opp-hz not found: %s\n",
+						np_opp->full_name);
+				}
+			}
+
+			of_node_put(np_thr);
+		}
+	}
+
+	if (nfreqs > 0) {
+		/* sort frequencies in descending order */
+		sort(freqs, nfreqs, sizeof(*freqs), cmp_freqs, NULL);
+
+		ft->n_entries = nfreqs;
+		ft->freqs = devm_kzalloc(thr->dev,
+				  nfreqs * sizeof(*freqs), GFP_KERNEL);
+		if (!ft->freqs) {
+			err = -ENOMEM;
+			goto out_free;
+		}
+
+		memcpy(ft->freqs, freqs, nfreqs * sizeof(*freqs));
+	} else {
+		err = -ENODEV;
+	}
+
+out_free:
+	kfree(freqs);
+
+out_node_put:
+	of_node_put(np_opp_desc);
+
+	return err;
+}
+
+static void thr_cpufreq_init(struct throttler *thr, int cpu)
+{
+	struct device *cpu_dev;
+	struct thr_freq_table ft;
+	struct cpufreq_thrdev *cpufreq_dev;
+	int err;
+
+	WARN_ON(!mutex_is_locked(&thr->lock));
+
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev) {
+		dev_err_ratelimited(thr->dev, "failed to get CPU %d\n", cpu);
+		return;
+	}
+
+	err = thr_init_freq_table(thr, cpu_dev, &ft);
+	if (err) {
+		/* CPU is not throttled or initialization failed */
+		if (err != -ENODEV)
+			thr_err(thr, "failed to initialize CPU %d: %d", cpu,
+				err);
+
+		cpumask_set_cpu(cpu, &thr->cpufreq.cm_ignore);
+		return;
+	}
+
+	cpufreq_dev = devm_kzalloc(thr->dev, sizeof(*cpufreq_dev), GFP_KERNEL);
+	if (!cpufreq_dev) {
+		thr_err(thr, "%s: out of memory\n", __func__);
+		return;
+	}
+
+	cpufreq_dev->cpu = cpu;
+	memcpy(&cpufreq_dev->freq_table, &ft, sizeof(ft));
+	list_add_tail(&cpufreq_dev->node, &thr->cpufreq.list);
+
+	cpumask_set_cpu(cpu, &thr->cpufreq.cm_initialized);
+}
+
+static void thr_devfreq_init(struct device *dev, void *data)
+{
+	struct throttler *thr = data;
+	struct thr_freq_table ft;
+	struct devfreq_thrdev *dftd;
+	int err;
+
+	WARN_ON(!mutex_is_locked(&thr->lock));
+
+	err = thr_init_freq_table(thr, dev->parent, &ft);
+	if (err) {
+		if (err == -ENODEV)
+			return;
+
+		thr_err(thr, "failed to init frequency table of device %s: %d",
+			dev_name(dev), err);
+		return;
+	}
+
+	dftd = devm_kzalloc(thr->dev, sizeof(*dftd), GFP_KERNEL);
+	if (!dftd) {
+		thr_err(thr, "%s: out of memory\n", __func__);
+		return;
+	}
+
+	dftd->thr = thr;
+	dftd->devfreq = container_of(dev, struct devfreq, dev);
+	memcpy(&dftd->freq_table, &ft, sizeof(ft));
+
+	dftd->nb.notifier_call = thr_handle_devfreq_event;
+	err = devm_devfreq_register_notifier(thr->dev, dftd->devfreq,
+				     &dftd->nb, DEVFREQ_POLICY_NOTIFIER);
+	if (err < 0) {
+		thr_err(thr, "failed to register devfreq notifier\n");
+		devm_kfree(thr->dev, dftd);
+		return;
+	}
+
+	list_add_tail(&dftd->node, &thr->devfreq.list);
+
+	thr_dbg(thr, "device '%s' is used for throttling\n",
+		dev_name(dev));
+}
+
+static int thr_handle_cpufreq_event(struct notifier_block *nb,
+				unsigned long event, void *data)
+{
+	struct throttler *thr =
+		container_of(nb, struct throttler, cpufreq.nb);
+	struct cpufreq_policy *policy = data;
+	struct cpufreq_thrdev *cftd;
+
+	if (event != CPUFREQ_ADJUST)
+		return 0;
+
+	mutex_lock(&thr->lock);
+
+	if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
+		goto out;
+
+	if (!cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_initialized)) {
+		thr_cpufreq_init(thr, policy->cpu);
+
+		if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
+			goto out;
+
+		thr_dbg(thr, "CPU%d is used for throttling\n", policy->cpu);
+	}
+
+	/*
+	 * Can't do this check earlier, otherwise we might miss CPU policies
+	 * that are added after setup().
+	 */
+	if (thr->level == 0) {
+		list_for_each_entry(cftd, &thr->cpufreq.list, node) {
+			if (cftd->cpu != policy->cpu)
+				continue;
+
+			if (cftd->clamp_freq != 0) {
+				thr_dbg(thr, "unthrottling CPU%d\n", cftd->cpu);
+				cftd->clamp_freq = 0;
+			}
+		}
+
+		goto out;
+	}
+
+	list_for_each_entry(cftd, &thr->cpufreq.list, node) {
+		unsigned long clamp_freq;
+
+		if (cftd->cpu != policy->cpu)
+			continue;
+
+		clamp_freq = thr_get_throttling_freq(&cftd->freq_table,
+						     thr->level) / 1000;
+		if (cftd->clamp_freq != clamp_freq) {
+			thr_dbg(thr, "throttling CPU%d to %lu MHz\n", cftd->cpu,
+				clamp_freq / 1000);
+			cftd->clamp_freq = clamp_freq;
+		}
+
+		if (clamp_freq < policy->max)
+			cpufreq_verify_within_limits(policy, 0, clamp_freq);
+	}
+
+out:
+	mutex_unlock(&thr->lock);
+
+	return NOTIFY_DONE;
+}
+
+/*
+ * Notifier called by devfreq. Can't acquire thr->lock since it might
+ * already be held by throttler_set_level(). It isn't necessary to
+ * acquire the lock for the following reasons:
+ *
+ * Only the devfreq_thrdev and thr->level are accessed in this function.
+ * The devfreq device won't go away (or change) during the execution of
+ * this function, since we are called from the devfreq core. Theoretically
+ * thr->level could change and we'd apply an outdated setting, however in
+ * this case the function would run again shortly after and apply the
+ * correct value.
+ */
+static int thr_handle_devfreq_event(struct notifier_block *nb,
+				    unsigned long event, void *data)
+{
+	struct devfreq_thrdev *dftd =
+		container_of(nb, struct devfreq_thrdev, nb);
+	struct throttler *thr = dftd->thr;
+	struct devfreq_policy *policy = data;
+	unsigned long clamp_freq;
+
+	if (event != DEVFREQ_ADJUST)
+		return NOTIFY_DONE;
+
+	if (thr->level == 0) {
+		if (dftd->clamp_freq != 0) {
+			thr_dbg(thr, "unthrottling '%s'\n",
+				dev_name(&dftd->devfreq->dev));
+			dftd->clamp_freq = 0;
+		}
+
+		return NOTIFY_DONE;
+	}
+
+	clamp_freq = thr_get_throttling_freq(&dftd->freq_table, thr->level);
+	if (clamp_freq != dftd->clamp_freq) {
+		thr_dbg(thr, "throttling '%s' to %lu MHz\n",
+			dev_name(&dftd->devfreq->dev), clamp_freq / 1000000);
+		dftd->clamp_freq = clamp_freq;
+	}
+
+	if (clamp_freq < policy->max)
+		devfreq_verify_within_limits(policy, 0, clamp_freq);
+
+	return NOTIFY_DONE;
+}
+
+static void thr_cpufreq_update_policy(struct throttler *thr)
+{
+	struct cpufreq_thrdev *cftd;
+
+	WARN_ON(!mutex_is_locked(&thr->lock));
+
+	list_for_each_entry(cftd, &thr->cpufreq.list, node) {
+		struct cpufreq_policy *policy = cpufreq_cpu_get(cftd->cpu);
+
+		if (!policy) {
+			thr_warn(thr, "CPU%d does have no cpufreq policy!\n",
+				 cftd->cpu);
+			continue;
+		}
+
+		/*
+		 * The lock isn't really needed in this function, the list
+		 * of cpufreq devices can be extended, but no items are
+		 * deleted during the lifetime of the throttler. Releasing
+		 * the lock is necessary since cpufreq_update_policy() ends
+		 * up calling thr_handle_cpufreq_event(), which needs to
+		 * acquire the lock.
+		 */
+		mutex_unlock(&thr->lock);
+		cpufreq_update_policy(cftd->cpu);
+		mutex_lock(&thr->lock);
+
+		cpufreq_cpu_put(policy);
+	}
+}
+
+static int thr_handle_devfreq_added(struct device *dev,
+				    struct class_interface *ci)
+{
+	struct throttler *thr = ci_to_throttler(ci);
+
+	mutex_lock(&thr->lock);
+	thr_devfreq_init(dev, thr);
+	mutex_unlock(&thr->lock);
+
+	return 0;
+}
+
+static void thr_handle_devfreq_removed(struct device *dev,
+				       struct class_interface *ci)
+{
+	struct devfreq_thrdev *dftd;
+	struct throttler *thr = ci_to_throttler(ci);
+
+	mutex_lock(&thr->lock);
+
+	list_for_each_entry(dftd, &thr->devfreq.list, node) {
+		if (dev == &dftd->devfreq->dev) {
+			list_del(&dftd->node);
+			devm_kfree(thr->dev, dftd->freq_table.freqs);
+			devm_kfree(thr->dev, dftd);
+			break;
+		}
+	}
+
+	mutex_unlock(&thr->lock);
+}
+
+void throttler_set_level(struct throttler *thr, int level)
+{
+	struct devfreq_thrdev *dftd;
+
+	if (level == thr->level)
+		return;
+
+	mutex_lock(&thr->lock);
+
+	thr_dbg(thr, "throttling level: %d\n", level);
+	thr->level = level;
+
+	if (!list_empty(&thr->cpufreq.list))
+		thr_cpufreq_update_policy(thr);
+
+	list_for_each_entry(dftd, &thr->devfreq.list, node) {
+		mutex_lock(&dftd->devfreq->lock);
+		update_devfreq(dftd->devfreq);
+		mutex_unlock(&dftd->devfreq->lock);
+	}
+
+	mutex_unlock(&thr->lock);
+}
+EXPORT_SYMBOL_GPL(throttler_set_level);
+
+#ifdef CONFIG_THROTTLER_DEBUG
+
+static ssize_t thr_level_read(struct file *file, char __user *user_buf,
+			      size_t count, loff_t *ppos)
+{
+	struct throttler *thr = file->f_inode->i_private;
+	char buf[5];
+	int len;
+
+	len = scnprintf(buf, sizeof(buf), "%d\n", thr->level);
+
+	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static ssize_t thr_level_write(struct file *file,
+				 const char __user *user_buf,
+				 size_t count, loff_t *ppos)
+{
+	int rc;
+	int level;
+	struct throttler *thr = file->f_inode->i_private;
+
+	rc = kstrtoint_from_user(user_buf, count, 10, &level);
+	if (rc)
+		return rc;
+
+	throttler_set_level(thr, level);
+
+	return count;
+}
+
+static const struct file_operations level_debugfs_ops = {
+	.owner = THIS_MODULE,
+	.read = thr_level_read,
+	.write = thr_level_write,
+};
+#endif
+
+struct throttler *throttler_setup(struct device *dev)
+{
+	struct throttler *thr;
+	struct device_node *np = dev->of_node;
+	struct class_interface *ci;
+	int cpu;
+	int err;
+
+	if (!np)
+		/* should never happen */
+		return ERR_PTR(-EINVAL);
+
+	thr = devm_kzalloc(dev, sizeof(*thr), GFP_KERNEL);
+	if (!thr)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&thr->lock);
+	thr->dev = dev;
+
+	cpumask_clear(&thr->cpufreq.cm_ignore);
+	cpumask_clear(&thr->cpufreq.cm_initialized);
+
+	INIT_LIST_HEAD(&thr->cpufreq.list);
+	INIT_LIST_HEAD(&thr->devfreq.list);
+
+	thr->cpufreq.nb.notifier_call = thr_handle_cpufreq_event;
+	err = cpufreq_register_notifier(&thr->cpufreq.nb,
+					CPUFREQ_POLICY_NOTIFIER);
+	if (err < 0) {
+		thr_err(thr, "failed to register cpufreq notifier\n");
+		return ERR_PTR(err);
+	}
+
+	/*
+	 * The CPU throttling configuration is parsed at runtime, when the
+	 * cpufreq policy notifier is called for a CPU that hasn't been
+	 * initialized yet.
+	 *
+	 * This is done for two reasons:
+	 * -  when the throttler is probed the CPU might not yet have a policy
+	 * -  CPUs that were offline at probe time might be hotplugged
+	 *
+	 * The notifier is called then the policy is added/set
+	 */
+	for_each_online_cpu(cpu) {
+		struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+
+		if (!policy)
+			continue;
+
+		cpufreq_update_policy(cpu);
+		cpufreq_cpu_put(policy);
+	}
+
+	/*
+	 * devfreq devices can be added and removed at runtime, hence they
+	 * must also be handled dynamically. The class_interface notifies us
+	 * whenever a device is added or removed. When the interface is
+	 * registered ci->add_dev() is called for all existing devfreq
+	 * devices.
+	 */
+	ci = &thr->devfreq.class_iface;
+	ci->class = devfreq_class;
+	ci->add_dev = thr_handle_devfreq_added;
+	ci->remove_dev = thr_handle_devfreq_removed;
+
+	err = class_interface_register(ci);
+	if (err) {
+		thr_err(thr, "failed to register devfreq class interface: %d\n",
+			err);
+		cpufreq_unregister_notifier(&thr->cpufreq.nb,
+					    CPUFREQ_POLICY_NOTIFIER);
+		return ERR_PTR(err);
+	}
+
+#ifdef CONFIG_THROTTLER_DEBUG
+	thr->debugfs.dir = debugfs_create_dir(dev_name(thr->dev), NULL);
+	if (IS_ERR(thr->debugfs.dir)) {
+		thr_warn(thr, "failed to create debugfs directory: %ld\n",
+			 PTR_ERR(thr->debugfs.dir));
+		thr->debugfs.dir = NULL;
+		goto skip_debugfs;
+	}
+
+	thr->debugfs.attr_level = debugfs_create_file("level", 0644,
+						      thr->debugfs.dir, thr,
+						      &level_debugfs_ops);
+	if (IS_ERR(thr->debugfs.dir)) {
+		thr_warn(thr, "failed to create debugfs attribute: %ld\n",
+			 PTR_ERR(thr->debugfs.attr_level));
+		debugfs_remove(thr->debugfs.dir);
+		thr->debugfs.dir = NULL;
+	}
+
+skip_debugfs:
+#endif
+
+	return thr;
+}
+EXPORT_SYMBOL_GPL(throttler_setup);
+
+void throttler_teardown(struct throttler *thr)
+{
+	struct devfreq_thrdev *dftd;
+	int level;
+
+	mutex_lock(&thr->lock);
+
+	level = thr->level;
+	thr->level = 0;
+
+	class_interface_unregister(&thr->devfreq.class_iface);
+
+	if (level) {
+		/* unthrottle CPUs */
+		if (!list_empty(&thr->cpufreq.list))
+			thr_cpufreq_update_policy(thr);
+
+		/* unthrottle devfreq devices */
+		list_for_each_entry(dftd, &thr->devfreq.list, node) {
+			mutex_lock(&dftd->devfreq->lock);
+			update_devfreq(dftd->devfreq);
+			mutex_unlock(&dftd->devfreq->lock);
+		}
+	}
+
+	cpufreq_unregister_notifier(&thr->cpufreq.nb,
+				    CPUFREQ_POLICY_NOTIFIER);
+
+	mutex_unlock(&thr->lock);
+}
+EXPORT_SYMBOL_GPL(throttler_teardown);
diff --git a/include/linux/throttler.h b/include/linux/throttler.h
new file mode 100644
index 000000000000..a29d99f581da
--- /dev/null
+++ b/include/linux/throttler.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_THROTTLER_H__
+#define __LINUX_THROTTLER_H__
+
+struct throttler;
+
+extern struct throttler *throttler_setup(struct device *dev);
+extern void throttler_teardown(struct throttler *thr);
+extern void throttler_set_level(struct throttler *thr, int level);
+
+#ifdef CONFIG_THROTTLER_DEBUG
+#define thr_dbg(thr, fmt, ...) dev_info(thr->dev, fmt, ##__VA_ARGS__)
+#else
+#define thr_dbg(thr, fmt, ...) dev_dbg(thr->dev, fmt, ##__VA_ARGS__)
+#endif
+
+#define thr_info(thr, fmt, ...) dev_info(thr->dev, fmt, ##__VA_ARGS__)
+#define thr_warn(thr, fmt, ...) dev_warn(thr->dev, fmt, ##__VA_ARGS__)
+#define thr_err(thr, fmt, ...) dev_warn(thr->dev, fmt, ##__VA_ARGS__)
+
+#endif /* __LINUX_THROTTLER_H__ */
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 11/12] misc: throttler: Add Chrome OS EC throttler
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (9 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 10/12] misc: throttler: Add core support for non-thermal throttling Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-18 23:34   ` Brian Norris
  2018-06-14 19:47 ` [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device Matthias Kaehlcke
  11 siblings, 1 reply; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

The driver subscribes to throttling events from the Chrome OS
embedded controller and enables/disables system throttling based
on these events.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
---
Changes in v3:
- change module license to GPL v2 as in the SPDX identifier
- don't instantiate the throttler through the DT (instantiation
  by CrOS EC MFD in a separate patch)

Changes in v2:
- added SPDX line instead of license boiler-plate
- use macro to avoid splitting line
- changed variable name for throttler from 'cte' to 'ce_thr'
- formatting fixes
- Kconfig: removed odd dashes around 'help'
- added 'Reviewed-by' tag

 drivers/misc/throttler/Kconfig             |  10 ++
 drivers/misc/throttler/Makefile            |   1 +
 drivers/misc/throttler/cros_ec_throttler.c | 109 +++++++++++++++++++++
 3 files changed, 120 insertions(+)
 create mode 100644 drivers/misc/throttler/cros_ec_throttler.c

diff --git a/drivers/misc/throttler/Kconfig b/drivers/misc/throttler/Kconfig
index 8b2e63b2ef48..ef984c96f67b 100644
--- a/drivers/misc/throttler/Kconfig
+++ b/drivers/misc/throttler/Kconfig
@@ -20,4 +20,14 @@ menuconfig THROTTLER_DEBUG
 
 	  Choose N unless you want to debug throttler drivers.
 
+config CROS_EC_THROTTLER
+	tristate "Throttler event monitor for the Chrome OS Embedded Controller"
+	depends on MFD_CROS_EC
+	help
+	  This driver adds support to throttle the system in reaction to
+	  Chrome OS EC events.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called cros_ec_throttler.
+
 endif # THROTTLER
diff --git a/drivers/misc/throttler/Makefile b/drivers/misc/throttler/Makefile
index c8d920cee315..d9b2a77dabc9 100644
--- a/drivers/misc/throttler/Makefile
+++ b/drivers/misc/throttler/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_THROTTLER)		+= core.o
+obj-$(CONFIG_CROS_EC_THROTTLER)	+= cros_ec_throttler.o
diff --git a/drivers/misc/throttler/cros_ec_throttler.c b/drivers/misc/throttler/cros_ec_throttler.c
new file mode 100644
index 000000000000..f1e9fc0a6284
--- /dev/null
+++ b/drivers/misc/throttler/cros_ec_throttler.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for throttling triggered by events from the Chrome OS Embedded
+ * Controller.
+ *
+ * Copyright (C) 2018 Google, Inc.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mfd/cros_ec.h>
+#include <linux/module.h>
+#include <linux/notifier.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/throttler.h>
+
+#define nb_to_ce_thr(nb) container_of(nb, struct cros_ec_throttler, nb)
+
+struct cros_ec_throttler {
+	struct cros_ec_device *ec;
+	struct throttler *throttler;
+	struct notifier_block nb;
+};
+
+static int cros_ec_throttler_event(struct notifier_block *nb,
+	unsigned long queued_during_suspend, void *_notify)
+{
+	struct cros_ec_throttler *ce_thr = nb_to_ce_thr(nb);
+	u32 host_event;
+
+	host_event = cros_ec_get_host_event(ce_thr->ec);
+	if (host_event & EC_HOST_EVENT_MASK(EC_HOST_EVENT_THROTTLE_START)) {
+		throttler_set_level(ce_thr->throttler, 1);
+
+		return NOTIFY_OK;
+	} else if (host_event &
+		   EC_HOST_EVENT_MASK(EC_HOST_EVENT_THROTTLE_STOP)) {
+		throttler_set_level(ce_thr->throttler, 0);
+
+		return NOTIFY_OK;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static int cros_ec_throttler_probe(struct platform_device *pdev)
+{
+	struct cros_ec_throttler *ce_thr;
+	struct device *dev = &pdev->dev;
+	int ret;
+
+	ce_thr = devm_kzalloc(dev, sizeof(*ce_thr), GFP_KERNEL);
+	if (!ce_thr)
+		return -ENOMEM;
+
+	ce_thr->ec = dev_get_drvdata(pdev->dev.parent);
+
+	/*
+	 * The core code uses the DT node of the throttler to identify its
+	 * throttling devices and rates. The CrOS EC throttler is a sub-device
+	 * of the CrOS EC MFD device and doesn't have its own device node. Use
+	 * the node of the MFD device instead.
+	 */
+	dev->of_node = dev->parent->of_node;
+
+	ce_thr->throttler = throttler_setup(dev);
+	if (IS_ERR(ce_thr->throttler))
+		return PTR_ERR(ce_thr->throttler);
+
+	dev_set_drvdata(dev, ce_thr);
+
+	ce_thr->nb.notifier_call = cros_ec_throttler_event;
+	ret = blocking_notifier_chain_register(&ce_thr->ec->event_notifier,
+					       &ce_thr->nb);
+	if (ret < 0) {
+		dev_err(dev, "failed to register notifier\n");
+		throttler_teardown(ce_thr->throttler);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int cros_ec_throttler_remove(struct platform_device *pdev)
+{
+	struct cros_ec_throttler *ce_thr = platform_get_drvdata(pdev);
+
+	blocking_notifier_chain_unregister(&ce_thr->ec->event_notifier,
+					   &ce_thr->nb);
+
+	throttler_teardown(ce_thr->throttler);
+
+	return 0;
+}
+
+static struct platform_driver cros_ec_throttler_driver = {
+	.driver = {
+		.name = "cros-ec-throttler",
+	},
+	.probe		= cros_ec_throttler_probe,
+	.remove		= cros_ec_throttler_remove,
+};
+
+module_platform_driver(cros_ec_throttler_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Matthias Kaehlcke <mka@chromium.org>");
+MODULE_DESCRIPTION("Chrome OS EC Throttler");
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device
  2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
                   ` (10 preceding siblings ...)
  2018-06-14 19:47 ` [PATCH v3 11/12] misc: throttler: Add Chrome OS EC throttler Matthias Kaehlcke
@ 2018-06-14 19:47 ` Matthias Kaehlcke
  2018-06-18 23:21   ` Brian Norris
  11 siblings, 1 reply; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-14 19:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Kyungmin Park, Chanwoo Choi, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Mark Rutland, linux-pm, devicetree, linux-kernel,
	Brian Norris, Douglas Anderson, Enric Balletbo i Serra,
	Rafael J . Wysocki, Viresh Kumar, Lee Jones, Matthias Kaehlcke

Instantiate the CrOS EC throttler if it is enabled in the kernel
configuration.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v3:
- patch added to series

 drivers/mfd/cros_ec.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
index 36156a41499c..5f52b6e2c21a 100644
--- a/drivers/mfd/cros_ec.c
+++ b/drivers/mfd/cros_ec.c
@@ -91,6 +91,10 @@ static int cros_ec_sleep_event(struct cros_ec_device *ec_dev, u8 sleep_event)
 	return cros_ec_cmd_xfer(ec_dev, &buf.msg);
 }
 
+static const struct mfd_cell ec_throttler_cell = {
+	.name = "cros-ec-throttler",
+};
+
 int cros_ec_register(struct cros_ec_device *ec_dev)
 {
 	struct device *dev = ec_dev->dev;
@@ -153,6 +157,18 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
 		}
 	}
 
+	if (IS_ENABLED(CONFIG_CROS_EC_THROTTLER)) {
+		err = mfd_add_devices(ec_dev->dev, PLATFORM_DEVID_AUTO,
+				      &ec_throttler_cell, 1, NULL, ec_dev->irq,
+				      NULL);
+		if (err) {
+			dev_err(dev,
+				"Failed to register throttler subdevice %d\n",
+				err);
+			return err;
+		}
+	}
+
 	if (IS_ENABLED(CONFIG_OF) && dev->of_node) {
 		err = devm_of_platform_populate(dev);
 		if (err) {
-- 
2.18.0.rc1.242.g61856ae69a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa
  2018-06-14 19:47 ` [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa Matthias Kaehlcke
@ 2018-06-15 21:06   ` Brian Norris
  2018-06-15 21:25     ` Matthias Kaehlcke
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Norris @ 2018-06-15 21:06 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones

Hi,

On Thu, Jun 14, 2018 at 12:47:01PM -0700, Matthias Kaehlcke wrote:
> Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding
> the devfreq device") introduced the initialization of the user
> limits min/max_freq from the lowest/highest available OPPs. Later
> commit f1d981eaecf8 ("PM / devfreq: Use the available min/max
> frequency") added scaling_min/max_freq, which actually represent
> the frequencies of the lowest/highest available OPP. scaling_min/
> max_freq are initialized with the values from min/max_freq, which
> is totally correct in the context, but a bit awkward to read.
> 
> Swap the initialization and assign scaling_min/max_freq with the
> OPP freqs and then the user limts min/max_freq with scaling_min/
> max_freq.
> 
> Needless to say that this change is a NOP, intended to improve
> readability.
> ---

BTW, putting the '---' here means that stuff below it usually gets
dropped when applied (e.g., with git-am). So it'll drop your
Signed-off-by and Reviewed-by. Not a huge problem if the maintainers
look out for that.

> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>

Reviewed-by: Brian Norris <briannorris@chromium.org>

> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - added 'Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>' tag
> 
>  drivers/devfreq/devfreq.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index fe2af6aa88fc..0057ef5b0a98 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -604,21 +604,21 @@ struct devfreq *devfreq_add_device(struct device *dev,
>  		mutex_lock(&devfreq->lock);
>  	}
>  
> -	devfreq->min_freq = find_available_min_freq(devfreq);
> -	if (!devfreq->min_freq) {
> +	devfreq->scaling_min_freq = find_available_min_freq(devfreq);
> +	if (!devfreq->scaling_min_freq) {
>  		mutex_unlock(&devfreq->lock);
>  		err = -EINVAL;
>  		goto err_dev;
>  	}
> -	devfreq->scaling_min_freq = devfreq->min_freq;
> +	devfreq->min_freq = devfreq->scaling_min_freq;
>  
> -	devfreq->max_freq = find_available_max_freq(devfreq);
> -	if (!devfreq->max_freq) {
> +	devfreq->scaling_max_freq = find_available_max_freq(devfreq);
> +	if (!devfreq->scaling_max_freq) {
>  		mutex_unlock(&devfreq->lock);
>  		err = -EINVAL;
>  		goto err_dev;
>  	}
> -	devfreq->scaling_max_freq = devfreq->max_freq;
> +	devfreq->max_freq = devfreq->scaling_max_freq;
>  
>  	dev_set_name(&devfreq->dev, "devfreq%d",
>  				atomic_inc_return(&devfreq_no));
> -- 
> 2.18.0.rc1.242.g61856ae69a-goog
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa
  2018-06-15 21:06   ` Brian Norris
@ 2018-06-15 21:25     ` Matthias Kaehlcke
  0 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-15 21:25 UTC (permalink / raw)
  To: Brian Norris
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones

On Fri, Jun 15, 2018 at 02:06:21PM -0700, Brian Norris wrote:
> Hi,
> 
> On Thu, Jun 14, 2018 at 12:47:01PM -0700, Matthias Kaehlcke wrote:
> > Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding
> > the devfreq device") introduced the initialization of the user
> > limits min/max_freq from the lowest/highest available OPPs. Later
> > commit f1d981eaecf8 ("PM / devfreq: Use the available min/max
> > frequency") added scaling_min/max_freq, which actually represent
> > the frequencies of the lowest/highest available OPP. scaling_min/
> > max_freq are initialized with the values from min/max_freq, which
> > is totally correct in the context, but a bit awkward to read.
> > 
> > Swap the initialization and assign scaling_min/max_freq with the
> > OPP freqs and then the user limts min/max_freq with scaling_min/
> > max_freq.
> > 
> > Needless to say that this change is a NOP, intended to improve
> > readability.
> > ---
> 
> BTW, putting the '---' here means that stuff below it usually gets
> dropped when applied (e.g., with git-am). So it'll drop your
> Signed-off-by and Reviewed-by. Not a huge problem if the maintainers
> look out for that.

It wasn't intended and is related with my current workflow: To keep
easily track of deltas in my tree and have the changelog ready when
sending the patches I currently keep the changelog in the commit
message and manually move it below '---' when doing 'git send-email'.

Seems I got it wrong in this case :(

> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
> 
> Reviewed-by: Brian Norris <briannorris@chromium.org>

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 02/12] PM / devfreq: Fix handling of min/max_freq == 0
  2018-06-14 19:47 ` [PATCH v3 02/12] PM / devfreq: Fix handling of min/max_freq == 0 Matthias Kaehlcke
@ 2018-06-18 20:12   ` Brian Norris
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Norris @ 2018-06-18 20:12 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones

Hi,

On Thu, Jun 14, 2018 at 12:47:02PM -0700, Matthias Kaehlcke wrote:
> Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
> devfreq device") initializes df->min/max_freq with the min/max OPP when
> the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
> available min/max frequency") adds df->scaling_min/max_freq and the
> following to the frequency adjustment code:
> 
>   max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
> 
> With the current handling of min/max_freq this is incorrect:
> 
> Even though df->max_freq is now initialized to a value != 0 user space
> can still set it to 0, in this case max_freq would be 0 instead of
> df->scaling_max_freq as intended. In consequence the frequency adjustment
> is not performed:
> 
>   if (max_freq && freq > max_freq) {
> 	freq = max_freq;
> 
> To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
> when the user passes a value of 0. This also prevents df->max_freq from
> being set below the min OPP when df->min_freq is 0, and similar for
> min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
> checks for this case can be removed.
> 
> Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>

FWIW, I've reviewed patches 1-9 (the core devfreq bugfixes and features,
and a small cpufreq helper fixup), and they seem pretty clear and good
to me, aside from a minor error in the current subject
(s/devfreg/devfreq/). So:

Reviewed-by: Brian Norris <briannorris@chromium.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 10/12] misc: throttler: Add core support for non-thermal throttling
  2018-06-14 19:47 ` [PATCH v3 10/12] misc: throttler: Add core support for non-thermal throttling Matthias Kaehlcke
@ 2018-06-18 23:03   ` Brian Norris
  2018-06-18 23:59     ` Matthias Kaehlcke
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Norris @ 2018-06-18 23:03 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones

Hi Matthias,

On Thu, Jun 14, 2018 at 12:47:10PM -0700, Matthias Kaehlcke wrote:
> The purpose of the throttler is to provide support for non-thermal
> throttling. Throttling is triggered by external event, e.g. the
> detection of a high battery discharge current, close to the OCP limit
> of the battery. The throttler is only in charge of the throttling, not
> the monitoring, which is done by another (possibly platform specific)
> driver.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>

I have a few more comments.

> ---
> Changes in v3:
> - Kconfig: don't select CPU_FREQ and PM_DEVFREQ
> - added CONFIG_THROTTLER_DEBUG option
> - changed 'level' sysfs attribute to debugfs
> - introduced thr_<level> macros for logging
> - added debug logs
> - added field clamp_freq to struct cpufreq_thrdev and devfreq_thrdev
>   to keep track of the current frequency limits and avoid spammy logs
> 
> Changes in v2:
> - completely reworked the driver to support configuration through OPPs, which
>   requires a more dynamic handling
> - added sysfs attribute to set the level for debugging and testing
> - Makefile: depend on Kconfig option to traverse throttler directory
> - Kconfig: removed 'default n'
> - added SPDX line instead of license boiler-plate
> - added entry to MAINTAINERS file
> 
>  MAINTAINERS                     |   7 +
>  drivers/misc/Kconfig            |   1 +
>  drivers/misc/Makefile           |   1 +
>  drivers/misc/throttler/Kconfig  |  23 ++
>  drivers/misc/throttler/Makefile |   1 +
>  drivers/misc/throttler/core.c   | 687 ++++++++++++++++++++++++++++++++
>  include/linux/throttler.h       |  21 +
>  7 files changed, 741 insertions(+)
>  create mode 100644 drivers/misc/throttler/Kconfig
>  create mode 100644 drivers/misc/throttler/Makefile
>  create mode 100644 drivers/misc/throttler/core.c
>  create mode 100644 include/linux/throttler.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dc241b04d1bd..db359af7cb1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14090,6 +14090,13 @@ T:	git git://linuxtv.org/mhadli/v4l-dvb-davinci_devices.git
>  S:	Maintained
>  F:	drivers/media/platform/am437x/
>  
> +THROTTLER DRIVERS
> +M:	Matthias Kaehlcke <mka@chromium.org>
> +L:	linux-pm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/misc/throttler/
> +F:	include/linux/throttler.h
> +
>  TI BANDGAP AND THERMAL DRIVER
>  M:	Eduardo Valentin <edubezval@gmail.com>
>  M:	Keerthy <j-keerthy@ti.com>
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 3726eacdf65d..717fa3bd0e09 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -527,4 +527,5 @@ source "drivers/misc/echo/Kconfig"
>  source "drivers/misc/cxl/Kconfig"
>  source "drivers/misc/ocxl/Kconfig"
>  source "drivers/misc/cardreader/Kconfig"
> +source "drivers/misc/throttler/Kconfig"
>  endmenu
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index af22bbc3d00c..0f4ecc6a7532 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -58,3 +58,4 @@ obj-$(CONFIG_ASPEED_LPC_SNOOP)	+= aspeed-lpc-snoop.o
>  obj-$(CONFIG_PCI_ENDPOINT_TEST)	+= pci_endpoint_test.o
>  obj-$(CONFIG_OCXL)		+= ocxl/
>  obj-$(CONFIG_MISC_RTSX)		+= cardreader/
> +obj-$(CONFIG_THROTTLER)		+= throttler/
> diff --git a/drivers/misc/throttler/Kconfig b/drivers/misc/throttler/Kconfig
> new file mode 100644
> index 000000000000..8b2e63b2ef48
> --- /dev/null
> +++ b/drivers/misc/throttler/Kconfig
> @@ -0,0 +1,23 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +menuconfig THROTTLER
> +	bool "Throttler support"
> +	depends on OF
> +	help
> +	  This option enables core support for non-thermal throttling of CPUs
> +	  and devfreq devices.
> +
> +	  Note that you also need a event monitor module usually called
> +	  *_throttler.
> +
> +if THROTTLER
> +
> +menuconfig THROTTLER_DEBUG
> +	bool "Enable throttler debugging"
> +	help
> +	  This option enables throttler debugging features like additional
> +	  logging and a debugfs attribute for setting the logging level.
> +
> +	  Choose N unless you want to debug throttler drivers.
> +
> +endif # THROTTLER
> diff --git a/drivers/misc/throttler/Makefile b/drivers/misc/throttler/Makefile
> new file mode 100644
> index 000000000000..c8d920cee315
> --- /dev/null
> +++ b/drivers/misc/throttler/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_THROTTLER)		+= core.o
> diff --git a/drivers/misc/throttler/core.c b/drivers/misc/throttler/core.c
> new file mode 100644
> index 000000000000..52350d846654
> --- /dev/null
> +++ b/drivers/misc/throttler/core.c
> @@ -0,0 +1,687 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Core code for non-thermal throttling
> + *
> + * Copyright (C) 2018 Google, Inc.
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/cpufreq.h>
> +#include <linux/cpumask.h>
> +#include <linux/debugfs.h>
> +#include <linux/devfreq.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/notifier.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_opp.h>
> +#include <linux/slab.h>
> +#include <linux/sort.h>
> +#include <linux/throttler.h>
> +
> +/*
> + * Non-thermal throttling: throttling of system components in response to
> + * external events (e.g. high battery discharge current).
> + *
> + * The throttler supports throttling through cpufreq and devfreq. Multiple
> + * levels of throttling can be configured. At level 0 no throttling is
> + * active on behalf of the throttler, for values > 0 throttling is typically
> + * configured to be increasingly aggressive with each level.
> + * The number of throttling levels is not limited by the throttler (though
> + * it is likely limited by the throttling devices). It is not necessary to
> + * configure the same number of levels for all throttling devices. If the
> + * requested throttling level for a device is higher than the maximum level
> + * of the device the throttler will select the maximum throttling level of
> + * the device.
> + *
> + * Non-thermal throttling is split in two parts:
> + *
> + * - throttler core
> + *   - parses the thermal policy
> + *   - applies throttling settings for a requested level of throttling
> + *
> + * - event monitor driver
> + *   - monitors events that trigger throttling
> + *   - determines the throttling level (often limited to on/off)
> + *   - asks throttler core to apply throttling settings
> + *
> + * It is possible for a system to have more than one throttler and the
> + * throttlers may make use of the same throttling devices, in case of
> + * conflicting settings for a device the more aggressive values will be
> + * applied.
> + *
> + */
> +
> +#define ci_to_throttler(ci) \
> +	container_of(ci, struct throttler, devfreq.class_iface)
> +
> +struct thr_freq_table {
> +	uint32_t *freqs;
> +	int n_entries;
> +};
> +
> +struct cpufreq_thrdev {
> +	uint32_t cpu;
> +	struct thr_freq_table freq_table;
> +	uint32_t clamp_freq;
> +	struct list_head node;
> +};
> +
> +struct devfreq_thrdev {
> +	struct devfreq *devfreq;
> +	struct thr_freq_table freq_table;
> +	uint32_t clamp_freq;
> +	struct throttler *thr;
> +	struct notifier_block nb;
> +	struct list_head node;
> +};
> +
> +struct __thr_cpufreq {
> +	struct list_head list;
> +	cpumask_t cm_initialized;
> +	cpumask_t cm_ignore;
> +	struct notifier_block nb;
> +};
> +
> +struct __thr_devfreq {
> +	struct list_head list;
> +	struct class_interface class_iface;
> +};
> +
> +struct __thr_debugfs {
> +	struct dentry *dir;
> +	struct dentry *attr_level;
> +};
> +
> +struct throttler {
> +	struct device *dev;
> +	int level;
> +	struct __thr_cpufreq cpufreq;
> +	struct __thr_devfreq devfreq;
> +	struct mutex lock;
> +#ifdef CONFIG_THROTTLER_DEBUG
> +	struct __thr_debugfs debugfs;
> +#endif
> +};
> +
> +static inline int cmp_freqs(const void *a, const void *b)
> +{
> +	const uint32_t *pa = a, *pb = b;
> +
> +	if (*pa < *pb)
> +		return 1;
> +	else if (*pa > *pb)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int thr_handle_devfreq_event(struct notifier_block *nb,
> +				    unsigned long event, void *data);
> +
> +static unsigned long thr_get_throttling_freq(struct thr_freq_table *ft,
> +					     int level)
> +{
> +	if (level == 0) {
> +		WARN(true, "level == 0");
> +		return ULONG_MAX;
> +	}
> +
> +	if (level <= ft->n_entries)
> +		return ft->freqs[level - 1];
> +	else
> +		return ft->freqs[ft->n_entries - 1];
> +}
> +
> +static int thr_init_freq_table(struct throttler *thr, struct device *opp_dev,
> +			       struct thr_freq_table *ft)
> +{
> +	struct device_node *np_opp_desc, *np_opp;
> +	int nchilds;
> +	uint32_t *freqs;
> +	int nfreqs = 0;
> +	int err = 0;
> +
> +	np_opp_desc = dev_pm_opp_of_get_opp_desc_node(opp_dev);
> +	if (!np_opp_desc)
> +		return -EINVAL;
> +
> +	nchilds = of_get_child_count(np_opp_desc);
> +	if (!nchilds) {
> +		err = -EINVAL;
> +		goto out_node_put;
> +	}
> +
> +	freqs = kzalloc(nchilds * sizeof(uint32_t), GFP_KERNEL);
> +	if (!freqs) {
> +		err = -ENOMEM;
> +		goto out_node_put;
> +	}
> +
> +	/* determine which OPPs are used by this throttler (if any) */
> +	for_each_child_of_node(np_opp_desc, np_opp) {
> +		int num_thr;
> +		int i;
> +
> +		num_thr = of_property_count_u32_elems(np_opp, "opp-throttlers");
> +		if (num_thr < 0)
> +			continue;
> +
> +		for (i = 0; i < num_thr; i++) {
> +			struct device_node *np_thr;
> +
> +			np_thr = of_parse_phandle(np_opp, "opp-throttlers", i);
> +			if (!np_thr) {
> +				thr_err(thr,
> +					"failed to parse phandle %d: %s\n", i,
> +					np_opp->full_name);
> +				continue;
> +			}
> +
> +			if (thr->dev->of_node == np_thr) {
> +				u64 rate;
> +
> +				err = of_property_read_u64(np_opp, "opp-hz",
> +							   &rate);
> +				if (!err) {
> +					freqs[nfreqs] = rate;
> +					nfreqs++;
> +
> +					thr_dbg(thr,
> +						"OPP %s (%llu MHz) is used for throttling\n",
> +						np_opp->full_name,
> +						rate / 1000000);
> +
> +				} else {
> +					thr_err(thr, "opp-hz not found: %s\n",
> +						np_opp->full_name);
> +				}
> +			}
> +
> +			of_node_put(np_thr);
> +		}
> +	}
> +
> +	if (nfreqs > 0) {
> +		/* sort frequencies in descending order */
> +		sort(freqs, nfreqs, sizeof(*freqs), cmp_freqs, NULL);
> +
> +		ft->n_entries = nfreqs;
> +		ft->freqs = devm_kzalloc(thr->dev,
> +				  nfreqs * sizeof(*freqs), GFP_KERNEL);
> +		if (!ft->freqs) {
> +			err = -ENOMEM;
> +			goto out_free;
> +		}
> +
> +		memcpy(ft->freqs, freqs, nfreqs * sizeof(*freqs));
> +	} else {
> +		err = -ENODEV;
> +	}
> +
> +out_free:
> +	kfree(freqs);
> +
> +out_node_put:
> +	of_node_put(np_opp_desc);
> +
> +	return err;
> +}
> +
> +static void thr_cpufreq_init(struct throttler *thr, int cpu)
> +{
> +	struct device *cpu_dev;
> +	struct thr_freq_table ft;
> +	struct cpufreq_thrdev *cpufreq_dev;
> +	int err;
> +
> +	WARN_ON(!mutex_is_locked(&thr->lock));
> +
> +	cpu_dev = get_cpu_device(cpu);
> +	if (!cpu_dev) {
> +		dev_err_ratelimited(thr->dev, "failed to get CPU %d\n", cpu);
> +		return;
> +	}
> +
> +	err = thr_init_freq_table(thr, cpu_dev, &ft);
> +	if (err) {
> +		/* CPU is not throttled or initialization failed */
> +		if (err != -ENODEV)
> +			thr_err(thr, "failed to initialize CPU %d: %d", cpu,
> +				err);
> +
> +		cpumask_set_cpu(cpu, &thr->cpufreq.cm_ignore);
> +		return;
> +	}
> +
> +	cpufreq_dev = devm_kzalloc(thr->dev, sizeof(*cpufreq_dev), GFP_KERNEL);
> +	if (!cpufreq_dev) {
> +		thr_err(thr, "%s: out of memory\n", __func__);
> +		return;
> +	}
> +
> +	cpufreq_dev->cpu = cpu;
> +	memcpy(&cpufreq_dev->freq_table, &ft, sizeof(ft));
> +	list_add_tail(&cpufreq_dev->node, &thr->cpufreq.list);
> +
> +	cpumask_set_cpu(cpu, &thr->cpufreq.cm_initialized);
> +}
> +
> +static void thr_devfreq_init(struct device *dev, void *data)
> +{
> +	struct throttler *thr = data;
> +	struct thr_freq_table ft;
> +	struct devfreq_thrdev *dftd;
> +	int err;
> +
> +	WARN_ON(!mutex_is_locked(&thr->lock));
> +
> +	err = thr_init_freq_table(thr, dev->parent, &ft);
> +	if (err) {
> +		if (err == -ENODEV)
> +			return;
> +
> +		thr_err(thr, "failed to init frequency table of device %s: %d",
> +			dev_name(dev), err);
> +		return;
> +	}
> +
> +	dftd = devm_kzalloc(thr->dev, sizeof(*dftd), GFP_KERNEL);
> +	if (!dftd) {
> +		thr_err(thr, "%s: out of memory\n", __func__);

I think it's considered bad form to roll your own OOM messages. It's
assumed the memory manager will complain loudly enough for you already.

> +		return;
> +	}
> +
> +	dftd->thr = thr;
> +	dftd->devfreq = container_of(dev, struct devfreq, dev);
> +	memcpy(&dftd->freq_table, &ft, sizeof(ft));
> +
> +	dftd->nb.notifier_call = thr_handle_devfreq_event;
> +	err = devm_devfreq_register_notifier(thr->dev, dftd->devfreq,
> +				     &dftd->nb, DEVFREQ_POLICY_NOTIFIER);
> +	if (err < 0) {
> +		thr_err(thr, "failed to register devfreq notifier\n");
> +		devm_kfree(thr->dev, dftd);
> +		return;
> +	}
> +
> +	list_add_tail(&dftd->node, &thr->devfreq.list);
> +
> +	thr_dbg(thr, "device '%s' is used for throttling\n",
> +		dev_name(dev));
> +}
> +
> +static int thr_handle_cpufreq_event(struct notifier_block *nb,
> +				unsigned long event, void *data)
> +{
> +	struct throttler *thr =
> +		container_of(nb, struct throttler, cpufreq.nb);
> +	struct cpufreq_policy *policy = data;
> +	struct cpufreq_thrdev *cftd;
> +
> +	if (event != CPUFREQ_ADJUST)
> +		return 0;
> +
> +	mutex_lock(&thr->lock);
> +
> +	if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
> +		goto out;
> +
> +	if (!cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_initialized)) {
> +		thr_cpufreq_init(thr, policy->cpu);
> +
> +		if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
> +			goto out;
> +
> +		thr_dbg(thr, "CPU%d is used for throttling\n", policy->cpu);
> +	}
> +
> +	/*
> +	 * Can't do this check earlier, otherwise we might miss CPU policies
> +	 * that are added after setup().
> +	 */
> +	if (thr->level == 0) {
> +		list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> +			if (cftd->cpu != policy->cpu)
> +				continue;
> +
> +			if (cftd->clamp_freq != 0) {
> +				thr_dbg(thr, "unthrottling CPU%d\n", cftd->cpu);
> +				cftd->clamp_freq = 0;
> +			}
> +		}
> +
> +		goto out;
> +	}
> +
> +	list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> +		unsigned long clamp_freq;
> +
> +		if (cftd->cpu != policy->cpu)
> +			continue;
> +
> +		clamp_freq = thr_get_throttling_freq(&cftd->freq_table,
> +						     thr->level) / 1000;
> +		if (cftd->clamp_freq != clamp_freq) {
> +			thr_dbg(thr, "throttling CPU%d to %lu MHz\n", cftd->cpu,
> +				clamp_freq / 1000);
> +			cftd->clamp_freq = clamp_freq;
> +		}
> +
> +		if (clamp_freq < policy->max)
> +			cpufreq_verify_within_limits(policy, 0, clamp_freq);
> +	}
> +
> +out:
> +	mutex_unlock(&thr->lock);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +/*
> + * Notifier called by devfreq. Can't acquire thr->lock since it might
> + * already be held by throttler_set_level(). It isn't necessary to
> + * acquire the lock for the following reasons:
> + *
> + * Only the devfreq_thrdev and thr->level are accessed in this function.
> + * The devfreq device won't go away (or change) during the execution of
> + * this function, since we are called from the devfreq core. Theoretically
> + * thr->level could change and we'd apply an outdated setting, however in
> + * this case the function would run again shortly after and apply the
> + * correct value.
> + */
> +static int thr_handle_devfreq_event(struct notifier_block *nb,
> +				    unsigned long event, void *data)
> +{
> +	struct devfreq_thrdev *dftd =
> +		container_of(nb, struct devfreq_thrdev, nb);
> +	struct throttler *thr = dftd->thr;
> +	struct devfreq_policy *policy = data;
> +	unsigned long clamp_freq;
> +
> +	if (event != DEVFREQ_ADJUST)
> +		return NOTIFY_DONE;
> +
> +	if (thr->level == 0) {
> +		if (dftd->clamp_freq != 0) {
> +			thr_dbg(thr, "unthrottling '%s'\n",
> +				dev_name(&dftd->devfreq->dev));
> +			dftd->clamp_freq = 0;
> +		}
> +
> +		return NOTIFY_DONE;
> +	}
> +
> +	clamp_freq = thr_get_throttling_freq(&dftd->freq_table, thr->level);
> +	if (clamp_freq != dftd->clamp_freq) {
> +		thr_dbg(thr, "throttling '%s' to %lu MHz\n",
> +			dev_name(&dftd->devfreq->dev), clamp_freq / 1000000);
> +		dftd->clamp_freq = clamp_freq;
> +	}
> +
> +	if (clamp_freq < policy->max)
> +		devfreq_verify_within_limits(policy, 0, clamp_freq);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static void thr_cpufreq_update_policy(struct throttler *thr)
> +{
> +	struct cpufreq_thrdev *cftd;
> +
> +	WARN_ON(!mutex_is_locked(&thr->lock));
> +
> +	list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> +		struct cpufreq_policy *policy = cpufreq_cpu_get(cftd->cpu);
> +
> +		if (!policy) {
> +			thr_warn(thr, "CPU%d does have no cpufreq policy!\n",

s/does have/has/

> +				 cftd->cpu);
> +			continue;
> +		}
> +
> +		/*
> +		 * The lock isn't really needed in this function, the list
> +		 * of cpufreq devices can be extended, but no items are
> +		 * deleted during the lifetime of the throttler. Releasing
> +		 * the lock is necessary since cpufreq_update_policy() ends
> +		 * up calling thr_handle_cpufreq_event(), which needs to
> +		 * acquire the lock.
> +		 */
> +		mutex_unlock(&thr->lock);
> +		cpufreq_update_policy(cftd->cpu);
> +		mutex_lock(&thr->lock);
> +
> +		cpufreq_cpu_put(policy);
> +	}
> +}
> +
> +static int thr_handle_devfreq_added(struct device *dev,
> +				    struct class_interface *ci)
> +{
> +	struct throttler *thr = ci_to_throttler(ci);
> +
> +	mutex_lock(&thr->lock);
> +	thr_devfreq_init(dev, thr);
> +	mutex_unlock(&thr->lock);
> +
> +	return 0;
> +}
> +
> +static void thr_handle_devfreq_removed(struct device *dev,
> +				       struct class_interface *ci)
> +{
> +	struct devfreq_thrdev *dftd;
> +	struct throttler *thr = ci_to_throttler(ci);
> +
> +	mutex_lock(&thr->lock);
> +
> +	list_for_each_entry(dftd, &thr->devfreq.list, node) {
> +		if (dev == &dftd->devfreq->dev) {
> +			list_del(&dftd->node);
> +			devm_kfree(thr->dev, dftd->freq_table.freqs);
> +			devm_kfree(thr->dev, dftd);
> +			break;
> +		}
> +	}
> +
> +	mutex_unlock(&thr->lock);
> +}
> +
> +void throttler_set_level(struct throttler *thr, int level)
> +{
> +	struct devfreq_thrdev *dftd;

This driver doesn't really handle negative levels very well (it might
even read garbage memory?). You might either make the whole driver use
unsigned values (and parse with an unsigned kstrtoX helper below), or
else just reject negative values in this function.

> +
> +	if (level == thr->level)

It seems like this should be inside the lock.

> +		return;
> +
> +	mutex_lock(&thr->lock);
> +
> +	thr_dbg(thr, "throttling level: %d\n", level);
> +	thr->level = level;
> +
> +	if (!list_empty(&thr->cpufreq.list))
> +		thr_cpufreq_update_policy(thr);
> +
> +	list_for_each_entry(dftd, &thr->devfreq.list, node) {
> +		mutex_lock(&dftd->devfreq->lock);
> +		update_devfreq(dftd->devfreq);
> +		mutex_unlock(&dftd->devfreq->lock);
> +	}
> +
> +	mutex_unlock(&thr->lock);
> +}
> +EXPORT_SYMBOL_GPL(throttler_set_level);
> +
> +#ifdef CONFIG_THROTTLER_DEBUG
> +
> +static ssize_t thr_level_read(struct file *file, char __user *user_buf,
> +			      size_t count, loff_t *ppos)
> +{
> +	struct throttler *thr = file->f_inode->i_private;
> +	char buf[5];
> +	int len;
> +
> +	len = scnprintf(buf, sizeof(buf), "%d\n", thr->level);

Hold the throttler mutex around this read?

> +
> +	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
> +}
> +
> +static ssize_t thr_level_write(struct file *file,
> +				 const char __user *user_buf,
> +				 size_t count, loff_t *ppos)
> +{
> +	int rc;
> +	int level;
> +	struct throttler *thr = file->f_inode->i_private;
> +
> +	rc = kstrtoint_from_user(user_buf, count, 10, &level);
> +	if (rc)
> +		return rc;
> +
> +	throttler_set_level(thr, level);
> +
> +	return count;
> +}
> +
> +static const struct file_operations level_debugfs_ops = {
> +	.owner = THIS_MODULE,
> +	.read = thr_level_read,
> +	.write = thr_level_write,
> +};
> +#endif
> +
> +struct throttler *throttler_setup(struct device *dev)
> +{
> +	struct throttler *thr;
> +	struct device_node *np = dev->of_node;
> +	struct class_interface *ci;
> +	int cpu;
> +	int err;
> +
> +	if (!np)
> +		/* should never happen */
> +		return ERR_PTR(-EINVAL);
> +
> +	thr = devm_kzalloc(dev, sizeof(*thr), GFP_KERNEL);
> +	if (!thr)
> +		return ERR_PTR(-ENOMEM);
> +
> +	mutex_init(&thr->lock);
> +	thr->dev = dev;
> +
> +	cpumask_clear(&thr->cpufreq.cm_ignore);
> +	cpumask_clear(&thr->cpufreq.cm_initialized);
> +
> +	INIT_LIST_HEAD(&thr->cpufreq.list);
> +	INIT_LIST_HEAD(&thr->devfreq.list);
> +
> +	thr->cpufreq.nb.notifier_call = thr_handle_cpufreq_event;
> +	err = cpufreq_register_notifier(&thr->cpufreq.nb,
> +					CPUFREQ_POLICY_NOTIFIER);
> +	if (err < 0) {
> +		thr_err(thr, "failed to register cpufreq notifier\n");
> +		return ERR_PTR(err);
> +	}
> +
> +	/*
> +	 * The CPU throttling configuration is parsed at runtime, when the
> +	 * cpufreq policy notifier is called for a CPU that hasn't been
> +	 * initialized yet.
> +	 *
> +	 * This is done for two reasons:
> +	 * -  when the throttler is probed the CPU might not yet have a policy
> +	 * -  CPUs that were offline at probe time might be hotplugged
> +	 *
> +	 * The notifier is called then the policy is added/set
> +	 */
> +	for_each_online_cpu(cpu) {
> +		struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> +
> +		if (!policy)
> +			continue;
> +
> +		cpufreq_update_policy(cpu);
> +		cpufreq_cpu_put(policy);
> +	}
> +
> +	/*
> +	 * devfreq devices can be added and removed at runtime, hence they
> +	 * must also be handled dynamically. The class_interface notifies us
> +	 * whenever a device is added or removed. When the interface is
> +	 * registered ci->add_dev() is called for all existing devfreq
> +	 * devices.
> +	 */
> +	ci = &thr->devfreq.class_iface;
> +	ci->class = devfreq_class;
> +	ci->add_dev = thr_handle_devfreq_added;
> +	ci->remove_dev = thr_handle_devfreq_removed;
> +
> +	err = class_interface_register(ci);
> +	if (err) {
> +		thr_err(thr, "failed to register devfreq class interface: %d\n",
> +			err);
> +		cpufreq_unregister_notifier(&thr->cpufreq.nb,
> +					    CPUFREQ_POLICY_NOTIFIER);
> +		return ERR_PTR(err);
> +	}
> +
> +#ifdef CONFIG_THROTTLER_DEBUG
> +	thr->debugfs.dir = debugfs_create_dir(dev_name(thr->dev), NULL);

Remove this dir in throttler_teardown()?

> +	if (IS_ERR(thr->debugfs.dir)) {
> +		thr_warn(thr, "failed to create debugfs directory: %ld\n",
> +			 PTR_ERR(thr->debugfs.dir));
> +		thr->debugfs.dir = NULL;
> +		goto skip_debugfs;
> +	}
> +
> +	thr->debugfs.attr_level = debugfs_create_file("level", 0644,
> +						      thr->debugfs.dir, thr,
> +						      &level_debugfs_ops);
> +	if (IS_ERR(thr->debugfs.dir)) {
> +		thr_warn(thr, "failed to create debugfs attribute: %ld\n",
> +			 PTR_ERR(thr->debugfs.attr_level));
> +		debugfs_remove(thr->debugfs.dir);
> +		thr->debugfs.dir = NULL;
> +	}
> +
> +skip_debugfs:
> +#endif
> +
> +	return thr;
> +}
> +EXPORT_SYMBOL_GPL(throttler_setup);
> +
> +void throttler_teardown(struct throttler *thr)
> +{
> +	struct devfreq_thrdev *dftd;
> +	int level;
> +
> +	mutex_lock(&thr->lock);
> +
> +	level = thr->level;
> +	thr->level = 0;
> +
> +	class_interface_unregister(&thr->devfreq.class_iface);

This can deadlock. You're holding the throttler mutex and then this
grabs the class mutex; but add/remove notifications will be holding the
class mutex while making calls that grab the throttler mutex. IOW, you
have ABBA (not the band).

Also, if there are any active devfreq devices attached...you definitely
deadlock (simple AA), since we directly call ->remove_dev() on them
here. See this locked-up task:

[ 4440.118203] [<ffffffc0002158a0>] __switch_to+0x90/0x9c
[ 4440.118221] [<ffffffc000954f40>] __schedule+0x3cc/0x860
[ 4440.118232] [<ffffffc000954b14>] schedule+0x4c/0xac
[ 4440.118243] [<ffffffc0009553f8>] schedule_preempt_disabled+0x24/0x40
[ 4440.118255] [<ffffffc000956ac8>] __mutex_lock_common+0x194/0x3b0
[ 4440.118267] [<ffffffc000956348>] __mutex_lock_slowpath+0x38/0x44
[ 4440.118278] [<ffffffc00095630c>] mutex_lock+0x6c/0x70
[ 4440.118293] [<ffffffc00062c444>] thr_handle_devfreq_removed+0x2c/0xa0
[ 4440.118307] [<ffffffc000606dbc>] class_interface_unregister+0x74/0xc4
[ 4440.118318] [<ffffffc00062c4ec>] throttler_teardown+0x34/0xac
[ 4440.118328] [<ffffffc00062cb60>] cros_ec_throttler_remove+0x30/0x40
[ 4440.118341] [<ffffffc000607a60>] platform_drv_remove+0x28/0x50
[ 4440.118355] [<ffffffc000605974>] device_release_driver_internal+0x120/0x1b0
[ 4440.118367] [<ffffffc000605a28>] device_release_driver+0x24/0x30
[ 4440.118380] [<ffffffc000604b50>] unbind_store+0x6c/0xa4
[ 4440.118392] [<ffffffc000604a4c>] drv_attr_store+0x3c/0x54
[ 4440.118409] [<ffffffc0003bd4e0>] sysfs_kf_write+0x50/0x68
[ 4440.118424] [<ffffffc00044233c>] kernfs_fop_write+0xdc/0x188
[ 4440.118436] [<ffffffc00042762c>] __vfs_write+0xfc/0x10c
[ 4440.118446] [<ffffffc000427950>] SyS_write+0xf0/0x278
[ 4440.118460] [<ffffffc000203e44>] el0_svc_naked+0x34/0x38

I got there with this:

  echo cros-ec-throttler.1.auto > /sys/bus/platform/drivers/cros-ec-throttler/unbind

> +
> +	if (level) {
> +		/* unthrottle CPUs */
> +		if (!list_empty(&thr->cpufreq.list))

You don't technically need the list_empty() check, since you do
list_for_each_entry() within thr_cpufreq_update_policy().

> +			thr_cpufreq_update_policy(thr);
> +
> +		/* unthrottle devfreq devices */
> +		list_for_each_entry(dftd, &thr->devfreq.list, node) {
> +			mutex_lock(&dftd->devfreq->lock);
> +			update_devfreq(dftd->devfreq);
> +			mutex_unlock(&dftd->devfreq->lock);
> +		}

I wonder if the 'update' step deserves its own function, since the
cpufreq/devfreq updates are repeated in throttler_set_level().

> +	}
> +
> +	cpufreq_unregister_notifier(&thr->cpufreq.nb,
> +				    CPUFREQ_POLICY_NOTIFIER);

Is there a chance of deadlock here? This is a blocking unregistration
here, and we're holding the lock which a notifier call might be holding.
I think it's actually be OK to just do the unregistration outside the
lock?

Altogether, I think your unregistration needs to be something like:

 1) set the 'level' to a "none" value that can't be overwritten (e.g.,
 set_level() rejects it), under the threshold lock
 2) update cpufreq and devfreq, so the non-limits take hold
 3) release the threshold lock
 4) unregister the devfreq class and all notifiers, outside the
 threshold lock

Or maybe you come up with some other way that avoids all the above.

Brian

> +
> +	mutex_unlock(&thr->lock);
> +}
> +EXPORT_SYMBOL_GPL(throttler_teardown);
> diff --git a/include/linux/throttler.h b/include/linux/throttler.h
> new file mode 100644
> index 000000000000..a29d99f581da
> --- /dev/null
> +++ b/include/linux/throttler.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_THROTTLER_H__
> +#define __LINUX_THROTTLER_H__
> +
> +struct throttler;
> +
> +extern struct throttler *throttler_setup(struct device *dev);
> +extern void throttler_teardown(struct throttler *thr);
> +extern void throttler_set_level(struct throttler *thr, int level);
> +
> +#ifdef CONFIG_THROTTLER_DEBUG
> +#define thr_dbg(thr, fmt, ...) dev_info(thr->dev, fmt, ##__VA_ARGS__)
> +#else
> +#define thr_dbg(thr, fmt, ...) dev_dbg(thr->dev, fmt, ##__VA_ARGS__)
> +#endif
> +
> +#define thr_info(thr, fmt, ...) dev_info(thr->dev, fmt, ##__VA_ARGS__)
> +#define thr_warn(thr, fmt, ...) dev_warn(thr->dev, fmt, ##__VA_ARGS__)
> +#define thr_err(thr, fmt, ...) dev_warn(thr->dev, fmt, ##__VA_ARGS__)
> +
> +#endif /* __LINUX_THROTTLER_H__ */
> -- 
> 2.18.0.rc1.242.g61856ae69a-goog
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device
  2018-06-14 19:47 ` [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device Matthias Kaehlcke
@ 2018-06-18 23:21   ` Brian Norris
  2018-06-19  8:41     ` Enric Balletbo Serra
  2018-06-19 17:43     ` Matthias Kaehlcke
  0 siblings, 2 replies; 23+ messages in thread
From: Brian Norris @ 2018-06-18 23:21 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones

Hi,

On Thu, Jun 14, 2018 at 12:47:12PM -0700, Matthias Kaehlcke wrote:
> Instantiate the CrOS EC throttler if it is enabled in the kernel
> configuration.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> Changes in v3:
> - patch added to series
> 
>  drivers/mfd/cros_ec.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
> index 36156a41499c..5f52b6e2c21a 100644
> --- a/drivers/mfd/cros_ec.c
> +++ b/drivers/mfd/cros_ec.c
> @@ -91,6 +91,10 @@ static int cros_ec_sleep_event(struct cros_ec_device *ec_dev, u8 sleep_event)
>  	return cros_ec_cmd_xfer(ec_dev, &buf.msg);
>  }
>  
> +static const struct mfd_cell ec_throttler_cell = {
> +	.name = "cros-ec-throttler",
> +};
> +
>  int cros_ec_register(struct cros_ec_device *ec_dev)
>  {
>  	struct device *dev = ec_dev->dev;
> @@ -153,6 +157,18 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
>  		}
>  	}
>  
> +	if (IS_ENABLED(CONFIG_CROS_EC_THROTTLER)) {
> +		err = mfd_add_devices(ec_dev->dev, PLATFORM_DEVID_AUTO,
> +				      &ec_throttler_cell, 1, NULL, ec_dev->irq,
> +				      NULL);

Most of this similar sub-device registration seems to have moved to
cros_ec_dev.c, in ec_device_probe(). Should this do the same?

And on a similar note: is there a way to determine EC support for this
feature (e.g., EC_FEATURE_*)? Or do we just have to listen for
appropriate throttling events that may never come?

Also, is this very useful on non-DT systems? Does this fail gracefully?

Brian

> +		if (err) {
> +			dev_err(dev,
> +				"Failed to register throttler subdevice %d\n",
> +				err);
> +			return err;
> +		}
> +	}
> +
>  	if (IS_ENABLED(CONFIG_OF) && dev->of_node) {
>  		err = devm_of_platform_populate(dev);
>  		if (err) {
> -- 
> 2.18.0.rc1.242.g61856ae69a-goog
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 11/12] misc: throttler: Add Chrome OS EC throttler
  2018-06-14 19:47 ` [PATCH v3 11/12] misc: throttler: Add Chrome OS EC throttler Matthias Kaehlcke
@ 2018-06-18 23:34   ` Brian Norris
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Norris @ 2018-06-18 23:34 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones

Hi,

On Thu, Jun 14, 2018 at 12:47:11PM -0700, Matthias Kaehlcke wrote:
> The driver subscribes to throttling events from the Chrome OS
> embedded controller and enables/disables system throttling based
> on these events.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>

One suggestion, and otherwise:

Reviewed-by: Brian Norris <briannorris@chromium.org>

> ---
> Changes in v3:
> - change module license to GPL v2 as in the SPDX identifier
> - don't instantiate the throttler through the DT (instantiation
>   by CrOS EC MFD in a separate patch)
> 
> Changes in v2:
> - added SPDX line instead of license boiler-plate
> - use macro to avoid splitting line
> - changed variable name for throttler from 'cte' to 'ce_thr'
> - formatting fixes
> - Kconfig: removed odd dashes around 'help'
> - added 'Reviewed-by' tag
> 
>  drivers/misc/throttler/Kconfig             |  10 ++
>  drivers/misc/throttler/Makefile            |   1 +
>  drivers/misc/throttler/cros_ec_throttler.c | 109 +++++++++++++++++++++
>  3 files changed, 120 insertions(+)
>  create mode 100644 drivers/misc/throttler/cros_ec_throttler.c
> 
> diff --git a/drivers/misc/throttler/Kconfig b/drivers/misc/throttler/Kconfig
> index 8b2e63b2ef48..ef984c96f67b 100644
> --- a/drivers/misc/throttler/Kconfig
> +++ b/drivers/misc/throttler/Kconfig
> @@ -20,4 +20,14 @@ menuconfig THROTTLER_DEBUG
>  
>  	  Choose N unless you want to debug throttler drivers.
>  
> +config CROS_EC_THROTTLER
> +	tristate "Throttler event monitor for the Chrome OS Embedded Controller"
> +	depends on MFD_CROS_EC
> +	help
> +	  This driver adds support to throttle the system in reaction to
> +	  Chrome OS EC events.
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called cros_ec_throttler.
> +
>  endif # THROTTLER
> diff --git a/drivers/misc/throttler/Makefile b/drivers/misc/throttler/Makefile
> index c8d920cee315..d9b2a77dabc9 100644
> --- a/drivers/misc/throttler/Makefile
> +++ b/drivers/misc/throttler/Makefile
> @@ -1 +1,2 @@
>  obj-$(CONFIG_THROTTLER)		+= core.o
> +obj-$(CONFIG_CROS_EC_THROTTLER)	+= cros_ec_throttler.o
> diff --git a/drivers/misc/throttler/cros_ec_throttler.c b/drivers/misc/throttler/cros_ec_throttler.c
> new file mode 100644
> index 000000000000..f1e9fc0a6284
> --- /dev/null
> +++ b/drivers/misc/throttler/cros_ec_throttler.c
> @@ -0,0 +1,109 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Driver for throttling triggered by events from the Chrome OS Embedded
> + * Controller.
> + *
> + * Copyright (C) 2018 Google, Inc.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/mfd/cros_ec.h>
> +#include <linux/module.h>
> +#include <linux/notifier.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/throttler.h>
> +
> +#define nb_to_ce_thr(nb) container_of(nb, struct cros_ec_throttler, nb)
> +
> +struct cros_ec_throttler {
> +	struct cros_ec_device *ec;
> +	struct throttler *throttler;
> +	struct notifier_block nb;
> +};
> +
> +static int cros_ec_throttler_event(struct notifier_block *nb,
> +	unsigned long queued_during_suspend, void *_notify)
> +{
> +	struct cros_ec_throttler *ce_thr = nb_to_ce_thr(nb);
> +	u32 host_event;
> +
> +	host_event = cros_ec_get_host_event(ce_thr->ec);
> +	if (host_event & EC_HOST_EVENT_MASK(EC_HOST_EVENT_THROTTLE_START)) {
> +		throttler_set_level(ce_thr->throttler, 1);
> +
> +		return NOTIFY_OK;
> +	} else if (host_event &
> +		   EC_HOST_EVENT_MASK(EC_HOST_EVENT_THROTTLE_STOP)) {
> +		throttler_set_level(ce_thr->throttler, 0);
> +
> +		return NOTIFY_OK;
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static int cros_ec_throttler_probe(struct platform_device *pdev)
> +{
> +	struct cros_ec_throttler *ce_thr;
> +	struct device *dev = &pdev->dev;
> +	int ret;
> +
> +	ce_thr = devm_kzalloc(dev, sizeof(*ce_thr), GFP_KERNEL);
> +	if (!ce_thr)
> +		return -ENOMEM;
> +
> +	ce_thr->ec = dev_get_drvdata(pdev->dev.parent);
> +
> +	/*
> +	 * The core code uses the DT node of the throttler to identify its
> +	 * throttling devices and rates. The CrOS EC throttler is a sub-device
> +	 * of the CrOS EC MFD device and doesn't have its own device node. Use
> +	 * the node of the MFD device instead.
> +	 */
> +	dev->of_node = dev->parent->of_node;

It might be better to get the appropriate 'parent' (and of_node) pointer
via the 'cros_ec_device' you retrieve above, since the parent might be
different depending on my suggestions in patch 12...I think.

Brian

> +
> +	ce_thr->throttler = throttler_setup(dev);
> +	if (IS_ERR(ce_thr->throttler))
> +		return PTR_ERR(ce_thr->throttler);
> +
> +	dev_set_drvdata(dev, ce_thr);
> +
> +	ce_thr->nb.notifier_call = cros_ec_throttler_event;
> +	ret = blocking_notifier_chain_register(&ce_thr->ec->event_notifier,
> +					       &ce_thr->nb);
> +	if (ret < 0) {
> +		dev_err(dev, "failed to register notifier\n");
> +		throttler_teardown(ce_thr->throttler);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cros_ec_throttler_remove(struct platform_device *pdev)
> +{
> +	struct cros_ec_throttler *ce_thr = platform_get_drvdata(pdev);
> +
> +	blocking_notifier_chain_unregister(&ce_thr->ec->event_notifier,
> +					   &ce_thr->nb);
> +
> +	throttler_teardown(ce_thr->throttler);
> +
> +	return 0;
> +}
> +
> +static struct platform_driver cros_ec_throttler_driver = {
> +	.driver = {
> +		.name = "cros-ec-throttler",
> +	},
> +	.probe		= cros_ec_throttler_probe,
> +	.remove		= cros_ec_throttler_remove,
> +};
> +
> +module_platform_driver(cros_ec_throttler_driver);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Matthias Kaehlcke <mka@chromium.org>");
> +MODULE_DESCRIPTION("Chrome OS EC Throttler");
> -- 
> 2.18.0.rc1.242.g61856ae69a-goog
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 10/12] misc: throttler: Add core support for non-thermal throttling
  2018-06-18 23:03   ` Brian Norris
@ 2018-06-18 23:59     ` Matthias Kaehlcke
  0 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-18 23:59 UTC (permalink / raw)
  To: Brian Norris
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones

On Mon, Jun 18, 2018 at 04:03:25PM -0700, Brian Norris wrote:
> Hi Matthias,
> 
> On Thu, Jun 14, 2018 at 12:47:10PM -0700, Matthias Kaehlcke wrote:
> > The purpose of the throttler is to provide support for non-thermal
> > throttling. Throttling is triggered by external event, e.g. the
> > detection of a high battery discharge current, close to the OCP limit
> > of the battery. The throttler is only in charge of the throttling, not
> > the monitoring, which is done by another (possibly platform specific)
> > driver.
> > 
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> 
> I have a few more comments.

Thanks for the review and testing!

> > diff --git a/drivers/misc/throttler/core.c b/drivers/misc/throttler/core.c
> > new file mode 100644
> > index 000000000000..52350d846654
> > --- /dev/null
> > +++ b/drivers/misc/throttler/core.c
> > +static void thr_devfreq_init(struct device *dev, void *data)
> > +{
> > +	struct throttler *thr = data;
> > +	struct thr_freq_table ft;
> > +	struct devfreq_thrdev *dftd;
> > +	int err;
> > +
> > +	WARN_ON(!mutex_is_locked(&thr->lock));
> > +
> > +	err = thr_init_freq_table(thr, dev->parent, &ft);
> > +	if (err) {
> > +		if (err == -ENODEV)
> > +			return;
> > +
> > +		thr_err(thr, "failed to init frequency table of device %s: %d",
> > +			dev_name(dev), err);
> > +		return;
> > +	}
> > +
> > +	dftd = devm_kzalloc(thr->dev, sizeof(*dftd), GFP_KERNEL);
> > +	if (!dftd) {
> > +		thr_err(thr, "%s: out of memory\n", __func__);
> 
> I think it's considered bad form to roll your own OOM messages. It's
> assumed the memory manager will complain loudly enough for you already.

Ok, I'll remove the OOM logs.

> > +static void thr_cpufreq_update_policy(struct throttler *thr)
> > +{
> > +	struct cpufreq_thrdev *cftd;
> > +
> > +	WARN_ON(!mutex_is_locked(&thr->lock));
> > +
> > +	list_for_each_entry(cftd, &thr->cpufreq.list, node) {
> > +		struct cpufreq_policy *policy = cpufreq_cpu_get(cftd->cpu);
> > +
> > +		if (!policy) {
> > +			thr_warn(thr, "CPU%d does have no cpufreq policy!\n",
> 
> s/does have/has/

Ack

> > +void throttler_set_level(struct throttler *thr, int level)
> > +{
> > +	struct devfreq_thrdev *dftd;
> 
> This driver doesn't really handle negative levels very well (it might
> even read garbage memory?). You might either make the whole driver use
> unsigned values (and parse with an unsigned kstrtoX helper below), or
> else just reject negative values in this function.

Negative values for level make no sense in this driver, so using
unsigned values seems a reasonable solutions.

> > +
> > +	if (level == thr->level)
> 
> It seems like this should be inside the lock.

Can do, but I'm not sure it is strictly necessary. ->level is only
changed in calls originated by the throttler itself (this function and
throttler_teardown()), it would require a really bad 'monitor' driver
to have an actual race.

Mainly I wanted to avoid the seemingly unnecessary mutex_unlock() in
the return path ;-)

> > +static ssize_t thr_level_read(struct file *file, char __user *user_buf,
> > +			      size_t count, loff_t *ppos)
> > +{
> > +	struct throttler *thr = file->f_inode->i_private;
> > +	char buf[5];
> > +	int len;
> > +
> > +	len = scnprintf(buf, sizeof(buf), "%d\n", thr->level);
> 
> Hold the throttler mutex around this read?

Not necessary IMO. Reading the integer value is an atomic operation,
holding the mutex doesn't really change the fact that the value could
change right after userspace read it.

> > +struct throttler *throttler_setup(struct device *dev)
> > +{
> > +	struct throttler *thr;
> > +	struct device_node *np = dev->of_node;
> > +	struct class_interface *ci;
> > +	int cpu;
> > +	int err;
> > +
> > +	if (!np)
> > +		/* should never happen */
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	thr = devm_kzalloc(dev, sizeof(*thr), GFP_KERNEL);
> > +	if (!thr)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	mutex_init(&thr->lock);
> > +	thr->dev = dev;
> > +
> > +	cpumask_clear(&thr->cpufreq.cm_ignore);
> > +	cpumask_clear(&thr->cpufreq.cm_initialized);
> > +
> > +	INIT_LIST_HEAD(&thr->cpufreq.list);
> > +	INIT_LIST_HEAD(&thr->devfreq.list);
> > +
> > +	thr->cpufreq.nb.notifier_call = thr_handle_cpufreq_event;
> > +	err = cpufreq_register_notifier(&thr->cpufreq.nb,
> > +					CPUFREQ_POLICY_NOTIFIER);
> > +	if (err < 0) {
> > +		thr_err(thr, "failed to register cpufreq notifier\n");
> > +		return ERR_PTR(err);
> > +	}
> > +
> > +	/*
> > +	 * The CPU throttling configuration is parsed at runtime, when the
> > +	 * cpufreq policy notifier is called for a CPU that hasn't been
> > +	 * initialized yet.
> > +	 *
> > +	 * This is done for two reasons:
> > +	 * -  when the throttler is probed the CPU might not yet have a policy
> > +	 * -  CPUs that were offline at probe time might be hotplugged
> > +	 *
> > +	 * The notifier is called then the policy is added/set
> > +	 */
> > +	for_each_online_cpu(cpu) {
> > +		struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > +
> > +		if (!policy)
> > +			continue;
> > +
> > +		cpufreq_update_policy(cpu);
> > +		cpufreq_cpu_put(policy);
> > +	}
> > +
> > +	/*
> > +	 * devfreq devices can be added and removed at runtime, hence they
> > +	 * must also be handled dynamically. The class_interface notifies us
> > +	 * whenever a device is added or removed. When the interface is
> > +	 * registered ci->add_dev() is called for all existing devfreq
> > +	 * devices.
> > +	 */
> > +	ci = &thr->devfreq.class_iface;
> > +	ci->class = devfreq_class;
> > +	ci->add_dev = thr_handle_devfreq_added;
> > +	ci->remove_dev = thr_handle_devfreq_removed;
> > +
> > +	err = class_interface_register(ci);
> > +	if (err) {
> > +		thr_err(thr, "failed to register devfreq class interface: %d\n",
> > +			err);
> > +		cpufreq_unregister_notifier(&thr->cpufreq.nb,
> > +					    CPUFREQ_POLICY_NOTIFIER);
> > +		return ERR_PTR(err);
> > +	}
> > +
> > +#ifdef CONFIG_THROTTLER_DEBUG
> > +	thr->debugfs.dir = debugfs_create_dir(dev_name(thr->dev), NULL);
> 
> Remove this dir in throttler_teardown()?

Oops, I thought I did that already ...

> > +void throttler_teardown(struct throttler *thr)
> > +{
> > +	struct devfreq_thrdev *dftd;
> > +	int level;
> > +
> > +	mutex_lock(&thr->lock);
> > +
> > +	level = thr->level;
> > +	thr->level = 0;
> > +
> > +	class_interface_unregister(&thr->devfreq.class_iface);
> 
> This can deadlock. You're holding the throttler mutex and then this
> grabs the class mutex; but add/remove notifications will be holding the
> class mutex while making calls that grab the throttler mutex. IOW, you
> have ABBA (not the band).
> 
> Also, if there are any active devfreq devices attached...you definitely
> deadlock (simple AA), since we directly call ->remove_dev() on them
> here. See this locked-up task:
> 
> [ 4440.118203] [<ffffffc0002158a0>] __switch_to+0x90/0x9c
> [ 4440.118221] [<ffffffc000954f40>] __schedule+0x3cc/0x860
> [ 4440.118232] [<ffffffc000954b14>] schedule+0x4c/0xac
> [ 4440.118243] [<ffffffc0009553f8>] schedule_preempt_disabled+0x24/0x40
> [ 4440.118255] [<ffffffc000956ac8>] __mutex_lock_common+0x194/0x3b0
> [ 4440.118267] [<ffffffc000956348>] __mutex_lock_slowpath+0x38/0x44
> [ 4440.118278] [<ffffffc00095630c>] mutex_lock+0x6c/0x70
> [ 4440.118293] [<ffffffc00062c444>] thr_handle_devfreq_removed+0x2c/0xa0
> [ 4440.118307] [<ffffffc000606dbc>] class_interface_unregister+0x74/0xc4
> [ 4440.118318] [<ffffffc00062c4ec>] throttler_teardown+0x34/0xac
> [ 4440.118328] [<ffffffc00062cb60>] cros_ec_throttler_remove+0x30/0x40
> [ 4440.118341] [<ffffffc000607a60>] platform_drv_remove+0x28/0x50
> [ 4440.118355] [<ffffffc000605974>] device_release_driver_internal+0x120/0x1b0
> [ 4440.118367] [<ffffffc000605a28>] device_release_driver+0x24/0x30
> [ 4440.118380] [<ffffffc000604b50>] unbind_store+0x6c/0xa4
> [ 4440.118392] [<ffffffc000604a4c>] drv_attr_store+0x3c/0x54
> [ 4440.118409] [<ffffffc0003bd4e0>] sysfs_kf_write+0x50/0x68
> [ 4440.118424] [<ffffffc00044233c>] kernfs_fop_write+0xdc/0x188
> [ 4440.118436] [<ffffffc00042762c>] __vfs_write+0xfc/0x10c
> [ 4440.118446] [<ffffffc000427950>] SyS_write+0xf0/0x278
> [ 4440.118460] [<ffffffc000203e44>] el0_svc_naked+0x34/0x38
> 
> I got there with this:
> 
>   echo cros-ec-throttler.1.auto > /sys/bus/platform/drivers/cros-ec-throttler/unbind

Thanks for testing and providing detailed information, I'll revisit
the locking.

> > +	if (level) {
> > +		/* unthrottle CPUs */
> > +		if (!list_empty(&thr->cpufreq.list))
> 
> You don't technically need the list_empty() check, since you do
> list_for_each_entry() within thr_cpufreq_update_policy().

True, but it also does no/very little harm and the
list_for_each_entry() is hidden in thr_cpufreq_update_policy(). I
think the small overhead of the extra check in a function that is
executed at most once per throttler is justified by the improved
readability (no need to confirm that thr_cpufreq_update_policy() does
nothing if cpufreq is not involved in throttling_

> > +		/* unthrottle devfreq devices */
> > +		list_for_each_entry(dftd, &thr->devfreq.list, node) {
> > +			mutex_lock(&dftd->devfreq->lock);
> > +			update_devfreq(dftd->devfreq);
> > +			mutex_unlock(&dftd->devfreq->lock);
> > +		}
> 
> I wonder if the 'update' step deserves its own function, since the
> cpufreq/devfreq updates are repeated in throttler_set_level().

Sounds good.

> > +	}
> > +
> > +	cpufreq_unregister_notifier(&thr->cpufreq.nb,
> > +				    CPUFREQ_POLICY_NOTIFIER);
> 
> Is there a chance of deadlock here? This is a blocking unregistration
> here, and we're holding the lock which a notifier call might be holding.
> I think it's actually be OK to just do the unregistration outside the
> lock?
> 
> Altogether, I think your unregistration needs to be something like:
> 
>  1) set the 'level' to a "none" value that can't be overwritten (e.g.,
>  set_level() rejects it), under the threshold lock
>  2) update cpufreq and devfreq, so the non-limits take hold
>  3) release the threshold lock
>  4) unregister the devfreq class and all notifiers, outside the
>  threshold lock
> 
> Or maybe you come up with some other way that avoids all the above.

Thanks for your analysis and suggestions. I'll revisit the various
locking scenarios.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device
  2018-06-18 23:21   ` Brian Norris
@ 2018-06-19  8:41     ` Enric Balletbo Serra
  2018-06-19 17:55       ` Matthias Kaehlcke
  2018-06-19 17:43     ` Matthias Kaehlcke
  1 sibling, 1 reply; 23+ messages in thread
From: Enric Balletbo Serra @ 2018-06-19  8:41 UTC (permalink / raw)
  To: Brian Norris
  Cc: Matthias Kaehlcke, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
	Arnd Bergmann, Greg Kroah-Hartman, Rob Herring, Mark Rutland,
	Linux PM list, devicetree, linux-kernel, Doug Anderson,
	Enric Balletbo i Serra, Rafael J. Wysocki, viresh.kumar,
	Lee Jones

Hi Matthias,

I am also interested on the answer of Brian comments :). One small comment.

Missatge de Brian Norris <briannorris@chromium.org> del dia dt., 19 de
juny 2018 a les 1:22:
>
> Hi,
>
> On Thu, Jun 14, 2018 at 12:47:12PM -0700, Matthias Kaehlcke wrote:
> > Instantiate the CrOS EC throttler if it is enabled in the kernel
> > configuration.
> >
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > Changes in v3:
> > - patch added to series
> >
> >  drivers/mfd/cros_ec.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
> > index 36156a41499c..5f52b6e2c21a 100644
> > --- a/drivers/mfd/cros_ec.c
> > +++ b/drivers/mfd/cros_ec.c
> > @@ -91,6 +91,10 @@ static int cros_ec_sleep_event(struct cros_ec_device *ec_dev, u8 sleep_event)
> >       return cros_ec_cmd_xfer(ec_dev, &buf.msg);
> >  }
> >
> > +static const struct mfd_cell ec_throttler_cell = {
> > +     { .name = "cros-ec-throttler" }
> > +};
> > +

As Brian said I think that this should go in cros_ec_dev?

Even when only there is one cell we tend to use the array format (see
i.e the cros-ec-rtc and the others in cros_ec_dev).

+static const struct mfd_cell ec_throttler_cells[] = {
+     { .name = "cros-ec-throttler" }
+};

> >  int cros_ec_register(struct cros_ec_device *ec_dev)
> >  {
> >       struct device *dev = ec_dev->dev;
> > @@ -153,6 +157,18 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
> >               }
> >       }
> >
> > +     if (IS_ENABLED(CONFIG_CROS_EC_THROTTLER)) {
> > +             err = mfd_add_devices(ec_dev->dev, PLATFORM_DEVID_AUTO,
> > +                                   &ec_throttler_cell, 1, NULL, ec_dev->irq,

ARRAY_SIZE(ec_throttler_cells)

> > +                                   NULL);
>
> Most of this similar sub-device registration seems to have moved to
> cros_ec_dev.c, in ec_device_probe(). Should this do the same?
>
> And on a similar note: is there a way to determine EC support for this
> feature (e.g., EC_FEATURE_*)? Or do we just have to listen for
> appropriate throttling events that may never come?
>
> Also, is this very useful on non-DT systems? Does this fail gracefully?
>
> Brian
>
> > +             if (err) {
> > +                     dev_err(dev,
> > +                             "Failed to register throttler subdevice %d\n",
> > +                             err);
> > +                     return err;
> > +             }
> > +     }
> > +
> >       if (IS_ENABLED(CONFIG_OF) && dev->of_node) {
> >               err = devm_of_platform_populate(dev);
> >               if (err) {
> > --
> > 2.18.0.rc1.242.g61856ae69a-goog
> >

Cheers,
 Enric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device
  2018-06-18 23:21   ` Brian Norris
  2018-06-19  8:41     ` Enric Balletbo Serra
@ 2018-06-19 17:43     ` Matthias Kaehlcke
  1 sibling, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-19 17:43 UTC (permalink / raw)
  To: Brian Norris
  Cc: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Mark Rutland, linux-pm,
	devicetree, linux-kernel, Douglas Anderson,
	Enric Balletbo i Serra, Rafael J . Wysocki, Viresh Kumar,
	Lee Jones, Benson Leung, Olof Johansson

On Mon, Jun 18, 2018 at 04:21:10PM -0700, Brian Norris wrote:
> Hi,
> 
> On Thu, Jun 14, 2018 at 12:47:12PM -0700, Matthias Kaehlcke wrote:
> > Instantiate the CrOS EC throttler if it is enabled in the kernel
> > configuration.
> > 
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > Changes in v3:
> > - patch added to series
> > 
> >  drivers/mfd/cros_ec.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
> > index 36156a41499c..5f52b6e2c21a 100644
> > --- a/drivers/mfd/cros_ec.c
> > +++ b/drivers/mfd/cros_ec.c
> > @@ -91,6 +91,10 @@ static int cros_ec_sleep_event(struct cros_ec_device *ec_dev, u8 sleep_event)
> >  	return cros_ec_cmd_xfer(ec_dev, &buf.msg);
> >  }
> >  
> > +static const struct mfd_cell ec_throttler_cell = {
> > +	.name = "cros-ec-throttler",
> > +};
> > +
> >  int cros_ec_register(struct cros_ec_device *ec_dev)
> >  {
> >  	struct device *dev = ec_dev->dev;
> > @@ -153,6 +157,18 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
> >  		}
> >  	}
> >  
> > +	if (IS_ENABLED(CONFIG_CROS_EC_THROTTLER)) {
> > +		err = mfd_add_devices(ec_dev->dev, PLATFORM_DEVID_AUTO,
> > +				      &ec_throttler_cell, 1, NULL, ec_dev->irq,
> > +				      NULL);
> 
> Most of this similar sub-device registration seems to have moved to
> cros_ec_dev.c, in ec_device_probe(). Should this do the same?

Not sure about this, the only device that is registered in
ec_device_probe() are the sensors.

I cc-ed Benson and Olof, the maintainers of CrOS platform stuff,
maybe they can help to clarify what the preference is and why.

> And on a similar note: is there a way to determine EC support for this
> feature (e.g., EC_FEATURE_*)? Or do we just have to listen for
> appropriate throttling events that may never come?

To my knowledge there is no such way as of now. There is no
EC_FEATURE_* for throttling support
(https://chromium.googlesource.com/chromiumos/platform/ec/+/master/include/ec_commands.h#1174).
though we could probably add one.

> Also, is this very useful on non-DT systems? Does this fail gracefully?

It is not useful on non-DT systems, however the throttler core has a
dependency on CONFIG_OF, so the device wouldn't be added.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device
  2018-06-19  8:41     ` Enric Balletbo Serra
@ 2018-06-19 17:55       ` Matthias Kaehlcke
  0 siblings, 0 replies; 23+ messages in thread
From: Matthias Kaehlcke @ 2018-06-19 17:55 UTC (permalink / raw)
  To: Enric Balletbo Serra
  Cc: Brian Norris, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
	Arnd Bergmann, Greg Kroah-Hartman, Rob Herring, Mark Rutland,
	Linux PM list, devicetree, linux-kernel, Doug Anderson,
	Enric Balletbo i Serra, Rafael J. Wysocki, viresh.kumar,
	Lee Jones

Hi Enric,

On Tue, Jun 19, 2018 at 10:41:01AM +0200, Enric Balletbo Serra wrote:
> Hi Matthias,
> 
> I am also interested on the answer of Brian comments :). One small comment.
> 
> Missatge de Brian Norris <briannorris@chromium.org> del dia dt., 19 de
> juny 2018 a les 1:22:
> >
> > Hi,
> >
> > On Thu, Jun 14, 2018 at 12:47:12PM -0700, Matthias Kaehlcke wrote:
> > > Instantiate the CrOS EC throttler if it is enabled in the kernel
> > > configuration.
> > >
> > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > > ---
> > > Changes in v3:
> > > - patch added to series
> > >
> > >  drivers/mfd/cros_ec.c | 16 ++++++++++++++++
> > >  1 file changed, 16 insertions(+)
> > >
> > > diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c
> > > index 36156a41499c..5f52b6e2c21a 100644
> > > --- a/drivers/mfd/cros_ec.c
> > > +++ b/drivers/mfd/cros_ec.c
> > > @@ -91,6 +91,10 @@ static int cros_ec_sleep_event(struct cros_ec_device *ec_dev, u8 sleep_event)
> > >       return cros_ec_cmd_xfer(ec_dev, &buf.msg);
> > >  }
> > >
> > > +static const struct mfd_cell ec_throttler_cell = {
> > > +     { .name = "cros-ec-throttler" }
> > > +};
> > > +
> 
> As Brian said I think that this should go in cros_ec_dev?

Not sure about this, see my reply to Brian.

> Even when only there is one cell we tend to use the array format (see
> i.e the cros-ec-rtc and the others in cros_ec_dev).
> 
> +static const struct mfd_cell ec_throttler_cells[] = {
> +     { .name = "cros-ec-throttler" }
> +};

Ok, will change.

> > >  int cros_ec_register(struct cros_ec_device *ec_dev)
> > >  {
> > >       struct device *dev = ec_dev->dev;
> > > @@ -153,6 +157,18 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
> > >               }
> > >       }
> > >
> > > +     if (IS_ENABLED(CONFIG_CROS_EC_THROTTLER)) {
> > > +             err = mfd_add_devices(ec_dev->dev, PLATFORM_DEVID_AUTO,
> > > +                                   &ec_throttler_cell, 1, NULL, ec_dev->irq,
> 
> ARRAY_SIZE(ec_throttler_cells)

Ack

Thanks

Matthias

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-19 17:55 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-14 19:47 [PATCH v3 00/12] Add throttler driver for non-thermal throttling Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa Matthias Kaehlcke
2018-06-15 21:06   ` Brian Norris
2018-06-15 21:25     ` Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 02/12] PM / devfreq: Fix handling of min/max_freq == 0 Matthias Kaehlcke
2018-06-18 20:12   ` Brian Norris
2018-06-14 19:47 ` [PATCH v3 03/12] PM / devfreq: Don't adjust to user limits in governors Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 04/12] PM / devfreq: Add struct devfreq_policy Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 05/12] PM / devfreg: Add support for policy notifiers Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 06/12] PM / devfreq: Make update_devfreq() public Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 07/12] PM / devfreq: export devfreq_class Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 08/12] cpufreq: Add stub for cpufreq_update_policy() Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 09/12] dt-bindings: PM / OPP: add opp-throttlers property Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 10/12] misc: throttler: Add core support for non-thermal throttling Matthias Kaehlcke
2018-06-18 23:03   ` Brian Norris
2018-06-18 23:59     ` Matthias Kaehlcke
2018-06-14 19:47 ` [PATCH v3 11/12] misc: throttler: Add Chrome OS EC throttler Matthias Kaehlcke
2018-06-18 23:34   ` Brian Norris
2018-06-14 19:47 ` [PATCH v3 12/12] mfd: cros_ec: Add throttler sub-device Matthias Kaehlcke
2018-06-18 23:21   ` Brian Norris
2018-06-19  8:41     ` Enric Balletbo Serra
2018-06-19 17:55       ` Matthias Kaehlcke
2018-06-19 17:43     ` Matthias Kaehlcke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).