linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints
@ 2019-07-04  7:36 Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 1/7] PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier() Viresh Kumar
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki, Daniel Lezcano, Kevin Hilman, Len Brown,
	Len Brown, Pavel Machek, Srinivas Pandruvada, Ulf Hansson
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, linux-kernel,
	Matthias Kaehlcke, Rafael J. Wysocki

Hello,

Sorry about another version after another round of embarrassment over
compilation issues. The same patches have gone through pm/linux-next and
linux-next/master earlier, not sure why the current issues never came up
before. Anyway, they were stupid coding mistakes :(

I have pushed these patches again for the build bot to test, which
should finish before these get merged in PM tree now.

This patchset attempts to manage CPU frequency constraints using the PM
QoS framework. It only does the basic stuff right now and moves the
userspace constraints to use the QoS infrastructure.

Todo:
- Migrate all users to the QoS framework and get rid of cpufreq specific
  notifiers.
- Make PM QoS learn about the relation of CPUs in a policy, so a single
  list of constraints is managed for all of them instead of per-cpu
  constraints.

V5->V6:
- Fixed compilation issues with !CONFIG_PM case on riscv architecture.

V4->V5:
- A new patch added, 6/7.

V3->V4:
- Few commit logs updated as suggested during reviews.
- Separate commit (2/6) to create resume-latency specific routines
- Reused earlier work ("update") for notifiers as well.
- Kept Reviewed-by tags as is as the patches normally got better only.
  Please take them back if you find any issues.

V2->V3:
- Add a comment in cpufreq.c as suggested by Qais.
- Rebased on latest pm/linux-next.

V1->V2:
- The previous version introduced a completely new framework, this one
  moves to PM QoS instead.
- Lots of changes because of this.

--
viresh

Viresh Kumar (7):
  PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()
  PM / QOS: Rename __dev_pm_qos_read_value() and
    dev_pm_qos_raw_read_value()
  PM / QOS: Pass request type to dev_pm_qos_read_value()
  PM / QoS: Add support for MIN/MAX frequency constraints
  cpufreq: Register notifiers with the PM QoS framework
  cpufreq: intel_pstate: Reuse refresh_frequency_limits()
  cpufreq: Add QoS requests for userspace constraints

 Documentation/power/pm_qos_interface.txt |  12 +-
 drivers/base/power/domain.c              |   8 +-
 drivers/base/power/domain_governor.c     |   4 +-
 drivers/base/power/qos.c                 | 135 +++++++++++++--
 drivers/base/power/runtime.c             |   2 +-
 drivers/cpufreq/cpufreq.c                | 201 ++++++++++++++++-------
 drivers/cpufreq/intel_pstate.c           |   7 +-
 drivers/cpuidle/governor.c               |   2 +-
 include/linux/cpufreq.h                  |  12 +-
 include/linux/pm_qos.h                   |  48 ++++--
 10 files changed, 320 insertions(+), 111 deletions(-)

-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V6 1/7] PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()
  2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
@ 2019-07-04  7:36 ` Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 2/7] PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() Viresh Kumar
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki, Len Brown, Pavel Machek, Kevin Hilman, Ulf Hansson
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Matthias Kaehlcke,
	Rafael J . Wysocki, linux-kernel

In order to use the same set of routines to register notifiers for
different request types, update the existing
dev_pm_qos_{add|remove}_notifier() routines with an additional
parameter: request-type.

For now, it only supports resume-latency request type but will be
extended to frequency limit (min/max) constraints later on.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/power/pm_qos_interface.txt | 10 ++++++----
 drivers/base/power/domain.c              |  8 +++++---
 drivers/base/power/qos.c                 | 14 ++++++++++++--
 include/linux/pm_qos.h                   | 12 ++++++++----
 4 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt
index 19c5f7b1a7ba..ec7d662d1707 100644
--- a/Documentation/power/pm_qos_interface.txt
+++ b/Documentation/power/pm_qos_interface.txt
@@ -164,12 +164,14 @@ directory.
 Notification mechanisms:
 The per-device PM QoS framework has a per-device notification tree.
 
-int dev_pm_qos_add_notifier(device, notifier):
-Adds a notification callback function for the device.
+int dev_pm_qos_add_notifier(device, notifier, type):
+Adds a notification callback function for the device for a particular request
+type.
+
 The callback is called when the aggregated value of the device constraints list
-is changed (for resume latency device PM QoS only).
+is changed.
 
-int dev_pm_qos_remove_notifier(device, notifier):
+int dev_pm_qos_remove_notifier(device, notifier, type):
 Removes the notification callback function for the device.
 
 
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 33c30c1e6a30..b063bc41b0a9 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1536,7 +1536,8 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
 	if (ret)
 		genpd_free_dev_data(dev, gpd_data);
 	else
-		dev_pm_qos_add_notifier(dev, &gpd_data->nb);
+		dev_pm_qos_add_notifier(dev, &gpd_data->nb,
+					DEV_PM_QOS_RESUME_LATENCY);
 
 	return ret;
 }
@@ -1569,7 +1570,8 @@ static int genpd_remove_device(struct generic_pm_domain *genpd,
 
 	pdd = dev->power.subsys_data->domain_data;
 	gpd_data = to_gpd_data(pdd);
-	dev_pm_qos_remove_notifier(dev, &gpd_data->nb);
+	dev_pm_qos_remove_notifier(dev, &gpd_data->nb,
+				   DEV_PM_QOS_RESUME_LATENCY);
 
 	genpd_lock(genpd);
 
@@ -1597,7 +1599,7 @@ static int genpd_remove_device(struct generic_pm_domain *genpd,
 
  out:
 	genpd_unlock(genpd);
-	dev_pm_qos_add_notifier(dev, &gpd_data->nb);
+	dev_pm_qos_add_notifier(dev, &gpd_data->nb, DEV_PM_QOS_RESUME_LATENCY);
 
 	return ret;
 }
diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 6c91f8df1d59..cfd463212513 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -467,6 +467,7 @@ EXPORT_SYMBOL_GPL(dev_pm_qos_remove_request);
  *
  * @dev: target device for the constraint
  * @notifier: notifier block managed by caller.
+ * @type: request type.
  *
  * Will register the notifier into a notification chain that gets called
  * upon changes to the target value for the device.
@@ -474,10 +475,14 @@ EXPORT_SYMBOL_GPL(dev_pm_qos_remove_request);
  * If the device's constraints object doesn't exist when this routine is called,
  * it will be created (or error code will be returned if that fails).
  */
-int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier)
+int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
+			    enum dev_pm_qos_req_type type)
 {
 	int ret = 0;
 
+	if (WARN_ON(type != DEV_PM_QOS_RESUME_LATENCY))
+		return -EINVAL;
+
 	mutex_lock(&dev_pm_qos_mtx);
 
 	if (IS_ERR(dev->power.qos))
@@ -500,15 +505,20 @@ EXPORT_SYMBOL_GPL(dev_pm_qos_add_notifier);
  *
  * @dev: target device for the constraint
  * @notifier: notifier block to be removed.
+ * @type: request type.
  *
  * Will remove the notifier from the notification chain that gets called
  * upon changes to the target value.
  */
 int dev_pm_qos_remove_notifier(struct device *dev,
-			       struct notifier_block *notifier)
+			       struct notifier_block *notifier,
+			       enum dev_pm_qos_req_type type)
 {
 	int retval = 0;
 
+	if (WARN_ON(type != DEV_PM_QOS_RESUME_LATENCY))
+		return -EINVAL;
+
 	mutex_lock(&dev_pm_qos_mtx);
 
 	/* Silently return if the constraints object is not present. */
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 6ea1ae373d77..58e8749ceac5 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -146,9 +146,11 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
 int dev_pm_qos_update_request(struct dev_pm_qos_request *req, s32 new_value);
 int dev_pm_qos_remove_request(struct dev_pm_qos_request *req);
 int dev_pm_qos_add_notifier(struct device *dev,
-			    struct notifier_block *notifier);
+			    struct notifier_block *notifier,
+			    enum dev_pm_qos_req_type type);
 int dev_pm_qos_remove_notifier(struct device *dev,
-			       struct notifier_block *notifier);
+			       struct notifier_block *notifier,
+			       enum dev_pm_qos_req_type type);
 void dev_pm_qos_constraints_init(struct device *dev);
 void dev_pm_qos_constraints_destroy(struct device *dev);
 int dev_pm_qos_add_ancestor_request(struct device *dev,
@@ -202,10 +204,12 @@ static inline int dev_pm_qos_update_request(struct dev_pm_qos_request *req,
 static inline int dev_pm_qos_remove_request(struct dev_pm_qos_request *req)
 			{ return 0; }
 static inline int dev_pm_qos_add_notifier(struct device *dev,
-					  struct notifier_block *notifier)
+					  struct notifier_block *notifier,
+					  enum dev_pm_qos_req_type type)
 			{ return 0; }
 static inline int dev_pm_qos_remove_notifier(struct device *dev,
-					     struct notifier_block *notifier)
+					     struct notifier_block *notifier,
+					     enum dev_pm_qos_req_type type)
 			{ return 0; }
 static inline void dev_pm_qos_constraints_init(struct device *dev)
 {
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V6 2/7] PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value()
  2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 1/7] PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier() Viresh Kumar
@ 2019-07-04  7:36 ` Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 3/7] PM / QOS: Pass request type to dev_pm_qos_read_value() Viresh Kumar
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki, Kevin Hilman, Ulf Hansson, Pavel Machek,
	Len Brown, Daniel Lezcano
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Rafael J . Wysocki,
	linux-kernel

dev_pm_qos_read_value() will soon need to support more constraint types
(min/max frequency) and will have another argument to it, i.e. type of
the constraint. While that is fine for the existing users of
dev_pm_qos_read_value(), but not that optimal for the callers of
__dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() as all the
callers of these two routines are only looking for resume latency
constraint.

Lets make these two routines care only about the resume latency
constraint and rename them to __dev_pm_qos_resume_latency() and
dev_pm_qos_raw_resume_latency().

Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/domain_governor.c |  2 +-
 drivers/base/power/qos.c             | 13 +++++++++----
 drivers/base/power/runtime.c         |  2 +-
 drivers/cpuidle/governor.c           |  2 +-
 include/linux/pm_qos.h               |  8 ++++----
 5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
index 3838045c9277..20e56a5be01f 100644
--- a/drivers/base/power/domain_governor.c
+++ b/drivers/base/power/domain_governor.c
@@ -66,7 +66,7 @@ static bool default_suspend_ok(struct device *dev)
 	td->constraint_changed = false;
 	td->cached_suspend_ok = false;
 	td->effective_constraint_ns = 0;
-	constraint_ns = __dev_pm_qos_read_value(dev);
+	constraint_ns = __dev_pm_qos_resume_latency(dev);
 
 	spin_unlock_irqrestore(&dev->power.lock, flags);
 
diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index cfd463212513..7a0d197f0809 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -90,16 +90,16 @@ enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask)
 EXPORT_SYMBOL_GPL(dev_pm_qos_flags);
 
 /**
- * __dev_pm_qos_read_value - Get PM QoS constraint for a given device.
+ * __dev_pm_qos_resume_latency - Get resume latency constraint for a given device.
  * @dev: Device to get the PM QoS constraint value for.
  *
  * This routine must be called with dev->power.lock held.
  */
-s32 __dev_pm_qos_read_value(struct device *dev)
+s32 __dev_pm_qos_resume_latency(struct device *dev)
 {
 	lockdep_assert_held(&dev->power.lock);
 
-	return dev_pm_qos_raw_read_value(dev);
+	return dev_pm_qos_raw_resume_latency(dev);
 }
 
 /**
@@ -112,7 +112,12 @@ s32 dev_pm_qos_read_value(struct device *dev)
 	s32 ret;
 
 	spin_lock_irqsave(&dev->power.lock, flags);
-	ret = __dev_pm_qos_read_value(dev);
+
+	if (IS_ERR_OR_NULL(dev->power.qos))
+		ret = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+	else
+		ret = pm_qos_read_value(&dev->power.qos->resume_latency);
+
 	spin_unlock_irqrestore(&dev->power.lock, flags);
 
 	return ret;
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 952a1e7057c7..b75335508d2c 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -275,7 +275,7 @@ static int rpm_check_suspend_allowed(struct device *dev)
 	    || (dev->power.request_pending
 			&& dev->power.request == RPM_REQ_RESUME))
 		retval = -EAGAIN;
-	else if (__dev_pm_qos_read_value(dev) == 0)
+	else if (__dev_pm_qos_resume_latency(dev) == 0)
 		retval = -EPERM;
 	else if (dev->power.runtime_status == RPM_SUSPENDED)
 		retval = 1;
diff --git a/drivers/cpuidle/governor.c b/drivers/cpuidle/governor.c
index 9fddf828a76f..2e3e14192bee 100644
--- a/drivers/cpuidle/governor.c
+++ b/drivers/cpuidle/governor.c
@@ -110,7 +110,7 @@ int cpuidle_governor_latency_req(unsigned int cpu)
 {
 	int global_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
 	struct device *device = get_cpu_device(cpu);
-	int device_req = dev_pm_qos_raw_read_value(device);
+	int device_req = dev_pm_qos_raw_resume_latency(device);
 
 	return device_req < global_req ? device_req : global_req;
 }
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 58e8749ceac5..5e09d4980786 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -139,7 +139,7 @@ s32 pm_qos_read_value(struct pm_qos_constraints *c);
 #ifdef CONFIG_PM
 enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask);
 enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask);
-s32 __dev_pm_qos_read_value(struct device *dev);
+s32 __dev_pm_qos_resume_latency(struct device *dev);
 s32 dev_pm_qos_read_value(struct device *dev);
 int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
 			   enum dev_pm_qos_req_type type, s32 value);
@@ -176,7 +176,7 @@ static inline s32 dev_pm_qos_requested_flags(struct device *dev)
 	return dev->power.qos->flags_req->data.flr.flags;
 }
 
-static inline s32 dev_pm_qos_raw_read_value(struct device *dev)
+static inline s32 dev_pm_qos_raw_resume_latency(struct device *dev)
 {
 	return IS_ERR_OR_NULL(dev->power.qos) ?
 		PM_QOS_RESUME_LATENCY_NO_CONSTRAINT :
@@ -189,7 +189,7 @@ static inline enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev,
 static inline enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev,
 							s32 mask)
 			{ return PM_QOS_FLAGS_UNDEFINED; }
-static inline s32 __dev_pm_qos_read_value(struct device *dev)
+static inline s32 __dev_pm_qos_resume_latency(struct device *dev)
 			{ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; }
 static inline s32 dev_pm_qos_read_value(struct device *dev)
 			{ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; }
@@ -245,7 +245,7 @@ static inline s32 dev_pm_qos_requested_resume_latency(struct device *dev)
 	return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
 }
 static inline s32 dev_pm_qos_requested_flags(struct device *dev) { return 0; }
-static inline s32 dev_pm_qos_raw_read_value(struct device *dev)
+static inline s32 dev_pm_qos_raw_resume_latency(struct device *dev)
 {
 	return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
 }
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V6 3/7] PM / QOS: Pass request type to dev_pm_qos_read_value()
  2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 1/7] PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier() Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 2/7] PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() Viresh Kumar
@ 2019-07-04  7:36 ` Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 4/7] PM / QoS: Add support for MIN/MAX frequency constraints Viresh Kumar
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki, Len Brown, Pavel Machek, Kevin Hilman, Ulf Hansson
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Matthias Kaehlcke,
	Rafael J . Wysocki, linux-kernel

In order to allow dev_pm_qos_read_value() to read values for different
QoS requests, pass request type as a parameter to these routines.

For now, it only supports resume-latency request type but will be
extended to frequency limit (min/max) constraints later on.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/power/pm_qos_interface.txt |  2 +-
 drivers/base/power/domain_governor.c     |  2 +-
 drivers/base/power/qos.c                 | 17 ++++++++++++-----
 include/linux/pm_qos.h                   | 16 +++++++++++++---
 4 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt
index ec7d662d1707..cfcb1df39799 100644
--- a/Documentation/power/pm_qos_interface.txt
+++ b/Documentation/power/pm_qos_interface.txt
@@ -123,7 +123,7 @@ Will remove the element.  After removal it will update the aggregate target and
 call the notification trees if the target was changed as a result of removing
 the request.
 
-s32 dev_pm_qos_read_value(device):
+s32 dev_pm_qos_read_value(device, type):
 Returns the aggregated value for a given device's constraints list.
 
 enum pm_qos_flags_status dev_pm_qos_flags(device, mask)
diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
index 20e56a5be01f..daa8c7689f7e 100644
--- a/drivers/base/power/domain_governor.c
+++ b/drivers/base/power/domain_governor.c
@@ -33,7 +33,7 @@ static int dev_update_qos_constraint(struct device *dev, void *data)
 		 * take its current PM QoS constraint (that's the only thing
 		 * known at this point anyway).
 		 */
-		constraint_ns = dev_pm_qos_read_value(dev);
+		constraint_ns = dev_pm_qos_read_value(dev, DEV_PM_QOS_RESUME_LATENCY);
 		constraint_ns *= NSEC_PER_USEC;
 	}
 
diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 7a0d197f0809..2461fed0efa0 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -105,18 +105,25 @@ s32 __dev_pm_qos_resume_latency(struct device *dev)
 /**
  * dev_pm_qos_read_value - Get PM QoS constraint for a given device (locked).
  * @dev: Device to get the PM QoS constraint value for.
+ * @type: QoS request type.
  */
-s32 dev_pm_qos_read_value(struct device *dev)
+s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type)
 {
+	struct dev_pm_qos *qos = dev->power.qos;
 	unsigned long flags;
 	s32 ret;
 
 	spin_lock_irqsave(&dev->power.lock, flags);
 
-	if (IS_ERR_OR_NULL(dev->power.qos))
-		ret = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
-	else
-		ret = pm_qos_read_value(&dev->power.qos->resume_latency);
+	switch (type) {
+	case DEV_PM_QOS_RESUME_LATENCY:
+		ret = IS_ERR_OR_NULL(qos) ? PM_QOS_RESUME_LATENCY_NO_CONSTRAINT
+			: pm_qos_read_value(&qos->resume_latency);
+		break;
+	default:
+		WARN_ON(1);
+		ret = 0;
+	}
 
 	spin_unlock_irqrestore(&dev->power.lock, flags);
 
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 5e09d4980786..9a21b7ba72ae 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -140,7 +140,7 @@ s32 pm_qos_read_value(struct pm_qos_constraints *c);
 enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask);
 enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask);
 s32 __dev_pm_qos_resume_latency(struct device *dev);
-s32 dev_pm_qos_read_value(struct device *dev);
+s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type);
 int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
 			   enum dev_pm_qos_req_type type, s32 value);
 int dev_pm_qos_update_request(struct dev_pm_qos_request *req, s32 new_value);
@@ -191,8 +191,18 @@ static inline enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev,
 			{ return PM_QOS_FLAGS_UNDEFINED; }
 static inline s32 __dev_pm_qos_resume_latency(struct device *dev)
 			{ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; }
-static inline s32 dev_pm_qos_read_value(struct device *dev)
-			{ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; }
+static inline s32 dev_pm_qos_read_value(struct device *dev,
+					enum dev_pm_qos_req_type type)
+{
+	switch (type) {
+	case DEV_PM_QOS_RESUME_LATENCY:
+		return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+	default:
+		WARN_ON(1);
+		return 0;
+	}
+}
+
 static inline int dev_pm_qos_add_request(struct device *dev,
 					 struct dev_pm_qos_request *req,
 					 enum dev_pm_qos_req_type type,
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V6 4/7] PM / QoS: Add support for MIN/MAX frequency constraints
  2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
                   ` (2 preceding siblings ...)
  2019-07-04  7:36 ` [PATCH V6 3/7] PM / QOS: Pass request type to dev_pm_qos_read_value() Viresh Kumar
@ 2019-07-04  7:36 ` Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 5/7] cpufreq: Register notifiers with the PM QoS framework Viresh Kumar
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki, Len Brown, Pavel Machek
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Matthias Kaehlcke,
	Ulf Hansson, Rafael J . Wysocki, linux-kernel

This patch introduces the min-frequency and max-frequency device
constraints, which will be used by the cpufreq core to begin with.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/qos.c | 111 ++++++++++++++++++++++++++++++++++-----
 include/linux/pm_qos.h   |  12 +++++
 2 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 2461fed0efa0..6c90fd7e2ff8 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -120,6 +120,14 @@ s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type)
 		ret = IS_ERR_OR_NULL(qos) ? PM_QOS_RESUME_LATENCY_NO_CONSTRAINT
 			: pm_qos_read_value(&qos->resume_latency);
 		break;
+	case DEV_PM_QOS_MIN_FREQUENCY:
+		ret = IS_ERR_OR_NULL(qos) ? PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE
+			: pm_qos_read_value(&qos->min_frequency);
+		break;
+	case DEV_PM_QOS_MAX_FREQUENCY:
+		ret = IS_ERR_OR_NULL(qos) ? PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE
+			: pm_qos_read_value(&qos->max_frequency);
+		break;
 	default:
 		WARN_ON(1);
 		ret = 0;
@@ -161,6 +169,14 @@ static int apply_constraint(struct dev_pm_qos_request *req,
 			req->dev->power.set_latency_tolerance(req->dev, value);
 		}
 		break;
+	case DEV_PM_QOS_MIN_FREQUENCY:
+		ret = pm_qos_update_target(&qos->min_frequency,
+					   &req->data.pnode, action, value);
+		break;
+	case DEV_PM_QOS_MAX_FREQUENCY:
+		ret = pm_qos_update_target(&qos->max_frequency,
+					   &req->data.pnode, action, value);
+		break;
 	case DEV_PM_QOS_FLAGS:
 		ret = pm_qos_update_flags(&qos->flags, &req->data.flr,
 					  action, value);
@@ -189,12 +205,11 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
 	if (!qos)
 		return -ENOMEM;
 
-	n = kzalloc(sizeof(*n), GFP_KERNEL);
+	n = kzalloc(3 * sizeof(*n), GFP_KERNEL);
 	if (!n) {
 		kfree(qos);
 		return -ENOMEM;
 	}
-	BLOCKING_INIT_NOTIFIER_HEAD(n);
 
 	c = &qos->resume_latency;
 	plist_head_init(&c->list);
@@ -203,6 +218,7 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
 	c->no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
 	c->type = PM_QOS_MIN;
 	c->notifiers = n;
+	BLOCKING_INIT_NOTIFIER_HEAD(n);
 
 	c = &qos->latency_tolerance;
 	plist_head_init(&c->list);
@@ -211,6 +227,24 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
 	c->no_constraint_value = PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT;
 	c->type = PM_QOS_MIN;
 
+	c = &qos->min_frequency;
+	plist_head_init(&c->list);
+	c->target_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE;
+	c->default_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE;
+	c->no_constraint_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE;
+	c->type = PM_QOS_MAX;
+	c->notifiers = ++n;
+	BLOCKING_INIT_NOTIFIER_HEAD(n);
+
+	c = &qos->max_frequency;
+	plist_head_init(&c->list);
+	c->target_value = PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE;
+	c->default_value = PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE;
+	c->no_constraint_value = PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE;
+	c->type = PM_QOS_MIN;
+	c->notifiers = ++n;
+	BLOCKING_INIT_NOTIFIER_HEAD(n);
+
 	INIT_LIST_HEAD(&qos->flags.list);
 
 	spin_lock_irq(&dev->power.lock);
@@ -264,11 +298,25 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
 		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_DEFAULT_VALUE);
 		memset(req, 0, sizeof(*req));
 	}
+
 	c = &qos->latency_tolerance;
 	plist_for_each_entry_safe(req, tmp, &c->list, data.pnode) {
 		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_DEFAULT_VALUE);
 		memset(req, 0, sizeof(*req));
 	}
+
+	c = &qos->min_frequency;
+	plist_for_each_entry_safe(req, tmp, &c->list, data.pnode) {
+		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE);
+		memset(req, 0, sizeof(*req));
+	}
+
+	c = &qos->max_frequency;
+	plist_for_each_entry_safe(req, tmp, &c->list, data.pnode) {
+		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
+		memset(req, 0, sizeof(*req));
+	}
+
 	f = &qos->flags;
 	list_for_each_entry_safe(req, tmp, &f->list, data.flr.node) {
 		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_DEFAULT_VALUE);
@@ -380,6 +428,8 @@ static int __dev_pm_qos_update_request(struct dev_pm_qos_request *req,
 	switch(req->type) {
 	case DEV_PM_QOS_RESUME_LATENCY:
 	case DEV_PM_QOS_LATENCY_TOLERANCE:
+	case DEV_PM_QOS_MIN_FREQUENCY:
+	case DEV_PM_QOS_MAX_FREQUENCY:
 		curr_value = req->data.pnode.prio;
 		break;
 	case DEV_PM_QOS_FLAGS:
@@ -492,9 +542,6 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
 {
 	int ret = 0;
 
-	if (WARN_ON(type != DEV_PM_QOS_RESUME_LATENCY))
-		return -EINVAL;
-
 	mutex_lock(&dev_pm_qos_mtx);
 
 	if (IS_ERR(dev->power.qos))
@@ -502,10 +549,28 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
 	else if (!dev->power.qos)
 		ret = dev_pm_qos_constraints_allocate(dev);
 
-	if (!ret)
+	if (ret)
+		goto unlock;
+
+	switch (type) {
+	case DEV_PM_QOS_RESUME_LATENCY:
 		ret = blocking_notifier_chain_register(dev->power.qos->resume_latency.notifiers,
 						       notifier);
+		break;
+	case DEV_PM_QOS_MIN_FREQUENCY:
+		ret = blocking_notifier_chain_register(dev->power.qos->min_frequency.notifiers,
+						       notifier);
+		break;
+	case DEV_PM_QOS_MAX_FREQUENCY:
+		ret = blocking_notifier_chain_register(dev->power.qos->max_frequency.notifiers,
+						       notifier);
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EINVAL;
+	}
 
+unlock:
 	mutex_unlock(&dev_pm_qos_mtx);
 	return ret;
 }
@@ -526,20 +591,35 @@ int dev_pm_qos_remove_notifier(struct device *dev,
 			       struct notifier_block *notifier,
 			       enum dev_pm_qos_req_type type)
 {
-	int retval = 0;
-
-	if (WARN_ON(type != DEV_PM_QOS_RESUME_LATENCY))
-		return -EINVAL;
+	int ret = 0;
 
 	mutex_lock(&dev_pm_qos_mtx);
 
 	/* Silently return if the constraints object is not present. */
-	if (!IS_ERR_OR_NULL(dev->power.qos))
-		retval = blocking_notifier_chain_unregister(dev->power.qos->resume_latency.notifiers,
-							    notifier);
+	if (IS_ERR_OR_NULL(dev->power.qos))
+		goto unlock;
+
+	switch (type) {
+	case DEV_PM_QOS_RESUME_LATENCY:
+		ret = blocking_notifier_chain_unregister(dev->power.qos->resume_latency.notifiers,
+							 notifier);
+		break;
+	case DEV_PM_QOS_MIN_FREQUENCY:
+		ret = blocking_notifier_chain_unregister(dev->power.qos->min_frequency.notifiers,
+							 notifier);
+		break;
+	case DEV_PM_QOS_MAX_FREQUENCY:
+		ret = blocking_notifier_chain_unregister(dev->power.qos->max_frequency.notifiers,
+							 notifier);
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EINVAL;
+	}
 
+unlock:
 	mutex_unlock(&dev_pm_qos_mtx);
-	return retval;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_remove_notifier);
 
@@ -599,6 +679,9 @@ static void __dev_pm_qos_drop_user_request(struct device *dev,
 		req = dev->power.qos->flags_req;
 		dev->power.qos->flags_req = NULL;
 		break;
+	default:
+		WARN_ON(1);
+		return;
 	}
 	__dev_pm_qos_remove_request(req);
 	kfree(req);
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 9a21b7ba72ae..2aebbc5b9950 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -40,6 +40,8 @@ enum pm_qos_flags_status {
 #define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT	PM_QOS_LATENCY_ANY
 #define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS	PM_QOS_LATENCY_ANY_NS
 #define PM_QOS_LATENCY_TOLERANCE_DEFAULT_VALUE	0
+#define PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE	0
+#define PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE	(-1)
 #define PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT	(-1)
 
 #define PM_QOS_FLAG_NO_POWER_OFF	(1 << 0)
@@ -58,6 +60,8 @@ struct pm_qos_flags_request {
 enum dev_pm_qos_req_type {
 	DEV_PM_QOS_RESUME_LATENCY = 1,
 	DEV_PM_QOS_LATENCY_TOLERANCE,
+	DEV_PM_QOS_MIN_FREQUENCY,
+	DEV_PM_QOS_MAX_FREQUENCY,
 	DEV_PM_QOS_FLAGS,
 };
 
@@ -99,10 +103,14 @@ struct pm_qos_flags {
 struct dev_pm_qos {
 	struct pm_qos_constraints resume_latency;
 	struct pm_qos_constraints latency_tolerance;
+	struct pm_qos_constraints min_frequency;
+	struct pm_qos_constraints max_frequency;
 	struct pm_qos_flags flags;
 	struct dev_pm_qos_request *resume_latency_req;
 	struct dev_pm_qos_request *latency_tolerance_req;
 	struct dev_pm_qos_request *flags_req;
+	struct dev_pm_qos_request *min_frequency_req;
+	struct dev_pm_qos_request *max_frequency_req;
 };
 
 /* Action requested to pm_qos_update_target */
@@ -197,6 +205,10 @@ static inline s32 dev_pm_qos_read_value(struct device *dev,
 	switch (type) {
 	case DEV_PM_QOS_RESUME_LATENCY:
 		return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+	case DEV_PM_QOS_MIN_FREQUENCY:
+		return PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE;
+	case DEV_PM_QOS_MAX_FREQUENCY:
+		return PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE;
 	default:
 		WARN_ON(1);
 		return 0;
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V6 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
                   ` (3 preceding siblings ...)
  2019-07-04  7:36 ` [PATCH V6 4/7] PM / QoS: Add support for MIN/MAX frequency constraints Viresh Kumar
@ 2019-07-04  7:36 ` Viresh Kumar
  2019-07-08 10:57   ` [PATCH V7 " Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 6/7] cpufreq: intel_pstate: Reuse refresh_frequency_limits() Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 7/7] cpufreq: Add QoS requests for userspace constraints Viresh Kumar
  6 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Matthias Kaehlcke,
	Ulf Hansson, Rafael J . Wysocki, linux-kernel

This registers the notifiers for min/max frequency constraints with the
PM QoS framework. The constraints are also taken into consideration in
cpufreq_set_policy().

This also relocates cpufreq_policy_put_kobj() as it is required to be
called from cpufreq_policy_alloc() now.

refresh_frequency_limits() is updated to have proper locking in place and
avoid calling cpufreq_set_policy() for inactive policies.

No constraints are added until now though.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c | 135 ++++++++++++++++++++++++++++++--------
 include/linux/cpufreq.h   |   3 +
 2 files changed, 109 insertions(+), 29 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index ceb57af15ca0..81117e4d43cc 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -26,6 +26,7 @@
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/pm_qos.h>
 #include <linux/slab.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
@@ -999,7 +1000,7 @@ static void add_cpu_dev_symlink(struct cpufreq_policy *policy, unsigned int cpu)
 {
 	struct device *dev = get_cpu_device(cpu);
 
-	if (!dev)
+	if (unlikely(!dev))
 		return;
 
 	if (cpumask_test_and_set_cpu(cpu, policy->real_cpus))
@@ -1117,14 +1118,20 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cp
 
 static void refresh_frequency_limits(struct cpufreq_policy *policy)
 {
-	struct cpufreq_policy new_policy = *policy;
+	struct cpufreq_policy new_policy;
 
-	pr_debug("updating policy for CPU %u\n", policy->cpu);
+	down_write(&policy->rwsem);
+
+	if (!policy_is_inactive(policy)) {
+		new_policy = *policy;
+		pr_debug("updating policy for CPU %u\n", policy->cpu);
 
-	new_policy.min = policy->user_policy.min;
-	new_policy.max = policy->user_policy.max;
+		new_policy.min = policy->user_policy.min;
+		new_policy.max = policy->user_policy.max;
+		cpufreq_set_policy(policy, &new_policy);
+	}
 
-	cpufreq_set_policy(policy, &new_policy);
+	up_write(&policy->rwsem);
 }
 
 static void handle_update(struct work_struct *work)
@@ -1136,11 +1143,55 @@ static void handle_update(struct work_struct *work)
 	refresh_frequency_limits(policy);
 }
 
+static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long freq,
+				void *data)
+{
+	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_min);
+
+	schedule_work(&policy->update);
+	return 0;
+}
+
+static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long freq,
+				void *data)
+{
+	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_max);
+
+	schedule_work(&policy->update);
+	return 0;
+}
+
+static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
+{
+	struct kobject *kobj;
+	struct completion *cmp;
+
+	down_write(&policy->rwsem);
+	cpufreq_stats_free_table(policy);
+	kobj = &policy->kobj;
+	cmp = &policy->kobj_unregister;
+	up_write(&policy->rwsem);
+	kobject_put(kobj);
+
+	/*
+	 * We need to make sure that the underlying kobj is
+	 * actually not referenced anymore by anybody before we
+	 * proceed with unloading.
+	 */
+	pr_debug("waiting for dropping of refcount\n");
+	wait_for_completion(cmp);
+	pr_debug("wait complete\n");
+}
+
 static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 {
 	struct cpufreq_policy *policy;
+	struct device *dev = get_cpu_device(cpu);
 	int ret;
 
+	if (!dev)
+		return NULL;
+
 	policy = kzalloc(sizeof(*policy), GFP_KERNEL);
 	if (!policy)
 		return NULL;
@@ -1157,7 +1208,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
 				   cpufreq_global_kobject, "policy%u", cpu);
 	if (ret) {
-		pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
+		dev_err(dev, "%s: failed to init policy->kobj: %d\n", __func__, ret);
 		/*
 		 * The entire policy object will be freed below, but the extra
 		 * memory allocated for the kobject name needs to be freed by
@@ -1167,6 +1218,25 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 		goto err_free_real_cpus;
 	}
 
+	policy->nb_min.notifier_call = cpufreq_notifier_min;
+	policy->nb_max.notifier_call = cpufreq_notifier_max;
+
+	ret = dev_pm_qos_add_notifier(dev, &policy->nb_min,
+				      DEV_PM_QOS_MIN_FREQUENCY);
+	if (ret) {
+		dev_err(dev, "Failed to register MIN QoS notifier: %d (%*pbl)\n",
+			ret, cpumask_pr_args(policy->cpus));
+		goto err_kobj_remove;
+	}
+
+	ret = dev_pm_qos_add_notifier(dev, &policy->nb_max,
+				      DEV_PM_QOS_MAX_FREQUENCY);
+	if (ret) {
+		dev_err(dev, "Failed to register MAX QoS notifier: %d (%*pbl)\n",
+			ret, cpumask_pr_args(policy->cpus));
+		goto err_min_qos_notifier;
+	}
+
 	INIT_LIST_HEAD(&policy->policy_list);
 	init_rwsem(&policy->rwsem);
 	spin_lock_init(&policy->transition_lock);
@@ -1177,6 +1247,11 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	policy->cpu = cpu;
 	return policy;
 
+err_min_qos_notifier:
+	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
+				   DEV_PM_QOS_MIN_FREQUENCY);
+err_kobj_remove:
+	cpufreq_policy_put_kobj(policy);
 err_free_real_cpus:
 	free_cpumask_var(policy->real_cpus);
 err_free_rcpumask:
@@ -1189,30 +1264,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	return NULL;
 }
 
-static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
-{
-	struct kobject *kobj;
-	struct completion *cmp;
-
-	down_write(&policy->rwsem);
-	cpufreq_stats_free_table(policy);
-	kobj = &policy->kobj;
-	cmp = &policy->kobj_unregister;
-	up_write(&policy->rwsem);
-	kobject_put(kobj);
-
-	/*
-	 * We need to make sure that the underlying kobj is
-	 * actually not referenced anymore by anybody before we
-	 * proceed with unloading.
-	 */
-	pr_debug("waiting for dropping of refcount\n");
-	wait_for_completion(cmp);
-	pr_debug("wait complete\n");
-}
-
 static void cpufreq_policy_free(struct cpufreq_policy *policy)
 {
+	struct device *dev = get_cpu_device(policy->cpu);
 	unsigned long flags;
 	int cpu;
 
@@ -1224,6 +1278,11 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
 		per_cpu(cpufreq_cpu_data, cpu) = NULL;
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
+	dev_pm_qos_remove_notifier(dev, &policy->nb_max,
+				   DEV_PM_QOS_MAX_FREQUENCY);
+	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
+				   DEV_PM_QOS_MIN_FREQUENCY);
+
 	cpufreq_policy_put_kobj(policy);
 	free_cpumask_var(policy->real_cpus);
 	free_cpumask_var(policy->related_cpus);
@@ -2283,6 +2342,8 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 		       struct cpufreq_policy *new_policy)
 {
 	struct cpufreq_governor *old_gov;
+	struct device *cpu_dev = get_cpu_device(policy->cpu);
+	unsigned long min, max;
 	int ret;
 
 	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
@@ -2297,11 +2358,27 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 	if (new_policy->min > new_policy->max)
 		return -EINVAL;
 
+	/*
+	 * PM QoS framework collects all the requests from users and provide us
+	 * the final aggregated value here.
+	 */
+	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
+	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
+
+	if (min > new_policy->min)
+		new_policy->min = min;
+	if (max < new_policy->max)
+		new_policy->max = max;
+
 	/* verify the cpu speed can be set within this limit */
 	ret = cpufreq_driver->verify(new_policy);
 	if (ret)
 		return ret;
 
+	/*
+	 * The notifier-chain shall be removed once all the users of
+	 * CPUFREQ_ADJUST are moved to use the QoS framework.
+	 */
 	/* adjust if necessary - all reasons */
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_ADJUST, new_policy);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index a1467aa7f58b..95425941f46d 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -147,6 +147,9 @@ struct cpufreq_policy {
 
 	/* Pointer to the cooling device if used for thermal mitigation */
 	struct thermal_cooling_device *cdev;
+
+	struct notifier_block nb_min;
+	struct notifier_block nb_max;
 };
 
 struct cpufreq_freqs {
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V6 6/7] cpufreq: intel_pstate: Reuse refresh_frequency_limits()
  2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
                   ` (4 preceding siblings ...)
  2019-07-04  7:36 ` [PATCH V6 5/7] cpufreq: Register notifiers with the PM QoS framework Viresh Kumar
@ 2019-07-04  7:36 ` Viresh Kumar
  2019-07-04  7:36 ` [PATCH V6 7/7] cpufreq: Add QoS requests for userspace constraints Viresh Kumar
  6 siblings, 0 replies; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki, Srinivas Pandruvada, Len Brown
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Rafael J . Wysocki,
	linux-kernel

The implementation of intel_pstate_update_max_freq() is quite similar to
refresh_frequency_limits(), lets reuse it.

Finding minimum of policy->user_policy.max and policy->cpuinfo.max_freq
in intel_pstate_update_max_freq() is redundant as cpufreq_set_policy()
will call the ->verify() callback of intel-pstate driver, which will do
this comparison anyway and so dropping it from
intel_pstate_update_max_freq() doesn't harm.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c      | 3 ++-
 drivers/cpufreq/intel_pstate.c | 7 +------
 include/linux/cpufreq.h        | 1 +
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 81117e4d43cc..091789e868ee 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1116,7 +1116,7 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cp
 	return ret;
 }
 
-static void refresh_frequency_limits(struct cpufreq_policy *policy)
+void refresh_frequency_limits(struct cpufreq_policy *policy)
 {
 	struct cpufreq_policy new_policy;
 
@@ -1133,6 +1133,7 @@ static void refresh_frequency_limits(struct cpufreq_policy *policy)
 
 	up_write(&policy->rwsem);
 }
+EXPORT_SYMBOL(refresh_frequency_limits);
 
 static void handle_update(struct work_struct *work)
 {
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f2ff5de988c1..cc27d4c59dca 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -898,7 +898,6 @@ static void intel_pstate_update_policies(void)
 static void intel_pstate_update_max_freq(unsigned int cpu)
 {
 	struct cpufreq_policy *policy = cpufreq_cpu_acquire(cpu);
-	struct cpufreq_policy new_policy;
 	struct cpudata *cpudata;
 
 	if (!policy)
@@ -908,11 +907,7 @@ static void intel_pstate_update_max_freq(unsigned int cpu)
 	policy->cpuinfo.max_freq = global.turbo_disabled_mf ?
 			cpudata->pstate.max_freq : cpudata->pstate.turbo_freq;
 
-	memcpy(&new_policy, policy, sizeof(*policy));
-	new_policy.max = min(policy->user_policy.max, policy->cpuinfo.max_freq);
-	new_policy.min = min(policy->user_policy.min, new_policy.max);
-
-	cpufreq_set_policy(policy, &new_policy);
+	refresh_frequency_limits(policy);
 
 	cpufreq_cpu_release(policy);
 }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 95425941f46d..1fa37b675a80 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -207,6 +207,7 @@ void cpufreq_cpu_release(struct cpufreq_policy *policy);
 int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
 int cpufreq_set_policy(struct cpufreq_policy *policy,
 		       struct cpufreq_policy *new_policy);
+void refresh_frequency_limits(struct cpufreq_policy *policy);
 void cpufreq_update_policy(unsigned int cpu);
 void cpufreq_update_limits(unsigned int cpu);
 bool have_governor_per_policy(void);
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V6 7/7] cpufreq: Add QoS requests for userspace constraints
  2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
                   ` (5 preceding siblings ...)
  2019-07-04  7:36 ` [PATCH V6 6/7] cpufreq: intel_pstate: Reuse refresh_frequency_limits() Viresh Kumar
@ 2019-07-04  7:36 ` Viresh Kumar
  2019-07-05 10:51   ` [PATCH V7 " Viresh Kumar
  6 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2019-07-04  7:36 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Matthias Kaehlcke,
	Ulf Hansson, Rafael J . Wysocki, linux-kernel

This implements QoS requests to manage userspace configuration of min
and max frequency.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c | 87 ++++++++++++++++++++-------------------
 include/linux/cpufreq.h   |  8 +---
 2 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 091789e868ee..13c2f119cc0c 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -718,23 +718,15 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
 static ssize_t store_##file_name					\
 (struct cpufreq_policy *policy, const char *buf, size_t count)		\
 {									\
-	int ret, temp;							\
-	struct cpufreq_policy new_policy;				\
+	unsigned long val;						\
+	int ret;							\
 									\
-	memcpy(&new_policy, policy, sizeof(*policy));			\
-	new_policy.min = policy->user_policy.min;			\
-	new_policy.max = policy->user_policy.max;			\
-									\
-	ret = sscanf(buf, "%u", &new_policy.object);			\
+	ret = sscanf(buf, "%lu", &val);					\
 	if (ret != 1)							\
 		return -EINVAL;						\
 									\
-	temp = new_policy.object;					\
-	ret = cpufreq_set_policy(policy, &new_policy);		\
-	if (!ret)							\
-		policy->user_policy.object = temp;			\
-									\
-	return ret ? ret : count;					\
+	ret = dev_pm_qos_update_request(policy->object##_freq_req, val);\
+	return ret >= 0 ? count : ret;					\
 }
 
 store_one(scaling_min_freq, min);
@@ -1126,8 +1118,6 @@ void refresh_frequency_limits(struct cpufreq_policy *policy)
 		new_policy = *policy;
 		pr_debug("updating policy for CPU %u\n", policy->cpu);
 
-		new_policy.min = policy->user_policy.min;
-		new_policy.max = policy->user_policy.max;
 		cpufreq_set_policy(policy, &new_policy);
 	}
 
@@ -1238,6 +1228,12 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 		goto err_min_qos_notifier;
 	}
 
+	policy->min_freq_req = kzalloc(2 * sizeof(*policy->min_freq_req),
+				       GFP_KERNEL);
+	if (!policy->min_freq_req)
+		goto err_max_qos_notifier;
+
+	policy->max_freq_req = policy->min_freq_req + 1;
 	INIT_LIST_HEAD(&policy->policy_list);
 	init_rwsem(&policy->rwsem);
 	spin_lock_init(&policy->transition_lock);
@@ -1248,6 +1244,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	policy->cpu = cpu;
 	return policy;
 
+err_max_qos_notifier:
+	dev_pm_qos_remove_notifier(dev, &policy->nb_max,
+				   DEV_PM_QOS_MAX_FREQUENCY);
 err_min_qos_notifier:
 	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
 				   DEV_PM_QOS_MIN_FREQUENCY);
@@ -1283,6 +1282,9 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
 				   DEV_PM_QOS_MAX_FREQUENCY);
 	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
 				   DEV_PM_QOS_MIN_FREQUENCY);
+	dev_pm_qos_remove_request(policy->max_freq_req);
+	dev_pm_qos_remove_request(policy->min_freq_req);
+	kfree(policy->min_freq_req);
 
 	cpufreq_policy_put_kobj(policy);
 	free_cpumask_var(policy->real_cpus);
@@ -1361,16 +1363,30 @@ static int cpufreq_online(unsigned int cpu)
 	cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
 
 	if (new_policy) {
-		policy->user_policy.min = policy->min;
-		policy->user_policy.max = policy->max;
+		struct device *dev = get_cpu_device(cpu);
 
 		for_each_cpu(j, policy->related_cpus) {
 			per_cpu(cpufreq_cpu_data, j) = policy;
 			add_cpu_dev_symlink(policy, j);
 		}
-	} else {
-		policy->min = policy->user_policy.min;
-		policy->max = policy->user_policy.max;
+
+		ret = dev_pm_qos_add_request(dev, policy->min_freq_req,
+					     DEV_PM_QOS_MIN_FREQUENCY,
+					     policy->min);
+		if (ret < 0) {
+			dev_err(dev, "Failed to add min-freq constraint (%d)\n",
+				ret);
+			goto out_destroy_policy;
+		}
+
+		ret = dev_pm_qos_add_request(dev, policy->max_freq_req,
+					     DEV_PM_QOS_MAX_FREQUENCY,
+					     policy->max);
+		if (ret < 0) {
+			dev_err(dev, "Failed to add max-freq constraint (%d)\n",
+				ret);
+			goto out_destroy_policy;
+		}
 	}
 
 	if (cpufreq_driver->get && has_target()) {
@@ -2344,7 +2360,6 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 {
 	struct cpufreq_governor *old_gov;
 	struct device *cpu_dev = get_cpu_device(policy->cpu);
-	unsigned long min, max;
 	int ret;
 
 	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
@@ -2352,24 +2367,12 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 
 	memcpy(&new_policy->cpuinfo, &policy->cpuinfo, sizeof(policy->cpuinfo));
 
-	/*
-	* This check works well when we store new min/max freq attributes,
-	* because new_policy is a copy of policy with one field updated.
-	*/
-	if (new_policy->min > new_policy->max)
-		return -EINVAL;
-
 	/*
 	 * PM QoS framework collects all the requests from users and provide us
 	 * the final aggregated value here.
 	 */
-	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
-	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
-
-	if (min > new_policy->min)
-		new_policy->min = min;
-	if (max < new_policy->max)
-		new_policy->max = max;
+	new_policy->min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
+	new_policy->max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
 
 	/* verify the cpu speed can be set within this limit */
 	ret = cpufreq_driver->verify(new_policy);
@@ -2458,10 +2461,9 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
  * @cpu: CPU to re-evaluate the policy for.
  *
  * Update the current frequency for the cpufreq policy of @cpu and use
- * cpufreq_set_policy() to re-apply the min and max limits saved in the
- * user_policy sub-structure of that policy, which triggers the evaluation
- * of policy notifiers and the cpufreq driver's ->verify() callback for the
- * policy in question, among other things.
+ * cpufreq_set_policy() to re-apply the min and max limits, which triggers the
+ * evaluation of policy notifiers and the cpufreq driver's ->verify() callback
+ * for the policy in question, among other things.
  */
 void cpufreq_update_policy(unsigned int cpu)
 {
@@ -2521,10 +2523,9 @@ static int cpufreq_boost_set_sw(int state)
 			break;
 		}
 
-		down_write(&policy->rwsem);
-		policy->user_policy.max = policy->max;
-		cpufreq_governor_limits(policy);
-		up_write(&policy->rwsem);
+		ret = dev_pm_qos_update_request(policy->max_freq_req, policy->max);
+		if (ret)
+			break;
 	}
 
 	return ret;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 1fa37b675a80..afc683021ac5 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -50,11 +50,6 @@ struct cpufreq_cpuinfo {
 	unsigned int		transition_latency;
 };
 
-struct cpufreq_user_policy {
-	unsigned int		min;    /* in kHz */
-	unsigned int		max;    /* in kHz */
-};
-
 struct cpufreq_policy {
 	/* CPUs sharing clock, require sw coordination */
 	cpumask_var_t		cpus;	/* Online CPUs only */
@@ -84,7 +79,8 @@ struct cpufreq_policy {
 	struct work_struct	update; /* if update_policy() needs to be
 					 * called, but you're in IRQ context */
 
-	struct cpufreq_user_policy user_policy;
+	struct dev_pm_qos_request *min_freq_req;
+	struct dev_pm_qos_request *max_freq_req;
 	struct cpufreq_frequency_table	*freq_table;
 	enum cpufreq_table_sorting freq_table_sorted;
 
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V7 7/7] cpufreq: Add QoS requests for userspace constraints
  2019-07-04  7:36 ` [PATCH V6 7/7] cpufreq: Add QoS requests for userspace constraints Viresh Kumar
@ 2019-07-05 10:51   ` Viresh Kumar
  2019-07-05 11:17     ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2019-07-05 10:51 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, Matthias Kaehlcke,
	Ulf Hansson, Rafael J . Wysocki, linux-kernel

This implements QoS requests to manage userspace configuration of min
and max frequency.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
V6->V7:
- We can't call dev_pm_qos_remove_request() for a request which was
  never added. This happened in one of the error paths. Fixed that by
  allocating the requests only right before we try to add them and also
  take care of things properly during errors.

@Rafael: Please apply this version instead of the diff I supplied on the
WARN email earlier. This has proper protection in place at many places.

 drivers/cpufreq/cpufreq.c | 98 ++++++++++++++++++++++-----------------
 include/linux/cpufreq.h   |  8 +---
 2 files changed, 57 insertions(+), 49 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 091789e868ee..a46abd738cfd 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -718,23 +718,15 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
 static ssize_t store_##file_name					\
 (struct cpufreq_policy *policy, const char *buf, size_t count)		\
 {									\
-	int ret, temp;							\
-	struct cpufreq_policy new_policy;				\
+	unsigned long val;						\
+	int ret;							\
 									\
-	memcpy(&new_policy, policy, sizeof(*policy));			\
-	new_policy.min = policy->user_policy.min;			\
-	new_policy.max = policy->user_policy.max;			\
-									\
-	ret = sscanf(buf, "%u", &new_policy.object);			\
+	ret = sscanf(buf, "%lu", &val);					\
 	if (ret != 1)							\
 		return -EINVAL;						\
 									\
-	temp = new_policy.object;					\
-	ret = cpufreq_set_policy(policy, &new_policy);		\
-	if (!ret)							\
-		policy->user_policy.object = temp;			\
-									\
-	return ret ? ret : count;					\
+	ret = dev_pm_qos_update_request(policy->object##_freq_req, val);\
+	return ret >= 0 ? count : ret;					\
 }
 
 store_one(scaling_min_freq, min);
@@ -1126,8 +1118,6 @@ void refresh_frequency_limits(struct cpufreq_policy *policy)
 		new_policy = *policy;
 		pr_debug("updating policy for CPU %u\n", policy->cpu);
 
-		new_policy.min = policy->user_policy.min;
-		new_policy.max = policy->user_policy.max;
 		cpufreq_set_policy(policy, &new_policy);
 	}
 
@@ -1283,6 +1273,9 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
 				   DEV_PM_QOS_MAX_FREQUENCY);
 	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
 				   DEV_PM_QOS_MIN_FREQUENCY);
+	dev_pm_qos_remove_request(policy->max_freq_req);
+	dev_pm_qos_remove_request(policy->min_freq_req);
+	kfree(policy->min_freq_req);
 
 	cpufreq_policy_put_kobj(policy);
 	free_cpumask_var(policy->real_cpus);
@@ -1361,16 +1354,50 @@ static int cpufreq_online(unsigned int cpu)
 	cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
 
 	if (new_policy) {
-		policy->user_policy.min = policy->min;
-		policy->user_policy.max = policy->max;
+		struct device *dev = get_cpu_device(cpu);
 
 		for_each_cpu(j, policy->related_cpus) {
 			per_cpu(cpufreq_cpu_data, j) = policy;
 			add_cpu_dev_symlink(policy, j);
 		}
-	} else {
-		policy->min = policy->user_policy.min;
-		policy->max = policy->user_policy.max;
+
+		policy->min_freq_req = kzalloc(2 * sizeof(*policy->min_freq_req),
+					       GFP_KERNEL);
+		if (!policy->min_freq_req)
+			goto out_destroy_policy;
+
+		ret = dev_pm_qos_add_request(dev, policy->min_freq_req,
+					     DEV_PM_QOS_MIN_FREQUENCY,
+					     policy->min);
+		if (ret < 0) {
+			/*
+			 * So we don't call dev_pm_qos_remove_request() for an
+			 * uninitialized request.
+			 */
+			kfree(policy->min_freq_req);
+			policy->min_freq_req = NULL;
+
+			dev_err(dev, "Failed to add min-freq constraint (%d)\n",
+				ret);
+			goto out_destroy_policy;
+		}
+
+		/*
+		 * This must be initialized right here to avoid calling
+		 * dev_pm_qos_remove_request() on uninitialized request in case
+		 * of errors.
+		 */
+		policy->max_freq_req = policy->min_freq_req + 1;
+
+		ret = dev_pm_qos_add_request(dev, policy->max_freq_req,
+					     DEV_PM_QOS_MAX_FREQUENCY,
+					     policy->max);
+		if (ret < 0) {
+			policy->max_freq_req = NULL;
+			dev_err(dev, "Failed to add max-freq constraint (%d)\n",
+				ret);
+			goto out_destroy_policy;
+		}
 	}
 
 	if (cpufreq_driver->get && has_target()) {
@@ -2344,7 +2371,6 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 {
 	struct cpufreq_governor *old_gov;
 	struct device *cpu_dev = get_cpu_device(policy->cpu);
-	unsigned long min, max;
 	int ret;
 
 	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
@@ -2352,24 +2378,12 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 
 	memcpy(&new_policy->cpuinfo, &policy->cpuinfo, sizeof(policy->cpuinfo));
 
-	/*
-	* This check works well when we store new min/max freq attributes,
-	* because new_policy is a copy of policy with one field updated.
-	*/
-	if (new_policy->min > new_policy->max)
-		return -EINVAL;
-
 	/*
 	 * PM QoS framework collects all the requests from users and provide us
 	 * the final aggregated value here.
 	 */
-	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
-	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
-
-	if (min > new_policy->min)
-		new_policy->min = min;
-	if (max < new_policy->max)
-		new_policy->max = max;
+	new_policy->min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
+	new_policy->max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
 
 	/* verify the cpu speed can be set within this limit */
 	ret = cpufreq_driver->verify(new_policy);
@@ -2458,10 +2472,9 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
  * @cpu: CPU to re-evaluate the policy for.
  *
  * Update the current frequency for the cpufreq policy of @cpu and use
- * cpufreq_set_policy() to re-apply the min and max limits saved in the
- * user_policy sub-structure of that policy, which triggers the evaluation
- * of policy notifiers and the cpufreq driver's ->verify() callback for the
- * policy in question, among other things.
+ * cpufreq_set_policy() to re-apply the min and max limits, which triggers the
+ * evaluation of policy notifiers and the cpufreq driver's ->verify() callback
+ * for the policy in question, among other things.
  */
 void cpufreq_update_policy(unsigned int cpu)
 {
@@ -2521,10 +2534,9 @@ static int cpufreq_boost_set_sw(int state)
 			break;
 		}
 
-		down_write(&policy->rwsem);
-		policy->user_policy.max = policy->max;
-		cpufreq_governor_limits(policy);
-		up_write(&policy->rwsem);
+		ret = dev_pm_qos_update_request(policy->max_freq_req, policy->max);
+		if (ret)
+			break;
 	}
 
 	return ret;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 1fa37b675a80..afc683021ac5 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -50,11 +50,6 @@ struct cpufreq_cpuinfo {
 	unsigned int		transition_latency;
 };
 
-struct cpufreq_user_policy {
-	unsigned int		min;    /* in kHz */
-	unsigned int		max;    /* in kHz */
-};
-
 struct cpufreq_policy {
 	/* CPUs sharing clock, require sw coordination */
 	cpumask_var_t		cpus;	/* Online CPUs only */
@@ -84,7 +79,8 @@ struct cpufreq_policy {
 	struct work_struct	update; /* if update_policy() needs to be
 					 * called, but you're in IRQ context */
 
-	struct cpufreq_user_policy user_policy;
+	struct dev_pm_qos_request *min_freq_req;
+	struct dev_pm_qos_request *max_freq_req;
 	struct cpufreq_frequency_table	*freq_table;
 	enum cpufreq_table_sorting freq_table_sorted;
 
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 7/7] cpufreq: Add QoS requests for userspace constraints
  2019-07-05 10:51   ` [PATCH V7 " Viresh Kumar
@ 2019-07-05 11:17     ` Rafael J. Wysocki
  0 siblings, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-07-05 11:17 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Linux PM, Vincent Guittot, Matthias Kaehlcke,
	Ulf Hansson, Rafael J . Wysocki, Linux Kernel Mailing List

On Fri, Jul 5, 2019 at 12:52 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> This implements QoS requests to manage userspace configuration of min
> and max frequency.
>
> Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> V6->V7:
> - We can't call dev_pm_qos_remove_request() for a request which was
>   never added. This happened in one of the error paths. Fixed that by
>   allocating the requests only right before we try to add them and also
>   take care of things properly during errors.
>
> @Rafael: Please apply this version instead of the diff I supplied on the
> WARN email earlier. This has proper protection in place at many places.

Applied, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-07-04  7:36 ` [PATCH V6 5/7] cpufreq: Register notifiers with the PM QoS framework Viresh Kumar
@ 2019-07-08 10:57   ` Viresh Kumar
  2019-09-22 20:12     ` Dmitry Osipenko
  0 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2019-07-08 10:57 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, mka, ulf.hansson, sfr,
	pavel, Rafael J . Wysocki, linux-kernel

This registers the notifiers for min/max frequency constraints with the
PM QoS framework. The constraints are also taken into consideration in
cpufreq_set_policy().

This also relocates cpufreq_policy_put_kobj() as it is required to be
called from cpufreq_policy_alloc() now.

refresh_frequency_limits() is updated to avoid calling
cpufreq_set_policy() for inactive policies and handle_update() is
updated to have proper locking in place.

No constraints are added until now though.

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
V6->V7:
- All callers of refresh_frequency_limits(), except handle_update(),
  take the policy->rwsem and result in deadlock as
  refresh_frequency_limits() also takes the same lock again. Fix that
  by taking the rwsem from handle_update() instead.

@Rafael: Sending it before Pavel has verified it as I would be offline
later, in case you want to apply this today itself.

 drivers/cpufreq/cpufreq.c | 135 +++++++++++++++++++++++++++++---------
 include/linux/cpufreq.h   |   3 +
 2 files changed, 108 insertions(+), 30 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index ceb57af15ca0..b96ef6db1bfe 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -26,6 +26,7 @@
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/pm_qos.h>
 #include <linux/slab.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
@@ -999,7 +1000,7 @@ static void add_cpu_dev_symlink(struct cpufreq_policy *policy, unsigned int cpu)
 {
 	struct device *dev = get_cpu_device(cpu);
 
-	if (!dev)
+	if (unlikely(!dev))
 		return;
 
 	if (cpumask_test_and_set_cpu(cpu, policy->real_cpus))
@@ -1117,14 +1118,16 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cp
 
 static void refresh_frequency_limits(struct cpufreq_policy *policy)
 {
-	struct cpufreq_policy new_policy = *policy;
-
-	pr_debug("updating policy for CPU %u\n", policy->cpu);
+	struct cpufreq_policy new_policy;
 
-	new_policy.min = policy->user_policy.min;
-	new_policy.max = policy->user_policy.max;
+	if (!policy_is_inactive(policy)) {
+		new_policy = *policy;
+		pr_debug("updating policy for CPU %u\n", policy->cpu);
 
-	cpufreq_set_policy(policy, &new_policy);
+		new_policy.min = policy->user_policy.min;
+		new_policy.max = policy->user_policy.max;
+		cpufreq_set_policy(policy, &new_policy);
+	}
 }
 
 static void handle_update(struct work_struct *work)
@@ -1133,14 +1136,60 @@ static void handle_update(struct work_struct *work)
 		container_of(work, struct cpufreq_policy, update);
 
 	pr_debug("handle_update for cpu %u called\n", policy->cpu);
+	down_write(&policy->rwsem);
 	refresh_frequency_limits(policy);
+	up_write(&policy->rwsem);
+}
+
+static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long freq,
+				void *data)
+{
+	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_min);
+
+	schedule_work(&policy->update);
+	return 0;
+}
+
+static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long freq,
+				void *data)
+{
+	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_max);
+
+	schedule_work(&policy->update);
+	return 0;
+}
+
+static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
+{
+	struct kobject *kobj;
+	struct completion *cmp;
+
+	down_write(&policy->rwsem);
+	cpufreq_stats_free_table(policy);
+	kobj = &policy->kobj;
+	cmp = &policy->kobj_unregister;
+	up_write(&policy->rwsem);
+	kobject_put(kobj);
+
+	/*
+	 * We need to make sure that the underlying kobj is
+	 * actually not referenced anymore by anybody before we
+	 * proceed with unloading.
+	 */
+	pr_debug("waiting for dropping of refcount\n");
+	wait_for_completion(cmp);
+	pr_debug("wait complete\n");
 }
 
 static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 {
 	struct cpufreq_policy *policy;
+	struct device *dev = get_cpu_device(cpu);
 	int ret;
 
+	if (!dev)
+		return NULL;
+
 	policy = kzalloc(sizeof(*policy), GFP_KERNEL);
 	if (!policy)
 		return NULL;
@@ -1157,7 +1206,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
 				   cpufreq_global_kobject, "policy%u", cpu);
 	if (ret) {
-		pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
+		dev_err(dev, "%s: failed to init policy->kobj: %d\n", __func__, ret);
 		/*
 		 * The entire policy object will be freed below, but the extra
 		 * memory allocated for the kobject name needs to be freed by
@@ -1167,6 +1216,25 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 		goto err_free_real_cpus;
 	}
 
+	policy->nb_min.notifier_call = cpufreq_notifier_min;
+	policy->nb_max.notifier_call = cpufreq_notifier_max;
+
+	ret = dev_pm_qos_add_notifier(dev, &policy->nb_min,
+				      DEV_PM_QOS_MIN_FREQUENCY);
+	if (ret) {
+		dev_err(dev, "Failed to register MIN QoS notifier: %d (%*pbl)\n",
+			ret, cpumask_pr_args(policy->cpus));
+		goto err_kobj_remove;
+	}
+
+	ret = dev_pm_qos_add_notifier(dev, &policy->nb_max,
+				      DEV_PM_QOS_MAX_FREQUENCY);
+	if (ret) {
+		dev_err(dev, "Failed to register MAX QoS notifier: %d (%*pbl)\n",
+			ret, cpumask_pr_args(policy->cpus));
+		goto err_min_qos_notifier;
+	}
+
 	INIT_LIST_HEAD(&policy->policy_list);
 	init_rwsem(&policy->rwsem);
 	spin_lock_init(&policy->transition_lock);
@@ -1177,6 +1245,11 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	policy->cpu = cpu;
 	return policy;
 
+err_min_qos_notifier:
+	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
+				   DEV_PM_QOS_MIN_FREQUENCY);
+err_kobj_remove:
+	cpufreq_policy_put_kobj(policy);
 err_free_real_cpus:
 	free_cpumask_var(policy->real_cpus);
 err_free_rcpumask:
@@ -1189,30 +1262,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	return NULL;
 }
 
-static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
-{
-	struct kobject *kobj;
-	struct completion *cmp;
-
-	down_write(&policy->rwsem);
-	cpufreq_stats_free_table(policy);
-	kobj = &policy->kobj;
-	cmp = &policy->kobj_unregister;
-	up_write(&policy->rwsem);
-	kobject_put(kobj);
-
-	/*
-	 * We need to make sure that the underlying kobj is
-	 * actually not referenced anymore by anybody before we
-	 * proceed with unloading.
-	 */
-	pr_debug("waiting for dropping of refcount\n");
-	wait_for_completion(cmp);
-	pr_debug("wait complete\n");
-}
-
 static void cpufreq_policy_free(struct cpufreq_policy *policy)
 {
+	struct device *dev = get_cpu_device(policy->cpu);
 	unsigned long flags;
 	int cpu;
 
@@ -1224,6 +1276,11 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
 		per_cpu(cpufreq_cpu_data, cpu) = NULL;
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
+	dev_pm_qos_remove_notifier(dev, &policy->nb_max,
+				   DEV_PM_QOS_MAX_FREQUENCY);
+	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
+				   DEV_PM_QOS_MIN_FREQUENCY);
+
 	cpufreq_policy_put_kobj(policy);
 	free_cpumask_var(policy->real_cpus);
 	free_cpumask_var(policy->related_cpus);
@@ -2283,6 +2340,8 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 		       struct cpufreq_policy *new_policy)
 {
 	struct cpufreq_governor *old_gov;
+	struct device *cpu_dev = get_cpu_device(policy->cpu);
+	unsigned long min, max;
 	int ret;
 
 	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
@@ -2297,11 +2356,27 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
 	if (new_policy->min > new_policy->max)
 		return -EINVAL;
 
+	/*
+	 * PM QoS framework collects all the requests from users and provide us
+	 * the final aggregated value here.
+	 */
+	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
+	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
+
+	if (min > new_policy->min)
+		new_policy->min = min;
+	if (max < new_policy->max)
+		new_policy->max = max;
+
 	/* verify the cpu speed can be set within this limit */
 	ret = cpufreq_driver->verify(new_policy);
 	if (ret)
 		return ret;
 
+	/*
+	 * The notifier-chain shall be removed once all the users of
+	 * CPUFREQ_ADJUST are moved to use the QoS framework.
+	 */
 	/* adjust if necessary - all reasons */
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_ADJUST, new_policy);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index a1467aa7f58b..95425941f46d 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -147,6 +147,9 @@ struct cpufreq_policy {
 
 	/* Pointer to the cooling device if used for thermal mitigation */
 	struct thermal_cooling_device *cdev;
+
+	struct notifier_block nb_min;
+	struct notifier_block nb_max;
 };
 
 struct cpufreq_freqs {
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-07-08 10:57   ` [PATCH V7 " Viresh Kumar
@ 2019-09-22 20:12     ` Dmitry Osipenko
  2019-09-23 13:56       ` Viresh Kumar
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2019-09-22 20:12 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linux-pm, Vincent Guittot, mka, ulf.hansson, sfr, pavel,
	Rafael J . Wysocki, linux-kernel, linux-tegra

08.07.2019 13:57, Viresh Kumar пишет:
> This registers the notifiers for min/max frequency constraints with the
> PM QoS framework. The constraints are also taken into consideration in
> cpufreq_set_policy().
> 
> This also relocates cpufreq_policy_put_kobj() as it is required to be
> called from cpufreq_policy_alloc() now.
> 
> refresh_frequency_limits() is updated to avoid calling
> cpufreq_set_policy() for inactive policies and handle_update() is
> updated to have proper locking in place.
> 
> No constraints are added until now though.
> 
> Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> V6->V7:
> - All callers of refresh_frequency_limits(), except handle_update(),
>   take the policy->rwsem and result in deadlock as
>   refresh_frequency_limits() also takes the same lock again. Fix that
>   by taking the rwsem from handle_update() instead.
> 
> @Rafael: Sending it before Pavel has verified it as I would be offline
> later, in case you want to apply this today itself.
> 
>  drivers/cpufreq/cpufreq.c | 135 +++++++++++++++++++++++++++++---------
>  include/linux/cpufreq.h   |   3 +
>  2 files changed, 108 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index ceb57af15ca0..b96ef6db1bfe 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -26,6 +26,7 @@
>  #include <linux/kernel_stat.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
> +#include <linux/pm_qos.h>
>  #include <linux/slab.h>
>  #include <linux/suspend.h>
>  #include <linux/syscore_ops.h>
> @@ -999,7 +1000,7 @@ static void add_cpu_dev_symlink(struct cpufreq_policy *policy, unsigned int cpu)
>  {
>  	struct device *dev = get_cpu_device(cpu);
>  
> -	if (!dev)
> +	if (unlikely(!dev))
>  		return;
>  
>  	if (cpumask_test_and_set_cpu(cpu, policy->real_cpus))
> @@ -1117,14 +1118,16 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cp
>  
>  static void refresh_frequency_limits(struct cpufreq_policy *policy)
>  {
> -	struct cpufreq_policy new_policy = *policy;
> -
> -	pr_debug("updating policy for CPU %u\n", policy->cpu);
> +	struct cpufreq_policy new_policy;
>  
> -	new_policy.min = policy->user_policy.min;
> -	new_policy.max = policy->user_policy.max;
> +	if (!policy_is_inactive(policy)) {
> +		new_policy = *policy;
> +		pr_debug("updating policy for CPU %u\n", policy->cpu);
>  
> -	cpufreq_set_policy(policy, &new_policy);
> +		new_policy.min = policy->user_policy.min;
> +		new_policy.max = policy->user_policy.max;
> +		cpufreq_set_policy(policy, &new_policy);
> +	}
>  }
>  
>  static void handle_update(struct work_struct *work)
> @@ -1133,14 +1136,60 @@ static void handle_update(struct work_struct *work)
>  		container_of(work, struct cpufreq_policy, update);
>  
>  	pr_debug("handle_update for cpu %u called\n", policy->cpu);
> +	down_write(&policy->rwsem);
>  	refresh_frequency_limits(policy);
> +	up_write(&policy->rwsem);
> +}
> +
> +static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long freq,
> +				void *data)
> +{
> +	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_min);
> +
> +	schedule_work(&policy->update);
> +	return 0;
> +}
> +
> +static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long freq,
> +				void *data)
> +{
> +	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_max);
> +
> +	schedule_work(&policy->update);
> +	return 0;
> +}
> +
> +static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
> +{
> +	struct kobject *kobj;
> +	struct completion *cmp;
> +
> +	down_write(&policy->rwsem);
> +	cpufreq_stats_free_table(policy);
> +	kobj = &policy->kobj;
> +	cmp = &policy->kobj_unregister;
> +	up_write(&policy->rwsem);
> +	kobject_put(kobj);
> +
> +	/*
> +	 * We need to make sure that the underlying kobj is
> +	 * actually not referenced anymore by anybody before we
> +	 * proceed with unloading.
> +	 */
> +	pr_debug("waiting for dropping of refcount\n");
> +	wait_for_completion(cmp);
> +	pr_debug("wait complete\n");
>  }
>  
>  static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
>  {
>  	struct cpufreq_policy *policy;
> +	struct device *dev = get_cpu_device(cpu);
>  	int ret;
>  
> +	if (!dev)
> +		return NULL;
> +
>  	policy = kzalloc(sizeof(*policy), GFP_KERNEL);
>  	if (!policy)
>  		return NULL;
> @@ -1157,7 +1206,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
>  	ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
>  				   cpufreq_global_kobject, "policy%u", cpu);
>  	if (ret) {
> -		pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
> +		dev_err(dev, "%s: failed to init policy->kobj: %d\n", __func__, ret);
>  		/*
>  		 * The entire policy object will be freed below, but the extra
>  		 * memory allocated for the kobject name needs to be freed by
> @@ -1167,6 +1216,25 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
>  		goto err_free_real_cpus;
>  	}
>  
> +	policy->nb_min.notifier_call = cpufreq_notifier_min;
> +	policy->nb_max.notifier_call = cpufreq_notifier_max;
> +
> +	ret = dev_pm_qos_add_notifier(dev, &policy->nb_min,
> +				      DEV_PM_QOS_MIN_FREQUENCY);
> +	if (ret) {
> +		dev_err(dev, "Failed to register MIN QoS notifier: %d (%*pbl)\n",
> +			ret, cpumask_pr_args(policy->cpus));
> +		goto err_kobj_remove;
> +	}
> +
> +	ret = dev_pm_qos_add_notifier(dev, &policy->nb_max,
> +				      DEV_PM_QOS_MAX_FREQUENCY);
> +	if (ret) {
> +		dev_err(dev, "Failed to register MAX QoS notifier: %d (%*pbl)\n",
> +			ret, cpumask_pr_args(policy->cpus));
> +		goto err_min_qos_notifier;
> +	}
> +
>  	INIT_LIST_HEAD(&policy->policy_list);
>  	init_rwsem(&policy->rwsem);
>  	spin_lock_init(&policy->transition_lock);
> @@ -1177,6 +1245,11 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
>  	policy->cpu = cpu;
>  	return policy;
>  
> +err_min_qos_notifier:
> +	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
> +				   DEV_PM_QOS_MIN_FREQUENCY);
> +err_kobj_remove:
> +	cpufreq_policy_put_kobj(policy);
>  err_free_real_cpus:
>  	free_cpumask_var(policy->real_cpus);
>  err_free_rcpumask:
> @@ -1189,30 +1262,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
>  	return NULL;
>  }
>  
> -static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
> -{
> -	struct kobject *kobj;
> -	struct completion *cmp;
> -
> -	down_write(&policy->rwsem);
> -	cpufreq_stats_free_table(policy);
> -	kobj = &policy->kobj;
> -	cmp = &policy->kobj_unregister;
> -	up_write(&policy->rwsem);
> -	kobject_put(kobj);
> -
> -	/*
> -	 * We need to make sure that the underlying kobj is
> -	 * actually not referenced anymore by anybody before we
> -	 * proceed with unloading.
> -	 */
> -	pr_debug("waiting for dropping of refcount\n");
> -	wait_for_completion(cmp);
> -	pr_debug("wait complete\n");
> -}
> -
>  static void cpufreq_policy_free(struct cpufreq_policy *policy)
>  {
> +	struct device *dev = get_cpu_device(policy->cpu);
>  	unsigned long flags;
>  	int cpu;
>  
> @@ -1224,6 +1276,11 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
>  		per_cpu(cpufreq_cpu_data, cpu) = NULL;
>  	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>  
> +	dev_pm_qos_remove_notifier(dev, &policy->nb_max,
> +				   DEV_PM_QOS_MAX_FREQUENCY);
> +	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
> +				   DEV_PM_QOS_MIN_FREQUENCY);
> +
>  	cpufreq_policy_put_kobj(policy);
>  	free_cpumask_var(policy->real_cpus);
>  	free_cpumask_var(policy->related_cpus);
> @@ -2283,6 +2340,8 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
>  		       struct cpufreq_policy *new_policy)
>  {
>  	struct cpufreq_governor *old_gov;
> +	struct device *cpu_dev = get_cpu_device(policy->cpu);
> +	unsigned long min, max;
>  	int ret;
>  
>  	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
> @@ -2297,11 +2356,27 @@ int cpufreq_set_policy(struct cpufreq_policy *policy,
>  	if (new_policy->min > new_policy->max)
>  		return -EINVAL;
>  
> +	/*
> +	 * PM QoS framework collects all the requests from users and provide us
> +	 * the final aggregated value here.
> +	 */
> +	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
> +	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
> +
> +	if (min > new_policy->min)
> +		new_policy->min = min;
> +	if (max < new_policy->max)
> +		new_policy->max = max;
> +
>  	/* verify the cpu speed can be set within this limit */
>  	ret = cpufreq_driver->verify(new_policy);
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * The notifier-chain shall be removed once all the users of
> +	 * CPUFREQ_ADJUST are moved to use the QoS framework.
> +	 */
>  	/* adjust if necessary - all reasons */
>  	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
>  			CPUFREQ_ADJUST, new_policy);
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index a1467aa7f58b..95425941f46d 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -147,6 +147,9 @@ struct cpufreq_policy {
>  
>  	/* Pointer to the cooling device if used for thermal mitigation */
>  	struct thermal_cooling_device *cdev;
> +
> +	struct notifier_block nb_min;
> +	struct notifier_block nb_max;
>  };
>  
>  struct cpufreq_freqs {
> 

Hello Viresh,

This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.


[   87.952369] ==================================================================
[   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
[   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243

[   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: G        W
5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
[   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[   87.956807] [<c0110aad>] (unwind_backtrace) from [<c010bb71>] (show_stack+0x11/0x14)
[   87.957709] [<c010bb71>] (show_stack) from [<c0d37b25>] (dump_stack+0x89/0x98)
[   87.958616] [<c0d37b25>] (dump_stack) from [<c02937e1>]
(print_address_description.constprop.0+0x3d/0x340)
[   87.959785] [<c02937e1>] (print_address_description.constprop.0) from [<c0293c6b>]
(__kasan_report+0xe3/0x12c)
[   87.960907] [<c0293c6b>] (__kasan_report) from [<c014988f>] (notifier_chain_register+0x4f/0x9c)
[   87.962001] [<c014988f>] (notifier_chain_register) from [<c01499b5>]
(blocking_notifier_chain_register+0x29/0x3c)
[   87.963180] [<c01499b5>] (blocking_notifier_chain_register) from [<c06f7ee9>]
(dev_pm_qos_add_notifier+0x79/0xf8)
[   87.964339] [<c06f7ee9>] (dev_pm_qos_add_notifier) from [<c092927d>] (cpufreq_online+0x5e1/0x8a4)
[   87.965351] [<c092927d>] (cpufreq_online) from [<c09295c9>] (cpufreq_add_dev+0x79/0x80)
[   87.966247] [<c09295c9>] (cpufreq_add_dev) from [<c06eb9d3>] (subsys_interface_register+0xc3/0x100)
[   87.967297] [<c06eb9d3>] (subsys_interface_register) from [<c0926e53>]
(cpufreq_register_driver+0x13b/0x1ec)
[   87.968476] [<c0926e53>] (cpufreq_register_driver) from [<bf800435>]
(tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])
[   87.969711] [<bf800435>] (tegra20_cpufreq_probe [tegra20_cpufreq]) from [<c06ef7a5>]
(platform_drv_probe+0x49/0x88)
[   87.970854] [<c06ef7a5>] (platform_drv_probe) from [<c06ed299>] (really_probe+0x109/0x374)
[   87.971771] [<c06ed299>] (really_probe) from [<c06ed64f>] (driver_probe_device+0x57/0x158)
[   87.972685] [<c06ed64f>] (driver_probe_device) from [<c06ed8fd>] (device_driver_attach+0x61/0x64)
[   87.973658] [<c06ed8fd>] (device_driver_attach) from [<c06ed949>] (__driver_attach+0x49/0xa0)
[   87.974599] [<c06ed949>] (__driver_attach) from [<c06eb62d>] (bus_for_each_dev+0x69/0x94)
[   87.975510] [<c06eb62d>] (bus_for_each_dev) from [<c06ec731>] (bus_add_driver+0x179/0x1e8)
[   87.976427] [<c06ec731>] (bus_add_driver) from [<c06ee4ab>] (driver_register+0x8f/0x130)
[   87.977345] [<c06ee4ab>] (driver_register) from [<bf805017>]
(tegra20_cpufreq_driver_init+0x17/0x1000 [tegra20_cpufreq])
[   87.978553] [<bf805017>] (tegra20_cpufreq_driver_init [tegra20_cpufreq]) from [<c0102f35>]
(do_one_initcall+0x4d/0x280)
[   87.979729] [<c0102f35>] (do_one_initcall) from [<c01c8675>] (do_init_module+0xb9/0x288)
[   87.980618] [<c01c8675>] (do_init_module) from [<c01cb105>] (load_module+0x2829/0x2b90)
[   87.981496] [<c01cb105>] (load_module) from [<c01cb62b>] (sys_finit_module+0x7b/0x8c)
[   87.982358] [<c01cb62b>] (sys_finit_module) from [<c0101001>] (ret_fast_syscall+0x1/0x26)
[   87.983240] Exception stack(0xe4babfa8 to 0xe4babff0)
[   87.983809] bfa0:                   0003f210 00000001 00000003 0002b744 00000000 b6611e64
[   87.984703] bfc0: 0003f210 00000001 29a92d00 0000017b 0003f2a8 00000000 0003f210 00040008
[   87.985588] bfe0: b6611de8 b6611dd8 00022534 aec402e0

[   87.986340] Allocated by task 182:
[   87.986756]  cpufreq_online+0x55f/0x8a4
[   87.987204]  cpufreq_add_dev+0x79/0x80
[   87.987641]  subsys_interface_register+0xc3/0x100
[   87.988178]  cpufreq_register_driver+0x13b/0x1ec
[   87.988719]  tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq]
[   87.989389]  platform_drv_probe+0x49/0x88
[   87.989849]  really_probe+0x109/0x374
[   88.021245]  driver_probe_device+0x57/0x158
[   88.052890]  device_driver_attach+0x61/0x64
[   88.084012]  __driver_attach+0x49/0xa0
[   88.114533]  bus_for_each_dev+0x69/0x94
[   88.144898]  bus_add_driver+0x179/0x1e8
[   88.174840]  driver_register+0x8f/0x130
[   88.204398]  tegra20_cpufreq_driver_init+0x17/0x1000 [tegra20_cpufreq]
[   88.234165]  do_one_initcall+0x4d/0x280
[   88.263754]  do_init_module+0xb9/0x288
[   88.292773]  load_module+0x2829/0x2b90
[   88.321106]  sys_finit_module+0x7b/0x8c
[   88.349098]  ret_fast_syscall+0x1/0x26
[   88.376732]  0xb6802c60

[   88.430292] Freed by task 239:
[   88.456279]  __kasan_slab_free+0x9f/0xc4
[   88.481802]  kfree+0x71/0x1f4
[   88.506805]  subsys_interface_unregister+0xad/0xf0
[   88.531524]  cpufreq_unregister_driver+0x2f/0x7c
[   88.555909]  tegra20_cpufreq_remove+0x19/0x48 [tegra20_cpufreq]
[   88.580273]  platform_drv_remove+0x27/0x34
[   88.604156]  device_release_driver_internal+0xdf/0x1a4
[   88.627659]  driver_detach+0x85/0xf8
[   88.650458]  bus_remove_driver+0x53/0xb0
[   88.672706]  tegra20_cpufreq_driver_exit+0x9/0xb88 [tegra20_cpufreq]
[   88.695100]  sys_delete_module+0x14d/0x1a0
[   88.717147]  ret_fast_syscall+0x1/0x26
[   88.738799]  0xb6d35ff4

[   88.780543] The buggy address belongs to the object at e6abbc00
                which belongs to the cache kmalloc-512 of size 512
[   88.821845] The buggy address is located 268 bytes inside of
                512-byte region [e6abbc00, e6abbe00)
[   88.861188] The buggy address belongs to the page:
[   88.880380] page:edcbc700 refcount:1 mapcount:0 mapping:e7001a00 index:0x0 compound_mapcount: 0
[   88.900167] flags: 0x10200(slab|head)
[   88.919599] raw: 00010200 edcafb00 00030003 e7001a00 00000000 00100010 ffffffff 00000001
[   88.939423] page dumped because: kasan: bad access detected

[   88.978080] Memory state around the buggy address:
[   88.997237]  e6abbc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.016710]  e6abbc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.036051] >e6abbd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.055192]               ^
[   89.074260]  e6abbd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.093832]  e6abbe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   89.113300] ==================================================================

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-09-22 20:12     ` Dmitry Osipenko
@ 2019-09-23 13:56       ` Viresh Kumar
  2019-09-23 17:27         ` Dmitry Osipenko
  2019-10-14  9:42       ` Viresh Kumar
  2019-10-15 11:46       ` Viresh Kumar
  2 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2019-09-23 13:56 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, mka, ulf.hansson, sfr,
	pavel, Rafael J . Wysocki, linux-kernel, linux-tegra

On 22-09-19, 23:12, Dmitry Osipenko wrote:
> This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
> 
> 
> [   87.952369] ==================================================================
> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> 
> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: G        W
> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> [   87.956807] [<c0110aad>] (unwind_backtrace) from [<c010bb71>] (show_stack+0x11/0x14)
> [   87.957709] [<c010bb71>] (show_stack) from [<c0d37b25>] (dump_stack+0x89/0x98)
> [   87.958616] [<c0d37b25>] (dump_stack) from [<c02937e1>]
> (print_address_description.constprop.0+0x3d/0x340)
> [   87.959785] [<c02937e1>] (print_address_description.constprop.0) from [<c0293c6b>]
> (__kasan_report+0xe3/0x12c)
> [   87.960907] [<c0293c6b>] (__kasan_report) from [<c014988f>] (notifier_chain_register+0x4f/0x9c)
> [   87.962001] [<c014988f>] (notifier_chain_register) from [<c01499b5>]
> (blocking_notifier_chain_register+0x29/0x3c)
> [   87.963180] [<c01499b5>] (blocking_notifier_chain_register) from [<c06f7ee9>]
> (dev_pm_qos_add_notifier+0x79/0xf8)
> [   87.964339] [<c06f7ee9>] (dev_pm_qos_add_notifier) from [<c092927d>] (cpufreq_online+0x5e1/0x8a4)

Hi Dmitry,

Unfortunately I am traveling right now and can't test this stuff, though I may
have found the root cause here. Can you please test the below diff for me ?

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 6c90fd7e2ff8..9ac244ee05fe 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -328,6 +328,8 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
        spin_unlock_irq(&dev->power.lock);
 
        kfree(qos->resume_latency.notifiers);
+       kfree(qos->min_frequency.notifiers);
+       kfree(qos->max_frequency.notifiers);
        kfree(qos);
 
  out:

-- 
viresh

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-09-23 13:56       ` Viresh Kumar
@ 2019-09-23 17:27         ` Dmitry Osipenko
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2019-09-23 17:27 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, mka, ulf.hansson, sfr,
	pavel, Rafael J . Wysocki, linux-kernel, linux-tegra

23.09.2019 16:56, Viresh Kumar пишет:
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
>> This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
>>
>>
>> [   87.952369] ==================================================================
>> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
>> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
>>
>> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: G        W
>> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
>> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>> [   87.956807] [<c0110aad>] (unwind_backtrace) from [<c010bb71>] (show_stack+0x11/0x14)
>> [   87.957709] [<c010bb71>] (show_stack) from [<c0d37b25>] (dump_stack+0x89/0x98)
>> [   87.958616] [<c0d37b25>] (dump_stack) from [<c02937e1>]
>> (print_address_description.constprop.0+0x3d/0x340)
>> [   87.959785] [<c02937e1>] (print_address_description.constprop.0) from [<c0293c6b>]
>> (__kasan_report+0xe3/0x12c)
>> [   87.960907] [<c0293c6b>] (__kasan_report) from [<c014988f>] (notifier_chain_register+0x4f/0x9c)
>> [   87.962001] [<c014988f>] (notifier_chain_register) from [<c01499b5>]
>> (blocking_notifier_chain_register+0x29/0x3c)
>> [   87.963180] [<c01499b5>] (blocking_notifier_chain_register) from [<c06f7ee9>]
>> (dev_pm_qos_add_notifier+0x79/0xf8)
>> [   87.964339] [<c06f7ee9>] (dev_pm_qos_add_notifier) from [<c092927d>] (cpufreq_online+0x5e1/0x8a4)
> 
> Hi Dmitry,
> 
> Unfortunately I am traveling right now and can't test this stuff, though I may
> have found the root cause here. Can you please test the below diff for me ?
> 
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index 6c90fd7e2ff8..9ac244ee05fe 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -328,6 +328,8 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
>         spin_unlock_irq(&dev->power.lock);
>  
>         kfree(qos->resume_latency.notifiers);
> +       kfree(qos->min_frequency.notifiers);
> +       kfree(qos->max_frequency.notifiers);
>         kfree(qos);
>  
>   out:
> 

Doesn't help. The use-after-free bugs are usually caused by a missing
NULL assignment after kfree(), like in this snippet:

	..
	if (!a)
		a = kmalloc();
	..
	kfree(a);
	// a = NULL    <-- missing!
	..

I briefly looked through the code and don't see anything obviously
wrong. The bug isn't critical since unlikely that somebody reloads
cpufreq module for a non-development purposes, so it's not a big deal
and can wait. Please take your time.

I also want to point out that kernel crashes after second module reload,
hence the KASAN report should be valid.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-09-22 20:12     ` Dmitry Osipenko
  2019-09-23 13:56       ` Viresh Kumar
@ 2019-10-14  9:42       ` Viresh Kumar
  2019-10-14 13:01         ` Dmitry Osipenko
  2019-10-15 11:46       ` Viresh Kumar
  2 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2019-10-14  9:42 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, mka, ulf.hansson, sfr,
	pavel, Rafael J . Wysocki, linux-kernel, linux-tegra

On 22-09-19, 23:12, Dmitry Osipenko wrote:
> This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
> 
> 
> [   87.952369] ==================================================================
> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243

Hi Dmitry,

I tried to reproduce it on my ubuntu on ARM64 setup and I couldn't hit
these issues on v5.4-rc1 with Kasan built in.

I then enabled Kasan (tried both inline and outline instrumentation)
but I couldn't get past the issues with module insertion. It fails
like this for me:

root@linaro-developer:~/work# insmod cpufreq-dt.ko 
[   72.985974] cpufreq_dt: Unknown symbol __asan_report_load1_noabort (err -2)
[   72.993164] cpufreq_dt: Unknown symbol __asan_report_load4_noabort (err -2)
[   73.000307] cpufreq_dt: Unknown symbol __asan_report_load8_noabort (err -2)
[   73.007451] cpufreq_dt: Unknown symbol __asan_report_store1_noabort (err -2)
[   73.014643] cpufreq_dt: Unknown symbol __asan_register_globals (err -2)
[   73.021409] cpufreq_dt: Unknown symbol __asan_unregister_globals (err -2)
[   73.028349] cpufreq_dt: Unknown symbol __asan_report_store8_noabort (err -2)
[   73.035543] cpufreq_dt: Unknown symbol __asan_report_store4_noabort (err -2)
insmod: ERROR: could not insert module cpufreq-dt.ko: Unknown symbol in module

I tried to search for these errors but couldn't find why I am getting
these and why the symbols are missing here. Can you suggest something
here ?

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-10-14  9:42       ` Viresh Kumar
@ 2019-10-14 13:01         ` Dmitry Osipenko
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2019-10-14 13:01 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, mka, ulf.hansson, sfr,
	pavel, Rafael J . Wysocki, linux-kernel, linux-tegra

14.10.2019 12:42, Viresh Kumar пишет:
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
>> This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
>>
>>
>> [   87.952369] ==================================================================
>> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
>> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> 
> Hi Dmitry,
> 
> I tried to reproduce it on my ubuntu on ARM64 setup and I couldn't hit
> these issues on v5.4-rc1 with Kasan built in.
> 
> I then enabled Kasan (tried both inline and outline instrumentation)
> but I couldn't get past the issues with module insertion. It fails
> like this for me:
> 
> root@linaro-developer:~/work# insmod cpufreq-dt.ko 
> [   72.985974] cpufreq_dt: Unknown symbol __asan_report_load1_noabort (err -2)
> [   72.993164] cpufreq_dt: Unknown symbol __asan_report_load4_noabort (err -2)
> [   73.000307] cpufreq_dt: Unknown symbol __asan_report_load8_noabort (err -2)
> [   73.007451] cpufreq_dt: Unknown symbol __asan_report_store1_noabort (err -2)
> [   73.014643] cpufreq_dt: Unknown symbol __asan_register_globals (err -2)
> [   73.021409] cpufreq_dt: Unknown symbol __asan_unregister_globals (err -2)
> [   73.028349] cpufreq_dt: Unknown symbol __asan_report_store8_noabort (err -2)
> [   73.035543] cpufreq_dt: Unknown symbol __asan_report_store4_noabort (err -2)
> insmod: ERROR: could not insert module cpufreq-dt.ko: Unknown symbol in module
> 
> I tried to search for these errors but couldn't find why I am getting
> these and why the symbols are missing here. Can you suggest something
> here ?
> 

Sorry, I don't know what's wrong with ARM64. There is no KASAN on ARM32 in upstream yet, I'm using
the WIP patches [1].

[1] https://lkml.org/lkml/2019/6/17/1562

BTW, I moved tegra20-cpufreq to use cpufreq-dt recently and the problem presents with the cpufreq-dt:

# rmmod cpufreq_dt
# modprobe cpufreq_dt

[   31.259483] ==================================================================
[   31.260321] BUG: KASAN: use-after-free in notifier_chain_register+0x2b/0x7c
[   31.261026] Read of size 4 at addr cc30250c by task modprobe/218

[   31.262067] CPU: 1 PID: 218 Comm: modprobe Tainted: G        W
5.4.0-rc2-next-20191011-00194-g02f44e30b215-dirty #2645
[   31.263347] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[   31.264154] [<c011116d>] (unwind_backtrace) from [<c010bb05>] (show_stack+0x11/0x14)
[   31.264960] [<c010bb05>] (show_stack) from [<c0d749ad>] (dump_stack+0x89/0x98)
[   31.265804] [<c0d749ad>] (dump_stack) from [<c02c72dd>]
(print_address_description.constprop.0+0x3d/0x340)
[   31.266830] [<c02c72dd>] (print_address_description.constprop.0) from [<c02c7767>]
(__kasan_report+0xe3/0x12c)
[   31.267865] [<c02c7767>] (__kasan_report) from [<c014eabb>] (notifier_chain_register+0x2b/0x7c)
[   31.268755] [<c014eabb>] (notifier_chain_register) from [<c014eb89>]
(blocking_notifier_chain_register+0x29/0x3c)
[   31.269842] [<c014eb89>] (blocking_notifier_chain_register) from [<c072cc11>]
(dev_pm_qos_add_notifier+0x79/0xf8)
[   31.270948] [<c072cc11>] (dev_pm_qos_add_notifier) from [<c095ec71>] (cpufreq_online+0x5e1/0x8a4)
[   31.271922] [<c095ec71>] (cpufreq_online) from [<c095efbd>] (cpufreq_add_dev+0x79/0x80)
[   31.272889] [<c095efbd>] (cpufreq_add_dev) from [<c0720213>] (subsys_interface_register+0xc3/0x100)
[   31.273894] [<c0720213>] (subsys_interface_register) from [<c095d83f>]
(cpufreq_register_driver+0x13b/0x1ec)
[   31.274912] [<c095d83f>] (cpufreq_register_driver) from [<bf800475>] (dt_cpufreq_probe+0x89/0xe0
[cpufreq_dt])
[   31.275924] [<bf800475>] (dt_cpufreq_probe [cpufreq_dt]) from [<c0723e31>]
(platform_drv_probe+0x49/0x88)
[   31.276889] [<c0723e31>] (platform_drv_probe) from [<c0721ad9>] (really_probe+0x109/0x378)
[   31.277715] [<c0721ad9>] (really_probe) from [<c0721e93>] (driver_probe_device+0x57/0x15c)
[   31.278537] [<c0721e93>] (driver_probe_device) from [<c0722145>] (device_driver_attach+0x61/0x64)
[   31.279425] [<c0722145>] (device_driver_attach) from [<c0722191>] (__driver_attach+0x49/0xa0)
[   31.280273] [<c0722191>] (__driver_attach) from [<c071fe6d>] (bus_for_each_dev+0x69/0x94)
[   31.281087] [<c071fe6d>] (bus_for_each_dev) from [<c0720f71>] (bus_add_driver+0x179/0x1e8)
[   31.281909] [<c0720f71>] (bus_add_driver) from [<c0722cf7>] (driver_register+0x8f/0x130)
[   31.282734] [<c0722cf7>] (driver_register) from [<bf805017>] (dt_cpufreq_platdrv_init+0x17/0x1000
[cpufreq_dt])
[   31.283761] [<bf805017>] (dt_cpufreq_platdrv_init [cpufreq_dt]) from [<c0102f69>]
(do_one_initcall+0x4d/0x280)
[   31.284759] [<c0102f69>] (do_one_initcall) from [<c01c70a9>] (do_init_module+0xb9/0x28c)
[   31.285561] [<c01c70a9>] (do_init_module) from [<c01c9ba9>] (load_module+0x2895/0x2c04)
[   31.286347] [<c01c9ba9>] (load_module) from [<c01ca0d7>] (sys_finit_module+0x7b/0x8c)
[   31.287117] [<c01ca0d7>] (sys_finit_module) from [<c0101001>] (ret_fast_syscall+0x1/0x26)
[   31.287901] Exception stack(0xcabb3fa8 to 0xcabb3ff0)
[   31.288406] 3fa0:                   0003f348 00000001 00000003 0002b744 00000000 b6b31e74
[   31.289200] 3fc0: 0003f348 00000001 94ccfd00 0000017b 0003f3f0 00000000 0003f348 00040010
[   31.290029] 3fe0: b6b31df8 b6b31de8 00022534 aec752f0

[   31.290698] Allocated by task 181:
[   31.291065]  __kasan_kmalloc.constprop.0+0x7b/0x84
[   31.291565]  cpufreq_online+0x55f/0x8a4
[   31.291959]  cpufreq_add_dev+0x79/0x80
[   31.292351]  subsys_interface_register+0xc3/0x100
[   31.292830]  cpufreq_register_driver+0x13b/0x1ec
[   31.293335]  dt_cpufreq_probe+0x89/0xe0 [cpufreq_dt]
[   31.293832]  platform_drv_probe+0x49/0x88
[   31.294245]  really_probe+0x109/0x378
[   31.294623]  driver_probe_device+0x57/0x15c
[   31.295048]  device_driver_attach+0x61/0x64
[   31.295472]  __driver_attach+0x49/0xa0
[   31.295854]  bus_for_each_dev+0x69/0x94
[   31.296244]  bus_add_driver+0x179/0x1e8
[   31.296636]  driver_register+0x8f/0x130
[   31.297047]  dt_cpufreq_platdrv_init+0x17/0x1000 [cpufreq_dt]
[   31.297616]  do_one_initcall+0x4d/0x280
[   31.298013]  do_init_module+0xb9/0x28c
[   31.298397]  load_module+0x2895/0x2c04
[   31.298780]  sys_finit_module+0x7b/0x8c
[   31.299167]  ret_fast_syscall+0x1/0x26
[   31.299548]  0xb6c2ac60

[   31.299967] Freed by task 214:
[   31.300288]  __kasan_slab_free+0xb7/0xe0
[   31.300686]  kfree+0x71/0x1f4
[   31.301001]  subsys_interface_unregister+0xad/0xf0
[   31.338959]  cpufreq_unregister_driver+0x2f/0x7c
[   31.377102]  dt_cpufreq_remove+0x15/0x18 [cpufreq_dt]
[   31.414885]  platform_drv_remove+0x27/0x34
[   31.452644]  device_release_driver_internal+0xdf/0x1a8
[   31.490404]  driver_detach+0x85/0xf8
[   31.527682]  bus_remove_driver+0x53/0xb0
[   31.564827]  dt_cpufreq_platdrv_exit+0x9/0xb28 [cpufreq_dt]
[   31.601736]  sys_delete_module+0x117/0x1a4
[   31.638575]  ret_fast_syscall+0x1/0x26
[   31.675041]  0xb6cafff4

[   31.746517] The buggy address belongs to the object at cc302400
                which belongs to the cache kmalloc-512 of size 512
[   31.817855] The buggy address is located 268 bytes inside of
                512-byte region [cc302400, cc302600)
[   31.888496] The buggy address belongs to the page:
[   31.923247] page:d291a000 refcount:1 mapcount:0 mapping:ce001a00 index:0x0 compound_mapcount: 0
[   31.958247] flags: 0x10200(slab|head)
[   31.992944] raw: 00010200 00000100 00000122 ce001a00 00000000 00100010 ffffffff 00000001
[   32.027763] page dumped because: kasan: bad access detected

[   32.095965] Memory state around the buggy address:
[   32.129904]  cc302400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   32.163593]  cc302480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   32.196538] >cc302500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   32.229052]               ^
[   32.260939]  cc302580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   32.292881]  cc302600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   32.324296] ==================================================================
[   32.355594] Disabling lock debugging due to kernel taint
[   32.462151] ------------[ cut here ]------------
[   32.492881] WARNING: CPU: 1 PID: 218 at lib/refcount.c:156 dev_pm_opp_of_add_table+0x59/0x128
[   32.523741] refcount_t: increment on 0; use-after-free.
[   32.554329] Modules linked in: cpufreq_dt(+) tegra30_devfreq [last unloaded: cpufreq_dt]
[   32.585233] CPU: 1 PID: 218 Comm: modprobe Tainted: G    B   W
5.4.0-rc2-next-20191011-00194-g02f44e30b215-dirty #2645
[   32.646692] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[   32.677493] [<c011116d>] (unwind_backtrace) from [<c010bb05>] (show_stack+0x11/0x14)
[   32.708460] [<c010bb05>] (show_stack) from [<c0d749ad>] (dump_stack+0x89/0x98)
[   32.739392] [<c0d749ad>] (dump_stack) from [<c0127713>] (__warn+0x10f/0x110)
[   32.770049] [<c0127713>] (__warn) from [<c0127a09>] (warn_slowpath_fmt+0x61/0x78)
[   32.800656] [<c0127a09>] (warn_slowpath_fmt) from [<c095afc5>] (dev_pm_opp_of_add_table+0x59/0x128)
[   32.860732] [<c095afc5>] (dev_pm_opp_of_add_table) from [<c095b0c5>]
(dev_pm_opp_of_cpumask_add_table+0x31/0x88)
[   32.921247] [<c095b0c5>] (dev_pm_opp_of_cpumask_add_table) from [<bf800245>]
(cpufreq_init+0xd9/0x280 [cpufreq_dt])
[   32.982732] [<bf800245>] (cpufreq_init [cpufreq_dt]) from [<c095ea0f>] (cpufreq_online+0x37f/0x8a4)
[   33.045107] [<c095ea0f>] (cpufreq_online) from [<c095efbd>] (cpufreq_add_dev+0x79/0x80)
[   33.077037] [<c095efbd>] (cpufreq_add_dev) from [<c0720213>] (subsys_interface_register+0xc3/0x100)
[   33.140128] [<c0720213>] (subsys_interface_register) from [<c095d83f>]
(cpufreq_register_driver+0x13b/0x1ec)
[   33.204911] [<c095d83f>] (cpufreq_register_driver) from [<bf800475>] (dt_cpufreq_probe+0x89/0xe0
[cpufreq_dt])
[   33.271766] [<bf800475>] (dt_cpufreq_probe [cpufreq_dt]) from [<c0723e31>]
(platform_drv_probe+0x49/0x88)
[   33.340156] [<c0723e31>] (platform_drv_probe) from [<c0721ad9>] (really_probe+0x109/0x378)
[   33.375275] [<c0721ad9>] (really_probe) from [<c0721e93>] (driver_probe_device+0x57/0x15c)
[   33.410559] [<c0721e93>] (driver_probe_device) from [<c0722145>] (device_driver_attach+0x61/0x64)
[   33.446244] [<c0722145>] (device_driver_attach) from [<c0722191>] (__driver_attach+0x49/0xa0)
[   33.482238] [<c0722191>] (__driver_attach) from [<c071fe6d>] (bus_for_each_dev+0x69/0x94)
[   33.518513] [<c071fe6d>] (bus_for_each_dev) from [<c0720f71>] (bus_add_driver+0x179/0x1e8)
[   33.555099] [<c0720f71>] (bus_add_driver) from [<c0722cf7>] (driver_register+0x8f/0x130)
[   33.592015] [<c0722cf7>] (driver_register) from [<bf805017>] (dt_cpufreq_platdrv_init+0x17/0x1000
[cpufreq_dt])
[   33.666547] [<bf805017>] (dt_cpufreq_platdrv_init [cpufreq_dt]) from [<c0102f69>]
(do_one_initcall+0x4d/0x280)
[   33.742553] [<c0102f69>] (do_one_initcall) from [<c01c70a9>] (do_init_module+0xb9/0x28c)
[   33.781507] [<c01c70a9>] (do_init_module) from [<c01c9ba9>] (load_module+0x2895/0x2c04)
[   33.820735] [<c01c9ba9>] (load_module) from [<c01ca0d7>] (sys_finit_module+0x7b/0x8c)
[   33.860308] [<c01ca0d7>] (sys_finit_module) from [<c0101001>] (ret_fast_syscall+0x1/0x26)
[   33.900121] Exception stack(0xcabb3fa8 to 0xcabb3ff0)
[   33.940062] 3fa0:                   0003f348 00000001 00000003 0002b744 00000000 b6b31e74
[   33.980876] 3fc0: 0003f348 00000001 94ccfd00 0000017b 0003f3f0 00000000 0003f348 00040010
[   34.021838] 3fe0: b6b31df8 b6b31de8 00022534 aec752f0
[   34.062931] ---[ end trace f68728a0d3053b54 ]---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-09-22 20:12     ` Dmitry Osipenko
  2019-09-23 13:56       ` Viresh Kumar
  2019-10-14  9:42       ` Viresh Kumar
@ 2019-10-15 11:46       ` Viresh Kumar
  2019-10-15 13:45         ` Dmitry Osipenko
  2019-10-15 15:53         ` Rafael J. Wysocki
  2 siblings, 2 replies; 22+ messages in thread
From: Viresh Kumar @ 2019-10-15 11:46 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, mka, ulf.hansson, sfr,
	pavel, Rafael J . Wysocki, linux-kernel, linux-tegra

On 22-09-19, 23:12, Dmitry Osipenko wrote:
> Hello Viresh,
> 
> This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
> 
> 
> [   87.952369] ==================================================================
> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> 
> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: G        W
> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> [   87.956807] [<c0110aad>] (unwind_backtrace) from [<c010bb71>] (show_stack+0x11/0x14)
> [   87.957709] [<c010bb71>] (show_stack) from [<c0d37b25>] (dump_stack+0x89/0x98)
> [   87.958616] [<c0d37b25>] (dump_stack) from [<c02937e1>]
> (print_address_description.constprop.0+0x3d/0x340)
> [   87.959785] [<c02937e1>] (print_address_description.constprop.0) from [<c0293c6b>]
> (__kasan_report+0xe3/0x12c)
> [   87.960907] [<c0293c6b>] (__kasan_report) from [<c014988f>] (notifier_chain_register+0x4f/0x9c)
> [   87.962001] [<c014988f>] (notifier_chain_register) from [<c01499b5>]
> (blocking_notifier_chain_register+0x29/0x3c)
> [   87.963180] [<c01499b5>] (blocking_notifier_chain_register) from [<c06f7ee9>]
> (dev_pm_qos_add_notifier+0x79/0xf8)
> [   87.964339] [<c06f7ee9>] (dev_pm_qos_add_notifier) from [<c092927d>] (cpufreq_online+0x5e1/0x8a4)
> [   87.965351] [<c092927d>] (cpufreq_online) from [<c09295c9>] (cpufreq_add_dev+0x79/0x80)
> [   87.966247] [<c09295c9>] (cpufreq_add_dev) from [<c06eb9d3>] (subsys_interface_register+0xc3/0x100)
> [   87.967297] [<c06eb9d3>] (subsys_interface_register) from [<c0926e53>]
> (cpufreq_register_driver+0x13b/0x1ec)
> [   87.968476] [<c0926e53>] (cpufreq_register_driver) from [<bf800435>]
> (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])

Hi Dmitry,

Thanks for the bug report and I was finally able to reproduce it at my end and
this was quite an interesting debugging exercise :)

When a cpufreq driver gets registered, we register with the subsys interface and
it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
notifiers get added to the first CPU of the policy, i.e. CPU0 in common cases.

When the cpufreq driver gets unregistered, we unregister with the subsys
interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
(should have been in reverse order I feel). We remove the QoS notifier only when
cpufreq_remove_dev() gets called for the last CPU of the policy, lets call it
CPUx. Now this has a different notifier list as compared to CPU0.

In short, we are adding the cpufreq notifiers to CPU0 and removing them from
CPUx. When we try to add it again by inserting the module for second time, we
find a node in the notifier list which is already freed but still in the list as
we removed it from CPUx's list (which doesn't do anything as the node wasn't
there in the first place).

@Rafael: How do you see we solve this problem ? Here are the options I could
think of:

- Update subsys layer to reverse the order of devices while unregistering (this
  will fix the current problem, but we will still have corner cases hanging
  around, like if the CPU0 is hotplugged out, etc).

- Update QoS framework with the knowledge of related CPUs, this has been pending
  until now from my side. And this is the thing we really need to do. Eventually
  we shall have only a single notifier list for all CPUs of a policy, at least
  for MIN/MAX frequencies.

- ??

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-10-15 11:46       ` Viresh Kumar
@ 2019-10-15 13:45         ` Dmitry Osipenko
  2019-10-15 15:53         ` Rafael J. Wysocki
  1 sibling, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2019-10-15 13:45 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, mka, ulf.hansson, sfr,
	pavel, Rafael J . Wysocki, linux-kernel, linux-tegra

15.10.2019 14:46, Viresh Kumar пишет:
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
>> Hello Viresh,
>>
>> This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
>>
>>
>> [   87.952369] ==================================================================
>> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
>> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
>>
>> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: G        W
>> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
>> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>> [   87.956807] [<c0110aad>] (unwind_backtrace) from [<c010bb71>] (show_stack+0x11/0x14)
>> [   87.957709] [<c010bb71>] (show_stack) from [<c0d37b25>] (dump_stack+0x89/0x98)
>> [   87.958616] [<c0d37b25>] (dump_stack) from [<c02937e1>]
>> (print_address_description.constprop.0+0x3d/0x340)
>> [   87.959785] [<c02937e1>] (print_address_description.constprop.0) from [<c0293c6b>]
>> (__kasan_report+0xe3/0x12c)
>> [   87.960907] [<c0293c6b>] (__kasan_report) from [<c014988f>] (notifier_chain_register+0x4f/0x9c)
>> [   87.962001] [<c014988f>] (notifier_chain_register) from [<c01499b5>]
>> (blocking_notifier_chain_register+0x29/0x3c)
>> [   87.963180] [<c01499b5>] (blocking_notifier_chain_register) from [<c06f7ee9>]
>> (dev_pm_qos_add_notifier+0x79/0xf8)
>> [   87.964339] [<c06f7ee9>] (dev_pm_qos_add_notifier) from [<c092927d>] (cpufreq_online+0x5e1/0x8a4)
>> [   87.965351] [<c092927d>] (cpufreq_online) from [<c09295c9>] (cpufreq_add_dev+0x79/0x80)
>> [   87.966247] [<c09295c9>] (cpufreq_add_dev) from [<c06eb9d3>] (subsys_interface_register+0xc3/0x100)
>> [   87.967297] [<c06eb9d3>] (subsys_interface_register) from [<c0926e53>]
>> (cpufreq_register_driver+0x13b/0x1ec)
>> [   87.968476] [<c0926e53>] (cpufreq_register_driver) from [<bf800435>]
>> (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])
> 
> Hi Dmitry,
> 
> Thanks for the bug report and I was finally able to reproduce it at my end and
> this was quite an interesting debugging exercise :)
> 
> When a cpufreq driver gets registered, we register with the subsys interface and
> it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
> notifiers get added to the first CPU of the policy, i.e. CPU0 in common cases.
> 
> When the cpufreq driver gets unregistered, we unregister with the subsys
> interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
> (should have been in reverse order I feel). We remove the QoS notifier only when
> cpufreq_remove_dev() gets called for the last CPU of the policy, lets call it
> CPUx. Now this has a different notifier list as compared to CPU0.
> 
> In short, we are adding the cpufreq notifiers to CPU0 and removing them from
> CPUx. When we try to add it again by inserting the module for second time, we
> find a node in the notifier list which is already freed but still in the list as
> we removed it from CPUx's list (which doesn't do anything as the node wasn't
> there in the first place).
> 
> @Rafael: How do you see we solve this problem ? Here are the options I could
> think of:
> 
> - Update subsys layer to reverse the order of devices while unregistering (this
>   will fix the current problem, but we will still have corner cases hanging
>   around, like if the CPU0 is hotplugged out, etc).
> 
> - Update QoS framework with the knowledge of related CPUs, this has been pending
>   until now from my side. And this is the thing we really need to do. Eventually
>   we shall have only a single notifier list for all CPUs of a policy, at least
>   for MIN/MAX frequencies.
> 
> - ??
> 

Viresh, thank you very much! Looking forward to a fix :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-10-15 11:46       ` Viresh Kumar
  2019-10-15 13:45         ` Dmitry Osipenko
@ 2019-10-15 15:53         ` Rafael J. Wysocki
  2019-10-15 21:50           ` Rafael J. Wysocki
  1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-10-15 15:53 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Dmitry Osipenko, Rafael Wysocki, Linux PM, Vincent Guittot,
	Matthias Kaehlcke, Ulf Hansson, Stephen Rothwell, Pavel Machek,
	Rafael J . Wysocki, Linux Kernel Mailing List, linux-tegra

On Tue, Oct 15, 2019 at 1:46 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
> > Hello Viresh,
> >
> > This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
> >
> >
> > [   87.952369] ==================================================================
> > [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> > [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> >
> > [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: G        W
> > 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> > [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> > [   87.956807] [<c0110aad>] (unwind_backtrace) from [<c010bb71>] (show_stack+0x11/0x14)
> > [   87.957709] [<c010bb71>] (show_stack) from [<c0d37b25>] (dump_stack+0x89/0x98)
> > [   87.958616] [<c0d37b25>] (dump_stack) from [<c02937e1>]
> > (print_address_description.constprop.0+0x3d/0x340)
> > [   87.959785] [<c02937e1>] (print_address_description.constprop.0) from [<c0293c6b>]
> > (__kasan_report+0xe3/0x12c)
> > [   87.960907] [<c0293c6b>] (__kasan_report) from [<c014988f>] (notifier_chain_register+0x4f/0x9c)
> > [   87.962001] [<c014988f>] (notifier_chain_register) from [<c01499b5>]
> > (blocking_notifier_chain_register+0x29/0x3c)
> > [   87.963180] [<c01499b5>] (blocking_notifier_chain_register) from [<c06f7ee9>]
> > (dev_pm_qos_add_notifier+0x79/0xf8)
> > [   87.964339] [<c06f7ee9>] (dev_pm_qos_add_notifier) from [<c092927d>] (cpufreq_online+0x5e1/0x8a4)
> > [   87.965351] [<c092927d>] (cpufreq_online) from [<c09295c9>] (cpufreq_add_dev+0x79/0x80)
> > [   87.966247] [<c09295c9>] (cpufreq_add_dev) from [<c06eb9d3>] (subsys_interface_register+0xc3/0x100)
> > [   87.967297] [<c06eb9d3>] (subsys_interface_register) from [<c0926e53>]
> > (cpufreq_register_driver+0x13b/0x1ec)
> > [   87.968476] [<c0926e53>] (cpufreq_register_driver) from [<bf800435>]
> > (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])
>
> Hi Dmitry,
>
> Thanks for the bug report and I was finally able to reproduce it at my end and
> this was quite an interesting debugging exercise :)
>
> When a cpufreq driver gets registered, we register with the subsys interface and
> it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
> notifiers get added to the first CPU of the policy, i.e. CPU0 in common cases.
>
> When the cpufreq driver gets unregistered, we unregister with the subsys
> interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
> (should have been in reverse order I feel). We remove the QoS notifier only when
> cpufreq_remove_dev() gets called for the last CPU of the policy, lets call it
> CPUx. Now this has a different notifier list as compared to CPU0.

The same problem will appear if the original policy CPU goes offline, won't it?

> In short, we are adding the cpufreq notifiers to CPU0 and removing them from
> CPUx. When we try to add it again by inserting the module for second time, we
> find a node in the notifier list which is already freed but still in the list as
> we removed it from CPUx's list (which doesn't do anything as the node wasn't
> there in the first place).
>
> @Rafael: How do you see we solve this problem ? Here are the options I could
> think of:
>
> - Update subsys layer to reverse the order of devices while unregistering (this
>   will fix the current problem, but we will still have corner cases hanging
>   around, like if the CPU0 is hotplugged out, etc).

This isn't sufficient for the offline case.

> - Update QoS framework with the knowledge of related CPUs, this has been pending
>   until now from my side. And this is the thing we really need to do. Eventually
>   we shall have only a single notifier list for all CPUs of a policy, at least
>   for MIN/MAX frequencies.

- Move the PM QoS requests and notifiers to the new policy CPU on all
changes of that.  That is, when cpufreq_offline() nominates the new
"leader", all of the QoS stuff for the policy needs to go to this one.

Of course, the reverse order of unregistration in the subsys layer
would help to avoid some useless churn related to that.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-10-15 15:53         ` Rafael J. Wysocki
@ 2019-10-15 21:50           ` Rafael J. Wysocki
  2019-10-16  8:27             ` Viresh Kumar
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-10-15 21:50 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Dmitry Osipenko, Rafael Wysocki, Linux PM, Vincent Guittot,
	Matthias Kaehlcke, Ulf Hansson, Stephen Rothwell, Pavel Machek,
	Rafael J . Wysocki, Linux Kernel Mailing List, linux-tegra

On Tue, Oct 15, 2019 at 5:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Oct 15, 2019 at 1:46 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 22-09-19, 23:12, Dmitry Osipenko wrote:
> > > Hello Viresh,
> > >
> > > This patch causes use-after-free on a cpufreq driver module reload. Please take a look, thanks in advance.
> > >
> > >
> > > [   87.952369] ==================================================================
> > > [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> > > [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> > >
> > > [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: G        W
> > > 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> > > [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> > > [   87.956807] [<c0110aad>] (unwind_backtrace) from [<c010bb71>] (show_stack+0x11/0x14)
> > > [   87.957709] [<c010bb71>] (show_stack) from [<c0d37b25>] (dump_stack+0x89/0x98)
> > > [   87.958616] [<c0d37b25>] (dump_stack) from [<c02937e1>]
> > > (print_address_description.constprop.0+0x3d/0x340)
> > > [   87.959785] [<c02937e1>] (print_address_description.constprop.0) from [<c0293c6b>]
> > > (__kasan_report+0xe3/0x12c)
> > > [   87.960907] [<c0293c6b>] (__kasan_report) from [<c014988f>] (notifier_chain_register+0x4f/0x9c)
> > > [   87.962001] [<c014988f>] (notifier_chain_register) from [<c01499b5>]
> > > (blocking_notifier_chain_register+0x29/0x3c)
> > > [   87.963180] [<c01499b5>] (blocking_notifier_chain_register) from [<c06f7ee9>]
> > > (dev_pm_qos_add_notifier+0x79/0xf8)
> > > [   87.964339] [<c06f7ee9>] (dev_pm_qos_add_notifier) from [<c092927d>] (cpufreq_online+0x5e1/0x8a4)
> > > [   87.965351] [<c092927d>] (cpufreq_online) from [<c09295c9>] (cpufreq_add_dev+0x79/0x80)
> > > [   87.966247] [<c09295c9>] (cpufreq_add_dev) from [<c06eb9d3>] (subsys_interface_register+0xc3/0x100)
> > > [   87.967297] [<c06eb9d3>] (subsys_interface_register) from [<c0926e53>]
> > > (cpufreq_register_driver+0x13b/0x1ec)
> > > [   87.968476] [<c0926e53>] (cpufreq_register_driver) from [<bf800435>]
> > > (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])
> >
> > Hi Dmitry,
> >
> > Thanks for the bug report and I was finally able to reproduce it at my end and
> > this was quite an interesting debugging exercise :)
> >
> > When a cpufreq driver gets registered, we register with the subsys interface and
> > it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
> > notifiers get added to the first CPU of the policy, i.e. CPU0 in common cases.
> >
> > When the cpufreq driver gets unregistered, we unregister with the subsys
> > interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
> > (should have been in reverse order I feel). We remove the QoS notifier only when
> > cpufreq_remove_dev() gets called for the last CPU of the policy, lets call it
> > CPUx. Now this has a different notifier list as compared to CPU0.
>
> The same problem will appear if the original policy CPU goes offline, won't it?
>
> > In short, we are adding the cpufreq notifiers to CPU0 and removing them from
> > CPUx. When we try to add it again by inserting the module for second time, we
> > find a node in the notifier list which is already freed but still in the list as
> > we removed it from CPUx's list (which doesn't do anything as the node wasn't
> > there in the first place).
> >
> > @Rafael: How do you see we solve this problem ? Here are the options I could
> > think of:
> >
> > - Update subsys layer to reverse the order of devices while unregistering (this
> >   will fix the current problem, but we will still have corner cases hanging
> >   around, like if the CPU0 is hotplugged out, etc).
>
> This isn't sufficient for the offline case.
>
> > - Update QoS framework with the knowledge of related CPUs, this has been pending
> >   until now from my side. And this is the thing we really need to do. Eventually
> >   we shall have only a single notifier list for all CPUs of a policy, at least
> >   for MIN/MAX frequencies.
>
> - Move the PM QoS requests and notifiers to the new policy CPU on all
> changes of that.  That is, when cpufreq_offline() nominates the new
> "leader", all of the QoS stuff for the policy needs to go to this one.

Alas, that still will not work, because things like
acpi_processor_ppc_init() only work accidentally for one-CPU policies.
Generally, adding such a PM QoS request to a non-policy CPU simply has
no effect until it becomes a policy CPU which may be never.

It looks like using device PM QoS for cpufreq is a mistake in general
and what is needed is a struct pm_qos_constraints member in struct
cpufreq_policy and something like

struct freq_pm_qos_request {
        enum freq_pm_qos_req_type type; /* min or max */
        struct plist_node pnode;
        struct cpufreq_policy *policy;
};

Then, pm_qos_update_target() can be used for adding, updating and
removing requests.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-10-15 21:50           ` Rafael J. Wysocki
@ 2019-10-16  8:27             ` Viresh Kumar
  2019-10-16  8:43               ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2019-10-16  8:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dmitry Osipenko, Rafael Wysocki, Linux PM, Vincent Guittot,
	Matthias Kaehlcke, Ulf Hansson, Stephen Rothwell, Pavel Machek,
	Rafael J . Wysocki, Linux Kernel Mailing List, linux-tegra

On 15-10-19, 23:50, Rafael J. Wysocki wrote:
> On Tue, Oct 15, 2019 at 5:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:

> > > - Update QoS framework with the knowledge of related CPUs, this has been pending
> > >   until now from my side. And this is the thing we really need to do. Eventually
> > >   we shall have only a single notifier list for all CPUs of a policy, at least
> > >   for MIN/MAX frequencies.
> >
> > - Move the PM QoS requests and notifiers to the new policy CPU on all
> > changes of that.  That is, when cpufreq_offline() nominates the new
> > "leader", all of the QoS stuff for the policy needs to go to this one.
> 
> Alas, that still will not work, because things like
> acpi_processor_ppc_init() only work accidentally for one-CPU policies.

I am not sure what problem you see here ? Can you please explain a bit more.

> Generally, adding such a PM QoS request to a non-policy CPU simply has
> no effect until it becomes a policy CPU which may be never.

I was thinking maybe we can read the constraints for all CPUs in the
policy->cpus mask in cpufreq_set_policy() and so this part of the problem will
just go away. The only part that would be left is to remove the QoS constraints
properly.

> It looks like using device PM QoS for cpufreq is a mistake in general
> and what is needed is a struct pm_qos_constraints member in struct
> cpufreq_policy and something like
> 
> struct freq_pm_qos_request {
>         enum freq_pm_qos_req_type type; /* min or max */
>         struct plist_node pnode;
>         struct cpufreq_policy *policy;
> };
> 
> Then, pm_qos_update_target() can be used for adding, updating and
> removing requests.

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
  2019-10-16  8:27             ` Viresh Kumar
@ 2019-10-16  8:43               ` Rafael J. Wysocki
  0 siblings, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-10-16  8:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Dmitry Osipenko, Rafael Wysocki, Linux PM,
	Vincent Guittot, Matthias Kaehlcke, Ulf Hansson,
	Stephen Rothwell, Pavel Machek, Rafael J . Wysocki,
	Linux Kernel Mailing List, linux-tegra

On Wed, Oct 16, 2019 at 10:27 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 15-10-19, 23:50, Rafael J. Wysocki wrote:
> > On Tue, Oct 15, 2019 at 5:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> > > > - Update QoS framework with the knowledge of related CPUs, this has been pending
> > > >   until now from my side. And this is the thing we really need to do. Eventually
> > > >   we shall have only a single notifier list for all CPUs of a policy, at least
> > > >   for MIN/MAX frequencies.
> > >
> > > - Move the PM QoS requests and notifiers to the new policy CPU on all
> > > changes of that.  That is, when cpufreq_offline() nominates the new
> > > "leader", all of the QoS stuff for the policy needs to go to this one.
> >
> > Alas, that still will not work, because things like
> > acpi_processor_ppc_init() only work accidentally for one-CPU policies.
>
> I am not sure what problem you see here ? Can you please explain a bit more.

Never mind, sorry.  This is called for policy->cpu too.

> > Generally, adding such a PM QoS request to a non-policy CPU simply has
> > no effect until it becomes a policy CPU which may be never.
>
> I was thinking maybe we can read the constraints for all CPUs in the
> policy->cpus mask in cpufreq_set_policy() and so this part of the problem will
> just go away. The only part that would be left is to remove the QoS constraints
> properly.

That would be on the complicated side IMO.

> > It looks like using device PM QoS for cpufreq is a mistake in general
> > and what is needed is a struct pm_qos_constraints member in struct
> > cpufreq_policy and something like
> >
> > struct freq_pm_qos_request {
> >         enum freq_pm_qos_req_type type; /* min or max */
> >         struct plist_node pnode;
> >         struct cpufreq_policy *policy;
> > };
> >
> > Then, pm_qos_update_target() can be used for adding, updating and
> > removing requests.

I have patches implementing this idea, more or less, almost ready,
stay tuned. :-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-10-16  8:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-04  7:36 [PATCH V6 0/7] cpufreq: Use QoS layer to manage freq-constraints Viresh Kumar
2019-07-04  7:36 ` [PATCH V6 1/7] PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier() Viresh Kumar
2019-07-04  7:36 ` [PATCH V6 2/7] PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() Viresh Kumar
2019-07-04  7:36 ` [PATCH V6 3/7] PM / QOS: Pass request type to dev_pm_qos_read_value() Viresh Kumar
2019-07-04  7:36 ` [PATCH V6 4/7] PM / QoS: Add support for MIN/MAX frequency constraints Viresh Kumar
2019-07-04  7:36 ` [PATCH V6 5/7] cpufreq: Register notifiers with the PM QoS framework Viresh Kumar
2019-07-08 10:57   ` [PATCH V7 " Viresh Kumar
2019-09-22 20:12     ` Dmitry Osipenko
2019-09-23 13:56       ` Viresh Kumar
2019-09-23 17:27         ` Dmitry Osipenko
2019-10-14  9:42       ` Viresh Kumar
2019-10-14 13:01         ` Dmitry Osipenko
2019-10-15 11:46       ` Viresh Kumar
2019-10-15 13:45         ` Dmitry Osipenko
2019-10-15 15:53         ` Rafael J. Wysocki
2019-10-15 21:50           ` Rafael J. Wysocki
2019-10-16  8:27             ` Viresh Kumar
2019-10-16  8:43               ` Rafael J. Wysocki
2019-07-04  7:36 ` [PATCH V6 6/7] cpufreq: intel_pstate: Reuse refresh_frequency_limits() Viresh Kumar
2019-07-04  7:36 ` [PATCH V6 7/7] cpufreq: Add QoS requests for userspace constraints Viresh Kumar
2019-07-05 10:51   ` [PATCH V7 " Viresh Kumar
2019-07-05 11:17     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).