linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name
@ 2021-03-12 17:03 Daniel Lezcano
  2021-03-12 17:03 ` [PATCH v2 2/5] thermal/drivers/cpufreq_cooling: Use device name instead of auto-numbering Daniel Lezcano
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-12 17:03 UTC (permalink / raw)
  To: daniel.lezcano
  Cc: linux-kernel, linux-pm, lukasz.luba, Jiri Pirko, Ido Schimmel,
	David S. Miller, Jakub Kicinski, Zhang Rui, Amit Kucheria,
	open list:MELLANOX ETHERNET SWITCH DRIVERS

We want to have any kind of name for the cooling devices as we do no
longer want to rely on auto-numbering. Let's replace the cooling
device's fixed array by a char pointer to be allocated dynamically
when registering the cooling device, so we don't limit the length of
the name.

Rework the error path at the same time as we have to rollback the
allocations in case of error.

Tested with a dummy device having the name:
 "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"

A village on the island of Anglesey (Wales), known to have the longest
name in Europe.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 .../ethernet/mellanox/mlxsw/core_thermal.c    |  2 +-
 drivers/thermal/thermal_core.c                | 38 +++++++++++--------
 include/linux/thermal.h                       |  2 +-
 3 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index bf85ce9835d7..7447c2a73cbd 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -141,7 +141,7 @@ static int mlxsw_get_cooling_device_idx(struct mlxsw_thermal *thermal,
 	/* Allow mlxsw thermal zone binding to an external cooling device */
 	for (i = 0; i < ARRAY_SIZE(mlxsw_thermal_external_allowed_cdev); i++) {
 		if (strnstr(cdev->type, mlxsw_thermal_external_allowed_cdev[i],
-			    sizeof(cdev->type)))
+			    strlen(cdev->type)))
 			return 0;
 	}
 
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 996c038f83a4..9ef8090eb645 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct device_node *np,
 {
 	struct thermal_cooling_device *cdev;
 	struct thermal_zone_device *pos = NULL;
-	int result;
-
-	if (type && strlen(type) >= THERMAL_NAME_LENGTH)
-		return ERR_PTR(-EINVAL);
+	int ret;
 
 	if (!ops || !ops->get_max_state || !ops->get_cur_state ||
 	    !ops->set_cur_state)
@@ -973,14 +970,17 @@ __thermal_cooling_device_register(struct device_node *np,
 	if (!cdev)
 		return ERR_PTR(-ENOMEM);
 
-	result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
-	if (result < 0) {
-		kfree(cdev);
-		return ERR_PTR(result);
+	ret = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
+	if (ret < 0)
+		goto out_kfree_cdev;
+	cdev->id = ret;
+
+	cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
+	if (!cdev->type) {
+		ret = -ENOMEM;
+		goto out_ida_remove;
 	}
 
-	cdev->id = result;
-	strlcpy(cdev->type, type ? : "", sizeof(cdev->type));
 	mutex_init(&cdev->lock);
 	INIT_LIST_HEAD(&cdev->thermal_instances);
 	cdev->np = np;
@@ -990,12 +990,9 @@ __thermal_cooling_device_register(struct device_node *np,
 	cdev->devdata = devdata;
 	thermal_cooling_device_setup_sysfs(cdev);
 	dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
-	result = device_register(&cdev->device);
-	if (result) {
-		ida_simple_remove(&thermal_cdev_ida, cdev->id);
-		put_device(&cdev->device);
-		return ERR_PTR(result);
-	}
+	ret = device_register(&cdev->device);
+	if (ret)
+		goto out_kfree_type;
 
 	/* Add 'this' new cdev to the global cdev list */
 	mutex_lock(&thermal_list_lock);
@@ -1013,6 +1010,14 @@ __thermal_cooling_device_register(struct device_node *np,
 	mutex_unlock(&thermal_list_lock);
 
 	return cdev;
+
+out_kfree_type:
+	kfree(cdev->type);
+	put_device(&cdev->device);
+out_ida_remove:
+	ida_simple_remove(&thermal_cdev_ida, cdev->id);
+out_kfree_cdev:
+	return ERR_PTR(ret);
 }
 
 /**
@@ -1172,6 +1177,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
 	device_del(&cdev->device);
 	thermal_cooling_device_destroy_sysfs(cdev);
 	put_device(&cdev->device);
+	kfree(cdev->type);
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 6ac7bb1d2b1f..169502164364 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -91,7 +91,7 @@ struct thermal_cooling_device_ops {
 
 struct thermal_cooling_device {
 	int id;
-	char type[THERMAL_NAME_LENGTH];
+	char *type;
 	struct device device;
 	struct device_node *np;
 	void *devdata;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] thermal/drivers/cpufreq_cooling: Use device name instead of auto-numbering
  2021-03-12 17:03 [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Daniel Lezcano
@ 2021-03-12 17:03 ` Daniel Lezcano
  2021-03-12 17:03 ` [PATCH v2 3/5] thermal/drivers/devfreq_cooling: " Daniel Lezcano
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-12 17:03 UTC (permalink / raw)
  To: daniel.lezcano
  Cc: linux-kernel, linux-pm, lukasz.luba, Viresh Kumar,
	Amit Daniel Kachhap, Javi Merino, Zhang Rui, Amit Kucheria

Currently the naming of a cooling device is just a cooling technique
followed by a number. When there are multiple cooling devices using
the same technique, it is impossible to clearly identify the related
device as this one is just a number.

For instance:

 thermal-cpufreq-0
 thermal-cpufreq-1
 etc ...

The 'thermal' prefix is redundant with the subsystem namespace. This
patch removes the 'thermal' prefix and changes the number by the device
name. So the naming above becomes:

 cpufreq-cpu0
 cpufreq-cpu4
 etc ...

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
---
V2:
  - Use kasprintf() instead of fixed array length on the stack
  - Fixed typo in the log
  - Removed idr.h inclusion
---
 drivers/thermal/cpufreq_cooling.c | 34 +++++++++++--------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index 10af3341e5ea..3f5f1dce1320 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -13,10 +13,10 @@
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/cpu_cooling.h>
+#include <linux/device.h>
 #include <linux/energy_model.h>
 #include <linux/err.h>
 #include <linux/export.h>
-#include <linux/idr.h>
 #include <linux/pm_opp.h>
 #include <linux/pm_qos.h>
 #include <linux/slab.h>
@@ -50,8 +50,6 @@ struct time_in_idle {
 
 /**
  * struct cpufreq_cooling_device - data for cooling device with cpufreq
- * @id: unique integer value corresponding to each cpufreq_cooling_device
- *	registered.
  * @last_load: load measured by the latest call to cpufreq_get_requested_power()
  * @cpufreq_state: integer value representing the current state of cpufreq
  *	cooling	devices.
@@ -69,7 +67,6 @@ struct time_in_idle {
  * cpufreq_cooling_device.
  */
 struct cpufreq_cooling_device {
-	int id;
 	u32 last_load;
 	unsigned int cpufreq_state;
 	unsigned int max_level;
@@ -82,7 +79,6 @@ struct cpufreq_cooling_device {
 	struct freq_qos_request qos_req;
 };
 
-static DEFINE_IDA(cpufreq_ida);
 static DEFINE_MUTEX(cooling_list_lock);
 static LIST_HEAD(cpufreq_cdev_list);
 
@@ -528,11 +524,11 @@ __cpufreq_cooling_register(struct device_node *np,
 {
 	struct thermal_cooling_device *cdev;
 	struct cpufreq_cooling_device *cpufreq_cdev;
-	char dev_name[THERMAL_NAME_LENGTH];
 	unsigned int i;
 	struct device *dev;
 	int ret;
 	struct thermal_cooling_device_ops *cooling_ops;
+	char *name;
 
 	dev = get_cpu_device(policy->cpu);
 	if (unlikely(!dev)) {
@@ -567,16 +563,6 @@ __cpufreq_cooling_register(struct device_node *np,
 	/* max_level is an index, not a counter */
 	cpufreq_cdev->max_level = i - 1;
 
-	ret = ida_simple_get(&cpufreq_ida, 0, 0, GFP_KERNEL);
-	if (ret < 0) {
-		cdev = ERR_PTR(ret);
-		goto free_idle_time;
-	}
-	cpufreq_cdev->id = ret;
-
-	snprintf(dev_name, sizeof(dev_name), "thermal-cpufreq-%d",
-		 cpufreq_cdev->id);
-
 	cooling_ops = &cpufreq_cooling_ops;
 
 #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR
@@ -591,7 +577,7 @@ __cpufreq_cooling_register(struct device_node *np,
 		pr_err("%s: unsorted frequency tables are not supported\n",
 		       __func__);
 		cdev = ERR_PTR(-EINVAL);
-		goto remove_ida;
+		goto free_idle_time;
 	}
 
 	ret = freq_qos_add_request(&policy->constraints,
@@ -601,11 +587,18 @@ __cpufreq_cooling_register(struct device_node *np,
 		pr_err("%s: Failed to add freq constraint (%d)\n", __func__,
 		       ret);
 		cdev = ERR_PTR(ret);
-		goto remove_ida;
+		goto free_idle_time;
 	}
 
-	cdev = thermal_of_cooling_device_register(np, dev_name, cpufreq_cdev,
+	cdev = ERR_PTR(-ENOMEM);
+	name = kasprintf(GFP_KERNEL, "cpufreq-%s", dev_name(dev));
+	if (!name)
+		goto remove_qos_req;
+
+	cdev = thermal_of_cooling_device_register(np, name, cpufreq_cdev,
 						  cooling_ops);
+	kfree(name);
+
 	if (IS_ERR(cdev))
 		goto remove_qos_req;
 
@@ -617,8 +610,6 @@ __cpufreq_cooling_register(struct device_node *np,
 
 remove_qos_req:
 	freq_qos_remove_request(&cpufreq_cdev->qos_req);
-remove_ida:
-	ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
 free_idle_time:
 	free_idle_time(cpufreq_cdev);
 free_cdev:
@@ -712,7 +703,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
 
 	thermal_cooling_device_unregister(cdev);
 	freq_qos_remove_request(&cpufreq_cdev->qos_req);
-	ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
 	free_idle_time(cpufreq_cdev);
 	kfree(cpufreq_cdev);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] thermal/drivers/devfreq_cooling: Use device name instead of auto-numbering
  2021-03-12 17:03 [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Daniel Lezcano
  2021-03-12 17:03 ` [PATCH v2 2/5] thermal/drivers/cpufreq_cooling: Use device name instead of auto-numbering Daniel Lezcano
@ 2021-03-12 17:03 ` Daniel Lezcano
  2021-03-12 18:05   ` Lukasz Luba
  2021-03-12 17:03 ` [PATCH v2 4/5] thermal/drivers/cpuidle_cooling: " Daniel Lezcano
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-12 17:03 UTC (permalink / raw)
  To: daniel.lezcano
  Cc: linux-kernel, linux-pm, lukasz.luba, Zhang Rui, Amit Kucheria

Currently the naming of a cooling device is just a cooling technique
followed by a number. When there are multiple cooling devices using
the same technique, it is impossible to clearly identify the related
device as this one is just a number.

For instance:

 thermal-devfreq-0
 thermal-devfreq-1
 etc ...

The 'thermal' prefix is redundant with the subsystem namespace. This
patch removes the 'thermal' prefix and changes the number by the device
name. So the naming above becomes:

 devfreq-5000000.gpu
 devfreq-1d84000.ufshc
 etc ...

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
V2:
 - Removed idr.h header
 - Used kasprintf instead of fixed buffer length on the stack
 - Fixed typo in the log
---
 drivers/thermal/devfreq_cooling.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index fed3121ff2a1..fb250ac16f50 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -14,7 +14,6 @@
 #include <linux/devfreq_cooling.h>
 #include <linux/energy_model.h>
 #include <linux/export.h>
-#include <linux/idr.h>
 #include <linux/slab.h>
 #include <linux/pm_opp.h>
 #include <linux/pm_qos.h>
@@ -25,11 +24,8 @@
 #define HZ_PER_KHZ		1000
 #define SCALE_ERROR_MITIGATION	100
 
-static DEFINE_IDA(devfreq_ida);
-
 /**
  * struct devfreq_cooling_device - Devfreq cooling device
- * @id:		unique integer value corresponding to each
  *		devfreq_cooling_device registered.
  * @cdev:	Pointer to associated thermal cooling device.
  * @devfreq:	Pointer to associated devfreq device.
@@ -51,7 +47,6 @@ static DEFINE_IDA(devfreq_ida);
  * @em_pd:		Energy Model for the associated Devfreq device
  */
 struct devfreq_cooling_device {
-	int id;
 	struct thermal_cooling_device *cdev;
 	struct devfreq *devfreq;
 	unsigned long cooling_state;
@@ -363,7 +358,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 	struct thermal_cooling_device *cdev;
 	struct device *dev = df->dev.parent;
 	struct devfreq_cooling_device *dfc;
-	char dev_name[THERMAL_NAME_LENGTH];
+	char *name;
 	int err, num_opps;
 
 	dfc = kzalloc(sizeof(*dfc), GFP_KERNEL);
@@ -407,30 +402,27 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 	if (err < 0)
 		goto free_table;
 
-	err = ida_simple_get(&devfreq_ida, 0, 0, GFP_KERNEL);
-	if (err < 0)
+	cdev = ERR_PTR(-ENOMEM);
+	name = kasprintf(GFP_KERNEL, "devfreq-%s", dev_name(dev));
+	if (!name)
 		goto remove_qos_req;
 
-	dfc->id = err;
-
-	snprintf(dev_name, sizeof(dev_name), "thermal-devfreq-%d", dfc->id);
-
-	cdev = thermal_of_cooling_device_register(np, dev_name, dfc,
+	cdev = thermal_of_cooling_device_register(np, name, dfc,
 						  &devfreq_cooling_ops);
+	kfree(name);
+
 	if (IS_ERR(cdev)) {
 		err = PTR_ERR(cdev);
 		dev_err(dev,
 			"Failed to register devfreq cooling device (%d)\n",
 			err);
-		goto release_ida;
+		goto remove_qos_req;
 	}
 
 	dfc->cdev = cdev;
 
 	return cdev;
 
-release_ida:
-	ida_simple_remove(&devfreq_ida, dfc->id);
 remove_qos_req:
 	dev_pm_qos_remove_request(&dfc->req_max_freq);
 free_table:
@@ -527,7 +519,6 @@ void devfreq_cooling_unregister(struct thermal_cooling_device *cdev)
 	dev = dfc->devfreq->dev.parent;
 
 	thermal_cooling_device_unregister(dfc->cdev);
-	ida_simple_remove(&devfreq_ida, dfc->id);
 	dev_pm_qos_remove_request(&dfc->req_max_freq);
 
 	em_dev_unregister_perf_domain(dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] thermal/drivers/cpuidle_cooling: Use device name instead of auto-numbering
  2021-03-12 17:03 [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Daniel Lezcano
  2021-03-12 17:03 ` [PATCH v2 2/5] thermal/drivers/cpufreq_cooling: Use device name instead of auto-numbering Daniel Lezcano
  2021-03-12 17:03 ` [PATCH v2 3/5] thermal/drivers/devfreq_cooling: " Daniel Lezcano
@ 2021-03-12 17:03 ` Daniel Lezcano
  2021-03-15  3:07   ` Viresh Kumar
  2021-03-12 17:03 ` [PATCH v2 5/5] thermal/drivers/cpufreq_cooling: Remove unused list Daniel Lezcano
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-12 17:03 UTC (permalink / raw)
  To: daniel.lezcano
  Cc: linux-kernel, linux-pm, lukasz.luba, Amit Daniel Kachhap,
	Viresh Kumar, Javi Merino, Zhang Rui, Amit Kucheria

Currently the naming of a cooling device is just a cooling technique
followed by a number. When there are multiple cooling devices using
the same technique, it is impossible to clearly identify the related
device as this one is just a number.

For instance:

 thermal-idle-0
 thermal-idle-1
 thermal-idle-2
 thermal-idle-3
 etc ...

The 'thermal' prefix is redundant with the subsystem namespace. This
patch removes the 'thermal prefix and changes the number by the device
name. So the naming above becomes:

 idle-cpu0
 idle-cpu1
 idle-cpu2
 idle-cpu3
 etc ...

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
---
V2:
  - Removed idr.h header
  - Used kasprintf instead of fixed buffer length on the stack
  - Fixed typo in the log
---
 drivers/thermal/cpuidle_cooling.c | 33 +++++++++++++++----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/thermal/cpuidle_cooling.c b/drivers/thermal/cpuidle_cooling.c
index 7ecab4b16b29..f32976163bad 100644
--- a/drivers/thermal/cpuidle_cooling.c
+++ b/drivers/thermal/cpuidle_cooling.c
@@ -9,9 +9,9 @@
 
 #include <linux/cpu_cooling.h>
 #include <linux/cpuidle.h>
+#include <linux/device.h>
 #include <linux/err.h>
 #include <linux/idle_inject.h>
-#include <linux/idr.h>
 #include <linux/of_device.h>
 #include <linux/slab.h>
 #include <linux/thermal.h>
@@ -26,8 +26,6 @@ struct cpuidle_cooling_device {
 	unsigned long state;
 };
 
-static DEFINE_IDA(cpuidle_ida);
-
 /**
  * cpuidle_cooling_runtime - Running time computation
  * @idle_duration_us: CPU idle time to inject in microseconds
@@ -174,10 +172,11 @@ static int __cpuidle_cooling_register(struct device_node *np,
 	struct idle_inject_device *ii_dev;
 	struct cpuidle_cooling_device *idle_cdev;
 	struct thermal_cooling_device *cdev;
+	struct device *dev;
 	unsigned int idle_duration_us = TICK_USEC;
 	unsigned int latency_us = UINT_MAX;
-	char dev_name[THERMAL_NAME_LENGTH];
-	int id, ret;
+	char *name;
+	int ret;
 
 	idle_cdev = kzalloc(sizeof(*idle_cdev), GFP_KERNEL);
 	if (!idle_cdev) {
@@ -185,16 +184,10 @@ static int __cpuidle_cooling_register(struct device_node *np,
 		goto out;
 	}
 
-	id = ida_simple_get(&cpuidle_ida, 0, 0, GFP_KERNEL);
-	if (id < 0) {
-		ret = id;
-		goto out_kfree;
-	}
-
 	ii_dev = idle_inject_register(drv->cpumask);
 	if (!ii_dev) {
 		ret = -EINVAL;
-		goto out_id;
+		goto out_kfree;
 	}
 
 	of_property_read_u32(np, "duration-us", &idle_duration_us);
@@ -205,24 +198,30 @@ static int __cpuidle_cooling_register(struct device_node *np,
 
 	idle_cdev->ii_dev = ii_dev;
 
-	snprintf(dev_name, sizeof(dev_name), "thermal-idle-%d", id);
+	dev = get_cpu_device(cpumask_first(drv->cpumask));
 
-	cdev = thermal_of_cooling_device_register(np, dev_name, idle_cdev,
+	name = kasprintf(GFP_KERNEL, "idle-%s", dev_name(dev));
+	if (!name) {
+		ret = -ENOMEM;
+		goto out_unregister;
+	}
+
+	cdev = thermal_of_cooling_device_register(np, name, idle_cdev,
 						  &cpuidle_cooling_ops);
+	kfree(name);
+
 	if (IS_ERR(cdev)) {
 		ret = PTR_ERR(cdev);
 		goto out_unregister;
 	}
 
 	pr_debug("%s: Idle injection set with idle duration=%u, latency=%u\n",
-		 dev_name, idle_duration_us, latency_us);
+		 name, idle_duration_us, latency_us);
 
 	return 0;
 
 out_unregister:
 	idle_inject_unregister(ii_dev);
-out_id:
-	ida_simple_remove(&cpuidle_ida, id);
 out_kfree:
 	kfree(idle_cdev);
 out:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] thermal/drivers/cpufreq_cooling: Remove unused list
  2021-03-12 17:03 [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Daniel Lezcano
                   ` (2 preceding siblings ...)
  2021-03-12 17:03 ` [PATCH v2 4/5] thermal/drivers/cpuidle_cooling: " Daniel Lezcano
@ 2021-03-12 17:03 ` Daniel Lezcano
  2021-03-12 17:19   ` Lukasz Luba
  2021-03-12 18:49 ` [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Lukasz Luba
  2021-03-14  9:53 ` Ido Schimmel
  5 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-12 17:03 UTC (permalink / raw)
  To: daniel.lezcano
  Cc: linux-kernel, linux-pm, lukasz.luba, Amit Daniel Kachhap,
	Viresh Kumar, Javi Merino, Zhang Rui, Amit Kucheria

There is a list with the purpose of grouping the cpufreq cooling
device together as described in the comments but actually it is
unused, the code evolved since 2012 and the list was no longer needed.

Delete the remaining unused list related code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/thermal/cpufreq_cooling.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index 3f5f1dce1320..f3d308427665 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -59,7 +59,6 @@ struct time_in_idle {
  * @cdev: thermal_cooling_device pointer to keep track of the
  *	registered cooling device.
  * @policy: cpufreq policy.
- * @node: list_head to link all cpufreq_cooling_device together.
  * @idle_time: idle time stats
  * @qos_req: PM QoS contraint to apply
  *
@@ -72,16 +71,12 @@ struct cpufreq_cooling_device {
 	unsigned int max_level;
 	struct em_perf_domain *em;
 	struct cpufreq_policy *policy;
-	struct list_head node;
 #ifndef CONFIG_SMP
 	struct time_in_idle *idle_time;
 #endif
 	struct freq_qos_request qos_req;
 };
 
-static DEFINE_MUTEX(cooling_list_lock);
-static LIST_HEAD(cpufreq_cdev_list);
-
 #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR
 /**
  * get_level: Find the level for a particular frequency
@@ -602,10 +597,6 @@ __cpufreq_cooling_register(struct device_node *np,
 	if (IS_ERR(cdev))
 		goto remove_qos_req;
 
-	mutex_lock(&cooling_list_lock);
-	list_add(&cpufreq_cdev->node, &cpufreq_cdev_list);
-	mutex_unlock(&cooling_list_lock);
-
 	return cdev;
 
 remove_qos_req:
@@ -697,10 +688,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
 
 	cpufreq_cdev = cdev->devdata;
 
-	mutex_lock(&cooling_list_lock);
-	list_del(&cpufreq_cdev->node);
-	mutex_unlock(&cooling_list_lock);
-
 	thermal_cooling_device_unregister(cdev);
 	freq_qos_remove_request(&cpufreq_cdev->qos_req);
 	free_idle_time(cpufreq_cdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] thermal/drivers/cpufreq_cooling: Remove unused list
  2021-03-12 17:03 ` [PATCH v2 5/5] thermal/drivers/cpufreq_cooling: Remove unused list Daniel Lezcano
@ 2021-03-12 17:19   ` Lukasz Luba
  0 siblings, 0 replies; 15+ messages in thread
From: Lukasz Luba @ 2021-03-12 17:19 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, linux-pm, Amit Daniel Kachhap, Viresh Kumar,
	Javi Merino, Zhang Rui, Amit Kucheria



On 3/12/21 5:03 PM, Daniel Lezcano wrote:
> There is a list with the purpose of grouping the cpufreq cooling
> device together as described in the comments but actually it is
> unused, the code evolved since 2012 and the list was no longer needed.
> 
> Delete the remaining unused list related code.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>   drivers/thermal/cpufreq_cooling.c | 13 -------------
>   1 file changed, 13 deletions(-)
> 

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] thermal/drivers/devfreq_cooling: Use device name instead of auto-numbering
  2021-03-12 17:03 ` [PATCH v2 3/5] thermal/drivers/devfreq_cooling: " Daniel Lezcano
@ 2021-03-12 18:05   ` Lukasz Luba
  0 siblings, 0 replies; 15+ messages in thread
From: Lukasz Luba @ 2021-03-12 18:05 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: linux-kernel, linux-pm, Zhang Rui, Amit Kucheria



On 3/12/21 5:03 PM, Daniel Lezcano wrote:
> Currently the naming of a cooling device is just a cooling technique
> followed by a number. When there are multiple cooling devices using
> the same technique, it is impossible to clearly identify the related
> device as this one is just a number.
> 
> For instance:
> 
>   thermal-devfreq-0
>   thermal-devfreq-1
>   etc ...
> 
> The 'thermal' prefix is redundant with the subsystem namespace. This
> patch removes the 'thermal' prefix and changes the number by the device
> name. So the naming above becomes:
> 
>   devfreq-5000000.gpu
>   devfreq-1d84000.ufshc
>   etc ...
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
> V2:
>   - Removed idr.h header
>   - Used kasprintf instead of fixed buffer length on the stack
>   - Fixed typo in the log
> ---
>   drivers/thermal/devfreq_cooling.c | 25 ++++++++-----------------
>   1 file changed, 8 insertions(+), 17 deletions(-)
> 

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name
  2021-03-12 17:03 [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Daniel Lezcano
                   ` (3 preceding siblings ...)
  2021-03-12 17:03 ` [PATCH v2 5/5] thermal/drivers/cpufreq_cooling: Remove unused list Daniel Lezcano
@ 2021-03-12 18:49 ` Lukasz Luba
  2021-03-12 21:01   ` Daniel Lezcano
  2021-03-14  9:53 ` Ido Schimmel
  5 siblings, 1 reply; 15+ messages in thread
From: Lukasz Luba @ 2021-03-12 18:49 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, linux-pm, Jiri Pirko, Ido Schimmel,
	David S. Miller, Jakub Kicinski, Zhang Rui, Amit Kucheria,
	open list:MELLANOX ETHERNET SWITCH DRIVERS



On 3/12/21 5:03 PM, Daniel Lezcano wrote:
> We want to have any kind of name for the cooling devices as we do no
> longer want to rely on auto-numbering. Let's replace the cooling
> device's fixed array by a char pointer to be allocated dynamically
> when registering the cooling device, so we don't limit the length of
> the name.
> 
> Rework the error path at the same time as we have to rollback the
> allocations in case of error.
> 
> Tested with a dummy device having the name:
>   "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
> 
> A village on the island of Anglesey (Wales), known to have the longest
> name in Europe.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>   .../ethernet/mellanox/mlxsw/core_thermal.c    |  2 +-
>   drivers/thermal/thermal_core.c                | 38 +++++++++++--------
>   include/linux/thermal.h                       |  2 +-
>   3 files changed, 24 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> index bf85ce9835d7..7447c2a73cbd 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> @@ -141,7 +141,7 @@ static int mlxsw_get_cooling_device_idx(struct mlxsw_thermal *thermal,
>   	/* Allow mlxsw thermal zone binding to an external cooling device */
>   	for (i = 0; i < ARRAY_SIZE(mlxsw_thermal_external_allowed_cdev); i++) {
>   		if (strnstr(cdev->type, mlxsw_thermal_external_allowed_cdev[i],
> -			    sizeof(cdev->type)))
> +			    strlen(cdev->type)))
>   			return 0;
>   	}
>   
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 996c038f83a4..9ef8090eb645 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct device_node *np,
>   {
>   	struct thermal_cooling_device *cdev;
>   	struct thermal_zone_device *pos = NULL;
> -	int result;
> -
> -	if (type && strlen(type) >= THERMAL_NAME_LENGTH)
> -		return ERR_PTR(-EINVAL);
> +	int ret;
>   
>   	if (!ops || !ops->get_max_state || !ops->get_cur_state ||
>   	    !ops->set_cur_state)
> @@ -973,14 +970,17 @@ __thermal_cooling_device_register(struct device_node *np,
>   	if (!cdev)
>   		return ERR_PTR(-ENOMEM);
>   
> -	result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
> -	if (result < 0) {
> -		kfree(cdev);
> -		return ERR_PTR(result);
> +	ret = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
> +	if (ret < 0)
> +		goto out_kfree_cdev;
> +	cdev->id = ret;
> +
> +	cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
> +	if (!cdev->type) {
> +		ret = -ENOMEM;

Since we haven't called the device_register() yet, I would call here:
kfree(cdev);
and then jump

> +		goto out_ida_remove;
>   	}
>   

Other than that, LGTM

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name
  2021-03-12 18:49 ` [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Lukasz Luba
@ 2021-03-12 21:01   ` Daniel Lezcano
  2021-03-15  9:40     ` Lukasz Luba
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-12 21:01 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, linux-pm, Jiri Pirko, Ido Schimmel,
	David S. Miller, Jakub Kicinski, Zhang Rui, Amit Kucheria,
	open list:MELLANOX ETHERNET SWITCH DRIVERS

On 12/03/2021 19:49, Lukasz Luba wrote:
> 
> 
> On 3/12/21 5:03 PM, Daniel Lezcano wrote:
>> We want to have any kind of name for the cooling devices as we do no
>> longer want to rely on auto-numbering. Let's replace the cooling
>> device's fixed array by a char pointer to be allocated dynamically
>> when registering the cooling device, so we don't limit the length of
>> the name.
>>
>> Rework the error path at the same time as we have to rollback the
>> allocations in case of error.
>>
>> Tested with a dummy device having the name:
>>   "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
>>
>> A village on the island of Anglesey (Wales), known to have the longest
>> name in Europe.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> ---
>>   .../ethernet/mellanox/mlxsw/core_thermal.c    |  2 +-
>>   drivers/thermal/thermal_core.c                | 38 +++++++++++--------
>>   include/linux/thermal.h                       |  2 +-
>>   3 files changed, 24 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>> b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>> index bf85ce9835d7..7447c2a73cbd 100644
>> --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>> +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>> @@ -141,7 +141,7 @@ static int mlxsw_get_cooling_device_idx(struct
>> mlxsw_thermal *thermal,
>>       /* Allow mlxsw thermal zone binding to an external cooling
>> device */
>>       for (i = 0; i < ARRAY_SIZE(mlxsw_thermal_external_allowed_cdev);
>> i++) {
>>           if (strnstr(cdev->type, mlxsw_thermal_external_allowed_cdev[i],
>> -                sizeof(cdev->type)))
>> +                strlen(cdev->type)))
>>               return 0;
>>       }
>>   diff --git a/drivers/thermal/thermal_core.c
>> b/drivers/thermal/thermal_core.c
>> index 996c038f83a4..9ef8090eb645 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct
>> device_node *np,
>>   {
>>       struct thermal_cooling_device *cdev;
>>       struct thermal_zone_device *pos = NULL;
>> -    int result;
>> -
>> -    if (type && strlen(type) >= THERMAL_NAME_LENGTH)
>> -        return ERR_PTR(-EINVAL);
>> +    int ret;
>>         if (!ops || !ops->get_max_state || !ops->get_cur_state ||
>>           !ops->set_cur_state)
>> @@ -973,14 +970,17 @@ __thermal_cooling_device_register(struct
>> device_node *np,
>>       if (!cdev)
>>           return ERR_PTR(-ENOMEM);
>>   -    result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
>> -    if (result < 0) {
>> -        kfree(cdev);
>> -        return ERR_PTR(result);
>> +    ret = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
>> +    if (ret < 0)
>> +        goto out_kfree_cdev;
>> +    cdev->id = ret;
>> +
>> +    cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
>> +    if (!cdev->type) {
>> +        ret = -ENOMEM;
> 
> Since we haven't called the device_register() yet, I would call here:
> kfree(cdev);
> and then jump

I'm not sure to understand, we have to remove the ida, no ?

> Other than that, LGTM
> 
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
> 
> Regards,
> Lukasz


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name
  2021-03-12 17:03 [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Daniel Lezcano
                   ` (4 preceding siblings ...)
  2021-03-12 18:49 ` [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Lukasz Luba
@ 2021-03-14  9:53 ` Ido Schimmel
  2021-03-14 10:48   ` Daniel Lezcano
  5 siblings, 1 reply; 15+ messages in thread
From: Ido Schimmel @ 2021-03-14  9:53 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, linux-pm, lukasz.luba, Jiri Pirko, Ido Schimmel,
	David S. Miller, Jakub Kicinski, Zhang Rui, Amit Kucheria,
	open list:MELLANOX ETHERNET SWITCH DRIVERS

On Fri, Mar 12, 2021 at 06:03:12PM +0100, Daniel Lezcano wrote:
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 996c038f83a4..9ef8090eb645 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct device_node *np,
>  {
>  	struct thermal_cooling_device *cdev;
>  	struct thermal_zone_device *pos = NULL;
> -	int result;
> -
> -	if (type && strlen(type) >= THERMAL_NAME_LENGTH)
> -		return ERR_PTR(-EINVAL);
> +	int ret;
>  
>  	if (!ops || !ops->get_max_state || !ops->get_cur_state ||
>  	    !ops->set_cur_state)
> @@ -973,14 +970,17 @@ __thermal_cooling_device_register(struct device_node *np,
>  	if (!cdev)
>  		return ERR_PTR(-ENOMEM);
>  
> -	result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
> -	if (result < 0) {
> -		kfree(cdev);
> -		return ERR_PTR(result);
> +	ret = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
> +	if (ret < 0)
> +		goto out_kfree_cdev;
> +	cdev->id = ret;
> +
> +	cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
> +	if (!cdev->type) {
> +		ret = -ENOMEM;
> +		goto out_ida_remove;
>  	}
>  
> -	cdev->id = result;
> -	strlcpy(cdev->type, type ? : "", sizeof(cdev->type));
>  	mutex_init(&cdev->lock);
>  	INIT_LIST_HEAD(&cdev->thermal_instances);
>  	cdev->np = np;
> @@ -990,12 +990,9 @@ __thermal_cooling_device_register(struct device_node *np,
>  	cdev->devdata = devdata;
>  	thermal_cooling_device_setup_sysfs(cdev);
>  	dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
> -	result = device_register(&cdev->device);
> -	if (result) {
> -		ida_simple_remove(&thermal_cdev_ida, cdev->id);
> -		put_device(&cdev->device);
> -		return ERR_PTR(result);
> -	}
> +	ret = device_register(&cdev->device);
> +	if (ret)
> +		goto out_kfree_type;
>  
>  	/* Add 'this' new cdev to the global cdev list */
>  	mutex_lock(&thermal_list_lock);
> @@ -1013,6 +1010,14 @@ __thermal_cooling_device_register(struct device_node *np,
>  	mutex_unlock(&thermal_list_lock);
>  
>  	return cdev;
> +
> +out_kfree_type:
> +	kfree(cdev->type);
> +	put_device(&cdev->device);
> +out_ida_remove:
> +	ida_simple_remove(&thermal_cdev_ida, cdev->id);
> +out_kfree_cdev:
> +	return ERR_PTR(ret);
>  }
>  
>  /**
> @@ -1172,6 +1177,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>  	device_del(&cdev->device);
>  	thermal_cooling_device_destroy_sysfs(cdev);
>  	put_device(&cdev->device);
> +	kfree(cdev->type);
>  }
>  EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);

I'm getting the following user-after-free with this patch [1]. Fixed by:

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 9ef8090eb645..c8d4010940ef 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1176,8 +1176,8 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
        ida_simple_remove(&thermal_cdev_ida, cdev->id);
        device_del(&cdev->device);
        thermal_cooling_device_destroy_sysfs(cdev);
-       put_device(&cdev->device);
        kfree(cdev->type);
+       put_device(&cdev->device);
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);

[1]
[  148.601815] ==================================================================
[  148.610260] BUG: KASAN: use-after-free in thermal_cooling_device_unregister+0x6ca/0x6e0
[  148.619304] Read of size 8 at addr ffff8881510a0808 by task devlink/574
[  148.626768]
[  148.628477] CPU: 2 PID: 574 Comm: devlink Not tainted 5.12.0-rc2-custom-12525-g7ba8a2feee15 #3301
[  148.638463] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  148.648625] Call Trace:
[  148.651408]  dump_stack+0xfa/0x151
[  148.661701]  print_address_description.constprop.0+0x18/0x130
[  148.681014]  kasan_report.cold+0x7f/0x111
[  148.692003]  thermal_cooling_device_unregister+0x6ca/0x6e0
[  148.703984]  mlxsw_thermal_fini+0xd2/0x1f0
[  148.708664]  mlxsw_core_bus_device_unregister+0x158/0x8d0
[  148.714794]  mlxsw_devlink_core_bus_device_reload_down+0x93/0xc0
[  148.721594]  devlink_reload+0x15f/0x5e0
[  148.749669]  devlink_nl_cmd_reload+0x7fc/0x1210
[  148.775992]  genl_family_rcv_msg_doit+0x22a/0x320
[  148.799789]  genl_rcv_msg+0x341/0x5a0
[  148.818789]  netlink_rcv_skb+0x14d/0x430
[  148.836450]  genl_rcv+0x29/0x40
[  148.840034]  netlink_unicast+0x539/0x7e0
[  148.859219]  netlink_sendmsg+0x8d7/0xe10
[  148.879271]  __sys_sendto+0x23f/0x350
[  148.904178]  __x64_sys_sendto+0xe2/0x1b0
[  148.919297]  do_syscall_64+0x2d/0x40
[  148.923365]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  148.929081] RIP: 0033:0x7f17c0dbaefa
[  148.933139] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76
 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
[  148.954190] RSP: 002b:00007ffd879c5e18 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  148.962723] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f17c0dbaefa
[  148.970751] RDX: 0000000000000030 RSI: 0000000000ad0ad0 RDI: 0000000000000003
[  148.978776] RBP: 0000000000ad0aa0 R08: 00007f17c0e8d000 R09: 000000000000000c
[  148.986803] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000408d70
[  148.994834] R13: 0000000000ad0910 R14: 0000000000000000 R15: 0000000000000001
[  149.002978]
[  149.004687] Allocated by task 1:
[  149.008345]  kasan_save_stack+0x1b/0x40
[  149.012711]  __kasan_kmalloc+0x7a/0x90
[  149.016974]  __thermal_cooling_device_register.part.0+0x59/0x9e0
[  149.023753]  thermal_cooling_device_register+0xb3/0x100
[  149.029671]  mlxsw_thermal_init+0x78b/0xa10
[  149.034427]  __mlxsw_core_bus_device_register+0xd05/0x1a30
[  149.040634]  mlxsw_core_bus_device_register+0x56/0xb0
[  149.046349]  mlxsw_pci_probe+0x53b/0x750
[  149.050800]  local_pci_probe+0xc6/0x170
[  149.055144]  pci_device_probe+0x2a3/0x4a0
[  149.059683]  really_probe+0x2b6/0xec0
[  149.063840]  driver_probe_device+0x1e2/0x330
[  149.068673]  device_driver_attach+0x282/0x2f0
[  149.073605]  __driver_attach+0x160/0x2f0
[  149.078050]  bus_for_each_dev+0x14c/0x1d0
[  149.082589]  bus_add_driver+0x3ac/0x650
[  149.086935]  driver_register+0x225/0x3a0
[  149.091381]  mlxsw_sp_module_init+0xa2/0x174
[  149.096216]  do_one_initcall+0x108/0x690
[  149.100660]  kernel_init_freeable+0x3ec/0x46b
[  149.105595]  kernel_init+0x13/0x1eb
[  149.109559]  ret_from_fork+0x1f/0x30
[  149.113613]
[  149.115311] Freed by task 574:
[  149.118765]  kasan_save_stack+0x1b/0x40
[  149.123116]  kasan_set_track+0x1c/0x30
[  149.127373]  kasan_set_free_info+0x20/0x30
[  149.132021]  __kasan_slab_free+0xe5/0x110
[  149.136556]  slab_free_freelist_hook+0x59/0x150
[  149.141681]  kfree+0xd5/0x3b0
[  149.145055]  thermal_release+0xa0/0x110
[  149.149414]  device_release+0xa4/0x240
[  149.153680]  kobject_put+0x1c8/0x540
[  149.157747]  put_device+0x20/0x30
[  149.161530]  thermal_cooling_device_unregister+0x578/0x6e0
[  149.167751]  mlxsw_thermal_fini+0xd2/0x1f0
[  149.172414]  mlxsw_core_bus_device_unregister+0x158/0x8d0
[  149.178529]  mlxsw_devlink_core_bus_device_reload_down+0x93/0xc0
[  149.185327]  devlink_reload+0x15f/0x5e0
[  149.189695]  devlink_nl_cmd_reload+0x7fc/0x1210
[  149.194838]  genl_family_rcv_msg_doit+0x22a/0x320
[  149.200182]  genl_rcv_msg+0x341/0x5a0
[  149.204337]  netlink_rcv_skb+0x14d/0x430
[  149.208799]  genl_rcv+0x29/0x40
[  149.212381]  netlink_unicast+0x539/0x7e0
[  149.216825]  netlink_sendmsg+0x8d7/0xe10
[  149.221274]  __sys_sendto+0x23f/0x350
[  149.225423]  __x64_sys_sendto+0xe2/0x1b0
[  149.229864]  do_syscall_64+0x2d/0x40
[  149.233912]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  149.239624]
[  149.241322] The buggy address belongs to the object at ffff8881510a0800
[  149.241322]  which belongs to the cache kmalloc-2k of size 2048
[  149.255372] The buggy address is located 8 bytes inside of
[  149.255372]  2048-byte region [ffff8881510a0800, ffff8881510a1000)
[  149.268456] The buggy address belongs to the page:
[  149.273850] page:000000006ec87a73 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1510a0
[  149.284416] head:000000006ec87a73 order:3 compound_mapcount:0 compound_pincount:0
[  149.292838] flags: 0x200000000010200(slab|head)
[  149.297981] raw: 0200000000010200 ffffea000426e208 ffffea000544b808 ffff88810004de40
[  149.306707] raw: 0000000000000000 0000000000050005 00000001ffffffff 0000000000000000
[  149.315417] page dumped because: kasan: bad access detected
[  149.321697]
[  149.323403] Memory state around the buggy address:
[  149.328811]  ffff8881510a0700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  149.336939]  ffff8881510a0780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  149.345075] >ffff8881510a0800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  149.353202]                       ^
[  149.357159]  ffff8881510a0880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  149.365298]  ffff8881510a0900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  149.373424] ==================================================================

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name
  2021-03-14  9:53 ` Ido Schimmel
@ 2021-03-14 10:48   ` Daniel Lezcano
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-14 10:48 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: linux-kernel, linux-pm, lukasz.luba, Jiri Pirko, Ido Schimmel,
	David S. Miller, Jakub Kicinski, Zhang Rui, Amit Kucheria,
	open list:MELLANOX ETHERNET SWITCH DRIVERS


Hi Ido,

On 14/03/2021 10:53, Ido Schimmel wrote:
> On Fri, Mar 12, 2021 at 06:03:12PM +0100, Daniel Lezcano wrote:
>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>> index 996c038f83a4..9ef8090eb645 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct device_node *np,

[ ... ]

>>  /**
>> @@ -1172,6 +1177,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>>  	device_del(&cdev->device);
>>  	thermal_cooling_device_destroy_sysfs(cdev);
>>  	put_device(&cdev->device);
>> +	kfree(cdev->type);
>>  }
>>  EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
> 
> I'm getting the following user-after-free with this patch [1]. Fixed by:
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 9ef8090eb645..c8d4010940ef 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1176,8 +1176,8 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>         ida_simple_remove(&thermal_cdev_ida, cdev->id);
>         device_del(&cdev->device);
>         thermal_cooling_device_destroy_sysfs(cdev);
> -       put_device(&cdev->device);
>         kfree(cdev->type);
> +       put_device(&cdev->device);

Indeed 'thermal_release' frees the cdev pointer and is called by
put_device, then kfree use the pointer right after.

Thanks for the fix

  -- Daniel




-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/5] thermal/drivers/cpuidle_cooling: Use device name instead of auto-numbering
  2021-03-12 17:03 ` [PATCH v2 4/5] thermal/drivers/cpuidle_cooling: " Daniel Lezcano
@ 2021-03-15  3:07   ` Viresh Kumar
  2021-03-15  3:10     ` Daniel Lezcano
  0 siblings, 1 reply; 15+ messages in thread
From: Viresh Kumar @ 2021-03-15  3:07 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, linux-pm, lukasz.luba, Amit Daniel Kachhap,
	Javi Merino, Zhang Rui, Amit Kucheria

On 12-03-21, 18:03, Daniel Lezcano wrote:
> Currently the naming of a cooling device is just a cooling technique
> followed by a number. When there are multiple cooling devices using
> the same technique, it is impossible to clearly identify the related
> device as this one is just a number.
> 
> For instance:
> 
>  thermal-idle-0
>  thermal-idle-1
>  thermal-idle-2
>  thermal-idle-3
>  etc ...
> 
> The 'thermal' prefix is redundant with the subsystem namespace. This
> patch removes the 'thermal prefix and changes the number by the device
> name. So the naming above becomes:
> 
>  idle-cpu0
>  idle-cpu1
>  idle-cpu2
>  idle-cpu3
>  etc ...
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

I acked for both the patches :(

-- 
viresh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/5] thermal/drivers/cpuidle_cooling: Use device name instead of auto-numbering
  2021-03-15  3:07   ` Viresh Kumar
@ 2021-03-15  3:10     ` Daniel Lezcano
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Lezcano @ 2021-03-15  3:10 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-pm, lukasz.luba, Amit Daniel Kachhap,
	Javi Merino, Zhang Rui, Amit Kucheria

On 15/03/2021 04:07, Viresh Kumar wrote:
> On 12-03-21, 18:03, Daniel Lezcano wrote:
>> Currently the naming of a cooling device is just a cooling technique
>> followed by a number. When there are multiple cooling devices using
>> the same technique, it is impossible to clearly identify the related
>> device as this one is just a number.
>>
>> For instance:
>>
>>  thermal-idle-0
>>  thermal-idle-1
>>  thermal-idle-2
>>  thermal-idle-3
>>  etc ...
>>
>> The 'thermal' prefix is redundant with the subsystem namespace. This
>> patch removes the 'thermal prefix and changes the number by the device
>> name. So the naming above becomes:
>>
>>  idle-cpu0
>>  idle-cpu1
>>  idle-cpu2
>>  idle-cpu3
>>  etc ...
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
> 
> I acked for both the patches :(

Right, I'll add you when merging the patches.

Thanks


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name
  2021-03-12 21:01   ` Daniel Lezcano
@ 2021-03-15  9:40     ` Lukasz Luba
  2021-03-15  9:58       ` Lukasz Luba
  0 siblings, 1 reply; 15+ messages in thread
From: Lukasz Luba @ 2021-03-15  9:40 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, linux-pm, Jiri Pirko, Ido Schimmel,
	David S. Miller, Jakub Kicinski, Zhang Rui, Amit Kucheria,
	open list:MELLANOX ETHERNET SWITCH DRIVERS



On 3/12/21 9:01 PM, Daniel Lezcano wrote:
> On 12/03/2021 19:49, Lukasz Luba wrote:
>>
>>
>> On 3/12/21 5:03 PM, Daniel Lezcano wrote:
>>> We want to have any kind of name for the cooling devices as we do no
>>> longer want to rely on auto-numbering. Let's replace the cooling
>>> device's fixed array by a char pointer to be allocated dynamically
>>> when registering the cooling device, so we don't limit the length of
>>> the name.
>>>
>>> Rework the error path at the same time as we have to rollback the
>>> allocations in case of error.
>>>
>>> Tested with a dummy device having the name:
>>>    "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
>>>
>>> A village on the island of Anglesey (Wales), known to have the longest
>>> name in Europe.
>>>
>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>> ---
>>>    .../ethernet/mellanox/mlxsw/core_thermal.c    |  2 +-
>>>    drivers/thermal/thermal_core.c                | 38 +++++++++++--------
>>>    include/linux/thermal.h                       |  2 +-
>>>    3 files changed, 24 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>> b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>> index bf85ce9835d7..7447c2a73cbd 100644
>>> --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>> +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>> @@ -141,7 +141,7 @@ static int mlxsw_get_cooling_device_idx(struct
>>> mlxsw_thermal *thermal,
>>>        /* Allow mlxsw thermal zone binding to an external cooling
>>> device */
>>>        for (i = 0; i < ARRAY_SIZE(mlxsw_thermal_external_allowed_cdev);
>>> i++) {
>>>            if (strnstr(cdev->type, mlxsw_thermal_external_allowed_cdev[i],
>>> -                sizeof(cdev->type)))
>>> +                strlen(cdev->type)))
>>>                return 0;
>>>        }
>>>    diff --git a/drivers/thermal/thermal_core.c
>>> b/drivers/thermal/thermal_core.c
>>> index 996c038f83a4..9ef8090eb645 100644
>>> --- a/drivers/thermal/thermal_core.c
>>> +++ b/drivers/thermal/thermal_core.c
>>> @@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct
>>> device_node *np,
>>>    {
>>>        struct thermal_cooling_device *cdev;
>>>        struct thermal_zone_device *pos = NULL;
>>> -    int result;
>>> -
>>> -    if (type && strlen(type) >= THERMAL_NAME_LENGTH)
>>> -        return ERR_PTR(-EINVAL);
>>> +    int ret;
>>>          if (!ops || !ops->get_max_state || !ops->get_cur_state ||
>>>            !ops->set_cur_state)
>>> @@ -973,14 +970,17 @@ __thermal_cooling_device_register(struct
>>> device_node *np,
>>>        if (!cdev)
>>>            return ERR_PTR(-ENOMEM);
>>>    -    result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
>>> -    if (result < 0) {
>>> -        kfree(cdev);
>>> -        return ERR_PTR(result);
>>> +    ret = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
>>> +    if (ret < 0)
>>> +        goto out_kfree_cdev;
>>> +    cdev->id = ret;
>>> +
>>> +    cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
>>> +    if (!cdev->type) {
>>> +        ret = -ENOMEM;
>>
>> Since we haven't called the device_register() yet, I would call here:
>> kfree(cdev);
>> and then jump
> 
> I'm not sure to understand, we have to remove the ida, no ?

Yes, we have to remove 'ida' and you jump to that label:
goto out_ida_remove;
but under that label, there is no 'put_device()'.
We could have here, before the 'goto', a simple kfree, which
should be safe, since we haven't called the device_register() yet.
Something like:

--------8<------------------------------
cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
if (!cdev->type) {
	ret = -ENOMEM;
	kfree(cdev);
	goto out_ida_remove;
}

-------->8------------------------------


> 
>> Other than that, LGTM
>>
>> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
>>
>> Regards,
>> Lukasz
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name
  2021-03-15  9:40     ` Lukasz Luba
@ 2021-03-15  9:58       ` Lukasz Luba
  0 siblings, 0 replies; 15+ messages in thread
From: Lukasz Luba @ 2021-03-15  9:58 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, linux-pm, Jiri Pirko, Ido Schimmel,
	David S. Miller, Jakub Kicinski, Zhang Rui, Amit Kucheria,
	open list:MELLANOX ETHERNET SWITCH DRIVERS



On 3/15/21 9:40 AM, Lukasz Luba wrote:
> 
> 
> On 3/12/21 9:01 PM, Daniel Lezcano wrote:
>> On 12/03/2021 19:49, Lukasz Luba wrote:
>>>
>>>
>>> On 3/12/21 5:03 PM, Daniel Lezcano wrote:
>>>> We want to have any kind of name for the cooling devices as we do no
>>>> longer want to rely on auto-numbering. Let's replace the cooling
>>>> device's fixed array by a char pointer to be allocated dynamically
>>>> when registering the cooling device, so we don't limit the length of
>>>> the name.
>>>>
>>>> Rework the error path at the same time as we have to rollback the
>>>> allocations in case of error.
>>>>
>>>> Tested with a dummy device having the name:
>>>>    "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
>>>>
>>>> A village on the island of Anglesey (Wales), known to have the longest
>>>> name in Europe.
>>>>
>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>> ---
>>>>    .../ethernet/mellanox/mlxsw/core_thermal.c    |  2 +-
>>>>    drivers/thermal/thermal_core.c                | 38 
>>>> +++++++++++--------
>>>>    include/linux/thermal.h                       |  2 +-
>>>>    3 files changed, 24 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>>> b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>>> index bf85ce9835d7..7447c2a73cbd 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
>>>> @@ -141,7 +141,7 @@ static int mlxsw_get_cooling_device_idx(struct
>>>> mlxsw_thermal *thermal,
>>>>        /* Allow mlxsw thermal zone binding to an external cooling
>>>> device */
>>>>        for (i = 0; i < ARRAY_SIZE(mlxsw_thermal_external_allowed_cdev);
>>>> i++) {
>>>>            if (strnstr(cdev->type, 
>>>> mlxsw_thermal_external_allowed_cdev[i],
>>>> -                sizeof(cdev->type)))
>>>> +                strlen(cdev->type)))
>>>>                return 0;
>>>>        }
>>>>    diff --git a/drivers/thermal/thermal_core.c
>>>> b/drivers/thermal/thermal_core.c
>>>> index 996c038f83a4..9ef8090eb645 100644
>>>> --- a/drivers/thermal/thermal_core.c
>>>> +++ b/drivers/thermal/thermal_core.c
>>>> @@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct
>>>> device_node *np,
>>>>    {
>>>>        struct thermal_cooling_device *cdev;
>>>>        struct thermal_zone_device *pos = NULL;
>>>> -    int result;
>>>> -
>>>> -    if (type && strlen(type) >= THERMAL_NAME_LENGTH)
>>>> -        return ERR_PTR(-EINVAL);
>>>> +    int ret;
>>>>          if (!ops || !ops->get_max_state || !ops->get_cur_state ||
>>>>            !ops->set_cur_state)
>>>> @@ -973,14 +970,17 @@ __thermal_cooling_device_register(struct
>>>> device_node *np,
>>>>        if (!cdev)
>>>>            return ERR_PTR(-ENOMEM);
>>>>    -    result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
>>>> -    if (result < 0) {
>>>> -        kfree(cdev);
>>>> -        return ERR_PTR(result);
>>>> +    ret = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
>>>> +    if (ret < 0)
>>>> +        goto out_kfree_cdev;
>>>> +    cdev->id = ret;
>>>> +
>>>> +    cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
>>>> +    if (!cdev->type) {
>>>> +        ret = -ENOMEM;
>>>
>>> Since we haven't called the device_register() yet, I would call here:
>>> kfree(cdev);
>>> and then jump
>>
>> I'm not sure to understand, we have to remove the ida, no ?
> 
> Yes, we have to remove 'ida' and you jump to that label:
> goto out_ida_remove;
> but under that label, there is no 'put_device()'.
> We could have here, before the 'goto', a simple kfree, which
> should be safe, since we haven't called the device_register() yet.
> Something like:
> 
> --------8<------------------------------
> cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
> if (!cdev->type) {
>      ret = -ENOMEM;
>      kfree(cdev);
>      goto out_ida_remove;
> }
> 
> -------->8------------------------------
> 

I've check that label and probably not easy to modify it
and put conditional there. So probably you would have to
call everything here (not jumping to label):

ida_simple_remove(&thermal_cdev_ida, cdev->id);
kfree(cdev);
return ERR_PTR(-_ENOMEM);


> 
>>
>>> Other than that, LGTM
>>>
>>> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
>>>
>>> Regards,
>>> Lukasz
>>
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-03-15  9:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 17:03 [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Daniel Lezcano
2021-03-12 17:03 ` [PATCH v2 2/5] thermal/drivers/cpufreq_cooling: Use device name instead of auto-numbering Daniel Lezcano
2021-03-12 17:03 ` [PATCH v2 3/5] thermal/drivers/devfreq_cooling: " Daniel Lezcano
2021-03-12 18:05   ` Lukasz Luba
2021-03-12 17:03 ` [PATCH v2 4/5] thermal/drivers/cpuidle_cooling: " Daniel Lezcano
2021-03-15  3:07   ` Viresh Kumar
2021-03-15  3:10     ` Daniel Lezcano
2021-03-12 17:03 ` [PATCH v2 5/5] thermal/drivers/cpufreq_cooling: Remove unused list Daniel Lezcano
2021-03-12 17:19   ` Lukasz Luba
2021-03-12 18:49 ` [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name Lukasz Luba
2021-03-12 21:01   ` Daniel Lezcano
2021-03-15  9:40     ` Lukasz Luba
2021-03-15  9:58       ` Lukasz Luba
2021-03-14  9:53 ` Ido Schimmel
2021-03-14 10:48   ` Daniel Lezcano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).