All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck
@ 2016-11-18  0:03 Thomas Gleixner
  2016-11-18  0:03 ` [patch 01/12] thermal/x86_pkg_temp: Cleanup thermal interrupt handling Thomas Gleixner
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

We solely intended to convert that driver to the hotplug state machine and
stumbled over a large pile of insanities, which are all interwoven with the
package management:

 - The work cancelation code, the thermal zone unregistering, the work code
   and the interrupt notification function are racy against each other and
   against cpu hotplug and module exit. The random locking sprinkeled all
   over the place does not help anything and probably exists to make people
   feel good. The resulting issues (mainly use after free) are probably
   hard to trigger, but they clearly exist

 - Work cancelation in the cpu hotplug code can leave the work marked
   scheduled and the package interrupts disabled forever.

 - Storage for a boolean information whether work is scheduled for a
   package is kept in separate allocated storage, which is resized when the
   number of detected packages grows.

 - Delayed work structs are held in a static percpu storage, which makes no
   sense at all because work is strictly per package.

 - Packages are kept in a list, which must be searched over and over.

Fixing the whole pile of races with a few duct tape fixes was pretty much
impossible, so I decided to do a major rewrite to fix all of the
above. Here are the major changes:

 - Rewrite the offline code with proper locking against interrupts and work
   function and make sure that canceled work is rescheduled if there is
   another online cpu in the package.

 - Use the cpu offline callback on module exit to fix the work cancelation
   race.

 - Move the bool which denotes scheduled work into the package struct
   where it belongs.

 - Move the delayed work struct into the package struct, which is the only
   sensible place to have it and schedule the work on the cpu which is the
   target for the sysfs files as this makes the cancellation and
   rescheduling in the cpu offline path simple.

 - Add a large pile of comments documenting the cpu teardown mechanism

 - Code sanitizing, revamp the horrible name choices plus a general coding
   style cleanup.

   Note, that I did the namespace and code cleanup in the middle of the
   series, because staring at that mess just made my eyes bleeding.

 - Store the package pointers in an array which is allocated at init
   time. Sizing of the array is determined from the topology
   information. That makes the package lookup a simple array lookup.

As a last step the driver is converted to the hotplug state machine.

The total damage is:

 x86_pkg_temp_thermal.c |  590 ++++++++++++++++++++-----------------------------
 1 file changed, 245 insertions(+), 345 deletions(-)

So a net reduction of 100 lines and the compiled size of the driver changes
in the following way:

   text	   data	    bss	    dec
   4370	    549	    128	   5047		Before
   2845	    424	     40	   3309	    	After fixes and cleanup
   2690	    404	     40	   3134		After hotplug conversion

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 01/12] thermal/x86_pkg_temp: Cleanup thermal interrupt handling
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 02/12] thermal/x86_pkg_temp: Remove redundant package search Thomas Gleixner
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Cleanup-thermal-interrupt-handling.patch --]
[-- Type: text/plain, Size: 1616 bytes --]

Wenn a package is removed nothing restores the thermal interrupt MSR so
the content will be stale when a CPU of that package becomes online again.

Aside of that the work function reenables interrupts before acknowledging
the current one, which is the wrong order to begin with.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -334,7 +334,6 @@ static void pkg_temp_thermal_threshold_w
 	pkg_work_scheduled[phy_id] = 0;
 	spin_unlock_irqrestore(&pkg_work_lock, flags);
 
-	enable_pkg_thres_interrupt();
 	rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
 	if (msr_val & THERM_LOG_THRESHOLD0) {
 		wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS,
@@ -346,6 +345,9 @@ static void pkg_temp_thermal_threshold_w
 				msr_val & ~THERM_LOG_THRESHOLD1);
 		notify = true;
 	}
+
+	enable_pkg_thres_interrupt();
+
 	if (notify) {
 		pr_debug("thermal_zone_device_update\n");
 		thermal_zone_device_update(phdev->tzone,
@@ -505,6 +507,13 @@ static int pkg_temp_thermal_device_remov
 		list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
 			if (phdev->phys_proc_id == phys_proc_id) {
 				thermal_zone_device_unregister(phdev->tzone);
+				/*
+				 * Restore original MSR value for package
+				 * thermal interrupt.
+				 */
+				wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+					     phdev->start_pkg_therm_low,
+					     phdev->start_pkg_therm_high);
 				list_del(&phdev->list);
 				kfree(phdev);
 				break;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 02/12] thermal/x86_pkg_temp: Remove redundant package search
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
  2016-11-18  0:03 ` [patch 01/12] thermal/x86_pkg_temp: Cleanup thermal interrupt handling Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 03/12] thermal/x86_pkg_temp: Replace open coded cpu search Thomas Gleixner
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Remove-redundant-package-search.patch --]
[-- Type: text/plain, Size: 2023 bytes --]

In pkg_temp_thermal_device_remove() the package device is searched at the
beginning of the function. When the device refcount becomes zero another
search for the same device is conducted. Remove the pointless loop and use
the device pointer which was retrieved at the beginning of the function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -479,10 +479,8 @@ static int pkg_temp_thermal_device_add(u
 
 static int pkg_temp_thermal_device_remove(unsigned int cpu)
 {
-	struct phy_dev_entry *n;
+	struct phy_dev_entry *phdev = pkg_temp_thermal_get_phy_entry(cpu);
 	u16 phys_proc_id = topology_physical_package_id(cpu);
-	struct phy_dev_entry *phdev =
-			pkg_temp_thermal_get_phy_entry(cpu);
 
 	if (!phdev)
 		return -ENODEV;
@@ -503,22 +501,19 @@ static int pkg_temp_thermal_device_remov
 	--phdev->ref_cnt;
 	pr_debug("thermal_device_remove: pkg: %d cpu %d ref_cnt %d\n",
 					phys_proc_id, cpu, phdev->ref_cnt);
-	if (!phdev->ref_cnt)
-		list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
-			if (phdev->phys_proc_id == phys_proc_id) {
-				thermal_zone_device_unregister(phdev->tzone);
-				/*
-				 * Restore original MSR value for package
-				 * thermal interrupt.
-				 */
-				wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
-					     phdev->start_pkg_therm_low,
-					     phdev->start_pkg_therm_high);
-				list_del(&phdev->list);
-				kfree(phdev);
-				break;
-			}
-		}
+
+	if (!phdev->ref_cnt) {
+		thermal_zone_device_unregister(phdev->tzone);
+		/*
+		 * Restore original MSR value for package thermal
+		 * interrupt.
+		 */
+		wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			     phdev->start_pkg_therm_low,
+			     phdev->start_pkg_therm_high);
+		list_del(&phdev->list);
+		kfree(phdev);
+	}
 	mutex_unlock(&phy_dev_list_mutex);
 
 	return 0;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 03/12] thermal/x86_pkg_temp: Replace open coded cpu search
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
  2016-11-18  0:03 ` [patch 01/12] thermal/x86_pkg_temp: Cleanup thermal interrupt handling Thomas Gleixner
  2016-11-18  0:03 ` [patch 02/12] thermal/x86_pkg_temp: Remove redundant package search Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 04/12] thermal/x86_pkg_temp: Sanitize callback (de)initialization Thomas Gleixner
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Replace-open-coded-cpu-search.patch --]
[-- Type: text/plain, Size: 1444 bytes --]

find_next_sibling() iterates over the online cpus and searches for a cpu
with the same package id as the current cpu. This is a pointless exercise
as topology_core_cpumask() allows a simple cpumask search for an online cpu
on the same package.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -384,18 +384,6 @@ static int pkg_temp_thermal_platform_the
 	return 0;
 }
 
-static int find_siblings_cpu(int cpu)
-{
-	int i;
-	int id = topology_physical_package_id(cpu);
-
-	for_each_online_cpu(i)
-		if (i != cpu && topology_physical_package_id(i) == id)
-			return i;
-
-	return 0;
-}
-
 static int pkg_temp_thermal_device_add(unsigned int cpu)
 {
 	int err;
@@ -488,9 +476,10 @@ static int pkg_temp_thermal_device_remov
 	mutex_lock(&phy_dev_list_mutex);
 	/* If we are loosing the first cpu for this package, we need change */
 	if (phdev->first_cpu == cpu) {
-		phdev->first_cpu = find_siblings_cpu(cpu);
-		pr_debug("thermal_device_remove: first cpu switched %d\n",
-					phdev->first_cpu);
+		int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+		phdev->first_cpu = target;
+		pr_debug("CPU %d down. New first_cpu%d\n", cpu, target);
 	}
 	/*
 	* It is possible that no siblings left as this was the last cpu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 04/12] thermal/x86_pkg_temp: Sanitize callback (de)initialization
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-11-18  0:03 ` [patch 03/12] thermal/x86_pkg_temp: Replace open coded cpu search Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 05/12] thermal/x86_pkg_temp: Get rid of ref counting Thomas Gleixner
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Sanitize-callback--de-initialization.patch --]
[-- Type: text/plain, Size: 2720 bytes --]

The threshold callbacks are installed before the initialization of the
online cpus has succeeded and removed after the teardown has been
done. That's both wrong as callbacks might be invoked into a half
initialized or torn down state.

Move them to the proper places: Last in init() and first in exit().

While at it shorten the insane long and horrible named function names.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -281,7 +281,7 @@ static struct thermal_zone_device_ops tz
 	.set_trip_temp = sys_set_trip_temp,
 };
 
-static bool pkg_temp_thermal_platform_thermal_rate_control(void)
+static bool pkg_thermal_rate_control(void)
 {
 	return true;
 }
@@ -355,7 +355,7 @@ static void pkg_temp_thermal_threshold_w
 	}
 }
 
-static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+static int pkg_thermal_notify(__u64 msr_val)
 {
 	unsigned long flags;
 	int cpu = smp_processor_id();
@@ -579,10 +579,6 @@ static int __init pkg_temp_thermal_init(
 		return -ENODEV;
 
 	spin_lock_init(&pkg_work_lock);
-	platform_thermal_package_notify =
-			pkg_temp_thermal_platform_thermal_notify;
-	platform_thermal_package_rate_control =
-			pkg_temp_thermal_platform_thermal_rate_control;
 
 	cpu_notifier_register_begin();
 	for_each_online_cpu(i)
@@ -591,6 +587,9 @@ static int __init pkg_temp_thermal_init(
 	__register_hotcpu_notifier(&pkg_temp_thermal_notifier);
 	cpu_notifier_register_done();
 
+	platform_thermal_package_notify = pkg_thermal_notify;
+	platform_thermal_package_rate_control = pkg_thermal_rate_control;
+
 	pkg_temp_debugfs_init(); /* Don't care if fails */
 
 	return 0;
@@ -600,9 +599,6 @@ static int __init pkg_temp_thermal_init(
 		put_core_offline(i);
 	cpu_notifier_register_done();
 	kfree(pkg_work_scheduled);
-	platform_thermal_package_notify = NULL;
-	platform_thermal_package_rate_control = NULL;
-
 	return -ENODEV;
 }
 
@@ -611,6 +607,9 @@ static void __exit pkg_temp_thermal_exit
 	struct phy_dev_entry *phdev, *n;
 	int i;
 
+	platform_thermal_package_notify = NULL;
+	platform_thermal_package_rate_control = NULL;
+
 	cpu_notifier_register_begin();
 	__unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
 	mutex_lock(&phy_dev_list_mutex);
@@ -625,8 +624,6 @@ static void __exit pkg_temp_thermal_exit
 		kfree(phdev);
 	}
 	mutex_unlock(&phy_dev_list_mutex);
-	platform_thermal_package_notify = NULL;
-	platform_thermal_package_rate_control = NULL;
 	for_each_online_cpu(i)
 		cancel_delayed_work_sync(
 			&per_cpu(pkg_temp_thermal_threshold_work, i));

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 05/12] thermal/x86_pkg_temp: Get rid of ref counting
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-11-18  0:03 ` [patch 04/12] thermal/x86_pkg_temp: Sanitize callback (de)initialization Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 06/12] thermal/x86_pkg_temp: Cleanup namespace Thomas Gleixner
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Get-rid-of-ref-counting.patch --]
[-- Type: text/plain, Size: 5323 bytes --]

There is no point in the whole package data refcounting dance because
topology_core_cpumask tells us whether this is the last cpu in the
package. If yes, then the package can go, if not it stays. It's already
serialized via the hotplug code.

While at it rename the first_cpu member of the package structure to
cpu. The first has absolutely no meaning.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   62 ++++++++++-----------------------
 1 file changed, 20 insertions(+), 42 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -60,9 +60,8 @@ MODULE_PARM_DESC(notify_delay_ms,
 struct phy_dev_entry {
 	struct list_head list;
 	u16 phys_proc_id;
-	u16 first_cpu;
+	u16 cpu;
 	u32 tj_max;
-	int ref_cnt;
 	u32 start_pkg_therm_low;
 	u32 start_pkg_therm_high;
 	struct thermal_zone_device *tzone;
@@ -170,8 +169,8 @@ static int sys_get_curr_temp(struct ther
 	struct phy_dev_entry *phy_dev_entry;
 
 	phy_dev_entry = tzd->devdata;
-	rdmsr_on_cpu(phy_dev_entry->first_cpu, MSR_IA32_PACKAGE_THERM_STATUS,
-			&eax, &edx);
+	rdmsr_on_cpu(phy_dev_entry->cpu, MSR_IA32_PACKAGE_THERM_STATUS,
+		     &eax, &edx);
 	if (eax & 0x80000000) {
 		*temp = phy_dev_entry->tj_max -
 				((eax >> 16) & 0x7f) * 1000;
@@ -204,8 +203,8 @@ static int sys_get_trip_temp(struct ther
 		shift = THERM_SHIFT_THRESHOLD0;
 	}
 
-	ret = rdmsr_on_cpu(phy_dev_entry->first_cpu,
-				MSR_IA32_PACKAGE_THERM_INTERRUPT, &eax, &edx);
+	ret = rdmsr_on_cpu(phy_dev_entry->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			   &eax, &edx);
 	if (ret < 0)
 		return -EINVAL;
 
@@ -232,9 +231,8 @@ static int sys_set_trip_temp(struct ther
 	if (trip >= MAX_NUMBER_OF_TRIPS || temp >= phy_dev_entry->tj_max)
 		return -EINVAL;
 
-	ret = rdmsr_on_cpu(phy_dev_entry->first_cpu,
-					MSR_IA32_PACKAGE_THERM_INTERRUPT,
-					&l, &h);
+	ret = rdmsr_on_cpu(phy_dev_entry->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			   &l, &h);
 	if (ret < 0)
 		return -EINVAL;
 
@@ -259,9 +257,8 @@ static int sys_set_trip_temp(struct ther
 		l |= intr;
 	}
 
-	return wrmsr_on_cpu(phy_dev_entry->first_cpu,
-					MSR_IA32_PACKAGE_THERM_INTERRUPT,
-					l, h);
+	return wrmsr_on_cpu(phy_dev_entry->cpu,	MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			    l, h);
 }
 
 static int sys_get_trip_type(struct thermal_zone_device *thermal,
@@ -431,9 +428,8 @@ static int pkg_temp_thermal_device_add(u
 	spin_unlock_irqrestore(&pkg_work_lock, flags);
 
 	phy_dev_entry->phys_proc_id = topology_physical_package_id(cpu);
-	phy_dev_entry->first_cpu = cpu;
+	phy_dev_entry->cpu = cpu;
 	phy_dev_entry->tj_max = tj_max;
-	phy_dev_entry->ref_cnt = 1;
 	phy_dev_entry->tzone = thermal_zone_device_register("x86_pkg_temp",
 			thres_count,
 			(thres_count == MAX_NUMBER_OF_TRIPS) ?
@@ -468,30 +464,16 @@ static int pkg_temp_thermal_device_add(u
 static int pkg_temp_thermal_device_remove(unsigned int cpu)
 {
 	struct phy_dev_entry *phdev = pkg_temp_thermal_get_phy_entry(cpu);
-	u16 phys_proc_id = topology_physical_package_id(cpu);
+	int target;
 
 	if (!phdev)
 		return -ENODEV;
 
 	mutex_lock(&phy_dev_list_mutex);
-	/* If we are loosing the first cpu for this package, we need change */
-	if (phdev->first_cpu == cpu) {
-		int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
-
-		phdev->first_cpu = target;
-		pr_debug("CPU %d down. New first_cpu%d\n", cpu, target);
-	}
-	/*
-	* It is possible that no siblings left as this was the last cpu
-	* going offline. We don't need to worry about this assignment
-	* as the phydev entry will be removed in this case and
-	* thermal zone is removed.
-	*/
-	--phdev->ref_cnt;
-	pr_debug("thermal_device_remove: pkg: %d cpu %d ref_cnt %d\n",
-					phys_proc_id, cpu, phdev->ref_cnt);
 
-	if (!phdev->ref_cnt) {
+	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+	/* This might be the last cpu in this package */
+	if (target >= nr_cpu_ids) {
 		thermal_zone_device_unregister(phdev->tzone);
 		/*
 		 * Restore original MSR value for package thermal
@@ -502,7 +484,10 @@ static int pkg_temp_thermal_device_remov
 			     phdev->start_pkg_therm_high);
 		list_del(&phdev->list);
 		kfree(phdev);
+	} else if (phdev->cpu == cpu) {
+		phdev->cpu = target;
 	}
+
 	mutex_unlock(&phy_dev_list_mutex);
 
 	return 0;
@@ -520,12 +505,6 @@ static int get_core_online(unsigned int
 			return -ENODEV;
 		if (pkg_temp_thermal_device_add(cpu))
 			return -ENODEV;
-	} else {
-		mutex_lock(&phy_dev_list_mutex);
-		++phdev->ref_cnt;
-		pr_debug("get_core_online: cpu %d ref_cnt %d\n",
-						cpu, phdev->ref_cnt);
-		mutex_unlock(&phy_dev_list_mutex);
 	}
 	INIT_DELAYED_WORK(&per_cpu(pkg_temp_thermal_threshold_work, cpu),
 			pkg_temp_thermal_threshold_work_fn);
@@ -615,10 +594,9 @@ static void __exit pkg_temp_thermal_exit
 	mutex_lock(&phy_dev_list_mutex);
 	list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
 		/* Retore old MSR value for package thermal interrupt */
-		wrmsr_on_cpu(phdev->first_cpu,
-			MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			phdev->start_pkg_therm_low,
-			phdev->start_pkg_therm_high);
+		wrmsr_on_cpu(phdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			     phdev->start_pkg_therm_low,
+			     phdev->start_pkg_therm_high);
 		thermal_zone_device_unregister(phdev->tzone);
 		list_del(&phdev->list);
 		kfree(phdev);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 06/12] thermal/x86_pkg_temp: Cleanup namespace
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (4 preceding siblings ...)
  2016-11-18  0:03 ` [patch 05/12] thermal/x86_pkg_temp: Get rid of ref counting Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 07/12] thermal/x86_pkg_temp: Cleanup code some more Thomas Gleixner
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Cleanup-namespace.patch --]
[-- Type: text/plain, Size: 11367 bytes --]

Any randomly chosen struct name is more descriptive than phy_dev_entry. 

Rename the whole thing to struct pkg_device, which describes the content
reasonably well and use the same variable name throughout the code so it
gets readable. Rename the msr struct members as well.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |  166 +++++++++++++++------------------
 1 file changed, 76 insertions(+), 90 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -57,14 +57,14 @@ MODULE_PARM_DESC(notify_delay_ms,
 /* Limit number of package temp zones */
 #define MAX_PKG_TEMP_ZONE_IDS	256
 
-struct phy_dev_entry {
-	struct list_head list;
-	u16 phys_proc_id;
-	u16 cpu;
-	u32 tj_max;
-	u32 start_pkg_therm_low;
-	u32 start_pkg_therm_high;
-	struct thermal_zone_device *tzone;
+struct pkg_device {
+	struct list_head		list;
+	u16				phys_proc_id;
+	u16				cpu;
+	u32				tj_max;
+	u32				msr_pkg_therm_low;
+	u32				msr_pkg_therm_high;
+	struct thermal_zone_device	*tzone;
 };
 
 static struct thermal_zone_params pkg_temp_tz_params = {
@@ -115,18 +115,17 @@ static int pkg_temp_debugfs_init(void)
 	return -ENOENT;
 }
 
-static struct phy_dev_entry
-			*pkg_temp_thermal_get_phy_entry(unsigned int cpu)
+static struct pkg_device *pkg_temp_thermal_get_dev(unsigned int cpu)
 {
 	u16 phys_proc_id = topology_physical_package_id(cpu);
-	struct phy_dev_entry *phy_ptr;
+	struct pkg_device *pkgdev;
 
 	mutex_lock(&phy_dev_list_mutex);
 
-	list_for_each_entry(phy_ptr, &phy_dev_list, list)
-		if (phy_ptr->phys_proc_id == phys_proc_id) {
+	list_for_each_entry(pkgdev, &phy_dev_list, list)
+		if (pkgdev->phys_proc_id == phys_proc_id) {
 			mutex_unlock(&phy_dev_list_mutex);
-			return phy_ptr;
+			return pkgdev;
 		}
 
 	mutex_unlock(&phy_dev_list_mutex);
@@ -165,36 +164,29 @@ static int get_tj_max(int cpu, u32 *tj_m
 
 static int sys_get_curr_temp(struct thermal_zone_device *tzd, int *temp)
 {
+	struct pkg_device *pkgdev = tzd->devdata;
 	u32 eax, edx;
-	struct phy_dev_entry *phy_dev_entry;
 
-	phy_dev_entry = tzd->devdata;
-	rdmsr_on_cpu(phy_dev_entry->cpu, MSR_IA32_PACKAGE_THERM_STATUS,
-		     &eax, &edx);
+	rdmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_STATUS, &eax, &edx);
 	if (eax & 0x80000000) {
-		*temp = phy_dev_entry->tj_max -
-				((eax >> 16) & 0x7f) * 1000;
+		*temp = pkgdev->tj_max - ((eax >> 16) & 0x7f) * 1000;
 		pr_debug("sys_get_curr_temp %d\n", *temp);
 		return 0;
 	}
-
 	return -EINVAL;
 }
 
 static int sys_get_trip_temp(struct thermal_zone_device *tzd,
-		int trip, int *temp)
+			     int trip, int *temp)
 {
-	u32 eax, edx;
-	struct phy_dev_entry *phy_dev_entry;
-	u32 mask, shift;
+	struct pkg_device *pkgdev = tzd->devdata;
 	unsigned long thres_reg_value;
+	u32 mask, shift, eax, edx;
 	int ret;
 
 	if (trip >= MAX_NUMBER_OF_TRIPS)
 		return -EINVAL;
 
-	phy_dev_entry = tzd->devdata;
-
 	if (trip) {
 		mask = THERM_MASK_THRESHOLD1;
 		shift = THERM_SHIFT_THRESHOLD1;
@@ -203,14 +195,14 @@ static int sys_get_trip_temp(struct ther
 		shift = THERM_SHIFT_THRESHOLD0;
 	}
 
-	ret = rdmsr_on_cpu(phy_dev_entry->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+	ret = rdmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
 			   &eax, &edx);
 	if (ret < 0)
 		return -EINVAL;
 
 	thres_reg_value = (eax & mask) >> shift;
 	if (thres_reg_value)
-		*temp = phy_dev_entry->tj_max - thres_reg_value * 1000;
+		*temp = pkgdev->tj_max - thres_reg_value * 1000;
 	else
 		*temp = 0;
 	pr_debug("sys_get_trip_temp %d\n", *temp);
@@ -218,20 +210,17 @@ static int sys_get_trip_temp(struct ther
 	return 0;
 }
 
-static int sys_set_trip_temp(struct thermal_zone_device *tzd, int trip,
-							int temp)
+static int
+sys_set_trip_temp(struct thermal_zone_device *tzd, int trip, int temp)
 {
-	u32 l, h;
-	struct phy_dev_entry *phy_dev_entry;
-	u32 mask, shift, intr;
+	struct pkg_device *pkgdev = tzd->devdata;
+	u32 l, h, mask, shift, intr;
 	int ret;
 
-	phy_dev_entry = tzd->devdata;
-
-	if (trip >= MAX_NUMBER_OF_TRIPS || temp >= phy_dev_entry->tj_max)
+	if (trip >= MAX_NUMBER_OF_TRIPS || temp >= pkgdev->tj_max)
 		return -EINVAL;
 
-	ret = rdmsr_on_cpu(phy_dev_entry->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+	ret = rdmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
 			   &l, &h);
 	if (ret < 0)
 		return -EINVAL;
@@ -250,19 +239,18 @@ static int sys_set_trip_temp(struct ther
 	* When users space sets a trip temperature == 0, which is indication
 	* that, it is no longer interested in receiving notifications.
 	*/
-	if (!temp)
+	if (!temp) {
 		l &= ~intr;
-	else {
-		l |= (phy_dev_entry->tj_max - temp)/1000 << shift;
+	} else {
+		l |= (pkgdev->tj_max - temp)/1000 << shift;
 		l |= intr;
 	}
 
-	return wrmsr_on_cpu(phy_dev_entry->cpu,	MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			    l, h);
+	return wrmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
 }
 
-static int sys_get_trip_type(struct thermal_zone_device *thermal,
-		int trip, enum thermal_trip_type *type)
+static int sys_get_trip_type(struct thermal_zone_device *thermal, int trip,
+			     enum thermal_trip_type *type)
 {
 
 	*type = THERMAL_TRIP_PASSIVE;
@@ -315,11 +303,11 @@ static void pkg_temp_thermal_threshold_w
 	__u64 msr_val;
 	int cpu = smp_processor_id();
 	int phy_id = topology_physical_package_id(cpu);
-	struct phy_dev_entry *phdev = pkg_temp_thermal_get_phy_entry(cpu);
+	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
 	bool notify = false;
 	unsigned long flags;
 
-	if (!phdev)
+	if (!pkgdev)
 		return;
 
 	spin_lock_irqsave(&pkg_work_lock, flags);
@@ -347,7 +335,7 @@ static void pkg_temp_thermal_threshold_w
 
 	if (notify) {
 		pr_debug("thermal_zone_device_update\n");
-		thermal_zone_device_update(phdev->tzone,
+		thermal_zone_device_update(pkgdev->tzone,
 					   THERMAL_EVENT_UNSPECIFIED);
 	}
 }
@@ -383,13 +371,11 @@ static int pkg_thermal_notify(__u64 msr_
 
 static int pkg_temp_thermal_device_add(unsigned int cpu)
 {
-	int err;
-	u32 tj_max;
-	struct phy_dev_entry *phy_dev_entry;
-	int thres_count;
-	u32 eax, ebx, ecx, edx;
-	u8 *temp;
+	u32 tj_max, eax, ebx, ecx, edx;
+	struct pkg_device *pkgdev;
+	int thres_count, err;
 	unsigned long flags;
+	u8 *temp;
 
 	cpuid(6, &eax, &ebx, &ecx, &edx);
 	thres_count = ebx & 0x07;
@@ -407,8 +393,8 @@ static int pkg_temp_thermal_device_add(u
 
 	mutex_lock(&phy_dev_list_mutex);
 
-	phy_dev_entry = kzalloc(sizeof(*phy_dev_entry), GFP_KERNEL);
-	if (!phy_dev_entry) {
+	pkgdev = kzalloc(sizeof(*pkgdev), GFP_KERNEL);
+	if (!pkgdev) {
 		err = -ENOMEM;
 		goto err_ret_unlock;
 	}
@@ -427,33 +413,32 @@ static int pkg_temp_thermal_device_add(u
 	pkg_work_scheduled[topology_physical_package_id(cpu)] = 0;
 	spin_unlock_irqrestore(&pkg_work_lock, flags);
 
-	phy_dev_entry->phys_proc_id = topology_physical_package_id(cpu);
-	phy_dev_entry->cpu = cpu;
-	phy_dev_entry->tj_max = tj_max;
-	phy_dev_entry->tzone = thermal_zone_device_register("x86_pkg_temp",
+	pkgdev->phys_proc_id = topology_physical_package_id(cpu);
+	pkgdev->cpu = cpu;
+	pkgdev->tj_max = tj_max;
+	pkgdev->tzone = thermal_zone_device_register("x86_pkg_temp",
 			thres_count,
-			(thres_count == MAX_NUMBER_OF_TRIPS) ?
-				0x03 : 0x01,
-			phy_dev_entry, &tzone_ops, &pkg_temp_tz_params, 0, 0);
-	if (IS_ERR(phy_dev_entry->tzone)) {
-		err = PTR_ERR(phy_dev_entry->tzone);
+			(thres_count == MAX_NUMBER_OF_TRIPS) ? 0x03 : 0x01,
+			pkgdev, &tzone_ops, &pkg_temp_tz_params, 0, 0);
+	if (IS_ERR(pkgdev->tzone)) {
+		err = PTR_ERR(pkgdev->tzone);
 		goto err_ret_free;
 	}
 	/* Store MSR value for package thermal interrupt, to restore at exit */
 	rdmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
-				&phy_dev_entry->start_pkg_therm_low,
-				&phy_dev_entry->start_pkg_therm_high);
+		     &pkgdev->msr_pkg_therm_low,
+		     &pkgdev->msr_pkg_therm_high);
 
-	list_add_tail(&phy_dev_entry->list, &phy_dev_list);
+	list_add_tail(&pkgdev->list, &phy_dev_list);
 	pr_debug("pkg_temp_thermal_device_add :phy_id %d cpu %d\n",
-			phy_dev_entry->phys_proc_id, cpu);
+		 pkgdev->phys_proc_id, cpu);
 
 	mutex_unlock(&phy_dev_list_mutex);
 
 	return 0;
 
 err_ret_free:
-	kfree(phy_dev_entry);
+	kfree(pkgdev);
 err_ret_unlock:
 	mutex_unlock(&phy_dev_list_mutex);
 
@@ -463,10 +448,10 @@ static int pkg_temp_thermal_device_add(u
 
 static int pkg_temp_thermal_device_remove(unsigned int cpu)
 {
-	struct phy_dev_entry *phdev = pkg_temp_thermal_get_phy_entry(cpu);
+	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
 	int target;
 
-	if (!phdev)
+	if (!pkgdev)
 		return -ENODEV;
 
 	mutex_lock(&phy_dev_list_mutex);
@@ -474,18 +459,18 @@ static int pkg_temp_thermal_device_remov
 	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 	/* This might be the last cpu in this package */
 	if (target >= nr_cpu_ids) {
-		thermal_zone_device_unregister(phdev->tzone);
+		thermal_zone_device_unregister(pkgdev->tzone);
 		/*
 		 * Restore original MSR value for package thermal
 		 * interrupt.
 		 */
 		wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			     phdev->start_pkg_therm_low,
-			     phdev->start_pkg_therm_high);
-		list_del(&phdev->list);
-		kfree(phdev);
-	} else if (phdev->cpu == cpu) {
-		phdev->cpu = target;
+			     pkgdev->msr_pkg_therm_low,
+			     pkgdev->msr_pkg_therm_high);
+		list_del(&pkgdev->list);
+		kfree(pkgdev);
+	} else if (pkgdev->cpu == cpu) {
+		pkgdev->cpu = target;
 	}
 
 	mutex_unlock(&phy_dev_list_mutex);
@@ -495,19 +480,20 @@ static int pkg_temp_thermal_device_remov
 
 static int get_core_online(unsigned int cpu)
 {
+	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
-	struct phy_dev_entry *phdev = pkg_temp_thermal_get_phy_entry(cpu);
 
 	/* Check if there is already an instance for this package */
-	if (!phdev) {
+	if (!pkgdev) {
 		if (!cpu_has(c, X86_FEATURE_DTHERM) ||
 					!cpu_has(c, X86_FEATURE_PTS))
 			return -ENODEV;
 		if (pkg_temp_thermal_device_add(cpu))
 			return -ENODEV;
 	}
+
 	INIT_DELAYED_WORK(&per_cpu(pkg_temp_thermal_threshold_work, cpu),
-			pkg_temp_thermal_threshold_work_fn);
+			  pkg_temp_thermal_threshold_work_fn);
 
 	pr_debug("get_core_online: cpu %d successful\n", cpu);
 
@@ -583,7 +569,7 @@ static int __init pkg_temp_thermal_init(
 
 static void __exit pkg_temp_thermal_exit(void)
 {
-	struct phy_dev_entry *phdev, *n;
+	struct pkg_device *pkgdev, *n;
 	int i;
 
 	platform_thermal_package_notify = NULL;
@@ -592,14 +578,14 @@ static void __exit pkg_temp_thermal_exit
 	cpu_notifier_register_begin();
 	__unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
 	mutex_lock(&phy_dev_list_mutex);
-	list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
+	list_for_each_entry_safe(pkgdev, n, &phy_dev_list, list) {
 		/* Retore old MSR value for package thermal interrupt */
-		wrmsr_on_cpu(phdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			     phdev->start_pkg_therm_low,
-			     phdev->start_pkg_therm_high);
-		thermal_zone_device_unregister(phdev->tzone);
-		list_del(&phdev->list);
-		kfree(phdev);
+		wrmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			     pkgdev->msr_pkg_therm_low,
+			     pkgdev->msr_pkg_therm_high);
+		thermal_zone_device_unregister(pkgdev->tzone);
+		list_del(&pkgdev->list);
+		kfree(pkgdev);
 	}
 	mutex_unlock(&phy_dev_list_mutex);
 	for_each_online_cpu(i)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 07/12] thermal/x86_pkg_temp: Cleanup code some more
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (5 preceding siblings ...)
  2016-11-18  0:03 ` [patch 06/12] thermal/x86_pkg_temp: Cleanup namespace Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 08/12] thermal/x86_pkg_temp: Sanitize locking Thomas Gleixner
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Cleanup-code-some-more.patch --]
[-- Type: text/plain, Size: 5563 bytes --]

Coding style fixups and replacement of overly complex constructs and random
error codes instead of returning the real ones. This mess makes the eyes bleeding.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   81 ++++++++++++---------------------
 1 file changed, 30 insertions(+), 51 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -139,27 +139,17 @@ static struct pkg_device *pkg_temp_therm
 */
 static int get_tj_max(int cpu, u32 *tj_max)
 {
-	u32 eax, edx;
-	u32 val;
+	u32 eax, edx, val;
 	int err;
 
 	err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET, &eax, &edx);
 	if (err)
-		goto err_ret;
-	else {
-		val = (eax >> 16) & 0xff;
-		if (val)
-			*tj_max = val * 1000;
-		else {
-			err = -EINVAL;
-			goto err_ret;
-		}
-	}
+		return err;
 
-	return 0;
-err_ret:
-	*tj_max = 0;
-	return err;
+	val = (eax >> 16) & 0xff;
+	*tj_max = val * 1000;
+
+	return val ? 0 : -EINVAL;
 }
 
 static int sys_get_curr_temp(struct thermal_zone_device *tzd, int *temp)
@@ -198,7 +188,7 @@ static int sys_get_trip_temp(struct ther
 	ret = rdmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
 			   &eax, &edx);
 	if (ret < 0)
-		return -EINVAL;
+		return ret;
 
 	thres_reg_value = (eax & mask) >> shift;
 	if (thres_reg_value)
@@ -223,7 +213,7 @@ sys_set_trip_temp(struct thermal_zone_de
 	ret = rdmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
 			   &l, &h);
 	if (ret < 0)
-		return -EINVAL;
+		return ret;
 
 	if (trip) {
 		mask = THERM_MASK_THRESHOLD1;
@@ -252,9 +242,7 @@ sys_set_trip_temp(struct thermal_zone_de
 static int sys_get_trip_type(struct thermal_zone_device *thermal, int trip,
 			     enum thermal_trip_type *type)
 {
-
 	*type = THERMAL_TRIP_PASSIVE;
-
 	return 0;
 }
 
@@ -274,8 +262,8 @@ static bool pkg_thermal_rate_control(voi
 /* Enable threshold interrupt on local package/cpu */
 static inline void enable_pkg_thres_interrupt(void)
 {
-	u32 l, h;
 	u8 thres_0, thres_1;
+	u32 l, h;
 
 	rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
 	/* only enable/disable if it had valid threshold value */
@@ -292,20 +280,21 @@ static inline void enable_pkg_thres_inte
 static inline void disable_pkg_thres_interrupt(void)
 {
 	u32 l, h;
+
 	rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
-	wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			l & (~THERM_INT_THRESHOLD0_ENABLE) &
-				(~THERM_INT_THRESHOLD1_ENABLE), h);
+
+	l &= ~(THERM_INT_THRESHOLD0_ENABLE | THERM_INT_THRESHOLD1_ENABLE);
+	wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
 }
 
 static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
 {
-	__u64 msr_val;
 	int cpu = smp_processor_id();
-	int phy_id = topology_physical_package_id(cpu);
 	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
+	int phy_id = topology_physical_package_id(cpu);
 	bool notify = false;
 	unsigned long flags;
+	u64 msr_val, wr_val;
 
 	if (!pkgdev)
 		return;
@@ -320,14 +309,9 @@ static void pkg_temp_thermal_threshold_w
 	spin_unlock_irqrestore(&pkg_work_lock, flags);
 
 	rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
-	if (msr_val & THERM_LOG_THRESHOLD0) {
-		wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS,
-				msr_val & ~THERM_LOG_THRESHOLD0);
-		notify = true;
-	}
-	if (msr_val & THERM_LOG_THRESHOLD1) {
-		wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS,
-				msr_val & ~THERM_LOG_THRESHOLD1);
+	wr_val = msr_val & ~(THERM_LOG_THRESHOLD0 | THERM_LOG_THRESHOLD1);
+	if (wr_val != msr_val) {
+		wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS, wr_val);
 		notify = true;
 	}
 
@@ -340,11 +324,11 @@ static void pkg_temp_thermal_threshold_w
 	}
 }
 
-static int pkg_thermal_notify(__u64 msr_val)
+static int pkg_thermal_notify(u64 msr_val)
 {
-	unsigned long flags;
 	int cpu = smp_processor_id();
 	int phy_id = topology_physical_package_id(cpu);
+	unsigned long flags;
 
 	/*
 	* When a package is in interrupted state, all CPU's in that package
@@ -483,21 +467,17 @@ static int get_core_online(unsigned int
 	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-	/* Check if there is already an instance for this package */
-	if (!pkgdev) {
-		if (!cpu_has(c, X86_FEATURE_DTHERM) ||
-					!cpu_has(c, X86_FEATURE_PTS))
-			return -ENODEV;
-		if (pkg_temp_thermal_device_add(cpu))
-			return -ENODEV;
-	}
+	/* Paranoia check */
+	if (!cpu_has(c, X86_FEATURE_DTHERM) || !cpu_has(c, X86_FEATURE_PTS))
+		return -ENODEV;
 
 	INIT_DELAYED_WORK(&per_cpu(pkg_temp_thermal_threshold_work, cpu),
 			  pkg_temp_thermal_threshold_work_fn);
 
-	pr_debug("get_core_online: cpu %d successful\n", cpu);
-
-	return 0;
+	/* If the package exists, nothing to do */
+	if (pkgdev)
+		return 0;
+	return pkg_temp_thermal_device_add(cpu);
 }
 
 static void put_core_offline(unsigned int cpu)
@@ -555,8 +535,8 @@ static int __init pkg_temp_thermal_init(
 	platform_thermal_package_notify = pkg_thermal_notify;
 	platform_thermal_package_rate_control = pkg_thermal_rate_control;
 
-	pkg_temp_debugfs_init(); /* Don't care if fails */
-
+	 /* Don't care if it fails */
+	pkg_temp_debugfs_init();
 	return 0;
 
 err_ret:
@@ -566,6 +546,7 @@ static int __init pkg_temp_thermal_init(
 	kfree(pkg_work_scheduled);
 	return -ENODEV;
 }
+module_init(pkg_temp_thermal_init)
 
 static void __exit pkg_temp_thermal_exit(void)
 {
@@ -597,8 +578,6 @@ static void __exit pkg_temp_thermal_exit
 
 	debugfs_remove_recursive(debugfs);
 }
-
-module_init(pkg_temp_thermal_init)
 module_exit(pkg_temp_thermal_exit)
 
 MODULE_DESCRIPTION("X86 PKG TEMP Thermal Driver");

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 08/12] thermal/x86_pkg_temp: Sanitize locking
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (6 preceding siblings ...)
  2016-11-18  0:03 ` [patch 07/12] thermal/x86_pkg_temp: Cleanup code some more Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 09/12] thermal/x86_pkg_temp: Move work scheduled flag into package struct Thomas Gleixner
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Sanitize-locking.patch --]
[-- Type: text/plain, Size: 12275 bytes --]

The work cancellation code, the thermal zone unregistering, the work code
and the interrupt notification function are racy against each other and
against cpu hotplug and module exit. The random locking sprinkeled all
over the place does not help anything and probably exists to make people
feel good. The resulting issues (mainly use after free) are probably
hard to trigger, but they clearly exist

Protect the package list with a spinlock so it can be accessed from the
interrupt notifier and also from the work function. The add/removal code in
the hotplug callbacks take the lock for list manipulation. That makes sure
that on removal neither the interrupt notifier nor the work function can
access the about to be freed package structure anymore.

The thermal zone unregistering is another trainwreck. It's not serialized
against the work function. So unregistering the zone device can race with
the work function and cause havoc.

Protect the thermal zone with a mutex, which is held in the work
function to make sure that the zone device is not being unregistered
concurrently.

To solve the module exit issues, we simply invoke the cpu offline callback
and let it work its magic.

Use proper names for the locks so it's clear what they are for and add a
pile of comments to explain the protection rules.

It's amazing that fixing the locking and adding 30 lines of comments
explaining it still removes more lines than it adds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |  216 +++++++++++++++------------------
 1 file changed, 104 insertions(+), 112 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -73,7 +73,10 @@ static struct thermal_zone_params pkg_te
 
 /* List maintaining number of package instances */
 static LIST_HEAD(phy_dev_list);
-static DEFINE_MUTEX(phy_dev_list_mutex);
+/* Serializes interrupt notification, work and hotplug */
+static DEFINE_SPINLOCK(pkg_temp_lock);
+/* Protects zone operation in the work function against hotplug removal */
+static DEFINE_MUTEX(thermal_zone_mutex);
 
 /* Interrupt to work function schedule queue */
 static DEFINE_PER_CPU(struct delayed_work, pkg_temp_thermal_threshold_work);
@@ -81,8 +84,6 @@ static DEFINE_PER_CPU(struct delayed_wor
 /* To track if the work is already scheduled on a package */
 static u8 *pkg_work_scheduled;
 
-/* Spin lock to prevent races with pkg_work_scheduled */
-static spinlock_t pkg_work_lock;
 static u16 max_phy_id;
 
 /* Debug counters to show using debugfs */
@@ -115,21 +116,23 @@ static int pkg_temp_debugfs_init(void)
 	return -ENOENT;
 }
 
+/*
+ * Protection:
+ *
+ * - cpu hotplug: Read serialized by cpu hotplug lock
+ *		  Write must hold pkg_temp_lock
+ *
+ * - Other callsites: Must hold pkg_temp_lock
+ */
 static struct pkg_device *pkg_temp_thermal_get_dev(unsigned int cpu)
 {
 	u16 phys_proc_id = topology_physical_package_id(cpu);
 	struct pkg_device *pkgdev;
 
-	mutex_lock(&phy_dev_list_mutex);
-
-	list_for_each_entry(pkgdev, &phy_dev_list, list)
-		if (pkgdev->phys_proc_id == phys_proc_id) {
-			mutex_unlock(&phy_dev_list_mutex);
+	list_for_each_entry(pkgdev, &phy_dev_list, list) {
+		if (pkgdev->phys_proc_id == phys_proc_id)
 			return pkgdev;
-		}
-
-	mutex_unlock(&phy_dev_list_mutex);
-
+	}
 	return NULL;
 }
 
@@ -289,67 +292,66 @@ static inline void disable_pkg_thres_int
 
 static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
 {
-	int cpu = smp_processor_id();
-	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
-	int phy_id = topology_physical_package_id(cpu);
-	bool notify = false;
-	unsigned long flags;
+	struct thermal_zone_device *tzone = NULL;
+	int phy_id, cpu = smp_processor_id();
+	struct pkg_device *pkgdev;
 	u64 msr_val, wr_val;
 
-	if (!pkgdev)
-		return;
-
-	spin_lock_irqsave(&pkg_work_lock, flags);
+	mutex_lock(&thermal_zone_mutex);
+	spin_lock_irq(&pkg_temp_lock);
 	++pkg_work_cnt;
-	if (unlikely(phy_id > max_phy_id)) {
-		spin_unlock_irqrestore(&pkg_work_lock, flags);
+
+	pkgdev = pkg_temp_thermal_get_dev(cpu);
+	if (!pkgdev) {
+		spin_unlock_irq(&pkg_temp_lock);
+		mutex_unlock(&thermal_zone_mutex);
 		return;
 	}
+
 	pkg_work_scheduled[phy_id] = 0;
-	spin_unlock_irqrestore(&pkg_work_lock, flags);
 
 	rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
 	wr_val = msr_val & ~(THERM_LOG_THRESHOLD0 | THERM_LOG_THRESHOLD1);
 	if (wr_val != msr_val) {
 		wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS, wr_val);
-		notify = true;
+		tzone = pkgdev->tzone;
 	}
 
 	enable_pkg_thres_interrupt();
+	spin_unlock_irq(&pkg_temp_lock);
 
-	if (notify) {
-		pr_debug("thermal_zone_device_update\n");
-		thermal_zone_device_update(pkgdev->tzone,
-					   THERMAL_EVENT_UNSPECIFIED);
-	}
+	/*
+	 * If tzone is not NULL, then thermal_zone_mutex will prevent the
+	 * concurrent removal in the cpu offline callback.
+	 */
+	if (tzone)
+		thermal_zone_device_update(tzone, THERMAL_EVENT_UNSPECIFIED);
+
+	mutex_unlock(&thermal_zone_mutex);
 }
 
 static int pkg_thermal_notify(u64 msr_val)
 {
 	int cpu = smp_processor_id();
 	int phy_id = topology_physical_package_id(cpu);
+	struct pkg_device *pkgdev;
 	unsigned long flags;
 
-	/*
-	* When a package is in interrupted state, all CPU's in that package
-	* are in the same interrupt state. So scheduling on any one CPU in
-	* the package is enough and simply return for others.
-	*/
-	spin_lock_irqsave(&pkg_work_lock, flags);
+	spin_lock_irqsave(&pkg_temp_lock, flags);
 	++pkg_interrupt_cnt;
-	if (unlikely(phy_id > max_phy_id) || unlikely(!pkg_work_scheduled) ||
-			pkg_work_scheduled[phy_id]) {
-		disable_pkg_thres_interrupt();
-		spin_unlock_irqrestore(&pkg_work_lock, flags);
-		return -EINVAL;
-	}
-	pkg_work_scheduled[phy_id] = 1;
-	spin_unlock_irqrestore(&pkg_work_lock, flags);
 
 	disable_pkg_thres_interrupt();
-	schedule_delayed_work_on(cpu,
+
+	/* Work is per package, so scheduling it once is enough. */
+	pkgdev = pkg_temp_thermal_get_dev(cpu);
+	if (pkgdev && pkg_work_scheduled && !pkg_work_scheduled[phy_id]) {
+		pkg_work_scheduled[phy_id] = 1;
+		schedule_delayed_work_on(cpu,
 				&per_cpu(pkg_temp_thermal_threshold_work, cpu),
 				msecs_to_jiffies(notify_delay_ms));
+	}
+
+	spin_unlock_irqrestore(&pkg_temp_lock, flags);
 	return 0;
 }
 
@@ -373,29 +375,25 @@ static int pkg_temp_thermal_device_add(u
 
 	err = get_tj_max(cpu, &tj_max);
 	if (err)
-		goto err_ret;
-
-	mutex_lock(&phy_dev_list_mutex);
+		return err;
 
 	pkgdev = kzalloc(sizeof(*pkgdev), GFP_KERNEL);
-	if (!pkgdev) {
-		err = -ENOMEM;
-		goto err_ret_unlock;
-	}
+	if (!pkgdev)
+		return -ENOMEM;
 
-	spin_lock_irqsave(&pkg_work_lock, flags);
+	spin_lock_irqsave(&pkg_temp_lock, flags);
 	if (topology_physical_package_id(cpu) > max_phy_id)
 		max_phy_id = topology_physical_package_id(cpu);
 	temp = krealloc(pkg_work_scheduled,
 			(max_phy_id+1) * sizeof(u8), GFP_ATOMIC);
 	if (!temp) {
-		spin_unlock_irqrestore(&pkg_work_lock, flags);
-		err = -ENOMEM;
-		goto err_ret_free;
+		spin_unlock_irqrestore(&pkg_temp_lock, flags);
+		kfree(pkgdev);
+		return -ENOMEM;
 	}
 	pkg_work_scheduled = temp;
 	pkg_work_scheduled[topology_physical_package_id(cpu)] = 0;
-	spin_unlock_irqrestore(&pkg_work_lock, flags);
+	spin_unlock_irqrestore(&pkg_temp_lock, flags);
 
 	pkgdev->phys_proc_id = topology_physical_package_id(cpu);
 	pkgdev->cpu = cpu;
@@ -406,60 +404,78 @@ static int pkg_temp_thermal_device_add(u
 			pkgdev, &tzone_ops, &pkg_temp_tz_params, 0, 0);
 	if (IS_ERR(pkgdev->tzone)) {
 		err = PTR_ERR(pkgdev->tzone);
-		goto err_ret_free;
+		kfree(pkgdev);
+		return err;
 	}
 	/* Store MSR value for package thermal interrupt, to restore at exit */
 	rdmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
 		     &pkgdev->msr_pkg_therm_low,
 		     &pkgdev->msr_pkg_therm_high);
 
+	spin_lock_irq(&pkg_temp_lock);
 	list_add_tail(&pkgdev->list, &phy_dev_list);
-	pr_debug("pkg_temp_thermal_device_add :phy_id %d cpu %d\n",
-		 pkgdev->phys_proc_id, cpu);
-
-	mutex_unlock(&phy_dev_list_mutex);
-
+	spin_unlock_irq(&pkg_temp_lock);
 	return 0;
-
-err_ret_free:
-	kfree(pkgdev);
-err_ret_unlock:
-	mutex_unlock(&phy_dev_list_mutex);
-
-err_ret:
-	return err;
 }
 
-static int pkg_temp_thermal_device_remove(unsigned int cpu)
+static void put_core_offline(unsigned int cpu)
 {
+	int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
-	int target;
+	bool lastcpu;
 
 	if (!pkgdev)
-		return -ENODEV;
+		return;
 
-	mutex_lock(&phy_dev_list_mutex);
+	lastcpu = target >= nr_cpu_ids;
+	/*
+	 * Remove the sysfs files, if this is the last cpu in the package
+	 * before doing further cleanups.
+	 */
+	if (lastcpu) {
+		struct thermal_zone_device *tzone = pkgdev->tzone;
+
+		/*
+		 * We must protect against a work function calling
+		 * thermal_zone_update, after/while unregister. We null out
+		 * the pointer under the zone mutex, so the worker function
+		 * won't try to call.
+		 */
+		mutex_lock(&thermal_zone_mutex);
+		pkgdev->tzone = NULL;
+		mutex_unlock(&thermal_zone_mutex);
 
-	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
-	/* This might be the last cpu in this package */
-	if (target >= nr_cpu_ids) {
-		thermal_zone_device_unregister(pkgdev->tzone);
+		thermal_zone_device_unregister(tzone);
+	}
+
+	/*
+	 * If this is the last CPU in the package, restore the interrupt
+	 * MSR and remove the package reference from the array.
+	 */
+	if (lastcpu) {
+		/* Protect against work and interrupts */
+		spin_lock_irq(&pkg_temp_lock);
+		list_del(&pkgdev->list);
 		/*
-		 * Restore original MSR value for package thermal
-		 * interrupt.
+		 * After this point nothing touches the MSR anymore. We
+		 * must drop the lock to make the cross cpu call. This goes
+		 * away once we move that code to the hotplug state machine.
 		 */
+		spin_unlock_irq(&pkg_temp_lock);
 		wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
 			     pkgdev->msr_pkg_therm_low,
 			     pkgdev->msr_pkg_therm_high);
-		list_del(&pkgdev->list);
 		kfree(pkgdev);
-	} else if (pkgdev->cpu == cpu) {
-		pkgdev->cpu = target;
 	}
 
-	mutex_unlock(&phy_dev_list_mutex);
-
-	return 0;
+	/*
+	 * Note, this is broken when work was really scheduled on the
+	 * outgoing cpu because this will leave the work_scheduled flag set
+	 * and the thermal interrupts disabled. Will be fixed in the next
+	 * step as there is no way to fix it in a sane way with the per cpu
+	 * work nonsense.
+	 */
+	cancel_delayed_work_sync(&per_cpu(pkg_temp_thermal_threshold_work, cpu));
 }
 
 static int get_core_online(unsigned int cpu)
@@ -480,15 +496,6 @@ static int get_core_online(unsigned int
 	return pkg_temp_thermal_device_add(cpu);
 }
 
-static void put_core_offline(unsigned int cpu)
-{
-	if (!pkg_temp_thermal_device_remove(cpu))
-		cancel_delayed_work_sync(
-			&per_cpu(pkg_temp_thermal_threshold_work, cpu));
-
-	pr_debug("put_core_offline: cpu %d\n", cpu);
-}
-
 static int pkg_temp_thermal_cpu_callback(struct notifier_block *nfb,
 				 unsigned long action, void *hcpu)
 {
@@ -523,8 +530,6 @@ static int __init pkg_temp_thermal_init(
 	if (!x86_match_cpu(pkg_temp_thermal_ids))
 		return -ENODEV;
 
-	spin_lock_init(&pkg_work_lock);
-
 	cpu_notifier_register_begin();
 	for_each_online_cpu(i)
 		if (get_core_online(i))
@@ -550,7 +555,6 @@ module_init(pkg_temp_thermal_init)
 
 static void __exit pkg_temp_thermal_exit(void)
 {
-	struct pkg_device *pkgdev, *n;
 	int i;
 
 	platform_thermal_package_notify = NULL;
@@ -558,20 +562,8 @@ static void __exit pkg_temp_thermal_exit
 
 	cpu_notifier_register_begin();
 	__unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
-	mutex_lock(&phy_dev_list_mutex);
-	list_for_each_entry_safe(pkgdev, n, &phy_dev_list, list) {
-		/* Retore old MSR value for package thermal interrupt */
-		wrmsr_on_cpu(pkgdev->cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			     pkgdev->msr_pkg_therm_low,
-			     pkgdev->msr_pkg_therm_high);
-		thermal_zone_device_unregister(pkgdev->tzone);
-		list_del(&pkgdev->list);
-		kfree(pkgdev);
-	}
-	mutex_unlock(&phy_dev_list_mutex);
 	for_each_online_cpu(i)
-		cancel_delayed_work_sync(
-			&per_cpu(pkg_temp_thermal_threshold_work, i));
+		put_core_offline(i);
 	cpu_notifier_register_done();
 
 	kfree(pkg_work_scheduled);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 09/12] thermal/x86_pkg_temp: Move work scheduled flag into package struct
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (7 preceding siblings ...)
  2016-11-18  0:03 ` [patch 08/12] thermal/x86_pkg_temp: Sanitize locking Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 10/12] thermal/x86_pkg_temp: Move work " Thomas Gleixner
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Move-work-scheduled-flag-into-package-struct.patch --]
[-- Type: text/plain, Size: 3936 bytes --]

Storage for a boolean information whether work is scheduled for a package
is kept in separate allocated storage, which is resized when the number of
detected packages grows.

With the proper locking in place this is a completely pointless exercise
because we can simply stick it into the per package struct.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   35 ++++-----------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -61,6 +61,7 @@ struct pkg_device {
 	struct list_head		list;
 	u16				phys_proc_id;
 	u16				cpu;
+	bool				work_scheduled;
 	u32				tj_max;
 	u32				msr_pkg_therm_low;
 	u32				msr_pkg_therm_high;
@@ -81,11 +82,6 @@ static DEFINE_MUTEX(thermal_zone_mutex);
 /* Interrupt to work function schedule queue */
 static DEFINE_PER_CPU(struct delayed_work, pkg_temp_thermal_threshold_work);
 
-/* To track if the work is already scheduled on a package */
-static u8 *pkg_work_scheduled;
-
-static u16 max_phy_id;
-
 /* Debug counters to show using debugfs */
 static struct dentry *debugfs;
 static unsigned int pkg_interrupt_cnt;
@@ -293,7 +289,7 @@ static inline void disable_pkg_thres_int
 static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
 {
 	struct thermal_zone_device *tzone = NULL;
-	int phy_id, cpu = smp_processor_id();
+	int cpu = smp_processor_id();
 	struct pkg_device *pkgdev;
 	u64 msr_val, wr_val;
 
@@ -307,8 +303,7 @@ static void pkg_temp_thermal_threshold_w
 		mutex_unlock(&thermal_zone_mutex);
 		return;
 	}
-
-	pkg_work_scheduled[phy_id] = 0;
+	pkgdev->work_scheduled = false;
 
 	rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
 	wr_val = msr_val & ~(THERM_LOG_THRESHOLD0 | THERM_LOG_THRESHOLD1);
@@ -333,7 +328,6 @@ static void pkg_temp_thermal_threshold_w
 static int pkg_thermal_notify(u64 msr_val)
 {
 	int cpu = smp_processor_id();
-	int phy_id = topology_physical_package_id(cpu);
 	struct pkg_device *pkgdev;
 	unsigned long flags;
 
@@ -344,8 +338,8 @@ static int pkg_thermal_notify(u64 msr_va
 
 	/* Work is per package, so scheduling it once is enough. */
 	pkgdev = pkg_temp_thermal_get_dev(cpu);
-	if (pkgdev && pkg_work_scheduled && !pkg_work_scheduled[phy_id]) {
-		pkg_work_scheduled[phy_id] = 1;
+	if (pkgdev && !pkgdev->work_scheduled) {
+		pkgdev->work_scheduled = true;
 		schedule_delayed_work_on(cpu,
 				&per_cpu(pkg_temp_thermal_threshold_work, cpu),
 				msecs_to_jiffies(notify_delay_ms));
@@ -360,8 +354,6 @@ static int pkg_temp_thermal_device_add(u
 	u32 tj_max, eax, ebx, ecx, edx;
 	struct pkg_device *pkgdev;
 	int thres_count, err;
-	unsigned long flags;
-	u8 *temp;
 
 	cpuid(6, &eax, &ebx, &ecx, &edx);
 	thres_count = ebx & 0x07;
@@ -381,20 +373,6 @@ static int pkg_temp_thermal_device_add(u
 	if (!pkgdev)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&pkg_temp_lock, flags);
-	if (topology_physical_package_id(cpu) > max_phy_id)
-		max_phy_id = topology_physical_package_id(cpu);
-	temp = krealloc(pkg_work_scheduled,
-			(max_phy_id+1) * sizeof(u8), GFP_ATOMIC);
-	if (!temp) {
-		spin_unlock_irqrestore(&pkg_temp_lock, flags);
-		kfree(pkgdev);
-		return -ENOMEM;
-	}
-	pkg_work_scheduled = temp;
-	pkg_work_scheduled[topology_physical_package_id(cpu)] = 0;
-	spin_unlock_irqrestore(&pkg_temp_lock, flags);
-
 	pkgdev->phys_proc_id = topology_physical_package_id(cpu);
 	pkgdev->cpu = cpu;
 	pkgdev->tj_max = tj_max;
@@ -548,7 +526,6 @@ static int __init pkg_temp_thermal_init(
 	for_each_online_cpu(i)
 		put_core_offline(i);
 	cpu_notifier_register_done();
-	kfree(pkg_work_scheduled);
 	return -ENODEV;
 }
 module_init(pkg_temp_thermal_init)
@@ -566,8 +543,6 @@ static void __exit pkg_temp_thermal_exit
 		put_core_offline(i);
 	cpu_notifier_register_done();
 
-	kfree(pkg_work_scheduled);
-
 	debugfs_remove_recursive(debugfs);
 }
 module_exit(pkg_temp_thermal_exit)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 10/12] thermal/x86_pkg_temp: Move work into package struct
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (8 preceding siblings ...)
  2016-11-18  0:03 ` [patch 09/12] thermal/x86_pkg_temp: Move work scheduled flag into package struct Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 11/12] thermal/x86_pkg_temp: Sanitize package management Thomas Gleixner
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Move-work-into-package-struct.patch --]
[-- Type: text/plain, Size: 5901 bytes --]

Delayed work structs are held in a static percpu storage, which makes no
sense at all because work is strictly per package and we never schedule
more than one work per package.

Aside of that the work cancelation in the hotplug is broken when the work
is queued on the outgoing cpu and canceled. Nothing reschedules the work on
another online cpu in the package, so the interrupts stay disabled and the
work_scheduled flag stays active.

Move the delayed work struct into the package struct, which is the only
sensible place to have it.

To simplify the cancelation logic schedule the work always on the cpu which
is the target for the sysfs files. This is required so the cancelation
logic in the cpu offline path cancels only when the outgoing cpu is the
current target and reschedule the work when there is still a online
CPU in the package.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   73 +++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 21 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -65,6 +65,7 @@ struct pkg_device {
 	u32				tj_max;
 	u32				msr_pkg_therm_low;
 	u32				msr_pkg_therm_high;
+	struct delayed_work		work;
 	struct thermal_zone_device	*tzone;
 };
 
@@ -79,9 +80,6 @@ static DEFINE_SPINLOCK(pkg_temp_lock);
 /* Protects zone operation in the work function against hotplug removal */
 static DEFINE_MUTEX(thermal_zone_mutex);
 
-/* Interrupt to work function schedule queue */
-static DEFINE_PER_CPU(struct delayed_work, pkg_temp_thermal_threshold_work);
-
 /* Debug counters to show using debugfs */
 static struct dentry *debugfs;
 static unsigned int pkg_interrupt_cnt;
@@ -325,6 +323,13 @@ static void pkg_temp_thermal_threshold_w
 	mutex_unlock(&thermal_zone_mutex);
 }
 
+static void pkg_thermal_schedule_work(int cpu, struct delayed_work *work)
+{
+	unsigned long ms = msecs_to_jiffies(notify_delay_ms);
+
+	schedule_delayed_work_on(cpu, work, ms);
+}
+
 static int pkg_thermal_notify(u64 msr_val)
 {
 	int cpu = smp_processor_id();
@@ -340,9 +345,7 @@ static int pkg_thermal_notify(u64 msr_va
 	pkgdev = pkg_temp_thermal_get_dev(cpu);
 	if (pkgdev && !pkgdev->work_scheduled) {
 		pkgdev->work_scheduled = true;
-		schedule_delayed_work_on(cpu,
-				&per_cpu(pkg_temp_thermal_threshold_work, cpu),
-				msecs_to_jiffies(notify_delay_ms));
+		pkg_thermal_schedule_work(pkgdev->cpu, &pkgdev->work);
 	}
 
 	spin_unlock_irqrestore(&pkg_temp_lock, flags);
@@ -373,6 +376,7 @@ static int pkg_temp_thermal_device_add(u
 	if (!pkgdev)
 		return -ENOMEM;
 
+	INIT_DELAYED_WORK(&pkgdev->work, pkg_temp_thermal_threshold_work_fn);
 	pkgdev->phys_proc_id = topology_physical_package_id(cpu);
 	pkgdev->cpu = cpu;
 	pkgdev->tj_max = tj_max;
@@ -400,7 +404,7 @@ static void put_core_offline(unsigned in
 {
 	int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
-	bool lastcpu;
+	bool lastcpu, was_target;
 
 	if (!pkgdev)
 		return;
@@ -426,13 +430,24 @@ static void put_core_offline(unsigned in
 		thermal_zone_device_unregister(tzone);
 	}
 
+	/* Protect against work and interrupts */
+	spin_lock_irq(&pkg_temp_lock);
+
 	/*
-	 * If this is the last CPU in the package, restore the interrupt
-	 * MSR and remove the package reference from the array.
+	 * Check whether this cpu was the current target and store the new
+	 * one. When we drop the lock, then the interrupt notify function
+	 * will see the new target.
+	 */
+	was_target = pkgdev->cpu == cpu;
+	pkgdev->cpu = target;
+
+	/*
+	 * If this is the last CPU in the package remove the package
+	 * reference from the list and restore the interrupt MSR. When we
+	 * drop the lock neither the interrupt notify function nor the
+	 * worker will see the package anymore.
 	 */
 	if (lastcpu) {
-		/* Protect against work and interrupts */
-		spin_lock_irq(&pkg_temp_lock);
 		list_del(&pkgdev->list);
 		/*
 		 * After this point nothing touches the MSR anymore. We
@@ -443,17 +458,36 @@ static void put_core_offline(unsigned in
 		wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
 			     pkgdev->msr_pkg_therm_low,
 			     pkgdev->msr_pkg_therm_high);
-		kfree(pkgdev);
+		spin_lock_irq(&pkg_temp_lock);
 	}
 
 	/*
-	 * Note, this is broken when work was really scheduled on the
-	 * outgoing cpu because this will leave the work_scheduled flag set
-	 * and the thermal interrupts disabled. Will be fixed in the next
-	 * step as there is no way to fix it in a sane way with the per cpu
-	 * work nonsense.
+	 * Check whether there is work scheduled and whether the work is
+	 * targeted at the outgoing CPU.
 	 */
-	cancel_delayed_work_sync(&per_cpu(pkg_temp_thermal_threshold_work, cpu));
+	if (pkgdev->work_scheduled && was_target) {
+		/*
+		 * To cancel the work we need to drop the lock, otherwise
+		 * we might deadlock if the work needs to be flushed.
+		 */
+		spin_unlock_irq(&pkg_temp_lock);
+		cancel_delayed_work_sync(&pkgdev->work);
+		spin_lock_irq(&pkg_temp_lock);
+		/*
+		 * If this is not the last cpu in the package and the work
+		 * did not run after we dropped the lock above, then we
+		 * need to reschedule the work, otherwise the interrupt
+		 * stays disabled forever.
+		 */
+		if (!lastcpu && pkgdev->work_scheduled)
+			pkg_thermal_schedule_work(target, &pkgdev->work);
+	}
+
+	spin_unlock_irq(&pkg_temp_lock);
+
+	/* Final cleanup if this is the last cpu */
+	if (lastcpu)
+		kfree(pkgdev);
 }
 
 static int get_core_online(unsigned int cpu)
@@ -465,9 +499,6 @@ static int get_core_online(unsigned int
 	if (!cpu_has(c, X86_FEATURE_DTHERM) || !cpu_has(c, X86_FEATURE_PTS))
 		return -ENODEV;
 
-	INIT_DELAYED_WORK(&per_cpu(pkg_temp_thermal_threshold_work, cpu),
-			  pkg_temp_thermal_threshold_work_fn);
-
 	/* If the package exists, nothing to do */
 	if (pkgdev)
 		return 0;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 11/12] thermal/x86_pkg_temp: Sanitize package management
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (9 preceding siblings ...)
  2016-11-18  0:03 ` [patch 10/12] thermal/x86_pkg_temp: Move work " Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-18  0:03 ` [patch 12/12] thermal/x86 pkg temp: Convert to hotplug state machine Thomas Gleixner
  2016-11-21 20:02 ` [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Pandruvada, Srinivas
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov

[-- Attachment #1: thermal-x86_pkg_temp--Sanitize-package-management.patch --]
[-- Type: text/plain, Size: 4568 bytes --]

Packages are kept in a list, which must be searched over and over.

We can be smarter than that and just store the package pointers in an array
which is allocated at init time. Sizing of the array is determined from the
topology information. That makes the package search a simple array lookup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   42 +++++++++++++++++----------------
 1 file changed, 22 insertions(+), 20 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -54,13 +54,9 @@ MODULE_PARM_DESC(notify_delay_ms,
 * is some wrong values returned by cpuid for number of thresholds.
 */
 #define MAX_NUMBER_OF_TRIPS	2
-/* Limit number of package temp zones */
-#define MAX_PKG_TEMP_ZONE_IDS	256
 
 struct pkg_device {
-	struct list_head		list;
-	u16				phys_proc_id;
-	u16				cpu;
+	int				cpu;
 	bool				work_scheduled;
 	u32				tj_max;
 	u32				msr_pkg_therm_low;
@@ -73,8 +69,10 @@ static struct thermal_zone_params pkg_te
 	.no_hwmon	= true,
 };
 
-/* List maintaining number of package instances */
-static LIST_HEAD(phy_dev_list);
+/* Keep track of how many package pointers we allocated in init() */
+static int max_packages __read_mostly;
+/* Array of package pointers */
+static struct pkg_device **packages;
 /* Serializes interrupt notification, work and hotplug */
 static DEFINE_SPINLOCK(pkg_temp_lock);
 /* Protects zone operation in the work function against hotplug removal */
@@ -120,13 +118,10 @@ static int pkg_temp_debugfs_init(void)
  */
 static struct pkg_device *pkg_temp_thermal_get_dev(unsigned int cpu)
 {
-	u16 phys_proc_id = topology_physical_package_id(cpu);
-	struct pkg_device *pkgdev;
+	int pkgid = topology_logical_package_id(cpu);
 
-	list_for_each_entry(pkgdev, &phy_dev_list, list) {
-		if (pkgdev->phys_proc_id == phys_proc_id)
-			return pkgdev;
-	}
+	if (pkgid >= 0 && pkgid < max_packages)
+		return packages[pkgid];
 	return NULL;
 }
 
@@ -354,18 +349,19 @@ static int pkg_thermal_notify(u64 msr_va
 
 static int pkg_temp_thermal_device_add(unsigned int cpu)
 {
+	int pkgid = topology_logical_package_id(cpu);
 	u32 tj_max, eax, ebx, ecx, edx;
 	struct pkg_device *pkgdev;
 	int thres_count, err;
 
+	if (pkgid >= max_packages)
+		return -ENOMEM;
+
 	cpuid(6, &eax, &ebx, &ecx, &edx);
 	thres_count = ebx & 0x07;
 	if (!thres_count)
 		return -ENODEV;
 
-	if (topology_physical_package_id(cpu) > MAX_PKG_TEMP_ZONE_IDS)
-		return -ENODEV;
-
 	thres_count = clamp_val(thres_count, 0, MAX_NUMBER_OF_TRIPS);
 
 	err = get_tj_max(cpu, &tj_max);
@@ -377,7 +373,6 @@ static int pkg_temp_thermal_device_add(u
 		return -ENOMEM;
 
 	INIT_DELAYED_WORK(&pkgdev->work, pkg_temp_thermal_threshold_work_fn);
-	pkgdev->phys_proc_id = topology_physical_package_id(cpu);
 	pkgdev->cpu = cpu;
 	pkgdev->tj_max = tj_max;
 	pkgdev->tzone = thermal_zone_device_register("x86_pkg_temp",
@@ -395,7 +390,7 @@ static int pkg_temp_thermal_device_add(u
 		     &pkgdev->msr_pkg_therm_high);
 
 	spin_lock_irq(&pkg_temp_lock);
-	list_add_tail(&pkgdev->list, &phy_dev_list);
+	packages[pkgid] = pkgdev;
 	spin_unlock_irq(&pkg_temp_lock);
 	return 0;
 }
@@ -443,12 +438,12 @@ static void put_core_offline(unsigned in
 
 	/*
 	 * If this is the last CPU in the package remove the package
-	 * reference from the list and restore the interrupt MSR. When we
+	 * reference from the array and restore the interrupt MSR. When we
 	 * drop the lock neither the interrupt notify function nor the
 	 * worker will see the package anymore.
 	 */
 	if (lastcpu) {
-		list_del(&pkgdev->list);
+		packages[topology_logical_package_id(cpu)] = NULL;
 		/*
 		 * After this point nothing touches the MSR anymore. We
 		 * must drop the lock to make the cross cpu call. This goes
@@ -539,6 +534,11 @@ static int __init pkg_temp_thermal_init(
 	if (!x86_match_cpu(pkg_temp_thermal_ids))
 		return -ENODEV;
 
+	max_packages = topology_max_packages();
+	packages = kzalloc(max_packages * sizeof(struct pkg_device *), GFP_KERNEL);
+	if (!packages)
+		return -ENOMEM;
+
 	cpu_notifier_register_begin();
 	for_each_online_cpu(i)
 		if (get_core_online(i))
@@ -557,6 +557,7 @@ static int __init pkg_temp_thermal_init(
 	for_each_online_cpu(i)
 		put_core_offline(i);
 	cpu_notifier_register_done();
+	kfree(packages);
 	return -ENODEV;
 }
 module_init(pkg_temp_thermal_init)
@@ -575,6 +576,7 @@ static void __exit pkg_temp_thermal_exit
 	cpu_notifier_register_done();
 
 	debugfs_remove_recursive(debugfs);
+	kfree(packages);
 }
 module_exit(pkg_temp_thermal_exit)
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 12/12] thermal/x86 pkg temp: Convert to hotplug state machine
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (10 preceding siblings ...)
  2016-11-18  0:03 ` [patch 11/12] thermal/x86_pkg_temp: Sanitize package management Thomas Gleixner
@ 2016-11-18  0:03 ` Thomas Gleixner
  2016-11-21 20:02 ` [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Pandruvada, Srinivas
  12 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:03 UTC (permalink / raw)
  To: LKML
  Cc: Zhang Rui, Eduardo Valentin, linux-pm, Peter Zijlstra, x86, rt,
	Borislav Petkov, Sebastian Andrzej Siewior

[-- Attachment #1: thermal-x86_pkg_temp_Convert_to_hotplug_state_machine.patch --]
[-- Type: text/plain, Size: 5388 bytes --]

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Replace the wrmsr/rdmrs_on_cpu() calls in the hotplug callbacks as they are
guaranteed to be invoked on the incoming/outgoing cpu.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c |   81 +++++++++------------------------
 1 file changed, 24 insertions(+), 57 deletions(-)

--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -78,6 +78,9 @@ static DEFINE_SPINLOCK(pkg_temp_lock);
 /* Protects zone operation in the work function against hotplug removal */
 static DEFINE_MUTEX(thermal_zone_mutex);
 
+/* The dynamically assigned cpu hotplug state for module_exit() */
+static enum cpuhp_state pkg_thermal_hp_state __read_mostly;
+
 /* Debug counters to show using debugfs */
 static struct dentry *debugfs;
 static unsigned int pkg_interrupt_cnt;
@@ -385,9 +388,8 @@ static int pkg_temp_thermal_device_add(u
 		return err;
 	}
 	/* Store MSR value for package thermal interrupt, to restore at exit */
-	rdmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
-		     &pkgdev->msr_pkg_therm_low,
-		     &pkgdev->msr_pkg_therm_high);
+	rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, pkgdev->msr_pkg_therm_low,
+	      pkgdev->msr_pkg_therm_high);
 
 	spin_lock_irq(&pkg_temp_lock);
 	packages[pkgid] = pkgdev;
@@ -395,16 +397,17 @@ static int pkg_temp_thermal_device_add(u
 	return 0;
 }
 
-static void put_core_offline(unsigned int cpu)
+static int pkg_thermal_cpu_offline(unsigned int cpu)
 {
 	int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
 	bool lastcpu, was_target;
 
 	if (!pkgdev)
-		return;
+		return 0;
 
 	lastcpu = target >= nr_cpu_ids;
+
 	/*
 	 * Remove the sysfs files, if this is the last cpu in the package
 	 * before doing further cleanups.
@@ -444,16 +447,9 @@ static void put_core_offline(unsigned in
 	 */
 	if (lastcpu) {
 		packages[topology_logical_package_id(cpu)] = NULL;
-		/*
-		 * After this point nothing touches the MSR anymore. We
-		 * must drop the lock to make the cross cpu call. This goes
-		 * away once we move that code to the hotplug state machine.
-		 */
-		spin_unlock_irq(&pkg_temp_lock);
-		wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			     pkgdev->msr_pkg_therm_low,
-			     pkgdev->msr_pkg_therm_high);
-		spin_lock_irq(&pkg_temp_lock);
+		/* After this point nothing touches the MSR anymore. */
+		wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
+		      pkgdev->msr_pkg_therm_low, pkgdev->msr_pkg_therm_high);
 	}
 
 	/*
@@ -483,9 +479,10 @@ static void put_core_offline(unsigned in
 	/* Final cleanup if this is the last cpu */
 	if (lastcpu)
 		kfree(pkgdev);
+	return 0;
 }
 
-static int get_core_online(unsigned int cpu)
+static int pkg_thermal_cpu_online(unsigned int cpu)
 {
 	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
@@ -500,27 +497,6 @@ static int get_core_online(unsigned int
 	return pkg_temp_thermal_device_add(cpu);
 }
 
-static int pkg_temp_thermal_cpu_callback(struct notifier_block *nfb,
-				 unsigned long action, void *hcpu)
-{
-	unsigned int cpu = (unsigned long) hcpu;
-
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_ONLINE:
-	case CPU_DOWN_FAILED:
-		get_core_online(cpu);
-		break;
-	case CPU_DOWN_PREPARE:
-		put_core_offline(cpu);
-		break;
-	}
-	return NOTIFY_OK;
-}
-
-static struct notifier_block pkg_temp_thermal_notifier __refdata = {
-	.notifier_call = pkg_temp_thermal_cpu_callback,
-};
-
 static const struct x86_cpu_id __initconst pkg_temp_thermal_ids[] = {
 	{ X86_VENDOR_INTEL, X86_FAMILY_ANY, X86_MODEL_ANY, X86_FEATURE_PTS },
 	{}
@@ -529,7 +505,7 @@ MODULE_DEVICE_TABLE(x86cpu, pkg_temp_the
 
 static int __init pkg_temp_thermal_init(void)
 {
-	int i;
+	int ret;
 
 	if (!x86_match_cpu(pkg_temp_thermal_ids))
 		return -ENODEV;
@@ -539,12 +515,13 @@ static int __init pkg_temp_thermal_init(
 	if (!packages)
 		return -ENOMEM;
 
-	cpu_notifier_register_begin();
-	for_each_online_cpu(i)
-		if (get_core_online(i))
-			goto err_ret;
-	__register_hotcpu_notifier(&pkg_temp_thermal_notifier);
-	cpu_notifier_register_done();
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "thermal/x86_pkg:online",
+				pkg_thermal_cpu_online,	pkg_thermal_cpu_offline);
+	if (ret < 0)
+		goto err;
+
+	/* Store the state for module exit */
+	pkg_thermal_hp_state = ret;
 
 	platform_thermal_package_notify = pkg_thermal_notify;
 	platform_thermal_package_rate_control = pkg_thermal_rate_control;
@@ -553,28 +530,18 @@ static int __init pkg_temp_thermal_init(
 	pkg_temp_debugfs_init();
 	return 0;
 
-err_ret:
-	for_each_online_cpu(i)
-		put_core_offline(i);
-	cpu_notifier_register_done();
+err:
 	kfree(packages);
-	return -ENODEV;
+	return ret;
 }
 module_init(pkg_temp_thermal_init)
 
 static void __exit pkg_temp_thermal_exit(void)
 {
-	int i;
-
 	platform_thermal_package_notify = NULL;
 	platform_thermal_package_rate_control = NULL;
 
-	cpu_notifier_register_begin();
-	__unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
-	for_each_online_cpu(i)
-		put_core_offline(i);
-	cpu_notifier_register_done();
-
+	cpuhp_remove_state(pkg_thermal_hp_state);
 	debugfs_remove_recursive(debugfs);
 	kfree(packages);
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck
  2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
                   ` (11 preceding siblings ...)
  2016-11-18  0:03 ` [patch 12/12] thermal/x86 pkg temp: Convert to hotplug state machine Thomas Gleixner
@ 2016-11-21 20:02 ` Pandruvada, Srinivas
  2016-11-21 21:34   ` Thomas Gleixner
  12 siblings, 1 reply; 17+ messages in thread
From: Pandruvada, Srinivas @ 2016-11-21 20:02 UTC (permalink / raw)
  To: tglx, linux-kernel; +Cc: Zhang, Rui, edubezval, peterz, bp, linux-pm, x86, rt

On Fri, 2016-11-18 at 00:03 +0000, Thomas Gleixner wrote:
> We solely intended to convert that driver to the hotplug state
> machine and
> stumbled over a large pile of insanities, which are all interwoven
> with the
> package management:

Thanks Thomas for fixes.
But I did a very simple test on 4.9.0-rc5 on a client machine, just
rmmod and read zone, it will cause crash. I have not tested on a multi-
socket system, which is also required where the real test of  last cpu
on a package goes offline can be tested.

$ uname -a
Linux xxx 4.9.0-rc5+ #47 SMP Mon Nov 21 11:30:56 PST 2016 x86_64 x86_64
x86_64 GNU/Linux

$ pwd
/sys/devices/virtual/thermal/thermal_zone8
$ cat type 
x86_pkg_temp
$ cat temp 
43000
$ lsmod | grep x86
x86_pkg_temp_thermal    16384  0
aes_x86_64             20480  1 aesni_intel
$ sudo rmmod x86_pkg_temp_thermal
$ cd ../thermal_zone8

Zone is not removed on rmmod!!!, so any read on zone will cause crash

$ ls
available_policies  k_d   k_pu	  power      sustainable_power	
trip_point_0_type  type
emul_temp	    k_i   offset  slope      temp		trip_
point_1_temp  uevent
integral_cutoff     k_po  policy  subsystem  trip_point_0_temp	t
rip_point_1_type
spandruvada@spandruv-mobl2:/sys/devices/virtual/thermal/thermal_zone8$
cat temp 
Killed
spandruvada@spandruv-mobl2:/sys/devices/virtual/thermal/thermal_zone8$ 




[  116.726989] BUG: unable to handle kernel paging request at
ffffffffa07b8010
[  116.727103] IP: [<ffffffff81677cdd>]
thermal_zone_get_temp+0x3d/0x100
[  116.727192] PGD 2e0d067 
[  116.727224] PUD 2e0e063 
[  116.727259] PMD 14266c067 
[  116.727278] PTE 0

[  116.727328] Oops: 0000 [#1] SMP
[  116.727368] Modules linked in: rfcomm vmw_vsock_vmci_transport vsock
vmw_vmci asix usbnet mii binfmt_misc bnep hid_sensor_rotation
hid_sensor_als hid_sensor_incl_3d hid_sensor_accel_3d
hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger
hid_sensor_custom joydev hid_sensor_iio_common pn544_mei hid_rmi
mei_phy hid_multitouch pn544 hci hid_sensor_hub nfc intel_rapl
nls_iso8859_1 uvcvideo coretemp crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel videobuf2_vmalloc videobuf2_memops aesni_intel
videobuf2_v4l2 option aes_x86_64 usb_wwan videobuf2_core lrw usbserial
gf128mul glue_helper ablk_helper videodev btusb cryptd btrtl btbcm
iwlwifi btintel serio_raw bluetooth intel_pch_thermal cfg80211
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_intel battery snd_hda_codec snd_hda_core
[  116.728433]  winbond_cir int3403_thermal rc_core snd_hwdep
snd_seq_midi snd_pcm snd_seq_midi_event mac_hid snd_rawmidi dw_dmac
int3402_thermal snd_seq 8250_dw i2c_designware_platform snd_seq_device
spi_pxa2xx_platform i2c_designware_core snd_timer int3400_thermal
acpi_thermal_rel snd mei_me processor_thermal_device mei
int340x_thermal_zone lpc_ich intel_soc_dts_iosf soundcore nfsd
auth_rpcgss kvm_intel nfs_acl lockd kvm grace irqbypass sunrpc cuse
parport_pc ppdev lp parport autofs4 hid_generic usbhid i915
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops drm ahci libahci video i2c_hid hid sdhci_acpi sdhci [last
unloaded: x86_pkg_temp_thermal]
[  116.729342] CPU: 1 PID: 2431 Comm: cat Not tainted 4.9.0-rc5+ #47
[  116.729413] Hardware name: Intel Corporation Shark Bay Client
platform/Harris Beach SDS, BIOS HSWLPTU1.86C.0133.R00.1309172123
09/17/2013
[  116.729547] task: ffff880124288000 task.stack: ffffc90000cf4000
[  116.729614] RIP: 0010:[<ffffffff81677cdd>]  [<ffffffff81677cdd>]
thermal_zone_get_temp+0x3d/0x100
[  116.729721] RSP: 0018:ffffc90000cf7cd8  EFLAGS: 00010287
[  116.729783] RAX: 00000000ffffffea RBX: ffff880114de2000 RCX:
ffff88010f89d4c0
[  116.729862] RDX: ffffffffa07b8000 RSI: ffffc90000cf7d1c RDI:
ffff880114c5a000
[  116.729942] RBP: ffffc90000cf7d08 R08: 0000000000000000 R09:
0000000000000000
[  116.730021] R10: 000000001cffdcff R11: 0000000000000001 R12:
ffff880114c5a028
[  116.730101] R13: ffffffff81a7aab0 R14: ffff880113a53800 R15:
ffff88012fdcc200
[  116.730182] FS:  00007f40bf57e700(0000) GS:ffff88014f280000(0000)
knlGS:0000000000000000
[  116.730272] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  116.730337] CR2: ffffffffa07b8010 CR3: 0000000010130000 CR4:
00000000001406e0
[  116.730418] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  116.730497] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  116.730576] Stack:
[  116.730603]  0000000000000282 ffffffff7fffffff ffff880114de2000
ffff880114c5a028
[  116.736697]  ffffffff81a7aab0 ffff880113a53800 ffffc90000cf7d28
ffffffff8167826e
[  116.742680]  0000000000000246 ffffffff81f19e20 ffffc90000cf7d48
ffffffff8154c910
[  116.748607] Call Trace:
[  116.754349]  [<ffffffff8167826e>] temp_show+0x1e/0x40
[  116.759705]  [<ffffffff8154c910>] dev_attr_show+0x20/0x50
[  116.764841]  [<ffffffff812a50c1>] ? sysfs_file_ops+0x41/0x60
[  116.769056]  [<ffffffff812a5411>] sysfs_kf_seq_show+0xc1/0x120
[  116.773282]  [<ffffffff812a39f6>] kernfs_seq_show+0x26/0x30
[  116.777457]  [<ffffffff8124995f>] seq_read+0xcf/0x380
[  116.781649]  [<ffffffff812a4a4b>] kernfs_fop_read+0x11b/0x1a0
[  116.785727]  [<ffffffff81221968>] __vfs_read+0x28/0x110
[  116.789796]  [<ffffffff813526ab>] ?
security_file_permission+0x9b/0xc0
[  116.793605]  [<ffffffff81221dee>] ? rw_verify_area+0x4e/0xb0
[  116.795640]  [<ffffffff81221ee3>] vfs_read+0x93/0x130
[  116.797212]  [<ffffffff812232e9>] SyS_read+0x49/0xa0
[  116.798715]  [<ffffffff81003bfe>] do_syscall_64+0x6e/0x180
[  116.800224]  [<ffffffff8181c234>]
entry_SYSCALL64_slow_path+0x25/0x25
[  116.801709] Code: 41 54 53 48 83 ec 10 48 85 ff c7 45 d8 ff ff ff 7f
0f 84 a3 00 00 00 48 81 ff 00 f0 ff ff 0f 87 96 00 00 00 48 8b 97 a0 04
00 00 <48> 83 7a 10 00 0f 84 84 00 00 00 4c 8d b7 30 05 00 00 48 89 fb 
[  116.803334] RIP  [<ffffffff81677cdd>]
thermal_zone_get_temp+0x3d/0x100
[  116.804877]  RSP <ffffc90000cf7cd8>
[  116.806378] CR2: ffffffffa07b8010
[  116.815516] ---[ end trace 593a281d67d382bb ]---


Thanks,
Srinivas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck
  2016-11-21 20:02 ` [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Pandruvada, Srinivas
@ 2016-11-21 21:34   ` Thomas Gleixner
  2016-11-21 23:16     ` Pandruvada, Srinivas
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-21 21:34 UTC (permalink / raw)
  To: Pandruvada, Srinivas
  Cc: linux-kernel, Zhang, Rui, edubezval, peterz, bp, linux-pm, x86, rt

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote:

> On Fri, 2016-11-18 at 00:03 +0000, Thomas Gleixner wrote:
> > We solely intended to convert that driver to the hotplug state
> > machine and
> > stumbled over a large pile of insanities, which are all interwoven
> > with the
> > package management:
> 
> Thanks Thomas for fixes.
> But I did a very simple test on 4.9.0-rc5 on a client machine, just
> rmmod and read zone, it will cause crash. I have not tested on a multi-
> socket system, which is also required where the real test of  last cpu
> on a package goes offline can be tested.

Stupid me. I tested putting a socket offline, which works, but did not
check what happens on module removal. Delta fix below. That needs to be
folded into the series as the wreckage already happens before the last
patch.

Thanks,

	tglx

8<--------------------
--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -63,6 +63,7 @@ struct pkg_device {
 	u32				msr_pkg_therm_high;
 	struct delayed_work		work;
 	struct thermal_zone_device	*tzone;
+	struct cpumask			cpumask;
 };
 
 static struct thermal_zone_params pkg_temp_tz_params = {
@@ -391,6 +392,7 @@ static int pkg_temp_thermal_device_add(u
 	rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, pkgdev->msr_pkg_therm_low,
 	      pkgdev->msr_pkg_therm_high);
 
+	cpumask_set_cpu(cpu, &pkgdev->cpumask);
 	spin_lock_irq(&pkg_temp_lock);
 	packages[pkgid] = pkgdev;
 	spin_unlock_irq(&pkg_temp_lock);
@@ -399,13 +401,15 @@ static int pkg_temp_thermal_device_add(u
 
 static int pkg_thermal_cpu_offline(unsigned int cpu)
 {
-	int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 	struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
 	bool lastcpu, was_target;
+	int target;
 
 	if (!pkgdev)
 		return 0;
 
+	target = cpumask_any_but(&pkgdev->cpumask, cpu);
+	cpumask_clear_cpu(cpu, &pkgdev->cpumask);
 	lastcpu = target >= nr_cpu_ids;
 
 	/*
@@ -492,8 +496,10 @@ static int pkg_thermal_cpu_online(unsign
 		return -ENODEV;
 
 	/* If the package exists, nothing to do */
-	if (pkgdev)
+	if (pkgdev) {
+		cpumask_set_cpu(cpu, &pkgdev->cpumask);
 		return 0;
+	}
 	return pkg_temp_thermal_device_add(cpu);
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck
  2016-11-21 21:34   ` Thomas Gleixner
@ 2016-11-21 23:16     ` Pandruvada, Srinivas
  2016-11-22  9:05       ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Pandruvada, Srinivas @ 2016-11-21 23:16 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, peterz, Zhang, Rui, rt, edubezval, linux-pm, x86, bp

On Mon, 2016-11-21 at 22:34 +0100, Thomas Gleixner wrote:
> On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote:

[...]

> Stupid me. I tested putting a socket offline, which works, but did
> not
> check what happens on module removal. Delta fix below. That needs to
> be
> folded into the series as the wreckage already happens before the
> last
> patch.

Your change below fixes the crash issue. Now I tested a case where the
last cpu offlined from a package, it removed thermal zone and added
zone back once any cpu from the package onlined. So this is working.

I want to try to run some workload on those cpu to bump up the
temperature and check interrupts. I am hitting some issue unrelated to
this change may be. I onlined three cpus from the package 1.

[189443.567728] smpboot: Booting Node 1 Processor 15 APIC 0x2e
[189656.625947] smpboot: Booting Node 1 Processor 8 APIC 0x20
[189829.545851] smpboot: Booting Node 1 Processor 24 APIC 0x21

But I can't schedule anything on those CPUs. For example now can't run
turbostat, it complains
"
turbostat: re-initialized with num_cpus 19
Could not migrate to CPU 8
"

Same with

#taskset 0x100 stress -c 1
taskset: failed to set pid 0's affinity: Invalid argument

I am on the latest linux-pm/linux-next tree on this server. I will
switch to latest main line and try.

Thanks,
Srinivas



8<--------------------
--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -63,6 +63,7 @@ struct pkg_device {
        u32                             msr_pkg_therm_high;
        struct delayed_work             work;
        struct thermal_zone_device      *tzone;
+       struct cpumask                  cpumask;
 };
 
 static struct thermal_zone_params pkg_temp_tz_params = {
@@ -391,6 +392,7 @@ static int pkg_temp_thermal_device_add(u
        rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, pkgdev-
>msr_pkg_therm_low,
              pkgdev->msr_pkg_therm_high);
 
+       cpumask_set_cpu(cpu, &pkgdev->cpumask);
        spin_lock_irq(&pkg_temp_lock);
        packages[pkgid] = pkgdev;
        spin_unlock_irq(&pkg_temp_lock);
@@ -399,13 +401,15 @@ static int pkg_temp_thermal_device_add(u
 
 static int pkg_thermal_cpu_offline(unsigned int cpu)
 {
-       int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
        struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
        bool lastcpu, was_target;
+       int target;
 
        if (!pkgdev)
                return 0;
 
+       target = cpumask_any_but(&pkgdev->cpumask, cpu);
+       cpumask_clear_cpu(cpu, &pkgdev->cpumask);
        lastcpu = target >= nr_cpu_ids;
 
        /*
@@ -492,8 +496,10 @@ static int pkg_thermal_cpu_online(unsign
                return -ENODEV;
 
        /* If the package exists, nothing to do */
-       if (pkgdev)
+       if (pkgdev) {
+               cpumask_set_cpu(cpu, &pkgdev->cpumask);
                return 0;
+       }
        return pkg_temp_thermal_device_add(cpu);
 }
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck
  2016-11-21 23:16     ` Pandruvada, Srinivas
@ 2016-11-22  9:05       ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-11-22  9:05 UTC (permalink / raw)
  To: Pandruvada, Srinivas
  Cc: linux-kernel, peterz, Zhang, Rui, rt, edubezval, linux-pm, x86, bp

On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote:
> Your change below fixes the crash issue. Now I tested a case where the
> last cpu offlined from a package, it removed thermal zone and added
> zone back once any cpu from the package onlined. So this is working.
> 
> I want to try to run some workload on those cpu to bump up the
> temperature and check interrupts. I am hitting some issue unrelated to
> this change may be. I onlined three cpus from the package 1.
> 
> [189443.567728] smpboot: Booting Node 1 Processor 15 APIC 0x2e
> [189656.625947] smpboot: Booting Node 1 Processor 8 APIC 0x20
> [189829.545851] smpboot: Booting Node 1 Processor 24 APIC 0x21
> 
> But I can't schedule anything on those CPUs. For example now can't run
> turbostat, it complains
> "
> turbostat: re-initialized with num_cpus 19
> Could not migrate to CPU 8
> "
> 
> Same with
> 
> #taskset 0x100 stress -c 1
> taskset: failed to set pid 0's affinity: Invalid argument
> 
> I am on the latest linux-pm/linux-next tree on this server. I will
> switch to latest main line and try.

That must be something unrelated. I can use turbostat and taskset after
doing the above.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-11-22  9:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-18  0:03 [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thomas Gleixner
2016-11-18  0:03 ` [patch 01/12] thermal/x86_pkg_temp: Cleanup thermal interrupt handling Thomas Gleixner
2016-11-18  0:03 ` [patch 02/12] thermal/x86_pkg_temp: Remove redundant package search Thomas Gleixner
2016-11-18  0:03 ` [patch 03/12] thermal/x86_pkg_temp: Replace open coded cpu search Thomas Gleixner
2016-11-18  0:03 ` [patch 04/12] thermal/x86_pkg_temp: Sanitize callback (de)initialization Thomas Gleixner
2016-11-18  0:03 ` [patch 05/12] thermal/x86_pkg_temp: Get rid of ref counting Thomas Gleixner
2016-11-18  0:03 ` [patch 06/12] thermal/x86_pkg_temp: Cleanup namespace Thomas Gleixner
2016-11-18  0:03 ` [patch 07/12] thermal/x86_pkg_temp: Cleanup code some more Thomas Gleixner
2016-11-18  0:03 ` [patch 08/12] thermal/x86_pkg_temp: Sanitize locking Thomas Gleixner
2016-11-18  0:03 ` [patch 09/12] thermal/x86_pkg_temp: Move work scheduled flag into package struct Thomas Gleixner
2016-11-18  0:03 ` [patch 10/12] thermal/x86_pkg_temp: Move work " Thomas Gleixner
2016-11-18  0:03 ` [patch 11/12] thermal/x86_pkg_temp: Sanitize package management Thomas Gleixner
2016-11-18  0:03 ` [patch 12/12] thermal/x86 pkg temp: Convert to hotplug state machine Thomas Gleixner
2016-11-21 20:02 ` [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Pandruvada, Srinivas
2016-11-21 21:34   ` Thomas Gleixner
2016-11-21 23:16     ` Pandruvada, Srinivas
2016-11-22  9:05       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.