linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
@ 2016-11-22 17:42 Thomas Gleixner
  2016-11-22 17:42 ` [patch 2/6] hwmon/coretemp: Simplify sibling management Thomas Gleixner
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Thomas Gleixner @ 2016-11-22 17:42 UTC (permalink / raw)
  To: LKML
  Cc: Fenghua Yu, Jean Delvare, Guenter Roeck, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

After the first attempt to convert the coretemp driver to the hotplug state
machine failed, we had a deeper look and went a bit farther.

The driver has quite some interesting concepts vs. the package, core and
sysfs file management and a bug in the package temperature sysfs interface
vs. cpu hotplug.

The following series fixes that bug and simplifies the package/core
management and at the end converts it to the hotplug state machine.

Along with the source size the binary size shrinks as well:
   text	   data	    bss	    dec	    hex	
   4068	   360       20    4448    1160	  Before
   3801	   180       36    4017     fb1	  After

Thanks,

	tglx
-----
 coretemp.c |  321 +++++++++++++++++++++----------------------------------------
 1 file changed, 113 insertions(+), 208 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 1/6] hwmon/coretemp: Fixup target cpu for package when cpu is offlined
  2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
  2016-11-22 17:42 ` [patch 2/6] hwmon/coretemp: Simplify sibling management Thomas Gleixner
@ 2016-11-22 17:42 ` Thomas Gleixner
  2016-11-22 17:42 ` [patch 3/6] hwmon/coretemp: Avoid redundant lookups Thomas Gleixner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2016-11-22 17:42 UTC (permalink / raw)
  To: LKML
  Cc: Fenghua Yu, Jean Delvare, Guenter Roeck, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

[-- Attachment #1: hwmon-coretemp--Fixup-target-cpu-for-package-when-cpu-is-offlined.patch --]
[-- Type: text/plain, Size: 3441 bytes --]

When a CPU is offlined nothing checks whether it is the target CPU for the
package temperature sysfs interface.

As a consequence all future readouts of the package temperature return
crap:

# cat temp1_input
90000

which is Tjmax of that package.

Check whether the outgoing CPU is the target for the package and assign it
to some other still online CPU in the package. Protect the change against
the rdmsr_on_cpu() in show_crit_alarm().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/hwmon/coretemp.c |   31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -51,6 +51,7 @@ static int force_tjmax;
 module_param_named(tjmax, force_tjmax, int, 0444);
 MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
 
+#define PKG_SYSFS_ATTR_NO	1	/* Sysfs attribute for package temp */
 #define BASE_SYSFS_ATTR_NO	2	/* Sysfs Base attr no for coretemp */
 #define NUM_REAL_CORES		128	/* Number of Real cores per cpu */
 #define CORETEMP_NAME_LENGTH	19	/* String Length of attrs */
@@ -138,7 +139,9 @@ static ssize_t show_crit_alarm(struct de
 	struct platform_data *pdata = dev_get_drvdata(dev);
 	struct temp_data *tdata = pdata->core_data[attr->index];
 
+	mutex_lock(&tdata->update_lock);
 	rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax, &edx);
+	mutex_unlock(&tdata->update_lock);
 
 	return sprintf(buf, "%d\n", (eax >> 5) & 1);
 }
@@ -483,7 +486,7 @@ static int create_core_data(struct platf
 	 * The attr number is always core id + 2
 	 * The Pkgtemp will always show up as temp1_*, if available
 	 */
-	attr_no = pkg_flag ? 1 : TO_ATTR_NO(cpu);
+	attr_no = pkg_flag ? PKG_SYSFS_ATTR_NO : TO_ATTR_NO(cpu);
 
 	if (attr_no > MAX_CORE_DATA - 1)
 		return -ERANGE;
@@ -662,7 +665,7 @@ static void coretemp_device_remove(unsig
 	mutex_unlock(&pdev_list_mutex);
 }
 
-static bool is_any_core_online(struct platform_data *pdata)
+static int get_online_core_in_package(struct platform_data *pdata)
 {
 	int i;
 
@@ -670,10 +673,10 @@ static bool is_any_core_online(struct pl
 	for (i = MAX_CORE_DATA - 1; i >= 0; --i) {
 		if (pdata->core_data[i] &&
 			!pdata->core_data[i]->is_pkg_data) {
-			return true;
+			return pdata->core_data[i]->cpu;
 		}
 	}
-	return false;
+	return nr_cpu_ids;
 }
 
 static void get_core_online(unsigned int cpu)
@@ -720,9 +723,10 @@ static void get_core_online(unsigned int
 
 static void put_core_offline(unsigned int cpu)
 {
-	int i, indx;
-	struct platform_data *pdata;
 	struct platform_device *pdev = coretemp_get_pdev(cpu);
+	struct platform_data *pdata;
+	struct temp_data *tdata;
+	int i, indx, target;
 
 	/* If the physical CPU device does not exist, just return */
 	if (!pdev)
@@ -762,8 +766,21 @@ static void put_core_offline(unsigned in
 	 * which in turn calls coretemp_remove. This removes the
 	 * pkgtemp entry and does other clean ups.
 	 */
-	if (!is_any_core_online(pdata))
+	target = get_online_core_in_package(pdata);
+	if (target >= nr_cpu_ids) {
 		coretemp_device_remove(cpu);
+		return;
+	}
+	/*
+	 * Check whether this core is the target for the package
+	 * interface. We need to assign it to some other cpu.
+	 */
+	tdata = pdata->core_data[PKG_SYSFS_ATTR_NO];
+	if (tdata && tdata->cpu == cpu) {
+		mutex_lock(&tdata->update_lock);
+		tdata->cpu = target;
+		mutex_unlock(&tdata->update_lock);
+	}
 }
 
 static int coretemp_cpu_callback(struct notifier_block *nfb,

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 2/6] hwmon/coretemp: Simplify sibling management
  2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
@ 2016-11-22 17:42 ` Thomas Gleixner
  2016-11-22 17:42 ` [patch 1/6] hwmon/coretemp: Fixup target cpu for package when cpu is offlined Thomas Gleixner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2016-11-22 17:42 UTC (permalink / raw)
  To: LKML
  Cc: Fenghua Yu, Jean Delvare, Guenter Roeck, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

[-- Attachment #1: hwmon-coretemp--Simplify-sibling-management.patch --]
[-- Type: text/plain, Size: 5725 bytes --]

The coretemp driver provides a sysfs interface per physical core. If
hyperthreading is enabled and one of the siblings goes offline the sysfs
interface is removed and then immeditately created again for the
sibling. The only difference of them is the target cpu for the
rdmsr_on_cpu() in the sysfs show functions.

It's way simpler to keep a cpumask of cpus which are active in a package
and only remove the interface when the last sibling goes offline. Otherwise
just move the target cpu for the sysfs show functions to the still online
sibling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/hwmon/coretemp.c |   95 ++++++++++++++++++-----------------------------
 1 file changed, 38 insertions(+), 57 deletions(-)

--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -103,9 +103,10 @@ struct temp_data {
 
 /* Platform Data per Physical CPU */
 struct platform_data {
-	struct device *hwmon_dev;
-	u16 phys_proc_id;
-	struct temp_data *core_data[MAX_CORE_DATA];
+	struct device		*hwmon_dev;
+	u16			phys_proc_id;
+	struct cpumask		cpumask;
+	struct temp_data	*core_data[MAX_CORE_DATA];
 	struct device_attribute name_attr;
 };
 
@@ -491,16 +492,6 @@ static int create_core_data(struct platf
 	if (attr_no > MAX_CORE_DATA - 1)
 		return -ERANGE;
 
-	/*
-	 * Provide a single set of attributes for all HT siblings of a core
-	 * to avoid duplicate sensors (the processor ID and core ID of all
-	 * HT siblings of a core are the same).
-	 * Skip if a HT sibling of this core is already registered.
-	 * This is not an error.
-	 */
-	if (pdata->core_data[attr_no] != NULL)
-		return 0;
-
 	tdata = init_temp_data(cpu, pkg_flag);
 	if (!tdata)
 		return -ENOMEM;
@@ -665,24 +656,11 @@ static void coretemp_device_remove(unsig
 	mutex_unlock(&pdev_list_mutex);
 }
 
-static int get_online_core_in_package(struct platform_data *pdata)
-{
-	int i;
-
-	/* Find online cores, except pkgtemp data */
-	for (i = MAX_CORE_DATA - 1; i >= 0; --i) {
-		if (pdata->core_data[i] &&
-			!pdata->core_data[i]->is_pkg_data) {
-			return pdata->core_data[i]->cpu;
-		}
-	}
-	return nr_cpu_ids;
-}
-
 static void get_core_online(unsigned int cpu)
 {
-	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	struct platform_device *pdev = coretemp_get_pdev(cpu);
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	struct platform_data *pdata;
 	int err;
 
 	/*
@@ -707,6 +685,8 @@ static void get_core_online(unsigned int
 		err = coretemp_device_add(cpu);
 		if (err)
 			return;
+
+		pdev = coretemp_get_pdev(cpu);
 		/*
 		 * Check whether pkgtemp support is available.
 		 * If so, add interfaces for pkgtemp.
@@ -714,60 +694,60 @@ static void get_core_online(unsigned int
 		if (cpu_has(c, X86_FEATURE_PTS))
 			coretemp_add_core(cpu, 1);
 	}
+
+	pdata = platform_get_drvdata(pdev);
 	/*
-	 * Physical CPU device already exists.
-	 * So, just add interfaces for this core.
+	 * Check whether a thread sibling is already online. If not add the
+	 * interface for this CPU core.
 	 */
-	coretemp_add_core(cpu, 0);
+	if (!cpumask_intersects(&pdata->cpumask, topology_sibling_cpumask(cpu)))
+		coretemp_add_core(cpu, 0);
+
+	cpumask_set_cpu(cpu, &pdata->cpumask);
 }
 
 static void put_core_offline(unsigned int cpu)
 {
 	struct platform_device *pdev = coretemp_get_pdev(cpu);
-	struct platform_data *pdata;
+	struct platform_data *pd;
 	struct temp_data *tdata;
-	int i, indx, target;
+	int indx, target;
 
 	/* If the physical CPU device does not exist, just return */
 	if (!pdev)
 		return;
 
-	pdata = platform_get_drvdata(pdev);
-
-	indx = TO_ATTR_NO(cpu);
-
 	/* The core id is too big, just return */
+	indx = TO_ATTR_NO(cpu);
 	if (indx > MAX_CORE_DATA - 1)
 		return;
 
-	if (pdata->core_data[indx] && pdata->core_data[indx]->cpu == cpu)
-		coretemp_remove_core(pdata, indx);
+	pd = platform_get_drvdata(pdev);
+	tdata = pd->core_data[indx];
+
+	cpumask_clear_cpu(cpu, &pd->cpumask);
 
 	/*
-	 * If a HT sibling of a core is taken offline, but another HT sibling
-	 * of the same core is still online, register the alternate sibling.
-	 * This ensures that exactly one set of attributes is provided as long
-	 * as at least one HT sibling of a core is online.
-	 */
-	for_each_sibling(i, cpu) {
-		if (i != cpu) {
-			get_core_online(i);
-			/*
-			 * Display temperature sensor data for one HT sibling
-			 * per core only, so abort the loop after one such
-			 * sibling has been found.
-			 */
-			break;
-		}
+	 * If this is the last thread sibling, remove the CPU core
+	 * interface, If there is still a sibling online, transfer the
+	 * target cpu of that core interface to it.
+	 */
+	target = cpumask_any_and(&pd->cpumask, topology_sibling_cpumask(cpu));
+	if (target >= nr_cpu_ids) {
+		coretemp_remove_core(pd, indx);
+	} else if (tdata && tdata->cpu == cpu) {
+		mutex_lock(&tdata->update_lock);
+		tdata->cpu = target;
+		mutex_unlock(&tdata->update_lock);
 	}
+
 	/*
 	 * If all cores in this pkg are offline, remove the device.
 	 * coretemp_device_remove calls unregister_platform_device,
 	 * which in turn calls coretemp_remove. This removes the
 	 * pkgtemp entry and does other clean ups.
 	 */
-	target = get_online_core_in_package(pdata);
-	if (target >= nr_cpu_ids) {
+	if (cpumask_empty(&pd->cpumask)) {
 		coretemp_device_remove(cpu);
 		return;
 	}
@@ -775,8 +755,9 @@ static void put_core_offline(unsigned in
 	 * Check whether this core is the target for the package
 	 * interface. We need to assign it to some other cpu.
 	 */
-	tdata = pdata->core_data[PKG_SYSFS_ATTR_NO];
+	tdata = pd->core_data[PKG_SYSFS_ATTR_NO];
 	if (tdata && tdata->cpu == cpu) {
+		target = cpumask_first(&pd->cpumask);
 		mutex_lock(&tdata->update_lock);
 		tdata->cpu = target;
 		mutex_unlock(&tdata->update_lock);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 3/6] hwmon/coretemp: Avoid redundant lookups
  2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
  2016-11-22 17:42 ` [patch 2/6] hwmon/coretemp: Simplify sibling management Thomas Gleixner
  2016-11-22 17:42 ` [patch 1/6] hwmon/coretemp: Fixup target cpu for package when cpu is offlined Thomas Gleixner
@ 2016-11-22 17:42 ` Thomas Gleixner
  2016-11-22 17:42 ` [patch 4/6] [PREEMPT-RT] hwmon/coretemp: Convert to hotplug state machine Thomas Gleixner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2016-11-22 17:42 UTC (permalink / raw)
  To: LKML
  Cc: Fenghua Yu, Jean Delvare, Guenter Roeck, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

[-- Attachment #1: hwmon-coretemp--Avoid-redundant-lookups.patch --]
[-- Type: text/plain, Size: 1532 bytes --]

No point in looking up the same thing over and over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/hwmon/coretemp.c |   19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -533,21 +533,14 @@ static int create_core_data(struct platf
 	return err;
 }
 
-static void coretemp_add_core(unsigned int cpu, int pkg_flag)
+static void
+coretemp_add_core(struct platform_device *pdev, unsigned int cpu, int pkg_flag)
 {
-	struct platform_device *pdev = coretemp_get_pdev(cpu);
-	int err;
-
-	if (!pdev)
-		return;
-
-	err = create_core_data(pdev, cpu, pkg_flag);
-	if (err)
+	if (create_core_data(pdev, cpu, pkg_flag))
 		dev_err(&pdev->dev, "Adding Core %u failed\n", cpu);
 }
 
-static void coretemp_remove_core(struct platform_data *pdata,
-				 int indx)
+static void coretemp_remove_core(struct platform_data *pdata, int indx)
 {
 	struct temp_data *tdata = pdata->core_data[indx];
 
@@ -692,7 +685,7 @@ static void get_core_online(unsigned int
 		 * If so, add interfaces for pkgtemp.
 		 */
 		if (cpu_has(c, X86_FEATURE_PTS))
-			coretemp_add_core(cpu, 1);
+			coretemp_add_core(pdev, cpu, 1);
 	}
 
 	pdata = platform_get_drvdata(pdev);
@@ -701,7 +694,7 @@ static void get_core_online(unsigned int
 	 * interface for this CPU core.
 	 */
 	if (!cpumask_intersects(&pdata->cpumask, topology_sibling_cpumask(cpu)))
-		coretemp_add_core(cpu, 0);
+		coretemp_add_core(pdev, cpu, 0);
 
 	cpumask_set_cpu(cpu, &pdata->cpumask);
 }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 4/6] [PREEMPT-RT] hwmon/coretemp: Convert to hotplug state machine
  2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-11-22 17:42 ` [patch 3/6] hwmon/coretemp: Avoid redundant lookups Thomas Gleixner
@ 2016-11-22 17:42 ` Thomas Gleixner
  2016-11-22 17:42 ` [patch 5/6] hwmon/coretemp: Use proper error codes in cpu online callback Thomas Gleixner
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2016-11-22 17:42 UTC (permalink / raw)
  To: LKML
  Cc: Fenghua Yu, Jean Delvare, Guenter Roeck, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86, rt

[-- Attachment #1: hwmon-coretemp--Convert_to_hotplug_state_machine.patch --]
[-- Type: text/plain, Size: 5043 bytes --]

Install the callbacks via the state machine. Setup and teardown are handled
by the hotplug core.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-hwmon@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: rt@linuxtronix.de
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20161117183541.8588-5-bigeasy@linutronix.de

---
 drivers/hwmon/coretemp.c |   86 +++++++++++++++--------------------------------
 1 file changed, 29 insertions(+), 57 deletions(-)

--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -649,7 +649,7 @@ static void coretemp_device_remove(unsig
 	mutex_unlock(&pdev_list_mutex);
 }
 
-static void get_core_online(unsigned int cpu)
+static int coretemp_cpu_online(unsigned int cpu)
 {
 	struct platform_device *pdev = coretemp_get_pdev(cpu);
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
@@ -662,12 +662,12 @@ static void get_core_online(unsigned int
 	 * without thermal sensors will be filtered out.
 	 */
 	if (!cpu_has(c, X86_FEATURE_DTHERM))
-		return;
+		return 0;
 
 	if (!pdev) {
 		/* Check the microcode version of the CPU */
 		if (chk_ucode_version(cpu))
-			return;
+			return 0;
 
 		/*
 		 * Alright, we have DTS support.
@@ -677,7 +677,7 @@ static void get_core_online(unsigned int
 		 */
 		err = coretemp_device_add(cpu);
 		if (err)
-			return;
+			return 0;
 
 		pdev = coretemp_get_pdev(cpu);
 		/*
@@ -697,9 +697,10 @@ static void get_core_online(unsigned int
 		coretemp_add_core(pdev, cpu, 0);
 
 	cpumask_set_cpu(cpu, &pdata->cpumask);
+	return 0;
 }
 
-static void put_core_offline(unsigned int cpu)
+static int coretemp_cpu_offline(unsigned int cpu)
 {
 	struct platform_device *pdev = coretemp_get_pdev(cpu);
 	struct platform_data *pd;
@@ -708,12 +709,12 @@ static void put_core_offline(unsigned in
 
 	/* If the physical CPU device does not exist, just return */
 	if (!pdev)
-		return;
+		return 0;
 
 	/* The core id is too big, just return */
 	indx = TO_ATTR_NO(cpu);
 	if (indx > MAX_CORE_DATA - 1)
-		return;
+		return 0;
 
 	pd = platform_get_drvdata(pdev);
 	tdata = pd->core_data[indx];
@@ -742,7 +743,7 @@ static void put_core_offline(unsigned in
 	 */
 	if (cpumask_empty(&pd->cpumask)) {
 		coretemp_device_remove(cpu);
-		return;
+		return 0;
 	}
 	/*
 	 * Check whether this core is the target for the package
@@ -755,38 +756,19 @@ static void put_core_offline(unsigned in
 		tdata->cpu = target;
 		mutex_unlock(&tdata->update_lock);
 	}
+	return 0;
 }
-
-static int coretemp_cpu_callback(struct notifier_block *nfb,
-				 unsigned long action, void *hcpu)
-{
-	unsigned int cpu = (unsigned long) hcpu;
-
-	switch (action) {
-	case CPU_ONLINE:
-	case CPU_DOWN_FAILED:
-		get_core_online(cpu);
-		break;
-	case CPU_DOWN_PREPARE:
-		put_core_offline(cpu);
-		break;
-	}
-	return NOTIFY_OK;
-}
-
-static struct notifier_block coretemp_cpu_notifier __refdata = {
-	.notifier_call = coretemp_cpu_callback,
-};
-
 static const struct x86_cpu_id __initconst coretemp_ids[] = {
 	{ X86_VENDOR_INTEL, X86_FAMILY_ANY, X86_MODEL_ANY, X86_FEATURE_DTHERM },
 	{}
 };
 MODULE_DEVICE_TABLE(x86cpu, coretemp_ids);
 
+static enum cpuhp_state coretemp_hp_online;
+
 static int __init coretemp_init(void)
 {
-	int i, err;
+	int err;
 
 	/*
 	 * CPUID.06H.EAX[0] indicates whether the CPU has thermal
@@ -798,52 +780,42 @@ static int __init coretemp_init(void)
 
 	err = platform_driver_register(&coretemp_driver);
 	if (err)
-		goto exit;
+		return err;
 
-	cpu_notifier_register_begin();
-	for_each_online_cpu(i)
-		get_core_online(i);
+	get_online_cpus();
+	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hwmon/coretemp:online",
+				coretemp_cpu_online, coretemp_cpu_offline);
+	if (err < 0)
+		goto exit_driver_unreg;
+	coretemp_hp_online = err;
 
 #ifndef CONFIG_HOTPLUG_CPU
 	if (list_empty(&pdev_list)) {
-		cpu_notifier_register_done();
 		err = -ENODEV;
-		goto exit_driver_unreg;
+		goto exit_hp_unreg;
 	}
 #endif
-
-	__register_hotcpu_notifier(&coretemp_cpu_notifier);
-	cpu_notifier_register_done();
+	put_online_cpus();
 	return 0;
 
 #ifndef CONFIG_HOTPLUG_CPU
+exit_hp_unreg:
+	cpuhp_remove_state(coretemp_hp_online);
+	put_online_cpus();
+#endif
 exit_driver_unreg:
 	platform_driver_unregister(&coretemp_driver);
-#endif
-exit:
 	return err;
 }
+module_init(coretemp_init)
 
 static void __exit coretemp_exit(void)
 {
-	struct pdev_entry *p, *n;
-
-	cpu_notifier_register_begin();
-	__unregister_hotcpu_notifier(&coretemp_cpu_notifier);
-	mutex_lock(&pdev_list_mutex);
-	list_for_each_entry_safe(p, n, &pdev_list, list) {
-		platform_device_unregister(p->pdev);
-		list_del(&p->list);
-		kfree(p);
-	}
-	mutex_unlock(&pdev_list_mutex);
-	cpu_notifier_register_done();
+	cpuhp_remove_state(coretemp_hp_online);
 	platform_driver_unregister(&coretemp_driver);
 }
+module_exit(coretemp_exit)
 
 MODULE_AUTHOR("Rudolf Marek <r.marek@assembler.cz>");
 MODULE_DESCRIPTION("Intel Core temperature monitor");
 MODULE_LICENSE("GPL");
-
-module_init(coretemp_init)
-module_exit(coretemp_exit)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 5/6] hwmon/coretemp: Use proper error codes in cpu online callback
  2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-11-22 17:42 ` [patch 4/6] [PREEMPT-RT] hwmon/coretemp: Convert to hotplug state machine Thomas Gleixner
@ 2016-11-22 17:42 ` Thomas Gleixner
  2016-11-22 17:42 ` [patch 6/6] hwmon/coretemp: Simplify package management Thomas Gleixner
  2016-11-23 15:28 ` [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Guenter Roeck
  6 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2016-11-22 17:42 UTC (permalink / raw)
  To: LKML
  Cc: Fenghua Yu, Jean Delvare, Guenter Roeck, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

[-- Attachment #1: hwmon-coretemp--Use-proper-error-codes-in-cpu-online-callback.patch --]
[-- Type: text/plain, Size: 2118 bytes --]

The cpu online callback returns success unconditionally even when the
device has no support, micro code mismatches or device allocation fails.
Only if CPU_HOTPLUG is disabled, the init function checks whether the
device list is empty and removes the driver.

This does not make sense. If CPU HOTPLUG is enabled then there is no point
to keep the driver around when it failed to initialize on the already
online cpus. The chance that not yet online CPUs will provide a functional
interface later is very close to zero.

Add proper error return codes, so the setup of the cpu hotplug states fails
when the device cannot be initialized and remove all the magic cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/hwmon/coretemp.c |   24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -662,12 +662,12 @@ static int coretemp_cpu_online(unsigned
 	 * without thermal sensors will be filtered out.
 	 */
 	if (!cpu_has(c, X86_FEATURE_DTHERM))
-		return 0;
+		return -ENODEV;
 
 	if (!pdev) {
 		/* Check the microcode version of the CPU */
 		if (chk_ucode_version(cpu))
-			return 0;
+			return -EINVAL;
 
 		/*
 		 * Alright, we have DTS support.
@@ -677,7 +677,7 @@ static int coretemp_cpu_online(unsigned
 		 */
 		err = coretemp_device_add(cpu);
 		if (err)
-			return 0;
+			return err;
 
 		pdev = coretemp_get_pdev(cpu);
 		/*
@@ -782,28 +782,14 @@ static int __init coretemp_init(void)
 	if (err)
 		return err;
 
-	get_online_cpus();
 	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hwmon/coretemp:online",
 				coretemp_cpu_online, coretemp_cpu_offline);
 	if (err < 0)
-		goto exit_driver_unreg;
+		goto outdrv;
 	coretemp_hp_online = err;
-
-#ifndef CONFIG_HOTPLUG_CPU
-	if (list_empty(&pdev_list)) {
-		err = -ENODEV;
-		goto exit_hp_unreg;
-	}
-#endif
-	put_online_cpus();
 	return 0;
 
-#ifndef CONFIG_HOTPLUG_CPU
-exit_hp_unreg:
-	cpuhp_remove_state(coretemp_hp_online);
-	put_online_cpus();
-#endif
-exit_driver_unreg:
+outdrv:
 	platform_driver_unregister(&coretemp_driver);
 	return err;
 }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 6/6] hwmon/coretemp: Simplify package management
  2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
                   ` (4 preceding siblings ...)
  2016-11-22 17:42 ` [patch 5/6] hwmon/coretemp: Use proper error codes in cpu online callback Thomas Gleixner
@ 2016-11-22 17:42 ` Thomas Gleixner
  2016-11-23 15:28 ` [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Guenter Roeck
  6 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2016-11-22 17:42 UTC (permalink / raw)
  To: LKML
  Cc: Fenghua Yu, Jean Delvare, Guenter Roeck, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

[-- Attachment #1: hwmon-coretemp--Simplify-package-management.patch --]
[-- Type: text/plain, Size: 7074 bytes --]

Keeping track of the per package platform devices requires an extra object,
which is held in a linked list.

The maximum number of packages is known at init() time. So the extra object
and linked list management can be replaced by an array of platform device
pointers in which the per package devices pointers can be stored. Lookup
becomes a simple array lookup instead of a list walk.

The mutex protecting the list can be removed as well because the array is
only accessed from cpu hotplug callbacks which are already serialized.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/hwmon/coretemp.c |  120 ++++++++++++++---------------------------------
 1 file changed, 38 insertions(+), 82 deletions(-)

--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -59,7 +59,6 @@ MODULE_PARM_DESC(tjmax, "TjMax value in
 #define TOTAL_ATTRS		(MAX_CORE_ATTRS + 1)
 #define MAX_CORE_DATA		(NUM_REAL_CORES + BASE_SYSFS_ATTR_NO)
 
-#define TO_PHYS_ID(cpu)		(cpu_data(cpu).phys_proc_id)
 #define TO_CORE_ID(cpu)		(cpu_data(cpu).cpu_core_id)
 #define TO_ATTR_NO(cpu)		(TO_CORE_ID(cpu) + BASE_SYSFS_ATTR_NO)
 
@@ -104,20 +103,16 @@ struct temp_data {
 /* Platform Data per Physical CPU */
 struct platform_data {
 	struct device		*hwmon_dev;
-	u16			phys_proc_id;
+	u16			pkg_id;
 	struct cpumask		cpumask;
 	struct temp_data	*core_data[MAX_CORE_DATA];
 	struct device_attribute name_attr;
 };
 
-struct pdev_entry {
-	struct list_head list;
-	struct platform_device *pdev;
-	u16 phys_proc_id;
-};
-
-static LIST_HEAD(pdev_list);
-static DEFINE_MUTEX(pdev_list_mutex);
+/* Keep track of how many package pointers we allocated in init() */
+static int max_packages __read_mostly;
+/* Array of package pointers. Serialized by cpu hotplug lock */
+static struct platform_device **pkg_devices;
 
 static ssize_t show_label(struct device *dev,
 				struct device_attribute *devattr, char *buf)
@@ -127,7 +122,7 @@ static ssize_t show_label(struct device
 	struct temp_data *tdata = pdata->core_data[attr->index];
 
 	if (tdata->is_pkg_data)
-		return sprintf(buf, "Physical id %u\n", pdata->phys_proc_id);
+		return sprintf(buf, "Package id %u\n", pdata->pkg_id);
 
 	return sprintf(buf, "Core %u\n", tdata->cpu_core_id);
 }
@@ -439,18 +434,10 @@ static int chk_ucode_version(unsigned in
 
 static struct platform_device *coretemp_get_pdev(unsigned int cpu)
 {
-	u16 phys_proc_id = TO_PHYS_ID(cpu);
-	struct pdev_entry *p;
-
-	mutex_lock(&pdev_list_mutex);
+	int pkgid = topology_logical_package_id(cpu);
 
-	list_for_each_entry(p, &pdev_list, list)
-		if (p->phys_proc_id == phys_proc_id) {
-			mutex_unlock(&pdev_list_mutex);
-			return p->pdev;
-		}
-
-	mutex_unlock(&pdev_list_mutex);
+	if (pkgid >= 0 && pkgid < max_packages)
+		return pkg_devices[pkgid];
 	return NULL;
 }
 
@@ -561,7 +548,7 @@ static int coretemp_probe(struct platfor
 	if (!pdata)
 		return -ENOMEM;
 
-	pdata->phys_proc_id = pdev->id;
+	pdata->pkg_id = pdev->id;
 	platform_set_drvdata(pdev, pdata);
 
 	pdata->hwmon_dev = devm_hwmon_device_register_with_groups(dev, DRVNAME,
@@ -589,64 +576,26 @@ static struct platform_driver coretemp_d
 	.remove = coretemp_remove,
 };
 
-static int coretemp_device_add(unsigned int cpu)
+static struct platform_device *coretemp_device_add(unsigned int cpu)
 {
-	int err;
+	int err, pkgid = topology_logical_package_id(cpu);
 	struct platform_device *pdev;
-	struct pdev_entry *pdev_entry;
 
-	mutex_lock(&pdev_list_mutex);
+	if (pkgid < 0)
+		return ERR_PTR(-ENOMEM);
 
-	pdev = platform_device_alloc(DRVNAME, TO_PHYS_ID(cpu));
-	if (!pdev) {
-		err = -ENOMEM;
-		pr_err("Device allocation failed\n");
-		goto exit;
-	}
-
-	pdev_entry = kzalloc(sizeof(struct pdev_entry), GFP_KERNEL);
-	if (!pdev_entry) {
-		err = -ENOMEM;
-		goto exit_device_put;
-	}
+	pdev = platform_device_alloc(DRVNAME, pkgid);
+	if (!pdev)
+		return ERR_PTR(-ENOMEM);
 
 	err = platform_device_add(pdev);
 	if (err) {
-		pr_err("Device addition failed (%d)\n", err);
-		goto exit_device_free;
+		platform_device_put(pdev);
+		return ERR_PTR(err);
 	}
 
-	pdev_entry->pdev = pdev;
-	pdev_entry->phys_proc_id = pdev->id;
-
-	list_add_tail(&pdev_entry->list, &pdev_list);
-	mutex_unlock(&pdev_list_mutex);
-
-	return 0;
-
-exit_device_free:
-	kfree(pdev_entry);
-exit_device_put:
-	platform_device_put(pdev);
-exit:
-	mutex_unlock(&pdev_list_mutex);
-	return err;
-}
-
-static void coretemp_device_remove(unsigned int cpu)
-{
-	struct pdev_entry *p, *n;
-	u16 phys_proc_id = TO_PHYS_ID(cpu);
-
-	mutex_lock(&pdev_list_mutex);
-	list_for_each_entry_safe(p, n, &pdev_list, list) {
-		if (p->phys_proc_id != phys_proc_id)
-			continue;
-		platform_device_unregister(p->pdev);
-		list_del(&p->list);
-		kfree(p);
-	}
-	mutex_unlock(&pdev_list_mutex);
+	pkg_devices[pkgid] = pdev;
+	return pdev;
 }
 
 static int coretemp_cpu_online(unsigned int cpu)
@@ -654,7 +603,6 @@ static int coretemp_cpu_online(unsigned
 	struct platform_device *pdev = coretemp_get_pdev(cpu);
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	struct platform_data *pdata;
-	int err;
 
 	/*
 	 * CPUID.06H.EAX[0] indicates whether the CPU has thermal
@@ -675,11 +623,10 @@ static int coretemp_cpu_online(unsigned
 		 * online. So, initialize per-pkg data structures and
 		 * then bring this core online.
 		 */
-		err = coretemp_device_add(cpu);
-		if (err)
-			return err;
+		pdev = coretemp_device_add(cpu);
+		if (IS_ERR(pdev))
+			return PTR_ERR(pdev);
 
-		pdev = coretemp_get_pdev(cpu);
 		/*
 		 * Check whether pkgtemp support is available.
 		 * If so, add interfaces for pkgtemp.
@@ -736,15 +683,16 @@ static int coretemp_cpu_offline(unsigned
 	}
 
 	/*
-	 * If all cores in this pkg are offline, remove the device.
-	 * coretemp_device_remove calls unregister_platform_device,
-	 * which in turn calls coretemp_remove. This removes the
-	 * pkgtemp entry and does other clean ups.
+	 * If all cores in this pkg are offline, remove the device. This
+	 * will invoke the platform driver remove function, which cleans up
+	 * the rest.
 	 */
 	if (cpumask_empty(&pd->cpumask)) {
-		coretemp_device_remove(cpu);
+		pkg_devices[topology_logical_package_id(cpu)] = NULL;
+		platform_device_unregister(pdev);
 		return 0;
 	}
+
 	/*
 	 * Check whether this core is the target for the package
 	 * interface. We need to assign it to some other cpu.
@@ -778,6 +726,12 @@ static int __init coretemp_init(void)
 	if (!x86_match_cpu(coretemp_ids))
 		return -ENODEV;
 
+	max_packages = topology_max_packages();
+	pkg_devices = kzalloc(max_packages * sizeof(struct platform_device *),
+			      GFP_KERNEL);
+	if (!pkg_devices)
+		return -ENOMEM;
+
 	err = platform_driver_register(&coretemp_driver);
 	if (err)
 		return err;
@@ -791,6 +745,7 @@ static int __init coretemp_init(void)
 
 outdrv:
 	platform_driver_unregister(&coretemp_driver);
+	kfree(pkg_devices);
 	return err;
 }
 module_init(coretemp_init)
@@ -799,6 +754,7 @@ static void __exit coretemp_exit(void)
 {
 	cpuhp_remove_state(coretemp_hp_online);
 	platform_driver_unregister(&coretemp_driver);
+	kfree(pkg_devices);
 }
 module_exit(coretemp_exit)
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
                   ` (5 preceding siblings ...)
  2016-11-22 17:42 ` [patch 6/6] hwmon/coretemp: Simplify package management Thomas Gleixner
@ 2016-11-23 15:28 ` Guenter Roeck
  2017-04-12  8:31   ` Tommi Rantala
  6 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2016-11-23 15:28 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Fenghua Yu, Jean Delvare, linux-hwmon, Sebastian Siewior,
	Peter Zijlstra, x86

On 11/22/2016 09:42 AM, Thomas Gleixner wrote:
> After the first attempt to convert the coretemp driver to the hotplug state
> machine failed, we had a deeper look and went a bit farther.
>
> The driver has quite some interesting concepts vs. the package, core and
> sysfs file management and a bug in the package temperature sysfs interface
> vs. cpu hotplug.
>
> The following series fixes that bug and simplifies the package/core
> management and at the end converts it to the hotplug state machine.
>
> Along with the source size the binary size shrinks as well:
>    text	   data	    bss	    dec	    hex	
>    4068	   360       20    4448    1160	  Before
>    3801	   180       36    4017     fb1	  After
>
> Thanks,
>
> 	tglx
> -----
>  coretemp.c |  321 +++++++++++++++++++++----------------------------------------
>  1 file changed, 113 insertions(+), 208 deletions(-)
>
>
>
>
Looks good. Series applied to -next.

Guenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2016-11-23 15:28 ` [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Guenter Roeck
@ 2017-04-12  8:31   ` Tommi Rantala
  2017-04-12  9:28     ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2017-04-12  8:31 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Thomas Gleixner, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2016-11-23 17:28 GMT+02:00 Guenter Roeck <linux@roeck-us.net>:
>
> On 11/22/2016 09:42 AM, Thomas Gleixner wrote:
>>
>> After the first attempt to convert the coretemp driver to the hotplug state
>> machine failed, we had a deeper look and went a bit farther.
>>
>> The driver has quite some interesting concepts vs. the package, core and
>> sysfs file management and a bug in the package temperature sysfs interface
>> vs. cpu hotplug.
>>
>> The following series fixes that bug and simplifies the package/core
>> management and at the end converts it to the hotplug state machine.
>>
>> Along with the source size the binary size shrinks as well:
>>    text    data     bss     dec     hex
>>    4068    360       20 4448 1160     Before
>>    3801    180       36    4017     fb1   After
>>
>> Thanks,
>>
>>         tglx
>> -----
>>  coretemp.c |  321 +++++++++++++++++++++----------------------------------------
>>  1 file changed, 113 insertions(+), 208 deletions(-)

Hi,

Resume-from-suspend stopped working in HP xw6600 in fedora kernel
4.10.8-200.fc25.x86_64, while it worked just fine in
4.9.9-200.fc25.x86_64.

When powering on the suspended PC, there is no video output, and to
recover, I need to reset the machine.
Nothing is recorded in the journal logs for the resume, last lines are
from the suspend:

  Apr 08 15:41:49 xw6600 systemd[1]: Reached target Sleep.
  Apr 08 15:41:49 xw6600 systemd[1]: Starting Suspend...
  Apr 08 15:41:49 xw6600 systemd-sleep[6675]: Suspending system...

Also tested 4.11-rc5, but it fails the same way.

Bisection leads to commit:

commit e00ca5df37adc68052ea699cbd010ee4e19e39e4
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Nov 22 17:42:04 2016 +0000

    hwmon: (coretemp) Convert to hotplug state machine

    Install the callbacks via the state machine. Setup and teardown are handled
    by the hotplug core.

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: linux-hwmon@vger.kernel.org
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Jean Delvare <jdelvare@suse.com>
    Cc: rt@linuxtronix.de
    Cc: Guenter Roeck <linux@roeck-us.net>
    Link: http://lkml.kernel.org/r/20161117183541.8588-5-bigeasy@linutronix.de
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

If I do "modprobe -r coretemp", then the resume works OK with
4.10.8-200.fc25.x86_64.

Any ideas?

4.9.9-200.fc25.x86_64 dmesg:
http://termbin.com/3kcl

4.10.8-200.fc25.x86_64 dmesg:
http://termbin.com/62d9

-Tommi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-12  8:31   ` Tommi Rantala
@ 2017-04-12  9:28     ` Thomas Gleixner
  2017-04-12 10:43       ` Tommi Rantala
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-04-12  9:28 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Wed, 12 Apr 2017, Tommi Rantala wrote:
> Resume-from-suspend stopped working in HP xw6600 in fedora kernel
> 4.10.8-200.fc25.x86_64, while it worked just fine in
> 4.9.9-200.fc25.x86_64.
> 
> When powering on the suspended PC, there is no video output, and to
> recover, I need to reset the machine.

Is there just no video output or is the machine completely frozen? If it's
not completely dead, then you might be able to ssh into it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-12  9:28     ` Thomas Gleixner
@ 2017-04-12 10:43       ` Tommi Rantala
  2017-04-12 10:52         ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2017-04-12 10:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-04-12 12:28 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> On Wed, 12 Apr 2017, Tommi Rantala wrote:
>> Resume-from-suspend stopped working in HP xw6600 in fedora kernel
>> 4.10.8-200.fc25.x86_64, while it worked just fine in
>> 4.9.9-200.fc25.x86_64.
>>
>> When powering on the suspended PC, there is no video output, and to
>> recover, I need to reset the machine.
>
> Is there just no video output or is the machine completely frozen? If it's
> not completely dead, then you might be able to ssh into it.

It's completely hosed: not possible to ssh, does not respond to ping either.

I made a quick test with netconsole. After booting with
no_console_suspend=1, and setting the netconsole parameters, I can get
kernel messages (to my android phone) when suspending the machine. But
no messages after the failed resume.

Hmm, might I be able to capture messages over USB serial port...?

-Tommi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-12 10:43       ` Tommi Rantala
@ 2017-04-12 10:52         ` Thomas Gleixner
  2017-04-12 11:00           ` Tommi Rantala
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-04-12 10:52 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Wed, 12 Apr 2017, Tommi Rantala wrote:
> 2017-04-12 12:28 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > On Wed, 12 Apr 2017, Tommi Rantala wrote:
> >> Resume-from-suspend stopped working in HP xw6600 in fedora kernel
> >> 4.10.8-200.fc25.x86_64, while it worked just fine in
> >> 4.9.9-200.fc25.x86_64.
> >>
> >> When powering on the suspended PC, there is no video output, and to
> >> recover, I need to reset the machine.
> >
> > Is there just no video output or is the machine completely frozen? If it's
> > not completely dead, then you might be able to ssh into it.
> 
> It's completely hosed: not possible to ssh, does not respond to ping either.
> 
> I made a quick test with netconsole. After booting with
> no_console_suspend=1, and setting the netconsole parameters, I can get
> kernel messages (to my android phone) when suspending the machine. But
> no messages after the failed resume.

Let's do something else first.

Can you please try to offline/online CPUs from the console?

# echo 0 >/sys/devices/system/cpu1/online
# echo 1 >/sys/devices/system/cpu1/online

If that works, then try to offline all CPUs (except 0) in the same order as
suspend (1 ... 7) and then online them again in the same order?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-12 10:52         ` Thomas Gleixner
@ 2017-04-12 11:00           ` Tommi Rantala
  2017-04-12 14:53             ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2017-04-12 11:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-04-12 13:52 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> On Wed, 12 Apr 2017, Tommi Rantala wrote:
>> 2017-04-12 12:28 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
>> > On Wed, 12 Apr 2017, Tommi Rantala wrote:
>> >> Resume-from-suspend stopped working in HP xw6600 in fedora kernel
>> >> 4.10.8-200.fc25.x86_64, while it worked just fine in
>> >> 4.9.9-200.fc25.x86_64.
>> >>
>> >> When powering on the suspended PC, there is no video output, and to
>> >> recover, I need to reset the machine.
>> >
>> > Is there just no video output or is the machine completely frozen? If it's
>> > not completely dead, then you might be able to ssh into it.
>>
>> It's completely hosed: not possible to ssh, does not respond to ping either.
>>
>> I made a quick test with netconsole. After booting with
>> no_console_suspend=1, and setting the netconsole parameters, I can get
>> kernel messages (to my android phone) when suspending the machine. But
>> no messages after the failed resume.
>
> Let's do something else first.
>
> Can you please try to offline/online CPUs from the console?
>
> # echo 0 >/sys/devices/system/cpu1/online
> # echo 1 >/sys/devices/system/cpu1/online

ok, that works.

> If that works, then try to offline all CPUs (except 0) in the same order as
> suspend (1 ... 7) and then online them again in the same order?

Seems to work without problems:

# for i in $(seq 1 7) ; do echo 0 > /sys/devices/system/cpu/cpu$i/online ; done

[ 1237.317537] intel_powerclamp: No package C-state available
[ 1308.997620] smpboot: CPU 1 is now offline
[ 1309.007167] intel_powerclamp: No package C-state available
[ 1309.032563] smpboot: CPU 2 is now offline
[ 1309.038118] intel_powerclamp: No package C-state available
[ 1309.072495] smpboot: CPU 3 is now offline
[ 1309.077807] intel_powerclamp: No package C-state available
[ 1309.099545] Broke affinity for irq 29
[ 1309.100587] smpboot: CPU 4 is now offline
[ 1309.105346] intel_powerclamp: No package C-state available
[ 1309.135530] Broke affinity for irq 22
[ 1309.135540] Broke affinity for irq 29
[ 1309.136579] smpboot: CPU 5 is now offline
[ 1309.141653] intel_powerclamp: No package C-state available
[ 1309.171517] Broke affinity for irq 22
[ 1309.171526] Broke affinity for irq 29
[ 1309.171535] Broke affinity for irq 31
[ 1309.172586] smpboot: CPU 6 is now offline
[ 1309.176967] intel_powerclamp: No package C-state available
[ 1309.209122] Broke affinity for irq 19
[ 1309.209126] Broke affinity for irq 22
[ 1309.209135] Broke affinity for irq 29
[ 1309.209145] Broke affinity for irq 31
[ 1309.212071] smpboot: CPU 7 is now offline


# for i in $(seq 1 7) ; do echo 1 > /sys/devices/system/cpu/cpu$i/online ; done

[ 1309.217476] intel_powerclamp: No package C-state available
[ 1380.624184] x86: Booting SMP configuration:
[ 1380.624186] smpboot: Booting Node 0 Processor 1 APIC 0x4
[ 1380.659810] intel_powerclamp: No package C-state available
[ 1380.659957] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 1380.671198] microcode: sig=0x10676, pf=0x40, revision=0x60f
[ 1380.672088] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 1380.677952] intel_powerclamp: No package C-state available
[ 1380.686260] microcode: sig=0x1067a, pf=0x40, revision=0xa0b
[ 1380.687098] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 1380.699214] microcode: sig=0x10676, pf=0x40, revision=0x60f
[ 1380.699742] intel_powerclamp: No package C-state available
[ 1380.700267] smpboot: Booting Node 0 Processor 5 APIC 0x5
[ 1380.715207] microcode: sig=0x1067a, pf=0x40, revision=0xa0b
[ 1380.716202] smpboot: Booting Node 0 Processor 6 APIC 0x3
[ 1380.730264] microcode: sig=0x10676, pf=0x40, revision=0x60f
[ 1380.730567] intel_powerclamp: No package C-state available
[ 1380.731267] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 1380.748276] microcode: sig=0x1067a, pf=0x40, revision=0xa0b

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-12 11:00           ` Tommi Rantala
@ 2017-04-12 14:53             ` Thomas Gleixner
  2017-04-14 17:35               ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-04-12 14:53 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Wed, 12 Apr 2017, Tommi Rantala wrote:
> 2017-04-12 13:52 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > Can you please try to offline/online CPUs from the console?
> >
> > # echo 0 >/sys/devices/system/cpu1/online
> > # echo 1 >/sys/devices/system/cpu1/online
> 
> ok, that works.
> 
> > If that works, then try to offline all CPUs (except 0) in the same order as
> > suspend (1 ... 7) and then online them again in the same order?
> 
> Seems to work without problems:

Good.

Can you please try the following:

# for STATE in freezer devices platform processors core; do \
  echo $STATE; \
  echo $STATE >/sys/power/pm_test; \
  echo mem >/sys/power/state

That should give us at least a hint in which area to dig.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-12 14:53             ` Thomas Gleixner
@ 2017-04-14 17:35               ` Thomas Gleixner
  2017-04-15 17:22                 ` Tommi Rantala
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-04-14 17:35 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Wed, 12 Apr 2017, Thomas Gleixner wrote:
> On Wed, 12 Apr 2017, Tommi Rantala wrote:
> > 2017-04-12 13:52 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > > Can you please try to offline/online CPUs from the console?
> > >
> > > # echo 0 >/sys/devices/system/cpu1/online
> > > # echo 1 >/sys/devices/system/cpu1/online
> > 
> > ok, that works.
> > 
> > > If that works, then try to offline all CPUs (except 0) in the same order as
> > > suspend (1 ... 7) and then online them again in the same order?
> > 
> > Seems to work without problems:
> 
> Good.
> 
> Can you please try the following:
> 
> # for STATE in freezer devices platform processors core; do \
>   echo $STATE; \
>   echo $STATE >/sys/power/pm_test; \
>   echo mem >/sys/power/state
> 
> That should give us at least a hint in which area to dig.

Any news on that?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-14 17:35               ` Thomas Gleixner
@ 2017-04-15 17:22                 ` Tommi Rantala
  2017-04-23 15:01                   ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2017-04-15 17:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-04-14 20:35 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> On Wed, 12 Apr 2017, Thomas Gleixner wrote:
>>
>> Can you please try the following:
>>
>> # for STATE in freezer devices platform processors core; do \
>>   echo $STATE; \
>>   echo $STATE >/sys/power/pm_test; \
>>   echo mem >/sys/power/state
>>
>> That should give us at least a hint in which area to dig.
>
> Any news on that?

Sorry, was traveling.

Testing with 4.10.8-200.fc25.x86_64: freezer, devices and platform are
OK, it breaks at "processors".
The screen stays off, and the machine no longer answers to ping.

(Without coretemp loaded, the machine survives all the states. There
are some graphics glitches and radeon error messages)

-Tommi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-15 17:22                 ` Tommi Rantala
@ 2017-04-23 15:01                   ` Thomas Gleixner
  2017-05-04 15:39                     ` Tommi Rantala
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-04-23 15:01 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Sat, 15 Apr 2017, Tommi Rantala wrote:

> 2017-04-14 20:35 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > On Wed, 12 Apr 2017, Thomas Gleixner wrote:
> >>
> >> Can you please try the following:
> >>
> >> # for STATE in freezer devices platform processors core; do \
> >>   echo $STATE; \
> >>   echo $STATE >/sys/power/pm_test; \
> >>   echo mem >/sys/power/state
> >>
> >> That should give us at least a hint in which area to dig.
> >
> > Any news on that?
> 
> Sorry, was traveling.
> 
> Testing with 4.10.8-200.fc25.x86_64: freezer, devices and platform are
> OK, it breaks at "processors".
> The screen stays off, and the machine no longer answers to ping.
> 
> (Without coretemp loaded, the machine survives all the states. There
> are some graphics glitches and radeon error messages)

That's odd. I tried on a similar machine (w/o a radeon card) and it just
works with the coretemp module loaded.

Can you please do a CPU hotplug cycle (just one CPU) with the cpuhp events
in the tracer enabled. Send me the trace output so I might be able to spot
whats different and what interdependencies between other callbacks might be
there.

I'm traveling for a week now. I come back to this after my travel; if I
forget, please send me a reminder.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-04-23 15:01                   ` Thomas Gleixner
@ 2017-05-04 15:39                     ` Tommi Rantala
  2017-05-09  7:16                       ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2017-05-04 15:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-04-23 18:01 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> On Sat, 15 Apr 2017, Tommi Rantala wrote:
>
>> Testing with 4.10.8-200.fc25.x86_64: freezer, devices and platform are
>> OK, it breaks at "processors".
>> The screen stays off, and the machine no longer answers to ping.
>>
>> (Without coretemp loaded, the machine survives all the states. There
>> are some graphics glitches and radeon error messages)
>
> That's odd. I tried on a similar machine (w/o a radeon card) and it just
> works with the coretemp module loaded.
>
> Can you please do a CPU hotplug cycle (just one CPU) with the cpuhp events
> in the tracer enabled. Send me the trace output so I might be able to spot
> whats different and what interdependencies between other callbacks might be
> there.

Hi,

Here's the trace output, does it help?

http://termbin.com/qugr

-Tommi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-05-04 15:39                     ` Tommi Rantala
@ 2017-05-09  7:16                       ` Thomas Gleixner
  2017-05-10 13:52                         ` Tommi Rantala
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-05-09  7:16 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Thu, 4 May 2017, Tommi Rantala wrote:
> 2017-04-23 18:01 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > On Sat, 15 Apr 2017, Tommi Rantala wrote:
> >
> >> Testing with 4.10.8-200.fc25.x86_64: freezer, devices and platform are
> >> OK, it breaks at "processors".
> >> The screen stays off, and the machine no longer answers to ping.
> >>
> >> (Without coretemp loaded, the machine survives all the states. There
> >> are some graphics glitches and radeon error messages)
> >
> > That's odd. I tried on a similar machine (w/o a radeon card) and it just
> > works with the coretemp module loaded.
> >
> > Can you please do a CPU hotplug cycle (just one CPU) with the cpuhp events
> > in the tracer enabled. Send me the trace output so I might be able to spot
> > whats different and what interdependencies between other callbacks might be
> > there.
> 
> Hi,
> 
> Here's the trace output, does it help?

Not much. Can you please try the following:

1) Offline all CPUs except CPU0 before suspend/resume

2) Offline all CPUs except CPU0 and CPU1 before suspend/resume

3) Offline all CPUs except CPU0 and CPU2 before suspend/resume

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-05-09  7:16                       ` Thomas Gleixner
@ 2017-05-10 13:52                         ` Tommi Rantala
  2017-05-10 14:01                           ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Tommi Rantala @ 2017-05-10 13:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-05-09 10:16 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> On Thu, 4 May 2017, Tommi Rantala wrote:
>> Here's the trace output, does it help?
>
> Not much. Can you please try the following:
>
> 1) Offline all CPUs except CPU0 before suspend/resume

it works!

> 2) Offline all CPUs except CPU0 and CPU1 before suspend/resume

now it breaks.

> 3) Offline all CPUs except CPU0 and CPU2 before suspend/resume

works again!

(Also works with CPUs 0,2,4,6 onlined.)

-Tommi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-05-10 13:52                         ` Tommi Rantala
@ 2017-05-10 14:01                           ` Thomas Gleixner
  2017-05-10 14:02                             ` Tommi Rantala
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-05-10 14:01 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Wed, 10 May 2017, Tommi Rantala wrote:
> 2017-05-09 10:16 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > On Thu, 4 May 2017, Tommi Rantala wrote:
> >> Here's the trace output, does it help?
> >
> > Not much. Can you please try the following:
> >
> > 1) Offline all CPUs except CPU0 before suspend/resume
> 
> it works!
> 
> > 2) Offline all CPUs except CPU0 and CPU1 before suspend/resume
> 
> now it breaks.
> 
> > 3) Offline all CPUs except CPU0 and CPU2 before suspend/resume
> 
> works again!
> 
> (Also works with CPUs 0,2,4,6 onlined.)

Output from /proc/cpuinfo please.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion
  2017-05-10 14:01                           ` Thomas Gleixner
@ 2017-05-10 14:02                             ` Tommi Rantala
  0 siblings, 0 replies; 22+ messages in thread
From: Tommi Rantala @ 2017-05-10 14:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-05-10 17:01 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> On Wed, 10 May 2017, Tommi Rantala wrote:
>> 2017-05-09 10:16 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
>> > On Thu, 4 May 2017, Tommi Rantala wrote:
>> >> Here's the trace output, does it help?
>> >
>> > Not much. Can you please try the following:
>> >
>> > 1) Offline all CPUs except CPU0 before suspend/resume
>>
>> it works!
>>
>> > 2) Offline all CPUs except CPU0 and CPU1 before suspend/resume
>>
>> now it breaks.
>>
>> > 3) Offline all CPUs except CPU0 and CPU2 before suspend/resume
>>
>> works again!
>>
>> (Also works with CPUs 0,2,4,6 onlined.)
>
> Output from /proc/cpuinfo please.

http://termbin.com/vec2

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-05-10 14:02 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-22 17:42 [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Thomas Gleixner
2016-11-22 17:42 ` [patch 2/6] hwmon/coretemp: Simplify sibling management Thomas Gleixner
2016-11-22 17:42 ` [patch 1/6] hwmon/coretemp: Fixup target cpu for package when cpu is offlined Thomas Gleixner
2016-11-22 17:42 ` [patch 3/6] hwmon/coretemp: Avoid redundant lookups Thomas Gleixner
2016-11-22 17:42 ` [patch 4/6] [PREEMPT-RT] hwmon/coretemp: Convert to hotplug state machine Thomas Gleixner
2016-11-22 17:42 ` [patch 5/6] hwmon/coretemp: Use proper error codes in cpu online callback Thomas Gleixner
2016-11-22 17:42 ` [patch 6/6] hwmon/coretemp: Simplify package management Thomas Gleixner
2016-11-23 15:28 ` [patch 0/6] hwmon/coretemp: Hotplug fixes, cleanups and state machine conversion Guenter Roeck
2017-04-12  8:31   ` Tommi Rantala
2017-04-12  9:28     ` Thomas Gleixner
2017-04-12 10:43       ` Tommi Rantala
2017-04-12 10:52         ` Thomas Gleixner
2017-04-12 11:00           ` Tommi Rantala
2017-04-12 14:53             ` Thomas Gleixner
2017-04-14 17:35               ` Thomas Gleixner
2017-04-15 17:22                 ` Tommi Rantala
2017-04-23 15:01                   ` Thomas Gleixner
2017-05-04 15:39                     ` Tommi Rantala
2017-05-09  7:16                       ` Thomas Gleixner
2017-05-10 13:52                         ` Tommi Rantala
2017-05-10 14:01                           ` Thomas Gleixner
2017-05-10 14:02                             ` Tommi Rantala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).