linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, mce, therm_throt: Optimize notifications of thermal throttle
@ 2019-11-11 21:43 Srinivas Pandruvada
  2019-11-12 15:00 ` [tip: ras/core] x86/mce/therm_throt: " tip-bot2 for Srinivas Pandruvada
  2020-02-08 22:24 ` [PATCH] x86, mce, therm_throt: " Chris Wilson
  0 siblings, 2 replies; 7+ messages in thread
From: Srinivas Pandruvada @ 2019-11-11 21:43 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, hpa, bberg
  Cc: x86, linux-edac, linux-kernel, hdegoede, ckellner, Srinivas Pandruvada

Some modern systems have very tight thermal tolerances. Because of this
they may cross thermal thresholds when running normal workloads (even
during boot). The CPU hardware will react by limiting power/frequency
and using duty cycles to bring the temperature back into normal range.

Thus users may see a "critical" message about the "temperature above
threshold" which is soon followed by "temperature/speed normal". These
messages are rate-limited, but still may repeat every few minutes.

This issue became worse starting with the Ivy Bridge generation of
CPUs because they include a TCC activation offset in the MSR
IA32_TEMPERATURE_TARGET. OEMs use this to provide alerts long before
critical temperatures are reached.

A test run on a laptop with Intel 8th Gen i5 core for two hours with a
workload resulted in 20K+ thermal interrupts per CPU for core level and
another 20K+ interrupts at package level. The kernel logs were full of
throttling messages.

The real value of these threshold interrupts, is to debug problems with
the external cooling solutions and performance issues due to excessive
throttling.

So the solution here is the following:
- In the current thermal_throttle folder, show the maximum time for one
throttling event and total amount of time, the system was in throttling
state.
- Do not log short excursions.
- Log only when, in spite of thermal throttling the temperature is rising.
On the high threshold interrupt trigger a delayed workqueue, that
monitors the threshold violation log bit (THERM_STATUS_PROCHOT_LOG). When
the log bit is set, this workqueue callback calculates three point moving
average and logs warning message when the temperature trend is rising.
When this log bit is clear and temperature is below threshold temperature,
then the workqueue callback logs "Normal" message". Once a high threshold
event is logged, the logging is rate limited.

With this patch, on the same test laptop, no warnings are printed in logs
as the max time the processor could bring the temperature under control is
only 280 ms.

This implementation is done with the inputs from Alan Cox and Tony Luck.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---

Version History:
v1:
Changes compared to RFC PATCH:
	Addressed comments from Boris
	Rebased to tip/master
	Added kenrel doc for struct
	Edits to the commit description
	Optimize storage for _thermal_state
	Ignore invalid sample for threshold high

 arch/x86/kernel/cpu/mce/therm_throt.c | 251 +++++++++++++++++++++++---
 1 file changed, 227 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index bc441d68d060..c102fcff23f3 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -40,15 +40,58 @@
 #define THERMAL_THROTTLING_EVENT	0
 #define POWER_LIMIT_EVENT		1
 
-/*
- * Current thermal event state:
+/**
+ * struct _thermal_state - Represent the current thermal event state
+ * @next_check:			Stores the next timestamp, when it is allowed
+ *				to log the next warning message.
+ * @last_interrupt_time:	Stores the timestamp for the last threshold
+ *				high event.
+ * @therm_work:			Delayed workqueue structure
+ * @count:			Stores the current running count for thermal
+ *				or power threshold interrupts.
+ * @last_count:			Stores the previous running count for thermal
+ *				or power threshold interrupts.
+ * @max_time_ms:		This shows the maximum amount of time CPU was
+ *				in throttled state for a single thermal
+ *				threshold high to low state.
+ * @total_time_ms:		This is a cumulative time during which CPU was
+ *				in the throttled state.
+ * @rate_control_active:	Flag to set, when throttling message is logged.
+ *				This is used for the purpose of rate-control.
+ * @new_event:			Stores the last high/low status of the
+ *				THERM_STATUS_PROCHOT or
+ *				THERM_STATUS_POWER_LIMIT.
+ * @level:			Stores whether this _thermal_state instance is
+ *				for a CORE level or for PACKAGE level.
+ * @sample_index:		Index for storage for the next sample in the
+ *				buffer temp_samples[].
+ * @sample_count:		Total number of samples collected in the buffer
+ *				temp_samples[].
+ * @average:			The last moving average of temperature samples
+ * @baseline_temp:		Temperature at which thermal threshold high
+ *				interrupt was generated.
+ * @temp_samples:		Storage for temperature samples to calculate
+ *				moving average.
+ *
+ * This structure is used to represent data related to thermal state for a CPU.
+ * There is a separate storage for core and package level for each CPU.
  */
 struct _thermal_state {
-	bool			new_event;
-	int			event;
 	u64			next_check;
+	u64			last_interrupt_time;
+	struct delayed_work	therm_work;
 	unsigned long		count;
 	unsigned long		last_count;
+	unsigned long		max_time_ms;
+	unsigned long		total_time_ms;
+	bool			rate_control_active;
+	bool			new_event;
+	u8			level;
+	u8			sample_index;
+	u8			sample_count;
+	u8			average;
+	u8			baseline_temp;
+	u8			temp_samples[3];
 };
 
 struct thermal_state {
@@ -121,8 +164,22 @@ define_therm_throt_device_one_ro(package_throttle_count);
 define_therm_throt_device_show_func(package_power_limit, count);
 define_therm_throt_device_one_ro(package_power_limit_count);
 
+define_therm_throt_device_show_func(core_throttle, max_time_ms);
+define_therm_throt_device_one_ro(core_throttle_max_time_ms);
+
+define_therm_throt_device_show_func(package_throttle, max_time_ms);
+define_therm_throt_device_one_ro(package_throttle_max_time_ms);
+
+define_therm_throt_device_show_func(core_throttle, total_time_ms);
+define_therm_throt_device_one_ro(core_throttle_total_time_ms);
+
+define_therm_throt_device_show_func(package_throttle, total_time_ms);
+define_therm_throt_device_one_ro(package_throttle_total_time_ms);
+
 static struct attribute *thermal_throttle_attrs[] = {
 	&dev_attr_core_throttle_count.attr,
+	&dev_attr_core_throttle_max_time_ms.attr,
+	&dev_attr_core_throttle_total_time_ms.attr,
 	NULL
 };
 
@@ -135,6 +192,105 @@ static const struct attribute_group thermal_attr_group = {
 #define CORE_LEVEL	0
 #define PACKAGE_LEVEL	1
 
+#define THERM_THROT_POLL_INTERVAL	HZ
+#define THERM_STATUS_PROCHOT_LOG	BIT(1)
+
+static void clear_therm_status_log(int level)
+{
+	int msr;
+	u64 msr_val;
+
+	if (level == CORE_LEVEL)
+		msr = MSR_IA32_THERM_STATUS;
+	else
+		msr = MSR_IA32_PACKAGE_THERM_STATUS;
+
+	rdmsrl(msr, msr_val);
+	wrmsrl(msr, msr_val & ~THERM_STATUS_PROCHOT_LOG);
+}
+
+static void get_therm_status(int level, bool *proc_hot, u8 *temp)
+{
+	int msr;
+	u64 msr_val;
+
+	if (level == CORE_LEVEL)
+		msr = MSR_IA32_THERM_STATUS;
+	else
+		msr = MSR_IA32_PACKAGE_THERM_STATUS;
+
+	rdmsrl(msr, msr_val);
+	if (msr_val & THERM_STATUS_PROCHOT_LOG)
+		*proc_hot = true;
+	else
+		*proc_hot = false;
+
+	*temp = (msr_val >> 16) & 0x7F;
+}
+
+static void throttle_active_work(struct work_struct *work)
+{
+	struct _thermal_state *state = container_of(to_delayed_work(work),
+						struct _thermal_state, therm_work);
+	unsigned int i, avg, this_cpu = smp_processor_id();
+	u64 now = get_jiffies_64();
+	bool hot;
+	u8 temp;
+
+	get_therm_status(state->level, &hot, &temp);
+	/* temperature value is offset from the max so lesser means hotter */
+	if (!hot && temp > state->baseline_temp) {
+		if (state->rate_control_active)
+			pr_info("CPU%d: %s temperature/speed normal (total events = %lu)\n",
+				this_cpu,
+				state->level == CORE_LEVEL ? "Core" : "Package",
+				state->count);
+
+		state->rate_control_active = false;
+		return;
+	}
+
+	if (time_before64(now, state->next_check) &&
+			  state->rate_control_active)
+		goto re_arm;
+
+	state->next_check = now + CHECK_INTERVAL;
+
+	if (state->count != state->last_count) {
+		/* There was one new thermal interrupt */
+		state->last_count = state->count;
+		state->average = 0;
+		state->sample_count = 0;
+		state->sample_index = 0;
+	}
+
+	state->temp_samples[state->sample_index] = temp;
+	state->sample_count++;
+	state->sample_index = (state->sample_index + 1) % ARRAY_SIZE(state->temp_samples);
+	if (state->sample_count < ARRAY_SIZE(state->temp_samples))
+		goto re_arm;
+
+	avg = 0;
+	for (i = 0; i < ARRAY_SIZE(state->temp_samples); ++i)
+		avg += state->temp_samples[i];
+
+	avg /= ARRAY_SIZE(state->temp_samples);
+
+	if (state->average > avg) {
+		pr_warn("CPU%d: %s temperature is above threshold, cpu clock is throttled (total events = %lu)\n",
+			this_cpu,
+			state->level == CORE_LEVEL ? "Core" : "Package",
+			state->count);
+		state->rate_control_active = true;
+	}
+
+	state->average = avg;
+
+re_arm:
+	clear_therm_status_log(state->level);
+	schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL);
+}
+
 /***
  * therm_throt_process - Process thermal throttling event from interrupt
  * @curr: Whether the condition is current or not (boolean), since the
@@ -178,27 +334,33 @@ static void therm_throt_process(bool new_event, int event, int level)
 	if (new_event)
 		state->count++;
 
-	if (time_before64(now, state->next_check) &&
-			state->count != state->last_count)
+	if (event != THERMAL_THROTTLING_EVENT)
 		return;
 
-	state->next_check = now + CHECK_INTERVAL;
-	state->last_count = state->count;
+	if (new_event && !state->last_interrupt_time) {
+		bool hot;
+		u8 temp;
+
+		get_therm_status(state->level, &hot, &temp);
+		/*
+		 * Ignore short temperature spike as the system is not close
+		 * to PROCHOT. 10C offset is large enough to ignore. It is
+		 * already dropped from the high threshold temperature.
+		 */
+		if (temp > 10)
+			return;
 
-	/* if we just entered the thermal event */
-	if (new_event) {
-		if (event == THERMAL_THROTTLING_EVENT)
-			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
-				this_cpu,
-				level == CORE_LEVEL ? "Core" : "Package",
-				state->count);
-		return;
-	}
-	if (old_event) {
-		if (event == THERMAL_THROTTLING_EVENT)
-			pr_info("CPU%d: %s temperature/speed normal\n", this_cpu,
-				level == CORE_LEVEL ? "Core" : "Package");
-		return;
+		state->baseline_temp = temp;
+		state->last_interrupt_time = now;
+		schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL);
+	} else if (old_event && state->last_interrupt_time) {
+		unsigned long throttle_time;
+
+		throttle_time = jiffies_delta_to_msecs(now - state->last_interrupt_time);
+		if (throttle_time > state->max_time_ms)
+			state->max_time_ms = throttle_time;
+		state->total_time_ms += throttle_time;
+		state->last_interrupt_time = 0;
 	}
 }
 
@@ -244,20 +406,47 @@ static int thermal_throttle_add_dev(struct device *dev, unsigned int cpu)
 	if (err)
 		return err;
 
-	if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable)
+	if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) {
 		err = sysfs_add_file_to_group(&dev->kobj,
 					      &dev_attr_core_power_limit_count.attr,
 					      thermal_attr_group.name);
+		if (err)
+			goto del_group;
+	}
+
 	if (cpu_has(c, X86_FEATURE_PTS)) {
 		err = sysfs_add_file_to_group(&dev->kobj,
 					      &dev_attr_package_throttle_count.attr,
 					      thermal_attr_group.name);
-		if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable)
+		if (err)
+			goto del_group;
+
+		err = sysfs_add_file_to_group(&dev->kobj,
+					      &dev_attr_package_throttle_max_time_ms.attr,
+					      thermal_attr_group.name);
+		if (err)
+			goto del_group;
+
+		err = sysfs_add_file_to_group(&dev->kobj,
+					      &dev_attr_package_throttle_total_time_ms.attr,
+					      thermal_attr_group.name);
+		if (err)
+			goto del_group;
+
+		if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) {
 			err = sysfs_add_file_to_group(&dev->kobj,
 					&dev_attr_package_power_limit_count.attr,
 					thermal_attr_group.name);
+			if (err)
+				goto del_group;
+		}
 	}
 
+	return 0;
+
+del_group:
+	sysfs_remove_group(&dev->kobj, &thermal_attr_group);
+
 	return err;
 }
 
@@ -269,15 +458,29 @@ static void thermal_throttle_remove_dev(struct device *dev)
 /* Get notified when a cpu comes on/off. Be hotplug friendly. */
 static int thermal_throttle_online(unsigned int cpu)
 {
+	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
 
+	state->package_throttle.level = PACKAGE_LEVEL;
+	state->core_throttle.level = CORE_LEVEL;
+
+	INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work);
+	INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work);
+
 	return thermal_throttle_add_dev(dev, cpu);
 }
 
 static int thermal_throttle_offline(unsigned int cpu)
 {
+	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
 
+	cancel_delayed_work(&state->package_throttle.therm_work);
+	cancel_delayed_work(&state->core_throttle.therm_work);
+
+	state->package_throttle.rate_control_active = false;
+	state->core_throttle.rate_control_active = false;
+
 	thermal_throttle_remove_dev(dev);
 	return 0;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip: ras/core] x86/mce/therm_throt: Optimize notifications of thermal throttle
  2019-11-11 21:43 [PATCH] x86, mce, therm_throt: Optimize notifications of thermal throttle Srinivas Pandruvada
@ 2019-11-12 15:00 ` tip-bot2 for Srinivas Pandruvada
  2020-02-08 22:24 ` [PATCH] x86, mce, therm_throt: " Chris Wilson
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot2 for Srinivas Pandruvada @ 2019-11-12 15:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Srinivas Pandruvada, Borislav Petkov, H. Peter Anvin, bberg,
	ckellner, hdegoede, Ingo Molnar, linux-edac, Thomas Gleixner,
	Tony Luck, x86-ml, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     f6656208f04e5b3804054008eba4bf7170f4c841
Gitweb:        https://git.kernel.org/tip/f6656208f04e5b3804054008eba4bf7170f4c841
Author:        Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
AuthorDate:    Mon, 11 Nov 2019 13:43:12 -08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 12 Nov 2019 15:56:04 +01:00

x86/mce/therm_throt: Optimize notifications of thermal throttle

Some modern systems have very tight thermal tolerances. Because of this
they may cross thermal thresholds when running normal workloads (even
during boot). The CPU hardware will react by limiting power/frequency
and using duty cycles to bring the temperature back into normal range.

Thus users may see a "critical" message about the "temperature above
threshold" which is soon followed by "temperature/speed normal". These
messages are rate-limited, but still may repeat every few minutes.

This issue became worse starting with the Ivy Bridge generation of
CPUs because they include a TCC activation offset in the MSR
IA32_TEMPERATURE_TARGET. OEMs use this to provide alerts long before
critical temperatures are reached.

A test run on a laptop with Intel 8th Gen i5 core for two hours with a
workload resulted in 20K+ thermal interrupts per CPU for core level and
another 20K+ interrupts at package level. The kernel logs were full of
throttling messages.

The real value of these threshold interrupts, is to debug problems with
the external cooling solutions and performance issues due to excessive
throttling.

So the solution here is the following:

  - In the current thermal_throttle folder, show:
    - the maximum time for one throttling event and,
    - the total amount of time the system was in throttling state.

  - Do not log short excursions.

  - Log only when, in spite of thermal throttling, the temperature is rising.
  On the high threshold interrupt trigger a delayed workqueue that
  monitors the threshold violation log bit (THERM_STATUS_PROCHOT_LOG). When
  the log bit is set, this workqueue callback calculates three point moving
  average and logs a warning message when the temperature trend is rising.

  When this log bit is clear and temperature is below threshold
  temperature, then the workqueue callback logs a "Normal" message. Once a
  high threshold event is logged, the logging is rate-limited.

With this patch on the same test laptop, no warnings are printed in the logs
as the max time the processor could bring the temperature under control is
only 280 ms.

This implementation is done with the inputs from Alan Cox and Tony Luck.

 [ bp: Touchups. ]

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: bberg@redhat.com
Cc: ckellner@redhat.com
Cc: hdegoede@redhat.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191111214312.81365-1-srinivas.pandruvada@linux.intel.com
---
 arch/x86/kernel/cpu/mce/therm_throt.c | 251 ++++++++++++++++++++++---
 1 file changed, 227 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index bc441d6..d01e0da 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -40,15 +40,58 @@
 #define THERMAL_THROTTLING_EVENT	0
 #define POWER_LIMIT_EVENT		1
 
-/*
- * Current thermal event state:
+/**
+ * struct _thermal_state - Represent the current thermal event state
+ * @next_check:			Stores the next timestamp, when it is allowed
+ *				to log the next warning message.
+ * @last_interrupt_time:	Stores the timestamp for the last threshold
+ *				high event.
+ * @therm_work:			Delayed workqueue structure
+ * @count:			Stores the current running count for thermal
+ *				or power threshold interrupts.
+ * @last_count:			Stores the previous running count for thermal
+ *				or power threshold interrupts.
+ * @max_time_ms:		This shows the maximum amount of time CPU was
+ *				in throttled state for a single thermal
+ *				threshold high to low state.
+ * @total_time_ms:		This is a cumulative time during which CPU was
+ *				in the throttled state.
+ * @rate_control_active:	Set when a throttling message is logged.
+ *				This is used for the purpose of rate-control.
+ * @new_event:			Stores the last high/low status of the
+ *				THERM_STATUS_PROCHOT or
+ *				THERM_STATUS_POWER_LIMIT.
+ * @level:			Stores whether this _thermal_state instance is
+ *				for a CORE level or for PACKAGE level.
+ * @sample_index:		Index for storing the next sample in the buffer
+ *				temp_samples[].
+ * @sample_count:		Total number of samples collected in the buffer
+ *				temp_samples[].
+ * @average:			The last moving average of temperature samples
+ * @baseline_temp:		Temperature at which thermal threshold high
+ *				interrupt was generated.
+ * @temp_samples:		Storage for temperature samples to calculate
+ *				moving average.
+ *
+ * This structure is used to represent data related to thermal state for a CPU.
+ * There is a separate storage for core and package level for each CPU.
  */
 struct _thermal_state {
-	bool			new_event;
-	int			event;
 	u64			next_check;
+	u64			last_interrupt_time;
+	struct delayed_work	therm_work;
 	unsigned long		count;
 	unsigned long		last_count;
+	unsigned long		max_time_ms;
+	unsigned long		total_time_ms;
+	bool			rate_control_active;
+	bool			new_event;
+	u8			level;
+	u8			sample_index;
+	u8			sample_count;
+	u8			average;
+	u8			baseline_temp;
+	u8			temp_samples[3];
 };
 
 struct thermal_state {
@@ -121,8 +164,22 @@ define_therm_throt_device_one_ro(package_throttle_count);
 define_therm_throt_device_show_func(package_power_limit, count);
 define_therm_throt_device_one_ro(package_power_limit_count);
 
+define_therm_throt_device_show_func(core_throttle, max_time_ms);
+define_therm_throt_device_one_ro(core_throttle_max_time_ms);
+
+define_therm_throt_device_show_func(package_throttle, max_time_ms);
+define_therm_throt_device_one_ro(package_throttle_max_time_ms);
+
+define_therm_throt_device_show_func(core_throttle, total_time_ms);
+define_therm_throt_device_one_ro(core_throttle_total_time_ms);
+
+define_therm_throt_device_show_func(package_throttle, total_time_ms);
+define_therm_throt_device_one_ro(package_throttle_total_time_ms);
+
 static struct attribute *thermal_throttle_attrs[] = {
 	&dev_attr_core_throttle_count.attr,
+	&dev_attr_core_throttle_max_time_ms.attr,
+	&dev_attr_core_throttle_total_time_ms.attr,
 	NULL
 };
 
@@ -135,6 +192,105 @@ static const struct attribute_group thermal_attr_group = {
 #define CORE_LEVEL	0
 #define PACKAGE_LEVEL	1
 
+#define THERM_THROT_POLL_INTERVAL	HZ
+#define THERM_STATUS_PROCHOT_LOG	BIT(1)
+
+static void clear_therm_status_log(int level)
+{
+	int msr;
+	u64 msr_val;
+
+	if (level == CORE_LEVEL)
+		msr = MSR_IA32_THERM_STATUS;
+	else
+		msr = MSR_IA32_PACKAGE_THERM_STATUS;
+
+	rdmsrl(msr, msr_val);
+	wrmsrl(msr, msr_val & ~THERM_STATUS_PROCHOT_LOG);
+}
+
+static void get_therm_status(int level, bool *proc_hot, u8 *temp)
+{
+	int msr;
+	u64 msr_val;
+
+	if (level == CORE_LEVEL)
+		msr = MSR_IA32_THERM_STATUS;
+	else
+		msr = MSR_IA32_PACKAGE_THERM_STATUS;
+
+	rdmsrl(msr, msr_val);
+	if (msr_val & THERM_STATUS_PROCHOT_LOG)
+		*proc_hot = true;
+	else
+		*proc_hot = false;
+
+	*temp = (msr_val >> 16) & 0x7F;
+}
+
+static void throttle_active_work(struct work_struct *work)
+{
+	struct _thermal_state *state = container_of(to_delayed_work(work),
+						struct _thermal_state, therm_work);
+	unsigned int i, avg, this_cpu = smp_processor_id();
+	u64 now = get_jiffies_64();
+	bool hot;
+	u8 temp;
+
+	get_therm_status(state->level, &hot, &temp);
+	/* temperature value is offset from the max so lesser means hotter */
+	if (!hot && temp > state->baseline_temp) {
+		if (state->rate_control_active)
+			pr_info("CPU%d: %s temperature/speed normal (total events = %lu)\n",
+				this_cpu,
+				state->level == CORE_LEVEL ? "Core" : "Package",
+				state->count);
+
+		state->rate_control_active = false;
+		return;
+	}
+
+	if (time_before64(now, state->next_check) &&
+			  state->rate_control_active)
+		goto re_arm;
+
+	state->next_check = now + CHECK_INTERVAL;
+
+	if (state->count != state->last_count) {
+		/* There was one new thermal interrupt */
+		state->last_count = state->count;
+		state->average = 0;
+		state->sample_count = 0;
+		state->sample_index = 0;
+	}
+
+	state->temp_samples[state->sample_index] = temp;
+	state->sample_count++;
+	state->sample_index = (state->sample_index + 1) % ARRAY_SIZE(state->temp_samples);
+	if (state->sample_count < ARRAY_SIZE(state->temp_samples))
+		goto re_arm;
+
+	avg = 0;
+	for (i = 0; i < ARRAY_SIZE(state->temp_samples); ++i)
+		avg += state->temp_samples[i];
+
+	avg /= ARRAY_SIZE(state->temp_samples);
+
+	if (state->average > avg) {
+		pr_warn("CPU%d: %s temperature is above threshold, cpu clock is throttled (total events = %lu)\n",
+			this_cpu,
+			state->level == CORE_LEVEL ? "Core" : "Package",
+			state->count);
+		state->rate_control_active = true;
+	}
+
+	state->average = avg;
+
+re_arm:
+	clear_therm_status_log(state->level);
+	schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL);
+}
+
 /***
  * therm_throt_process - Process thermal throttling event from interrupt
  * @curr: Whether the condition is current or not (boolean), since the
@@ -178,27 +334,33 @@ static void therm_throt_process(bool new_event, int event, int level)
 	if (new_event)
 		state->count++;
 
-	if (time_before64(now, state->next_check) &&
-			state->count != state->last_count)
+	if (event != THERMAL_THROTTLING_EVENT)
 		return;
 
-	state->next_check = now + CHECK_INTERVAL;
-	state->last_count = state->count;
+	if (new_event && !state->last_interrupt_time) {
+		bool hot;
+		u8 temp;
+
+		get_therm_status(state->level, &hot, &temp);
+		/*
+		 * Ignore short temperature spike as the system is not close
+		 * to PROCHOT. 10C offset is large enough to ignore. It is
+		 * already dropped from the high threshold temperature.
+		 */
+		if (temp > 10)
+			return;
 
-	/* if we just entered the thermal event */
-	if (new_event) {
-		if (event == THERMAL_THROTTLING_EVENT)
-			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
-				this_cpu,
-				level == CORE_LEVEL ? "Core" : "Package",
-				state->count);
-		return;
-	}
-	if (old_event) {
-		if (event == THERMAL_THROTTLING_EVENT)
-			pr_info("CPU%d: %s temperature/speed normal\n", this_cpu,
-				level == CORE_LEVEL ? "Core" : "Package");
-		return;
+		state->baseline_temp = temp;
+		state->last_interrupt_time = now;
+		schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL);
+	} else if (old_event && state->last_interrupt_time) {
+		unsigned long throttle_time;
+
+		throttle_time = jiffies_delta_to_msecs(now - state->last_interrupt_time);
+		if (throttle_time > state->max_time_ms)
+			state->max_time_ms = throttle_time;
+		state->total_time_ms += throttle_time;
+		state->last_interrupt_time = 0;
 	}
 }
 
@@ -244,20 +406,47 @@ static int thermal_throttle_add_dev(struct device *dev, unsigned int cpu)
 	if (err)
 		return err;
 
-	if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable)
+	if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) {
 		err = sysfs_add_file_to_group(&dev->kobj,
 					      &dev_attr_core_power_limit_count.attr,
 					      thermal_attr_group.name);
+		if (err)
+			goto del_group;
+	}
+
 	if (cpu_has(c, X86_FEATURE_PTS)) {
 		err = sysfs_add_file_to_group(&dev->kobj,
 					      &dev_attr_package_throttle_count.attr,
 					      thermal_attr_group.name);
-		if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable)
+		if (err)
+			goto del_group;
+
+		err = sysfs_add_file_to_group(&dev->kobj,
+					      &dev_attr_package_throttle_max_time_ms.attr,
+					      thermal_attr_group.name);
+		if (err)
+			goto del_group;
+
+		err = sysfs_add_file_to_group(&dev->kobj,
+					      &dev_attr_package_throttle_total_time_ms.attr,
+					      thermal_attr_group.name);
+		if (err)
+			goto del_group;
+
+		if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) {
 			err = sysfs_add_file_to_group(&dev->kobj,
 					&dev_attr_package_power_limit_count.attr,
 					thermal_attr_group.name);
+			if (err)
+				goto del_group;
+		}
 	}
 
+	return 0;
+
+del_group:
+	sysfs_remove_group(&dev->kobj, &thermal_attr_group);
+
 	return err;
 }
 
@@ -269,15 +458,29 @@ static void thermal_throttle_remove_dev(struct device *dev)
 /* Get notified when a cpu comes on/off. Be hotplug friendly. */
 static int thermal_throttle_online(unsigned int cpu)
 {
+	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
 
+	state->package_throttle.level = PACKAGE_LEVEL;
+	state->core_throttle.level = CORE_LEVEL;
+
+	INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work);
+	INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work);
+
 	return thermal_throttle_add_dev(dev, cpu);
 }
 
 static int thermal_throttle_offline(unsigned int cpu)
 {
+	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
 
+	cancel_delayed_work(&state->package_throttle.therm_work);
+	cancel_delayed_work(&state->core_throttle.therm_work);
+
+	state->package_throttle.rate_control_active = false;
+	state->core_throttle.rate_control_active = false;
+
 	thermal_throttle_remove_dev(dev);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86, mce, therm_throt: Optimize notifications of thermal throttle
  2019-11-11 21:43 [PATCH] x86, mce, therm_throt: Optimize notifications of thermal throttle Srinivas Pandruvada
  2019-11-12 15:00 ` [tip: ras/core] x86/mce/therm_throt: " tip-bot2 for Srinivas Pandruvada
@ 2020-02-08 22:24 ` Chris Wilson
  2020-02-09  6:09   ` Srinivas Pandruvada
                     ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: Chris Wilson @ 2020-02-08 22:24 UTC (permalink / raw)
  To: Srinivas Pandruvada, bberg, bp, hpa, mingo, tglx, tony.luck
  Cc: x86, linux-edac, linux-kernel, hdegoede, ckellner, Srinivas Pandruvada

Quoting Srinivas Pandruvada (2019-11-11 21:43:12)
> +static void throttle_active_work(struct work_struct *work)
> +{
> +       struct _thermal_state *state = container_of(to_delayed_work(work),
> +                                               struct _thermal_state, therm_work);
> +       unsigned int i, avg, this_cpu = smp_processor_id();
> +       u64 now = get_jiffies_64();
> +       bool hot;
> +       u8 temp;

<6> [198.901895] [IGT] perf_pmu: starting subtest cpu-hotplug
<4> [199.088851] IRQ 24: no longer affine to CPU0
<4> [199.088871] IRQ 25: no longer affine to CPU0
<6> [199.091679] smpboot: CPU 0 is now offline
<6> [200.122204] smpboot: Booting Node 0 Processor 0 APIC 0x0
<6> [200.297267] smpboot: CPU 1 is now offline
<3> [201.218812] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17
<4> [201.218974] caller is throttle_active_work+0x12/0x280
<4> [201.218985] CPU: 0 PID: 17 Comm: kworker/1:0 Tainted: G     U            5.5.0-CI-CI_DRM_7867+ #1
<4> [201.218991] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4> [201.219001] Workqueue: events throttle_active_work
<4> [201.219009] Call Trace:
<4> [201.219021]  dump_stack+0x71/0x9b
<4> [201.219035]  debug_smp_processor_id+0xad/0xb0
<4> [201.219047]  throttle_active_work+0x12/0x280
<4> [201.219063]  process_one_work+0x26a/0x620
<4> [201.219087]  worker_thread+0x37/0x380
<4> [201.219103]  ? process_one_work+0x620/0x620
<4> [201.219110]  kthread+0x119/0x130
<4> [201.219119]  ? kthread_park+0x80/0x80
<4> [201.219134]  ret_from_fork+0x3a/0x50
<6> [201.315866] x86: Booting SMP configuration:
<6> [201.315880] smpboot: Booting Node 0 Processor 1 APIC 0x2
<4> [201.319814] ------------[ cut here ]------------
<3> [201.319832] ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
<4> [201.319971] WARNING: CPU: 1 PID: 14 at lib/debugobjects.c:484 debug_print_object+0x67/0x90
<4> [201.319977] Modules linked in: vgem snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_codec_realtek crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_intel_dspcfg snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_pcm mei_me r8169 mei realtek lpc_ich prime_numbers
<4> [201.320023] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted: G     U            5.5.0-CI-CI_DRM_7867+ #1
<4> [201.320029] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4> [201.320038] RIP: 0010:debug_print_object+0x67/0x90
<4> [201.320046] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 17 f7 8b 02 8b 53 10 4c 89 e6 48 c7 c7 b0 ce 31 82 48 8b 14 d5 00 37 07 82 e8 89 7b b8 ff <0f> 0b 5b 83 05 33 fb 21 01 01 5d 41 5c c3 83 05 28 fb 21 01 01 c3
<4> [201.320053] RSP: 0000:ffffc900000dbd40 EFLAGS: 00010286
<4> [201.320060] RAX: 0000000000000000 RBX: ffff888408665d68 RCX: 0000000000000001
<4> [201.320066] RDX: 0000000080000001 RSI: ffff88840d6e30f8 RDI: 00000000ffffffff
<4> [201.320072] RBP: ffffffff826489e0 R08: ffff88840d6e30f8 R09: 0000000000000000
<4> [201.320078] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff822d7bd1
<4> [201.320084] R13: ffffffff826489e0 R14: ffff88840f898300 R15: 0000000000000202
<4> [201.320091] FS:  0000000000000000(0000) GS:ffff88840f880000(0000) knlGS:0000000000000000
<4> [201.320098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [201.320104] CR2: 0000000000000000 CR3: 0000000005610001 CR4: 00000000001606e0
<4> [201.320109] Call Trace:
<4> [201.320125]  __debug_object_init+0x359/0x510
<4> [201.320140]  ? _raw_spin_unlock_irqrestore+0x34/0x60
<4> [201.320156]  ? queue_work_node+0x70/0x70
<4> [201.320165]  init_timer_key+0x25/0x140
<4> [201.320180]  ? intel_thermal_supported+0x30/0x30
<4> [201.320191]  thermal_throttle_online+0xb4/0x260
<4> [201.320204]  ? unexpected_thermal_interrupt+0x20/0x20
<4> [201.320213]  cpuhp_invoke_callback+0x9b/0x9d0
<4> [201.320235]  cpuhp_thread_fun+0x1c8/0x220
<4> [201.320249]  ? smpboot_thread_fn+0x23/0x280
<4> [201.320259]  ? smpboot_thread_fn+0x6b/0x280
<4> [201.320271]  smpboot_thread_fn+0x1d3/0x280
<4> [201.320288]  ? sort_range+0x20/0x20
<4> [201.320295]  kthread+0x119/0x130
<4> [201.320303]  ? kthread_park+0x80/0x80
<4> [201.320317]  ret_from_fork+0x3a/0x50
<4> [201.320348] irq event stamp: 4846
<4> [201.320358] hardirqs last  enabled at (4845): [<ffffffff8112dcca>] console_unlock+0x4ba/0x5a0
<4> [201.320368] hardirqs last disabled at (4846): [<ffffffff81001ca0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [201.320379] softirqs last  enabled at (4746): [<ffffffff81e00385>] __do_softirq+0x385/0x47f
<4> [201.320388] softirqs last disabled at (4739): [<ffffffff810ba15a>] irq_exit+0xba/0xc0
<4> [201.320394] ---[ end trace 06576bf31ad2ac2b ]---

Are we otherwise relying on current->nr_cpus_allowed == 1 here?
(As this section is not within a preempt_disable or local_irq_disable
region.)
-Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86, mce, therm_throt: Optimize notifications of thermal throttle
  2020-02-08 22:24 ` [PATCH] x86, mce, therm_throt: " Chris Wilson
@ 2020-02-09  6:09   ` Srinivas Pandruvada
  2020-02-10 15:16     ` Srinivas Pandruvada
  2020-02-25 15:16   ` [tip: ras/urgent] x86/mce/therm_throt: Undo thermal polling properly on CPU offline tip-bot2 for Thomas Gleixner
  2020-02-25 20:36   ` tip-bot2 for Thomas Gleixner
  2 siblings, 1 reply; 7+ messages in thread
From: Srinivas Pandruvada @ 2020-02-09  6:09 UTC (permalink / raw)
  To: Chris Wilson, bberg, bp, hpa, mingo, tglx, tony.luck
  Cc: x86, linux-edac, linux-kernel, hdegoede, ckellner

On Sat, 2020-02-08 at 22:24 +0000, Chris Wilson wrote:
> Quoting Srinivas Pandruvada (2019-11-11 21:43:12)
> > +static void throttle_active_work(struct work_struct *work)
> > +{
> > +       struct _thermal_state *state =
> > container_of(to_delayed_work(work),
> > +                                               struct
> > _thermal_state, therm_work);
> > +       unsigned int i, avg, this_cpu = smp_processor_id();
> > +       u64 now = get_jiffies_64();
> > +       bool hot;
> > +       u8 temp;
> 
> <6> [198.901895] [IGT] perf_pmu: starting subtest cpu-hotplug
> <4> [199.088851] IRQ 24: no longer affine to CPU0
> <4> [199.088871] IRQ 25: no longer affine to CPU0
> <6> [199.091679] smpboot: CPU 0 is now offline
> <6> [200.122204] smpboot: Booting Node 0 Processor 0 APIC 0x0
> <6> [200.297267] smpboot: CPU 1 is now offline
> <3> [201.218812] BUG: using smp_processor_id() in preemptible
> [00000000] code: kworker/1:0/17
> <4> [201.218974] caller is throttle_active_work+0x12/0x280
> <4> [201.218985] CPU: 0 PID: 17 Comm: kworker/1:0 Tainted:
> G     U            5.5.0-CI-CI_DRM_7867+ #1
> <4> [201.218991] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS
> V1.12 02/15/2016
> <4> [201.219001] Workqueue: events throttle_active_work
> <4> [201.219009] Call Trace:
> <4> [201.219021]  dump_stack+0x71/0x9b
> <4> [201.219035]  debug_smp_processor_id+0xad/0xb0
> <4> [201.219047]  throttle_active_work+0x12/0x280
> <4> [201.219063]  process_one_work+0x26a/0x620
> <4> [201.219087]  worker_thread+0x37/0x380
> <4> [201.219103]  ? process_one_work+0x620/0x620
> <4> [201.219110]  kthread+0x119/0x130
> <4> [201.219119]  ? kthread_park+0x80/0x80
> <4> [201.219134]  ret_from_fork+0x3a/0x50
> <6> [201.315866] x86: Booting SMP configuration:
> <6> [201.315880] smpboot: Booting Node 0 Processor 1 APIC 0x2
> <4> [201.319814] ------------[ cut here ]------------
> <3> [201.319832] ODEBUG: init active (active state 0) object type:
> timer_list hint: delayed_work_timer_fn+0x0/0x10
> <4> [201.319971] WARNING: CPU: 1 PID: 14 at lib/debugobjects.c:484
> debug_print_object+0x67/0x90
> <4> [201.319977] Modules linked in: vgem snd_hda_codec_hdmi i915
> mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_codec_realtek
> crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel
> snd_intel_dspcfg snd_hda_codec ghash_clmulni_intel snd_hwdep
> snd_hda_core snd_pcm mei_me r8169 mei realtek lpc_ich prime_numbers
> <4> [201.320023] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted:
> G     U            5.5.0-CI-CI_DRM_7867+ #1
> <4> [201.320029] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS
> V1.12 02/15/2016
> <4> [201.320038] RIP: 0010:debug_print_object+0x67/0x90
> <4> [201.320046] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 17 f7 8b
> 02 8b 53 10 4c 89 e6 48 c7 c7 b0 ce 31 82 48 8b 14 d5 00 37 07 82 e8
> 89 7b b8 ff <0f> 0b 5b 83 05 33 fb 21 01 01 5d 41 5c c3 83 05 28 fb
> 21 01 01 c3
> <4> [201.320053] RSP: 0000:ffffc900000dbd40 EFLAGS: 00010286
> <4> [201.320060] RAX: 0000000000000000 RBX: ffff888408665d68 RCX:
> 0000000000000001
> <4> [201.320066] RDX: 0000000080000001 RSI: ffff88840d6e30f8 RDI:
> 00000000ffffffff
> <4> [201.320072] RBP: ffffffff826489e0 R08: ffff88840d6e30f8 R09:
> 0000000000000000
> <4> [201.320078] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffff822d7bd1
> <4> [201.320084] R13: ffffffff826489e0 R14: ffff88840f898300 R15:
> 0000000000000202
> <4> [201.320091] FS:  0000000000000000(0000)
> GS:ffff88840f880000(0000) knlGS:0000000000000000
> <4> [201.320098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [201.320104] CR2: 0000000000000000 CR3: 0000000005610001 CR4:
> 00000000001606e0
> <4> [201.320109] Call Trace:
> <4> [201.320125]  __debug_object_init+0x359/0x510
> <4> [201.320140]  ? _raw_spin_unlock_irqrestore+0x34/0x60
> <4> [201.320156]  ? queue_work_node+0x70/0x70
> <4> [201.320165]  init_timer_key+0x25/0x140
> <4> [201.320180]  ? intel_thermal_supported+0x30/0x30
> <4> [201.320191]  thermal_throttle_online+0xb4/0x260
> <4> [201.320204]  ? unexpected_thermal_interrupt+0x20/0x20
> <4> [201.320213]  cpuhp_invoke_callback+0x9b/0x9d0
> <4> [201.320235]  cpuhp_thread_fun+0x1c8/0x220
> <4> [201.320249]  ? smpboot_thread_fn+0x23/0x280
> <4> [201.320259]  ? smpboot_thread_fn+0x6b/0x280
> <4> [201.320271]  smpboot_thread_fn+0x1d3/0x280
> <4> [201.320288]  ? sort_range+0x20/0x20
> <4> [201.320295]  kthread+0x119/0x130
> <4> [201.320303]  ? kthread_park+0x80/0x80
> <4> [201.320317]  ret_from_fork+0x3a/0x50
> <4> [201.320348] irq event stamp: 4846
> <4> [201.320358] hardirqs last  enabled at (4845):
> [<ffffffff8112dcca>] console_unlock+0x4ba/0x5a0
> <4> [201.320368] hardirqs last disabled at (4846):
> [<ffffffff81001ca0>] trace_hardirqs_off_thunk+0x1a/0x1c
> <4> [201.320379] softirqs last  enabled at (4746):
> [<ffffffff81e00385>] __do_softirq+0x385/0x47f
> <4> [201.320388] softirqs last disabled at (4739):
> [<ffffffff810ba15a>] irq_exit+0xba/0xc0
> <4> [201.320394] ---[ end trace 06576bf31ad2ac2b ]---
> 
> Are we otherwise relying on current->nr_cpus_allowed == 1 here?
No.
I am checking internally, if I can use raw_smp_processor_id() instead.

Thanks,
Srinivas

> (As this section is not within a preempt_disable or local_irq_disable
> region.)
> -Chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86, mce, therm_throt: Optimize notifications of thermal throttle
  2020-02-09  6:09   ` Srinivas Pandruvada
@ 2020-02-10 15:16     ` Srinivas Pandruvada
  0 siblings, 0 replies; 7+ messages in thread
From: Srinivas Pandruvada @ 2020-02-10 15:16 UTC (permalink / raw)
  To: Chris Wilson, bberg, bp, hpa, mingo, tglx, tony.luck
  Cc: x86, linux-edac, linux-kernel, hdegoede, ckellner

On Sat, 2020-02-08 at 22:09 -0800, Srinivas Pandruvada wrote:
> On Sat, 2020-02-08 at 22:24 +0000, Chris Wilson wrote:
> > Quoting Srinivas Pandruvada (2019-11-11 21:43:12)
> > > +static void throttle_active_work(struct work_struct *work)
> > > +{
> > > +       struct _thermal_state *state =
> > > container_of(to_delayed_work(work),
> > > +                                               struct
> > > _thermal_state, therm_work);
> > > +       unsigned int i, avg, this_cpu = smp_processor_id();
> > > +       u64 now = get_jiffies_64();
> > > +       bool hot;
> > > +       u8 temp;
> > 
> > <6> [198.901895] [IGT] perf_pmu: starting subtest cpu-hotplug
> > <4> [199.088851] IRQ 24: no longer affine to CPU0
> > <4> [199.088871] IRQ 25: no longer affine to CPU0
> > <6> [199.091679] smpboot: CPU 0 is now offline
> > <6> [200.122204] smpboot: Booting Node 0 Processor 0 APIC 0x0
> > <6> [200.297267] smpboot: CPU 1 is now offline
> > <3> [201.218812] BUG: using smp_processor_id() in preemptible
> > [00000000] code: kworker/1:0/17
> > <4> [201.218974] caller is throttle_active_work+0x12/0x280
> > <4> [201.218985] CPU: 0 PID: 17 Comm: kworker/1:0 Tainted:
> > G     U            5.5.0-CI-CI_DRM_7867+ #1
> > <4> [201.218991] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS
> > V1.12 02/15/2016
> > <4> [201.219001] Workqueue: events throttle_active_work
> > <4> [201.219009] Call Trace:
> > <4> [201.219021]  dump_stack+0x71/0x9b
> > <4> [201.219035]  debug_smp_processor_id+0xad/0xb0
> > <4> [201.219047]  throttle_active_work+0x12/0x280
> > <4> [201.219063]  process_one_work+0x26a/0x620
> > <4> [201.219087]  worker_thread+0x37/0x380
> > <4> [201.219103]  ? process_one_work+0x620/0x620
> > <4> [201.219110]  kthread+0x119/0x130
> > <4> [201.219119]  ? kthread_park+0x80/0x80
> > <4> [201.219134]  ret_from_fork+0x3a/0x50
> > <6> [201.315866] x86: Booting SMP configuration:
> > <6> [201.315880] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > <4> [201.319814] ------------[ cut here ]------------
> > <3> [201.319832] ODEBUG: init active (active state 0) object type:
> > timer_list hint: delayed_work_timer_fn+0x0/0x10
> > <4> [201.319971] WARNING: CPU: 1 PID: 14 at lib/debugobjects.c:484
> > debug_print_object+0x67/0x90
> > <4> [201.319977] Modules linked in: vgem snd_hda_codec_hdmi i915
> > mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_codec_realtek
> > crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel
> > snd_intel_dspcfg snd_hda_codec ghash_clmulni_intel snd_hwdep
> > snd_hda_core snd_pcm mei_me r8169 mei realtek lpc_ich prime_numbers
> > <4> [201.320023] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted:
> > G     U            5.5.0-CI-CI_DRM_7867+ #1
> > <4> [201.320029] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS
> > V1.12 02/15/2016
> > <4> [201.320038] RIP: 0010:debug_print_object+0x67/0x90
> > <4> [201.320046] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 17 f7 8b
> > 02 8b 53 10 4c 89 e6 48 c7 c7 b0 ce 31 82 48 8b 14 d5 00 37 07 82
> > e8
> > 89 7b b8 ff <0f> 0b 5b 83 05 33 fb 21 01 01 5d 41 5c c3 83 05 28 fb
> > 21 01 01 c3
> > <4> [201.320053] RSP: 0000:ffffc900000dbd40 EFLAGS: 00010286
> > <4> [201.320060] RAX: 0000000000000000 RBX: ffff888408665d68 RCX:
> > 0000000000000001
> > <4> [201.320066] RDX: 0000000080000001 RSI: ffff88840d6e30f8 RDI:
> > 00000000ffffffff
> > <4> [201.320072] RBP: ffffffff826489e0 R08: ffff88840d6e30f8 R09:
> > 0000000000000000
> > <4> [201.320078] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffff822d7bd1
> > <4> [201.320084] R13: ffffffff826489e0 R14: ffff88840f898300 R15:
> > 0000000000000202
> > <4> [201.320091] FS:  0000000000000000(0000)
> > GS:ffff88840f880000(0000) knlGS:0000000000000000
> > <4> [201.320098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4> [201.320104] CR2: 0000000000000000 CR3: 0000000005610001 CR4:
> > 00000000001606e0
> > <4> [201.320109] Call Trace:
> > <4> [201.320125]  __debug_object_init+0x359/0x510
> > <4> [201.320140]  ? _raw_spin_unlock_irqrestore+0x34/0x60
> > <4> [201.320156]  ? queue_work_node+0x70/0x70
> > <4> [201.320165]  init_timer_key+0x25/0x140
> > <4> [201.320180]  ? intel_thermal_supported+0x30/0x30
> > <4> [201.320191]  thermal_throttle_online+0xb4/0x260
> > <4> [201.320204]  ? unexpected_thermal_interrupt+0x20/0x20
> > <4> [201.320213]  cpuhp_invoke_callback+0x9b/0x9d0
> > <4> [201.320235]  cpuhp_thread_fun+0x1c8/0x220
> > <4> [201.320249]  ? smpboot_thread_fn+0x23/0x280
> > <4> [201.320259]  ? smpboot_thread_fn+0x6b/0x280
> > <4> [201.320271]  smpboot_thread_fn+0x1d3/0x280
> > <4> [201.320288]  ? sort_range+0x20/0x20
> > <4> [201.320295]  kthread+0x119/0x130
> > <4> [201.320303]  ? kthread_park+0x80/0x80
> > <4> [201.320317]  ret_from_fork+0x3a/0x50
> > <4> [201.320348] irq event stamp: 4846
> > <4> [201.320358] hardirqs last  enabled at (4845):
> > [<ffffffff8112dcca>] console_unlock+0x4ba/0x5a0
> > <4> [201.320368] hardirqs last disabled at (4846):
> > [<ffffffff81001ca0>] trace_hardirqs_off_thunk+0x1a/0x1c
> > <4> [201.320379] softirqs last  enabled at (4746):
> > [<ffffffff81e00385>] __do_softirq+0x385/0x47f
> > <4> [201.320388] softirqs last disabled at (4739):
> > [<ffffffff810ba15a>] irq_exit+0xba/0xc0
> > <4> [201.320394] ---[ end trace 06576bf31ad2ac2b ]---
> > 
> > Are we otherwise relying on current->nr_cpus_allowed == 1 here?
> No.
> I am checking internally, if I can use raw_smp_processor_id()
> instead.
Let me correct my answer.
Here the call is from a workqueue callback which is scheduled to
execute on a specific CPU using schedule_delayed_work_on().
Meanwhile if the CPU is offline or dead, not sure if the thread can
execute on another CPU.

Thanks,
Srinivas




> 
> Thanks,
> Srinivas
> 
> > (As this section is not within a preempt_disable or
> > local_irq_disable
> > region.)
> > -Chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: ras/urgent] x86/mce/therm_throt: Undo thermal polling properly on CPU offline
  2020-02-08 22:24 ` [PATCH] x86, mce, therm_throt: " Chris Wilson
  2020-02-09  6:09   ` Srinivas Pandruvada
@ 2020-02-25 15:16   ` tip-bot2 for Thomas Gleixner
  2020-02-25 20:36   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-02-25 15:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chris Wilson, Pandruvada, Srinivas, Thomas Gleixner,
	Borislav Petkov, x86, LKML

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     bf69bac4d3ef188ee8cf536a31ecbb230a8ba91a
Gitweb:        https://git.kernel.org/tip/bf69bac4d3ef188ee8cf536a31ecbb230a8ba91a
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 25 Feb 2020 14:55:15 +01:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 25 Feb 2020 15:05:56 +01:00

x86/mce/therm_throt: Undo thermal polling properly on CPU offline

Chris Wilson reported splats from running the thermal throttling
workqueue callback on offlined CPUs. The problem is that that callback
should not even run on offlined CPUs but it happens nevertheless because
the offlining callback thermal_throttle_offline() does not symmetrically
undo the setup work done in its onlining counterpart. IOW,

 1. The thermal interrupt vector should be masked out before ...

 2. ... cancelling any pending work synchronously so that no new work is
 enqueued anymore.

Do those things and fix the issue properly.

 [ bp: Write commit message. ]

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/158120068234.18291.7938335950259651295@skylake-alporthouse-com
---
 arch/x86/kernel/cpu/mce/therm_throt.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index 58b4ee3..5d8971c 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsigned int cpu)
 	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
 
-	cancel_delayed_work(&state->package_throttle.therm_work);
-	cancel_delayed_work(&state->core_throttle.therm_work);
+	/* Mask the thermal vector before draining evtl. pending work */
+	l = apic_read(APIC_LVTTHMR);
+	apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED);
+
+	cancel_delayed_work_sync(&state->package_throttle.therm_work);
+	cancel_delayed_work_sync(&state->core_throttle.therm_work);
 
 	state->package_throttle.rate_control_active = false;
 	state->core_throttle.rate_control_active = false;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip: ras/urgent] x86/mce/therm_throt: Undo thermal polling properly on CPU offline
  2020-02-08 22:24 ` [PATCH] x86, mce, therm_throt: " Chris Wilson
  2020-02-09  6:09   ` Srinivas Pandruvada
  2020-02-25 15:16   ` [tip: ras/urgent] x86/mce/therm_throt: Undo thermal polling properly on CPU offline tip-bot2 for Thomas Gleixner
@ 2020-02-25 20:36   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-02-25 20:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chris Wilson, Pandruvada, Srinivas, Thomas Gleixner,
	Borislav Petkov, x86, LKML

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     d364847eed890211444ad74496bb549f838c6018
Gitweb:        https://git.kernel.org/tip/d364847eed890211444ad74496bb549f838c6018
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 25 Feb 2020 14:55:15 +01:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 25 Feb 2020 21:21:44 +01:00

x86/mce/therm_throt: Undo thermal polling properly on CPU offline

Chris Wilson reported splats from running the thermal throttling
workqueue callback on offlined CPUs. The problem is that that callback
should not even run on offlined CPUs but it happens nevertheless because
the offlining callback thermal_throttle_offline() does not symmetrically
undo the setup work done in its onlining counterpart. IOW,

 1. The thermal interrupt vector should be masked out before ...

 2. ... cancelling any pending work synchronously so that no new work is
 enqueued anymore.

Do those things and fix the issue properly.

 [ bp: Write commit message. ]

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/158120068234.18291.7938335950259651295@skylake-alporthouse-com
---
 arch/x86/kernel/cpu/mce/therm_throt.c |  9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index 58b4ee3..f36dc07 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -486,9 +486,14 @@ static int thermal_throttle_offline(unsigned int cpu)
 {
 	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
+	u32 l;
+
+	/* Mask the thermal vector before draining evtl. pending work */
+	l = apic_read(APIC_LVTTHMR);
+	apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED);
 
-	cancel_delayed_work(&state->package_throttle.therm_work);
-	cancel_delayed_work(&state->core_throttle.therm_work);
+	cancel_delayed_work_sync(&state->package_throttle.therm_work);
+	cancel_delayed_work_sync(&state->core_throttle.therm_work);
 
 	state->package_throttle.rate_control_active = false;
 	state->core_throttle.rate_control_active = false;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-02-25 20:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11 21:43 [PATCH] x86, mce, therm_throt: Optimize notifications of thermal throttle Srinivas Pandruvada
2019-11-12 15:00 ` [tip: ras/core] x86/mce/therm_throt: " tip-bot2 for Srinivas Pandruvada
2020-02-08 22:24 ` [PATCH] x86, mce, therm_throt: " Chris Wilson
2020-02-09  6:09   ` Srinivas Pandruvada
2020-02-10 15:16     ` Srinivas Pandruvada
2020-02-25 15:16   ` [tip: ras/urgent] x86/mce/therm_throt: Undo thermal polling properly on CPU offline tip-bot2 for Thomas Gleixner
2020-02-25 20:36   ` tip-bot2 for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).