LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work
@ 2020-01-07  0:41 Chuansheng Liu
  2020-01-10 18:29 ` Luck, Tony
  2020-01-15 11:37 ` [tip: ras/urgent] x86/mce/therm_throt: Do not access " tip-bot2 for Chuansheng Liu
  0 siblings, 2 replies; 5+ messages in thread
From: Chuansheng Liu @ 2020-01-07  0:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: tony.luck, bp, tglx, mingo, hpa, chuansheng.liu

In ICL platform, it is easy to hit bootup failure with panic
in thermal interrupt handler during early bootup stage.

Such issue makes my platform almost can not boot up with
latest kernel code.

The call stack is like:
kernel BUG at kernel/timer/timer.c:1152!

Call Trace:
__queue_delayed_work
queue_delayed_work_on
therm_throt_process
intel_thermal_interrupt
...

When one CPU is up, the irq is enabled prior to CPU UP
notification which will then initialize therm_worker.
Such race will cause the posssibility that interrupt
handler therm_throt_process() accesses uninitialized
therm_work, then system hit panic at very early bootup
stage.

In my ICL platform, it can be reproduced in several times
of reboot stress. With this fix, the system keeps alive
for more than 200 times of reboot stress.

V2: Boris shares a good suggestion that we can moving the
interrupt unmasking at the end of therm_work initialization.

Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
---
 arch/x86/kernel/cpu/mce/therm_throt.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index b38010b541d6..528b85664b46 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -467,6 +467,7 @@ static int thermal_throttle_online(unsigned int cpu)
 {
 	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
+	u32 l;
 
 	state->package_throttle.level = PACKAGE_LEVEL;
 	state->core_throttle.level = CORE_LEVEL;
@@ -474,6 +475,12 @@ static int thermal_throttle_online(unsigned int cpu)
 	INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work);
 	INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work);
 
+	/* Unmask the thermal vector after
+	 * therm_works are initialized.
+	 */
+	l = apic_read(APIC_LVTTHMR);
+	apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
+
 	return thermal_throttle_add_dev(dev, cpu);
 }
 
@@ -722,10 +729,6 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 	wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);
 
-	/* Unmask the thermal vector: */
-	l = apic_read(APIC_LVTTHMR);
-	apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
-
 	pr_info_once("CPU0: Thermal monitoring enabled (%s)\n",
 		      tm2 ? "TM2" : "TM1");
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work
  2020-01-07  0:41 [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work Chuansheng Liu
@ 2020-01-10 18:29 ` Luck, Tony
  2020-01-13  9:05   ` Borislav Petkov
  2020-01-15 11:37 ` [tip: ras/urgent] x86/mce/therm_throt: Do not access " tip-bot2 for Chuansheng Liu
  1 sibling, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2020-01-10 18:29 UTC (permalink / raw)
  To: Chuansheng Liu; +Cc: linux-kernel, bp, tglx, mingo, hpa

On Tue, Jan 07, 2020 at 12:41:16AM +0000, Chuansheng Liu wrote:
> In my ICL platform, it can be reproduced in several times
> of reboot stress. With this fix, the system keeps alive
> for more than 200 times of reboot stress.
> 
> V2: Boris shares a good suggestion that we can moving the
> interrupt unmasking at the end of therm_work initialization.
> 
> Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>

Looks good to me:

Acked-by: Tony Luck <tony.luck@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work
  2020-01-10 18:29 ` Luck, Tony
@ 2020-01-13  9:05   ` Borislav Petkov
  2020-01-14  2:19     ` Liu, Chuansheng
  0 siblings, 1 reply; 5+ messages in thread
From: Borislav Petkov @ 2020-01-13  9:05 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Chuansheng Liu, linux-kernel, tglx, mingo, hpa

On Fri, Jan 10, 2020 at 10:29:29AM -0800, Luck, Tony wrote:
> On Tue, Jan 07, 2020 at 12:41:16AM +0000, Chuansheng Liu wrote:
> > In my ICL platform, it can be reproduced in several times
> > of reboot stress. With this fix, the system keeps alive
> > for more than 200 times of reboot stress.
> > 
> > V2: Boris shares a good suggestion that we can moving the
> > interrupt unmasking at the end of therm_work initialization.
> > 
> > Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
> 
> Looks good to me:
> 
> Acked-by: Tony Luck <tony.luck@intel.com>

Thx.

This "ICL platform" - whatever that is - is this shipping already so
that this qualifies for stable@ or can it go the normal path?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work
  2020-01-13  9:05   ` Borislav Petkov
@ 2020-01-14  2:19     ` Liu, Chuansheng
  0 siblings, 0 replies; 5+ messages in thread
From: Liu, Chuansheng @ 2020-01-14  2:19 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony; +Cc: linux-kernel, tglx, mingo, hpa



> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Monday, January 13, 2020 5:05 PM
> To: Luck, Tony <tony.luck@intel.com>
> Cc: Liu, Chuansheng <chuansheng.liu@intel.com>; linux-
> kernel@vger.kernel.org; tglx@linutronix.de; mingo@redhat.com;
> hpa@zytor.com
> Subject: Re: [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized
> therm_work
> 
> On Fri, Jan 10, 2020 at 10:29:29AM -0800, Luck, Tony wrote:
> > On Tue, Jan 07, 2020 at 12:41:16AM +0000, Chuansheng Liu wrote:
> > > In my ICL platform, it can be reproduced in several times
> > > of reboot stress. With this fix, the system keeps alive
> > > for more than 200 times of reboot stress.
> > >
> > > V2: Boris shares a good suggestion that we can moving the
> > > interrupt unmasking at the end of therm_work initialization.
> > >
> > > Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
> >
> > Looks good to me:
> >
> > Acked-by: Tony Luck <tony.luck@intel.com>
> 
> Thx.
> 
> This "ICL platform" - whatever that is - is this shipping already so
I just can say ICL(icelake) is shipped platform, I reproduced this issue
in one laptop.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: ras/urgent] x86/mce/therm_throt: Do not access uninitialized therm_work
  2020-01-07  0:41 [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work Chuansheng Liu
  2020-01-10 18:29 ` Luck, Tony
@ 2020-01-15 11:37 ` " tip-bot2 for Chuansheng Liu
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Chuansheng Liu @ 2020-01-15 11:37 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Chuansheng Liu, Borislav Petkov, Tony Luck, x86, LKML

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     978370956d2046b19313659ce65ed12d5b996626
Gitweb:        https://git.kernel.org/tip/978370956d2046b19313659ce65ed12d5b996626
Author:        Chuansheng Liu <chuansheng.liu@intel.com>
AuthorDate:    Tue, 07 Jan 2020 00:41:16 
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Wed, 15 Jan 2020 11:31:33 +01:00

x86/mce/therm_throt: Do not access uninitialized therm_work

It is relatively easy to trigger the following boot splat on an Ice Lake
client platform. The call stack is like:

  kernel BUG at kernel/timer/timer.c:1152!

  Call Trace:
  __queue_delayed_work
  queue_delayed_work_on
  therm_throt_process
  intel_thermal_interrupt
  ...

The reason is that a CPU's thermal interrupt is enabled prior to
executing its hotplug onlining callback which will initialize the
throttling workqueues.

Such a race can lead to therm_throt_process() accessing an uninitialized
therm_work, leading to the above BUG at a very early bootup stage.

Therefore, unmask the thermal interrupt vector only after having setup
the workqueues completely.

 [ bp: Heavily massage commit message and correct comment formatting. ]

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200107004116.59353-1-chuansheng.liu@intel.com
---
 arch/x86/kernel/cpu/mce/therm_throt.c |  9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index b38010b..6c3e1c9 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -467,6 +467,7 @@ static int thermal_throttle_online(unsigned int cpu)
 {
 	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
+	u32 l;
 
 	state->package_throttle.level = PACKAGE_LEVEL;
 	state->core_throttle.level = CORE_LEVEL;
@@ -474,6 +475,10 @@ static int thermal_throttle_online(unsigned int cpu)
 	INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work);
 	INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work);
 
+	/* Unmask the thermal vector after the above workqueues are initialized. */
+	l = apic_read(APIC_LVTTHMR);
+	apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
+
 	return thermal_throttle_add_dev(dev, cpu);
 }
 
@@ -722,10 +727,6 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 	wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);
 
-	/* Unmask the thermal vector: */
-	l = apic_read(APIC_LVTTHMR);
-	apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
-
 	pr_info_once("CPU0: Thermal monitoring enabled (%s)\n",
 		      tm2 ? "TM2" : "TM1");
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07  0:41 [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work Chuansheng Liu
2020-01-10 18:29 ` Luck, Tony
2020-01-13  9:05   ` Borislav Petkov
2020-01-14  2:19     ` Liu, Chuansheng
2020-01-15 11:37 ` [tip: ras/urgent] x86/mce/therm_throt: Do not access " tip-bot2 for Chuansheng Liu

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git