linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powercap/intel_rapl: refine RAPL error handling to respect initial CPU matching
@ 2019-12-30 14:36 Harry Pan
  2020-01-07 11:26 ` Rafael J. Wysocki
  0 siblings, 1 reply; 2+ messages in thread
From: Harry Pan @ 2019-12-30 14:36 UTC (permalink / raw)
  To: LKML; +Cc: gs0622, Harry Pan, stable, Rafael J. Wysocki, linux-pm

RAPL MMIO support depends on RAPL common driver, in case a new generation
of CPU is booting but not in the RAPL support list, the processor_thermal
driver invokes CPU hotplug API to enforce RAPL common driver adding new
RAPL domain which would cause kernel crash by null pointer dereference
because the internal RAPL domain resource mapping is not initialized after
the common init.

Add error handling to detect non initialized RAPL domain resource mapping
and return error code to the caller; such that, it avoids early crash for
new CPU and leave error messages through processor_thermal driver.

Before:
[    4.188566] BUG: kernel NULL pointer dereference, address: 0000000000000020
...snip...
[    4.189555] RIP: 0010:rapl_add_package+0x223/0x574
[    4.189555] Code: b5 a0 31 c0 49 8b 4d 78 48 01 d9 48 8b 0c c1 49 89 4c c6 10 48 ff c0 48 83 f8 05 75 e7 49 83 ff 03 75 15 48 8b 05 09 bc 18 01 <8b> 70 20 41 89 b6 0c 05 00 00 85 f6 75 1a 49 81 c6 18 9
[    4.189555] RSP: 0000:ffffb3adc00b3d90 EFLAGS: 00010246
[    4.189555] RAX: 0000000000000000 RBX: 0000000000000098 RCX: 0000000000000000
[    4.267161] usb 1-1: New USB device found, idVendor=2109, idProduct=2812, bcdDevice= b.e0
[    4.189555] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff9340caafd000
[    4.189555] RBP: ffffb3adc00b3df8 R08: ffffffffa0246e28 R09: ffff9340caafc000
[    4.189555] R10: 000000000000024a R11: ffffffff9ff1f6f2 R12: 00000000ffffffed
[    4.189555] R13: ffff9340caa94800 R14: ffff9340caafc518 R15: 0000000000000003
[    4.189555] FS:  0000000000000000(0000) GS:ffff9340ce200000(0000) knlGS:0000000000000000
[    4.189555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.189555] CR2: 0000000000000020 CR3: 0000000302c14001 CR4: 00000000003606f0
[    4.189555] Call Trace:
[    4.189555]  ? __switch_to_asm+0x40/0x70
[    4.189555]  rapl_mmio_cpu_online+0x47/0x64
[    4.189555]  ? rapl_mmio_write_raw+0x33/0x33
[    4.281059] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    4.189555]  cpuhp_invoke_callback+0x29f/0x66f
[    4.189555]  ? __schedule+0x46d/0x6a0
[    4.189555]  cpuhp_thread_fun+0xb9/0x11c
[    4.189555]  smpboot_thread_fn+0x17d/0x22f
[    4.297006] usb 1-1: Product: USB2.0 Hub
[    4.189555]  ? cpu_report_death+0x43/0x43
[    4.189555]  kthread+0x137/0x13f
[    4.189555]  ? cpu_report_death+0x43/0x43
[    4.189555]  ? kthread_blkcg+0x2e/0x2e
[    4.312951] usb 1-1: Manufacturer: VIA Labs, Inc.
[    4.189555]  ret_from_fork+0x1f/0x40
[    4.189555] Modules linked in:
[    4.189555] CR2: 0000000000000020
[    4.189555] ---[ end trace 01bb812aabc791f4 ]---

After:
[    0.787125] intel_rapl_common: driver does not support CPU family 6 model 166
...snip...
[    4.245273] proc_thermal 0000:00:04.0: failed to add RAPL MMIO interface

Note:
This example above is on a v5.4 branch without below two CML commits yet:
commit f84fdcbc8ec0 ("powercap/intel_rapl: add support for Cometlake desktop")
commit cae478114fbe ("powercap/intel_rapl: add support for CometLake Mobile")

Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support for MMIO RAPL")

Cc: <stable@vger.kernel.org> # v5.3+
Signed-off-by: Harry Pan <harry.pan@intel.com>

---

 drivers/powercap/intel_rapl_common.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 318d023a6a11..aa0a8de413b1 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -1294,6 +1294,9 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	int ret;
 
+	if (!rapl_defaults)
+		return ERR_PTR(-ENODEV);
+
 	rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
 	if (!rp)
 		return ERR_PTR(-ENOMEM);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] powercap/intel_rapl: refine RAPL error handling to respect initial CPU matching
  2019-12-30 14:36 [PATCH] powercap/intel_rapl: refine RAPL error handling to respect initial CPU matching Harry Pan
@ 2020-01-07 11:26 ` Rafael J. Wysocki
  0 siblings, 0 replies; 2+ messages in thread
From: Rafael J. Wysocki @ 2020-01-07 11:26 UTC (permalink / raw)
  To: Harry Pan
  Cc: LKML, Harry Pan, Stable, Rafael J. Wysocki, Linux PM, Zhang, Rui

On Mon, Dec 30, 2019 at 3:37 PM Harry Pan <harry.pan@intel.com> wrote:
>
> RAPL MMIO support depends on RAPL common driver, in case a new generation
> of CPU is booting but not in the RAPL support list, the processor_thermal
> driver invokes CPU hotplug API to enforce RAPL common driver adding new
> RAPL domain which would cause kernel crash by null pointer dereference
> because the internal RAPL domain resource mapping is not initialized after
> the common init.
>
> Add error handling to detect non initialized RAPL domain resource mapping
> and return error code to the caller; such that, it avoids early crash for
> new CPU and leave error messages through processor_thermal driver.
>
> Before:
> [    4.188566] BUG: kernel NULL pointer dereference, address: 0000000000000020
> ...snip...
> [    4.189555] RIP: 0010:rapl_add_package+0x223/0x574
> [    4.189555] Code: b5 a0 31 c0 49 8b 4d 78 48 01 d9 48 8b 0c c1 49 89 4c c6 10 48 ff c0 48 83 f8 05 75 e7 49 83 ff 03 75 15 48 8b 05 09 bc 18 01 <8b> 70 20 41 89 b6 0c 05 00 00 85 f6 75 1a 49 81 c6 18 9
> [    4.189555] RSP: 0000:ffffb3adc00b3d90 EFLAGS: 00010246
> [    4.189555] RAX: 0000000000000000 RBX: 0000000000000098 RCX: 0000000000000000
> [    4.267161] usb 1-1: New USB device found, idVendor=2109, idProduct=2812, bcdDevice= b.e0
> [    4.189555] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff9340caafd000
> [    4.189555] RBP: ffffb3adc00b3df8 R08: ffffffffa0246e28 R09: ffff9340caafc000
> [    4.189555] R10: 000000000000024a R11: ffffffff9ff1f6f2 R12: 00000000ffffffed
> [    4.189555] R13: ffff9340caa94800 R14: ffff9340caafc518 R15: 0000000000000003
> [    4.189555] FS:  0000000000000000(0000) GS:ffff9340ce200000(0000) knlGS:0000000000000000
> [    4.189555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.189555] CR2: 0000000000000020 CR3: 0000000302c14001 CR4: 00000000003606f0
> [    4.189555] Call Trace:
> [    4.189555]  ? __switch_to_asm+0x40/0x70
> [    4.189555]  rapl_mmio_cpu_online+0x47/0x64
> [    4.189555]  ? rapl_mmio_write_raw+0x33/0x33
> [    4.281059] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [    4.189555]  cpuhp_invoke_callback+0x29f/0x66f
> [    4.189555]  ? __schedule+0x46d/0x6a0
> [    4.189555]  cpuhp_thread_fun+0xb9/0x11c
> [    4.189555]  smpboot_thread_fn+0x17d/0x22f
> [    4.297006] usb 1-1: Product: USB2.0 Hub
> [    4.189555]  ? cpu_report_death+0x43/0x43
> [    4.189555]  kthread+0x137/0x13f
> [    4.189555]  ? cpu_report_death+0x43/0x43
> [    4.189555]  ? kthread_blkcg+0x2e/0x2e
> [    4.312951] usb 1-1: Manufacturer: VIA Labs, Inc.
> [    4.189555]  ret_from_fork+0x1f/0x40
> [    4.189555] Modules linked in:
> [    4.189555] CR2: 0000000000000020
> [    4.189555] ---[ end trace 01bb812aabc791f4 ]---
>
> After:
> [    0.787125] intel_rapl_common: driver does not support CPU family 6 model 166
> ...snip...
> [    4.245273] proc_thermal 0000:00:04.0: failed to add RAPL MMIO interface
>
> Note:
> This example above is on a v5.4 branch without below two CML commits yet:
> commit f84fdcbc8ec0 ("powercap/intel_rapl: add support for Cometlake desktop")
> commit cae478114fbe ("powercap/intel_rapl: add support for CometLake Mobile")
>
> Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support for MMIO RAPL")
>
> Cc: <stable@vger.kernel.org> # v5.3+
> Signed-off-by: Harry Pan <harry.pan@intel.com>

Applied as a fix for 5.5-rc with rewritten subject and changelog (new
subject: "powercap: intel_rapl: add NULL pointer check to
rapl_mmio_cpu_online()").

> ---
>
>  drivers/powercap/intel_rapl_common.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
> index 318d023a6a11..aa0a8de413b1 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -1294,6 +1294,9 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
>         struct cpuinfo_x86 *c = &cpu_data(cpu);
>         int ret;
>
> +       if (!rapl_defaults)
> +               return ERR_PTR(-ENODEV);
> +
>         rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
>         if (!rp)
>                 return ERR_PTR(-ENOMEM);
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-01-07 11:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-30 14:36 [PATCH] powercap/intel_rapl: refine RAPL error handling to respect initial CPU matching Harry Pan
2020-01-07 11:26 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).