All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
@ 2016-03-17 18:24 Srinivas Pandruvada
  2016-03-17 20:03 ` Linda Knippers
  0 siblings, 1 reply; 8+ messages in thread
From: Srinivas Pandruvada @ 2016-03-17 18:24 UTC (permalink / raw)
  To: rjw
  Cc: tony.luck, bp, tglx, mingo, hpa, lenb, linux-edac, linux-kernel,
	Srinivas Pandruvada

There are several reports of freeze on enabling HWP (Hardware PStates)
feature on Skylake based systems by Intel P states driver. The root
cause is identified as the HWP interrupts causing BIOS code to freeze.
HWP interrupts uses thermal LVT.
Linux natively handles thermal interrupts, but in Skylake based systems
SMM will take control of thermal interrupts. This is a problem for several
reasons:
- It is freezing in BIOS when tries to handle thermal interrupt, which
will require BIOS upgrade
- With SMM handling thermal we loose all the reporting features of
Linux arch/x86/kernel/cpu/mcheck/therm_throt driver
- Some thermal drivers like x86-package-temp driver depends on the thermal
threshold interrupts
- The HWP interrupts are useful for debugging and tuning performance

So we need native handling of thermal interrupts. To inform SMM that
OS will handle thermal interrupts, we need to use _OSC under processor
scope very early in ACPI initialization flow. This needs to be done
before SMM code path looks for _OSC capabilities. The bit 12 of
_OSC in processor scope defines whether OS will handle thermal interrupts.
When bit 12 is set to 1, OS will handle thermal interrupts.
Refer to this document for details on _OSC
http://www.intel.com/content/www/us/en/standards/processor-vendor-
specific-acpi-specification.html

This change introduces a new function acpi_early_processor_set_osc(),
which walks acpi name space and finds acpi processor object and
set capability via _OSC method to take over thermal LVT.

Also this change writes HWP status bits to 0 to clear any HWP status
bits in intel_thermal_interrupt().

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
v4:
Suggestion by Boris for removing use of intermediate variable for
clearing HWP status and using boot_cpu_has instead of static_cpu_has

v3:
- Added CONFIG_X86 around static_cpu_has() to fix compile error on
ARCH=ia64, reported by kbuild test robot
- return AE_CTRL_TERMINATE to terminate acpi name walk space, when _OSC
is set already once.
v2:
Unnecessary newline was introduced, removed that in acpi_processor.c

 arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++
 drivers/acpi/acpi_processor.c            | 47 ++++++++++++++++++++++++++++++++
 drivers/acpi/bus.c                       |  3 ++
 drivers/acpi/internal.h                  |  2 ++
 4 files changed, 55 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 2c5aaf8..0553858 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -385,6 +385,9 @@ static void intel_thermal_interrupt(void)
 {
 	__u64 msr_val;
 
+	if (static_cpu_has(X86_FEATURE_HWP))
+		wrmsrl_safe(MSR_HWP_STATUS, 0);
+
 	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
 
 	/* Check for violation of core thermal thresholds*/
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 6979186..18da84f 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -491,6 +491,53 @@ static void acpi_processor_remove(struct acpi_device *device)
 }
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
+#ifdef CONFIG_X86
+static bool acpi_hwp_native_thermal_lvt_set;
+static acpi_status acpi_set_hwp_native_thermal_lvt_osc(acpi_handle handle,
+						       u32 lvl, void *context,
+						       void **rv)
+{
+	u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-D87058713953";
+	u32 capbuf[2];
+	struct acpi_osc_context osc_context = {
+		.uuid_str = sb_uuid_str,
+		.rev = 1,
+		.cap.length = 8,
+		.cap.pointer = capbuf,
+	};
+
+	if (acpi_hwp_native_thermal_lvt_set)
+		return AE_CTRL_TERMINATE;
+
+	capbuf[0] = 0x0000;
+	capbuf[1] = 0x1000; /* set bit 12 */
+
+	if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context))) {
+		acpi_hwp_native_thermal_lvt_set = true;
+		kfree(osc_context.ret.pointer);
+	}
+
+	return AE_OK;
+}
+
+void acpi_early_processor_set_osc(void)
+{
+	if (boot_cpu_has(X86_FEATURE_HWP)) {
+		acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+				    ACPI_UINT32_MAX,
+				    acpi_set_hwp_native_thermal_lvt_osc,
+				    NULL, NULL, NULL);
+		acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
+				 acpi_set_hwp_native_thermal_lvt_osc,
+				 NULL, NULL);
+	}
+}
+#else
+
+void acpi_early_processor_set_osc(void) {}
+
+#endif
+
 /*
  * The following ACPI IDs are known to be suitable for representing as
  * processor devices.
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 891c42d..7e73aea 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1005,6 +1005,9 @@ static int __init acpi_bus_init(void)
 		goto error1;
 	}
 
+	/* Set capability bits for _OSC under processor scope */
+	acpi_early_processor_set_osc();
+
 	/*
 	 * _OSC method may exist in module level code,
 	 * so it must be run after ACPI_FULL_INITIALIZATION
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index 1e6833a..5c787ac 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -138,6 +138,8 @@ void acpi_early_processor_set_pdc(void);
 static inline void acpi_early_processor_set_pdc(void) {}
 #endif
 
+void acpi_early_processor_set_osc(void);
+
 /* --------------------------------------------------------------------------
                                   Embedded Controller
    -------------------------------------------------------------------------- */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
  2016-03-17 18:24 [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling Srinivas Pandruvada
@ 2016-03-17 20:03 ` Linda Knippers
  2016-03-17 20:36   ` Srinivas Pandruvada
  0 siblings, 1 reply; 8+ messages in thread
From: Linda Knippers @ 2016-03-17 20:03 UTC (permalink / raw)
  To: Srinivas Pandruvada, rjw
  Cc: tony.luck, bp, tglx, mingo, hpa, lenb, linux-edac, linux-kernel


On 3/17/2016 2:24 PM, Srinivas Pandruvada wrote:
> There are several reports of freeze on enabling HWP (Hardware PStates)
> feature on Skylake based systems by Intel P states driver. The root
> cause is identified as the HWP interrupts causing BIOS code to freeze.
> HWP interrupts uses thermal LVT.
> Linux natively handles thermal interrupts, but in Skylake based systems
> SMM will take control of thermal interrupts. This is a problem for several
> reasons:
> - It is freezing in BIOS when tries to handle thermal interrupt, which
> will require BIOS upgrade
> - With SMM handling thermal we loose all the reporting features of
> Linux arch/x86/kernel/cpu/mcheck/therm_throt driver
> - Some thermal drivers like x86-package-temp driver depends on the thermal
> threshold interrupts
> - The HWP interrupts are useful for debugging and tuning performance
> 
> So we need native handling of thermal interrupts. 

Is this working around a firmware bug?

> To inform SMM that
> OS will handle thermal interrupts, we need to use _OSC under processor
> scope very early in ACPI initialization flow. 

Does _OSC say that OS "will" handle thermal interrupts or that it "can"
handle thermal interrupts?  Couldn't a platform decide it wants to handle
them, regardless of what the OS may be capable of?

> This needs to be done
> before SMM code path looks for _OSC capabilities. The bit 12 of
> _OSC in processor scope defines whether OS will handle thermal interrupts.
> When bit 12 is set to 1, OS will handle thermal interrupts.
> Refer to this document for details on _OSC
> http://www.intel.com/content/www/us/en/standards/processor-vendor-
> specific-acpi-specification.html

Where is bit 12 documented?

> This change introduces a new function acpi_early_processor_set_osc(),
> which walks acpi name space and finds acpi processor object and
> set capability via _OSC method to take over thermal LVT.

Does this change just affect Skylake platforms or all platforms?

-- ljk
> 
> Also this change writes HWP status bits to 0 to clear any HWP status
> bits in intel_thermal_interrupt().
> 
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
> v4:
> Suggestion by Boris for removing use of intermediate variable for
> clearing HWP status and using boot_cpu_has instead of static_cpu_has
> 
> v3:
> - Added CONFIG_X86 around static_cpu_has() to fix compile error on
> ARCH=ia64, reported by kbuild test robot
> - return AE_CTRL_TERMINATE to terminate acpi name walk space, when _OSC
> is set already once.
> v2:
> Unnecessary newline was introduced, removed that in acpi_processor.c
> 
>  arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++
>  drivers/acpi/acpi_processor.c            | 47 ++++++++++++++++++++++++++++++++
>  drivers/acpi/bus.c                       |  3 ++
>  drivers/acpi/internal.h                  |  2 ++
>  4 files changed, 55 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 2c5aaf8..0553858 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -385,6 +385,9 @@ static void intel_thermal_interrupt(void)
>  {
>  	__u64 msr_val;
>  
> +	if (static_cpu_has(X86_FEATURE_HWP))
> +		wrmsrl_safe(MSR_HWP_STATUS, 0);
> +
>  	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
>  
>  	/* Check for violation of core thermal thresholds*/
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 6979186..18da84f 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -491,6 +491,53 @@ static void acpi_processor_remove(struct acpi_device *device)
>  }
>  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
>  
> +#ifdef CONFIG_X86
> +static bool acpi_hwp_native_thermal_lvt_set;
> +static acpi_status acpi_set_hwp_native_thermal_lvt_osc(acpi_handle handle,
> +						       u32 lvl, void *context,
> +						       void **rv)
> +{
> +	u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-D87058713953";
> +	u32 capbuf[2];
> +	struct acpi_osc_context osc_context = {
> +		.uuid_str = sb_uuid_str,
> +		.rev = 1,
> +		.cap.length = 8,
> +		.cap.pointer = capbuf,
> +	};
> +
> +	if (acpi_hwp_native_thermal_lvt_set)
> +		return AE_CTRL_TERMINATE;
> +
> +	capbuf[0] = 0x0000;
> +	capbuf[1] = 0x1000; /* set bit 12 */
> +
> +	if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context))) {
> +		acpi_hwp_native_thermal_lvt_set = true;
> +		kfree(osc_context.ret.pointer);
> +	}
> +
> +	return AE_OK;
> +}
> +
> +void acpi_early_processor_set_osc(void)
> +{
> +	if (boot_cpu_has(X86_FEATURE_HWP)) {
> +		acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
> +				    ACPI_UINT32_MAX,
> +				    acpi_set_hwp_native_thermal_lvt_osc,
> +				    NULL, NULL, NULL);
> +		acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
> +				 acpi_set_hwp_native_thermal_lvt_osc,
> +				 NULL, NULL);
> +	}
> +}
> +#else
> +
> +void acpi_early_processor_set_osc(void) {}
> +
> +#endif
> +
>  /*
>   * The following ACPI IDs are known to be suitable for representing as
>   * processor devices.
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 891c42d..7e73aea 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -1005,6 +1005,9 @@ static int __init acpi_bus_init(void)
>  		goto error1;
>  	}
>  
> +	/* Set capability bits for _OSC under processor scope */
> +	acpi_early_processor_set_osc();
> +
>  	/*
>  	 * _OSC method may exist in module level code,
>  	 * so it must be run after ACPI_FULL_INITIALIZATION
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 1e6833a..5c787ac 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -138,6 +138,8 @@ void acpi_early_processor_set_pdc(void);
>  static inline void acpi_early_processor_set_pdc(void) {}
>  #endif
>  
> +void acpi_early_processor_set_osc(void);
> +
>  /* --------------------------------------------------------------------------
>                                    Embedded Controller
>     -------------------------------------------------------------------------- */
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
  2016-03-17 20:03 ` Linda Knippers
@ 2016-03-17 20:36   ` Srinivas Pandruvada
  2016-03-17 20:51     ` Linda Knippers
  0 siblings, 1 reply; 8+ messages in thread
From: Srinivas Pandruvada @ 2016-03-17 20:36 UTC (permalink / raw)
  To: Linda Knippers, rjw
  Cc: tony.luck, bp, tglx, mingo, hpa, lenb, linux-edac, linux-kernel

On Thu, 2016-03-17 at 16:03 -0400, Linda Knippers wrote:
> On 3/17/2016 2:24 PM, Srinivas Pandruvada wrote:
> > 
> > There are several reports of freeze on enabling HWP (Hardware
> > PStates)
> > feature on Skylake based systems by Intel P states driver. The root
> > cause is identified as the HWP interrupts causing BIOS code to
> > freeze.
> > HWP interrupts uses thermal LVT.
> > Linux natively handles thermal interrupts, but in Skylake based
> > systems
> > SMM will take control of thermal interrupts. This is a problem for
> > several
> > reasons:
> > - It is freezing in BIOS when tries to handle thermal interrupt,
> > which
> > will require BIOS upgrade
> > - With SMM handling thermal we loose all the reporting features of
> > Linux arch/x86/kernel/cpu/mcheck/therm_throt driver
> > - Some thermal drivers like x86-package-temp driver depends on the
> > thermal
> > threshold interrupts
> > - The HWP interrupts are useful for debugging and tuning
> > performance
> > 
> > So we need native handling of thermal interrupts. 
> Is this working around a firmware bug?
Yes. But Linux always had capability to handle thermal interrupts
natively, so we should continue to have this in OS control, when
possible.

> 
> > 
> > To inform SMM that
> > OS will handle thermal interrupts, we need to use _OSC under
> > processor
> > scope very early in ACPI initialization flow. 
> Does _OSC say that OS "will" handle thermal interrupts or that it
> "can"
> handle thermal interrupts?

Can handle.

>   Couldn't a platform decide it wants to handle
> them, regardless of what the OS may be capable of?
> 
Yes. In that case platform can change the delivery mode bits to SMM in
LVT.

> > 
> > This needs to be done
> > before SMM code path looks for _OSC capabilities. The bit 12 of
> > _OSC in processor scope defines whether OS will handle thermal
> > interrupts.
> > When bit 12 is set to 1, OS will handle thermal interrupts.
> > Refer to this document for details on _OSC
> > http://www.intel.com/content/www/us/en/standards/processor-vendor-
> > specific-acpi-specification.html
> Where is bit 12 documented?
> 
In the above document.

> > 
> > This change introduces a new function
> > acpi_early_processor_set_osc(),
> > which walks acpi name space and finds acpi processor object and
> > set capability via _OSC method to take over thermal LVT.
> Does this change just affect Skylake platforms or all platforms?
Any platform which has Intel ® Speed Shift Technology (aka HWP) feature
present and enabled.

Thanks,
Srinivas 

> 
> -- ljk
> > 
> > 
> > Also this change writes HWP status bits to 0 to clear any HWP
> > status
> > bits in intel_thermal_interrupt().
> > 
> > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel
> > .com>
> > ---
> > v4:
> > Suggestion by Boris for removing use of intermediate variable for
> > clearing HWP status and using boot_cpu_has instead of
> > static_cpu_has
> > 
> > v3:
> > - Added CONFIG_X86 around static_cpu_has() to fix compile error on
> > ARCH=ia64, reported by kbuild test robot
> > - return AE_CTRL_TERMINATE to terminate acpi name walk space, when
> > _OSC
> > is set already once.
> > v2:
> > Unnecessary newline was introduced, removed that in
> > acpi_processor.c
> > 
> >  arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++
> >  drivers/acpi/acpi_processor.c            | 47
> > ++++++++++++++++++++++++++++++++
> >  drivers/acpi/bus.c                       |  3 ++
> >  drivers/acpi/internal.h                  |  2 ++
> >  4 files changed, 55 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > index 2c5aaf8..0553858 100644
> > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > @@ -385,6 +385,9 @@ static void intel_thermal_interrupt(void)
> >  {
> >  	__u64 msr_val;
> >  
> > +	if (static_cpu_has(X86_FEATURE_HWP))
> > +		wrmsrl_safe(MSR_HWP_STATUS, 0);
> > +
> >  	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
> >  
> >  	/* Check for violation of core thermal thresholds*/
> > diff --git a/drivers/acpi/acpi_processor.c
> > b/drivers/acpi/acpi_processor.c
> > index 6979186..18da84f 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -491,6 +491,53 @@ static void acpi_processor_remove(struct
> > acpi_device *device)
> >  }
> >  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
> >  
> > +#ifdef CONFIG_X86
> > +static bool acpi_hwp_native_thermal_lvt_set;
> > +static acpi_status acpi_set_hwp_native_thermal_lvt_osc(acpi_handle
> > handle,
> > +						       u32 lvl,
> > void *context,
> > +						       void **rv)
> > +{
> > +	u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-D87058713953";
> > +	u32 capbuf[2];
> > +	struct acpi_osc_context osc_context = {
> > +		.uuid_str = sb_uuid_str,
> > +		.rev = 1,
> > +		.cap.length = 8,
> > +		.cap.pointer = capbuf,
> > +	};
> > +
> > +	if (acpi_hwp_native_thermal_lvt_set)
> > +		return AE_CTRL_TERMINATE;
> > +
> > +	capbuf[0] = 0x0000;
> > +	capbuf[1] = 0x1000; /* set bit 12 */
> > +
> > +	if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context))) {
> > +		acpi_hwp_native_thermal_lvt_set = true;
> > +		kfree(osc_context.ret.pointer);
> > +	}
> > +
> > +	return AE_OK;
> > +}
> > +
> > +void acpi_early_processor_set_osc(void)
> > +{
> > +	if (boot_cpu_has(X86_FEATURE_HWP)) {
> > +		acpi_walk_namespace(ACPI_TYPE_PROCESSOR,
> > ACPI_ROOT_OBJECT,
> > +				    ACPI_UINT32_MAX,
> > +				    acpi_set_hwp_native_thermal_lv
> > t_osc,
> > +				    NULL, NULL, NULL);
> > +		acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
> > +				 acpi_set_hwp_native_thermal_lvt_o
> > sc,
> > +				 NULL, NULL);
> > +	}
> > +}
> > +#else
> > +
> > +void acpi_early_processor_set_osc(void) {}
> > +
> > +#endif
> > +
> >  /*
> >   * The following ACPI IDs are known to be suitable for
> > representing as
> >   * processor devices.
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index 891c42d..7e73aea 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -1005,6 +1005,9 @@ static int __init acpi_bus_init(void)
> >  		goto error1;
> >  	}
> >  
> > +	/* Set capability bits for _OSC under processor scope */
> > +	acpi_early_processor_set_osc();
> > +
> >  	/*
> >  	 * _OSC method may exist in module level code,
> >  	 * so it must be run after ACPI_FULL_INITIALIZATION
> > diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> > index 1e6833a..5c787ac 100644
> > --- a/drivers/acpi/internal.h
> > +++ b/drivers/acpi/internal.h
> > @@ -138,6 +138,8 @@ void acpi_early_processor_set_pdc(void);
> >  static inline void acpi_early_processor_set_pdc(void) {}
> >  #endif
> >  
> > +void acpi_early_processor_set_osc(void);
> > +
> >  /* -------------------------------------------------------------
> > -------------
> >                                    Embedded Controller
> >     ---------------------------------------------------------------
> > ----------- */
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
  2016-03-17 20:36   ` Srinivas Pandruvada
@ 2016-03-17 20:51     ` Linda Knippers
  2016-03-17 21:12       ` Srinivas Pandruvada
  0 siblings, 1 reply; 8+ messages in thread
From: Linda Knippers @ 2016-03-17 20:51 UTC (permalink / raw)
  To: Srinivas Pandruvada, rjw
  Cc: tony.luck, bp, tglx, mingo, hpa, lenb, linux-edac, linux-kernel



On 3/17/2016 4:36 PM, Srinivas Pandruvada wrote:
> On Thu, 2016-03-17 at 16:03 -0400, Linda Knippers wrote:
>> On 3/17/2016 2:24 PM, Srinivas Pandruvada wrote:
>>>
>>> There are several reports of freeze on enabling HWP (Hardware
>>> PStates)
>>> feature on Skylake based systems by Intel P states driver. The root
>>> cause is identified as the HWP interrupts causing BIOS code to
>>> freeze.
>>> HWP interrupts uses thermal LVT.
>>> Linux natively handles thermal interrupts, but in Skylake based
>>> systems
>>> SMM will take control of thermal interrupts. This is a problem for
>>> several
>>> reasons:
>>> - It is freezing in BIOS when tries to handle thermal interrupt,
>>> which
>>> will require BIOS upgrade
>>> - With SMM handling thermal we loose all the reporting features of
>>> Linux arch/x86/kernel/cpu/mcheck/therm_throt driver
>>> - Some thermal drivers like x86-package-temp driver depends on the
>>> thermal
>>> threshold interrupts
>>> - The HWP interrupts are useful for debugging and tuning
>>> performance
>>>
>>> So we need native handling of thermal interrupts. 
>> Is this working around a firmware bug?
> Yes. But Linux always had capability to handle thermal interrupts
> natively, so we should continue to have this in OS control, when
> possible.
> 
>>
>>>
>>> To inform SMM that
>>> OS will handle thermal interrupts, we need to use _OSC under
>>> processor
>>> scope very early in ACPI initialization flow. 
>> Does _OSC say that OS "will" handle thermal interrupts or that it
>> "can"
>> handle thermal interrupts?
> 
> Can handle.
> 
>>   Couldn't a platform decide it wants to handle
>> them, regardless of what the OS may be capable of?
>>
> Yes. In that case platform can change the delivery mode bits to SMM in
> LVT.

And would still exhibit the BIOS bug?

>>>
>>> This needs to be done
>>> before SMM code path looks for _OSC capabilities. The bit 12 of
>>> _OSC in processor scope defines whether OS will handle thermal
>>> interrupts.
>>> When bit 12 is set to 1, OS will handle thermal interrupts.
>>> Refer to this document for details on _OSC
>>> http://www.intel.com/content/www/us/en/standards/processor-vendor-
>>> specific-acpi-specification.html
>> Where is bit 12 documented?
>>
> In the above document.

When I look at that document, I see bit 12 described as
"If set, OSPM supports native interrupt handling for Collaborative Processor
Performance Control notifications."  Is that the same thing or am
I looking at the wrong table?

-- ljk

> 
>>>
>>> This change introduces a new function
>>> acpi_early_processor_set_osc(),
>>> which walks acpi name space and finds acpi processor object and
>>> set capability via _OSC method to take over thermal LVT.
>> Does this change just affect Skylake platforms or all platforms?
> Any platform which has Intel ® Speed Shift Technology (aka HWP) feature
> present and enabled.
> 
> Thanks,
> Srinivas 
> 
>>
>> -- ljk
>>>
>>>
>>> Also this change writes HWP status bits to 0 to clear any HWP
>>> status
>>> bits in intel_thermal_interrupt().
>>>
>>> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel
>>> .com>
>>> ---
>>> v4:
>>> Suggestion by Boris for removing use of intermediate variable for
>>> clearing HWP status and using boot_cpu_has instead of
>>> static_cpu_has
>>>
>>> v3:
>>> - Added CONFIG_X86 around static_cpu_has() to fix compile error on
>>> ARCH=ia64, reported by kbuild test robot
>>> - return AE_CTRL_TERMINATE to terminate acpi name walk space, when
>>> _OSC
>>> is set already once.
>>> v2:
>>> Unnecessary newline was introduced, removed that in
>>> acpi_processor.c
>>>
>>>  arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++
>>>  drivers/acpi/acpi_processor.c            | 47
>>> ++++++++++++++++++++++++++++++++
>>>  drivers/acpi/bus.c                       |  3 ++
>>>  drivers/acpi/internal.h                  |  2 ++
>>>  4 files changed, 55 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>> b/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>> index 2c5aaf8..0553858 100644
>>> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>> @@ -385,6 +385,9 @@ static void intel_thermal_interrupt(void)
>>>  {
>>>  	__u64 msr_val;
>>>  
>>> +	if (static_cpu_has(X86_FEATURE_HWP))
>>> +		wrmsrl_safe(MSR_HWP_STATUS, 0);
>>> +
>>>  	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
>>>  
>>>  	/* Check for violation of core thermal thresholds*/
>>> diff --git a/drivers/acpi/acpi_processor.c
>>> b/drivers/acpi/acpi_processor.c
>>> index 6979186..18da84f 100644
>>> --- a/drivers/acpi/acpi_processor.c
>>> +++ b/drivers/acpi/acpi_processor.c
>>> @@ -491,6 +491,53 @@ static void acpi_processor_remove(struct
>>> acpi_device *device)
>>>  }
>>>  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
>>>  
>>> +#ifdef CONFIG_X86
>>> +static bool acpi_hwp_native_thermal_lvt_set;
>>> +static acpi_status acpi_set_hwp_native_thermal_lvt_osc(acpi_handle
>>> handle,
>>> +						       u32 lvl,
>>> void *context,
>>> +						       void **rv)
>>> +{
>>> +	u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-D87058713953";
>>> +	u32 capbuf[2];
>>> +	struct acpi_osc_context osc_context = {
>>> +		.uuid_str = sb_uuid_str,
>>> +		.rev = 1,
>>> +		.cap.length = 8,
>>> +		.cap.pointer = capbuf,
>>> +	};
>>> +
>>> +	if (acpi_hwp_native_thermal_lvt_set)
>>> +		return AE_CTRL_TERMINATE;
>>> +
>>> +	capbuf[0] = 0x0000;
>>> +	capbuf[1] = 0x1000; /* set bit 12 */
>>> +
>>> +	if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context))) {
>>> +		acpi_hwp_native_thermal_lvt_set = true;
>>> +		kfree(osc_context.ret.pointer);
>>> +	}
>>> +
>>> +	return AE_OK;
>>> +}
>>> +
>>> +void acpi_early_processor_set_osc(void)
>>> +{
>>> +	if (boot_cpu_has(X86_FEATURE_HWP)) {
>>> +		acpi_walk_namespace(ACPI_TYPE_PROCESSOR,
>>> ACPI_ROOT_OBJECT,
>>> +				    ACPI_UINT32_MAX,
>>> +				    acpi_set_hwp_native_thermal_lv
>>> t_osc,
>>> +				    NULL, NULL, NULL);
>>> +		acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
>>> +				 acpi_set_hwp_native_thermal_lvt_o
>>> sc,
>>> +				 NULL, NULL);
>>> +	}
>>> +}
>>> +#else
>>> +
>>> +void acpi_early_processor_set_osc(void) {}
>>> +
>>> +#endif
>>> +
>>>  /*
>>>   * The following ACPI IDs are known to be suitable for
>>> representing as
>>>   * processor devices.
>>> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
>>> index 891c42d..7e73aea 100644
>>> --- a/drivers/acpi/bus.c
>>> +++ b/drivers/acpi/bus.c
>>> @@ -1005,6 +1005,9 @@ static int __init acpi_bus_init(void)
>>>  		goto error1;
>>>  	}
>>>  
>>> +	/* Set capability bits for _OSC under processor scope */
>>> +	acpi_early_processor_set_osc();
>>> +
>>>  	/*
>>>  	 * _OSC method may exist in module level code,
>>>  	 * so it must be run after ACPI_FULL_INITIALIZATION
>>> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
>>> index 1e6833a..5c787ac 100644
>>> --- a/drivers/acpi/internal.h
>>> +++ b/drivers/acpi/internal.h
>>> @@ -138,6 +138,8 @@ void acpi_early_processor_set_pdc(void);
>>>  static inline void acpi_early_processor_set_pdc(void) {}
>>>  #endif
>>>  
>>> +void acpi_early_processor_set_osc(void);
>>> +
>>>  /* -------------------------------------------------------------
>>> -------------
>>>                                    Embedded Controller
>>>     ---------------------------------------------------------------
>>> ----------- */
>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
  2016-03-17 20:51     ` Linda Knippers
@ 2016-03-17 21:12       ` Srinivas Pandruvada
  2016-03-17 23:44         ` Linda Knippers
  0 siblings, 1 reply; 8+ messages in thread
From: Srinivas Pandruvada @ 2016-03-17 21:12 UTC (permalink / raw)
  To: Linda Knippers, rjw
  Cc: tony.luck, bp, tglx, mingo, hpa, lenb, linux-edac, linux-kernel

On Thu, 2016-03-17 at 16:51 -0400, Linda Knippers wrote:
> 
> On 3/17/2016 4:36 PM, Srinivas Pandruvada wrote:
> > 
> > On Thu, 2016-03-17 at 16:03 -0400, Linda Knippers wrote:
> > > 
> > > On 3/17/2016 2:24 PM, Srinivas Pandruvada wrote:
> > > > 
> > > > 
> > > > There are several reports of freeze on enabling HWP (Hardware
> > > > PStates)
> > > > feature on Skylake based systems by Intel P states driver. The
> > > > root
> > > > cause is identified as the HWP interrupts causing BIOS code to
> > > > freeze.
> > > > HWP interrupts uses thermal LVT.
> > > > Linux natively handles thermal interrupts, but in Skylake based
> > > > systems
> > > > SMM will take control of thermal interrupts. This is a problem
> > > > for
> > > > several
> > > > reasons:
> > > > - It is freezing in BIOS when tries to handle thermal
> > > > interrupt,
> > > > which
> > > > will require BIOS upgrade
> > > > - With SMM handling thermal we loose all the reporting features
> > > > of
> > > > Linux arch/x86/kernel/cpu/mcheck/therm_throt driver
> > > > - Some thermal drivers like x86-package-temp driver depends on
> > > > the
> > > > thermal
> > > > threshold interrupts
> > > > - The HWP interrupts are useful for debugging and tuning
> > > > performance
> > > > 
> > > > So we need native handling of thermal interrupts. 
> > > Is this working around a firmware bug?
> > Yes. But Linux always had capability to handle thermal interrupts
> > natively, so we should continue to have this in OS control, when
> > possible.
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > To inform SMM that
> > > > OS will handle thermal interrupts, we need to use _OSC under
> > > > processor
> > > > scope very early in ACPI initialization flow. 
> > > Does _OSC say that OS "will" handle thermal interrupts or that it
> > > "can"
> > > handle thermal interrupts?
> > Can handle.
> > 
> > > 
> > >   Couldn't a platform decide it wants to handle
> > > them, regardless of what the OS may be capable of?
> > > 
> > Yes. In that case platform can change the delivery mode bits to SMM
> > in
> > LVT.
> And would still exhibit the BIOS bug?
> 
Yes. But eventually BIOS bug will be fixed.

> > 
> > > 
> > > > 
> > > > 
> > > > This needs to be done
> > > > before SMM code path looks for _OSC capabilities. The bit 12 of
> > > > _OSC in processor scope defines whether OS will handle thermal
> > > > interrupts.
> > > > When bit 12 is set to 1, OS will handle thermal interrupts.
> > > > Refer to this document for details on _OSC
> > > > http://www.intel.com/content/www/us/en/standards/processor-vend
> > > > or-
> > > > specific-acpi-specification.html
> > > Where is bit 12 documented?
> > > 
> > In the above document.
> When I look at that document, I see bit 12 described as
> "If set, OSPM supports native interrupt handling for Collaborative
> Processor
> Performance Control notifications."  Is that the same thing or am
> I looking at the wrong table?
Yes. If you look at section 14.4 in Intel SDM, you will see that 
"HWP is an implementation of the ACPI-defined Collaborative Processor
Performance Control (CPPC)". Section 14.4.5 also specifies that HWP
uses IA32_THERM_STATUS to communicate if there are notifications, which
is notified via thermal interrupt.

You asked above if platform can handle these notification in SMM only.
If you do then the notification will arrive as ACPI notifications. We
don't have support for such notifications in Linux yet.

Thanks,
Srinivas

> 
> -- ljk
> 
> > 
> > 
> > > 
> > > > 
> > > > 
> > > > This change introduces a new function
> > > > acpi_early_processor_set_osc(),
> > > > which walks acpi name space and finds acpi processor object and
> > > > set capability via _OSC method to take over thermal LVT.
> > > Does this change just affect Skylake platforms or all platforms?
> > Any platform which has Intel ® Speed Shift Technology (aka HWP)
> > feature
> > present and enabled.
> > 
> > Thanks,
> > Srinivas 
> > 
> > > 
> > > 
> > > -- ljk
> > > > 
> > > > 
> > > > 
> > > > Also this change writes HWP status bits to 0 to clear any HWP
> > > > status
> > > > bits in intel_thermal_interrupt().
> > > > 
> > > > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.i
> > > > ntel
> > > > .com>
> > > > ---
> > > > v4:
> > > > Suggestion by Boris for removing use of intermediate variable
> > > > for
> > > > clearing HWP status and using boot_cpu_has instead of
> > > > static_cpu_has
> > > > 
> > > > v3:
> > > > - Added CONFIG_X86 around static_cpu_has() to fix compile error
> > > > on
> > > > ARCH=ia64, reported by kbuild test robot
> > > > - return AE_CTRL_TERMINATE to terminate acpi name walk space,
> > > > when
> > > > _OSC
> > > > is set already once.
> > > > v2:
> > > > Unnecessary newline was introduced, removed that in
> > > > acpi_processor.c
> > > > 
> > > >  arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++
> > > >  drivers/acpi/acpi_processor.c            | 47
> > > > ++++++++++++++++++++++++++++++++
> > > >  drivers/acpi/bus.c                       |  3 ++
> > > >  drivers/acpi/internal.h                  |  2 ++
> > > >  4 files changed, 55 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > > > b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > > > index 2c5aaf8..0553858 100644
> > > > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > > > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > > > @@ -385,6 +385,9 @@ static void intel_thermal_interrupt(void)
> > > >  {
> > > >  	__u64 msr_val;
> > > >  
> > > > +	if (static_cpu_has(X86_FEATURE_HWP))
> > > > +		wrmsrl_safe(MSR_HWP_STATUS, 0);
> > > > +
> > > >  	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
> > > >  
> > > >  	/* Check for violation of core thermal thresholds*/
> > > > diff --git a/drivers/acpi/acpi_processor.c
> > > > b/drivers/acpi/acpi_processor.c
> > > > index 6979186..18da84f 100644
> > > > --- a/drivers/acpi/acpi_processor.c
> > > > +++ b/drivers/acpi/acpi_processor.c
> > > > @@ -491,6 +491,53 @@ static void acpi_processor_remove(struct
> > > > acpi_device *device)
> > > >  }
> > > >  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
> > > >  
> > > > +#ifdef CONFIG_X86
> > > > +static bool acpi_hwp_native_thermal_lvt_set;
> > > > +static acpi_status
> > > > acpi_set_hwp_native_thermal_lvt_osc(acpi_handle
> > > > handle,
> > > > +						       u32
> > > > lvl,
> > > > void *context,
> > > > +						       void
> > > > **rv)
> > > > +{
> > > > +	u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-
> > > > D87058713953";
> > > > +	u32 capbuf[2];
> > > > +	struct acpi_osc_context osc_context = {
> > > > +		.uuid_str = sb_uuid_str,
> > > > +		.rev = 1,
> > > > +		.cap.length = 8,
> > > > +		.cap.pointer = capbuf,
> > > > +	};
> > > > +
> > > > +	if (acpi_hwp_native_thermal_lvt_set)
> > > > +		return AE_CTRL_TERMINATE;
> > > > +
> > > > +	capbuf[0] = 0x0000;
> > > > +	capbuf[1] = 0x1000; /* set bit 12 */
> > > > +
> > > > +	if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context)))
> > > > {
> > > > +		acpi_hwp_native_thermal_lvt_set = true;
> > > > +		kfree(osc_context.ret.pointer);
> > > > +	}
> > > > +
> > > > +	return AE_OK;
> > > > +}
> > > > +
> > > > +void acpi_early_processor_set_osc(void)
> > > > +{
> > > > +	if (boot_cpu_has(X86_FEATURE_HWP)) {
> > > > +		acpi_walk_namespace(ACPI_TYPE_PROCESSOR,
> > > > ACPI_ROOT_OBJECT,
> > > > +				    ACPI_UINT32_MAX,
> > > > +				    acpi_set_hwp_native_therma
> > > > l_lv
> > > > t_osc,
> > > > +				    NULL, NULL, NULL);
> > > > +		acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
> > > > +				 acpi_set_hwp_native_thermal_l
> > > > vt_o
> > > > sc,
> > > > +				 NULL, NULL);
> > > > +	}
> > > > +}
> > > > +#else
> > > > +
> > > > +void acpi_early_processor_set_osc(void) {}
> > > > +
> > > > +#endif
> > > > +
> > > >  /*
> > > >   * The following ACPI IDs are known to be suitable for
> > > > representing as
> > > >   * processor devices.
> > > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > > index 891c42d..7e73aea 100644
> > > > --- a/drivers/acpi/bus.c
> > > > +++ b/drivers/acpi/bus.c
> > > > @@ -1005,6 +1005,9 @@ static int __init acpi_bus_init(void)
> > > >  		goto error1;
> > > >  	}
> > > >  
> > > > +	/* Set capability bits for _OSC under processor scope
> > > > */
> > > > +	acpi_early_processor_set_osc();
> > > > +
> > > >  	/*
> > > >  	 * _OSC method may exist in module level code,
> > > >  	 * so it must be run after ACPI_FULL_INITIALIZATION
> > > > diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> > > > index 1e6833a..5c787ac 100644
> > > > --- a/drivers/acpi/internal.h
> > > > +++ b/drivers/acpi/internal.h
> > > > @@ -138,6 +138,8 @@ void acpi_early_processor_set_pdc(void);
> > > >  static inline void acpi_early_processor_set_pdc(void) {}
> > > >  #endif
> > > >  
> > > > +void acpi_early_processor_set_osc(void);
> > > > +
> > > >  /* ---------------------------------------------------------
> > > > ----
> > > > -------------
> > > >                                    Embedded Controller
> > > >     -----------------------------------------------------------
> > > > ----
> > > > ----------- */
> > > > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
  2016-03-17 21:12       ` Srinivas Pandruvada
@ 2016-03-17 23:44         ` Linda Knippers
  2016-03-18  0:17           ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Linda Knippers @ 2016-03-17 23:44 UTC (permalink / raw)
  To: Srinivas Pandruvada, rjw
  Cc: tony.luck, bp, tglx, mingo, hpa, lenb, linux-edac, linux-kernel



On 3/17/2016 5:12 PM, Srinivas Pandruvada wrote:
<snip>
>>>>> This needs to be done
>>>>> before SMM code path looks for _OSC capabilities. The bit 12 of
>>>>> _OSC in processor scope defines whether OS will handle thermal
>>>>> interrupts.
>>>>> When bit 12 is set to 1, OS will handle thermal interrupts.
>>>>> Refer to this document for details on _OSC
>>>>> http://www.intel.com/content/www/us/en/standards/processor-vend
>>>>> or-
>>>>> specific-acpi-specification.html
>>>> Where is bit 12 documented?
>>>>
>>> In the above document.
>> When I look at that document, I see bit 12 described as
>> "If set, OSPM supports native interrupt handling for Collaborative
>> Processor
>> Performance Control notifications."  Is that the same thing or am
>> I looking at the wrong table?
> Yes. If you look at section 14.4 in Intel SDM, you will see that 
> "HWP is an implementation of the ACPI-defined Collaborative Processor
> Performance Control (CPPC)". Section 14.4.5 also specifies that HWP
> uses IA32_THERM_STATUS to communicate if there are notifications, which
> is notified via thermal interrupt.

Ok, thanks. That wasn't clear from the commit message.  It
sounded like bit 12 directly indicated that the OS will handle
thermal interrupts but it's a bit more indirect than that.

> You asked above if platform can handle these notification in SMM only.
> If you do then the notification will arrive as ACPI notifications. We
> don't have support for such notifications in Linux yet.

What I meant to ask was if the platform can disregard the _OSC information
and handle thermal events on it's own, without OS involvement.
For example, servers typically don't want to rely on the OS to manage
thermal issues.

<snip>

>>>>> This change introduces a new function
>>>>> acpi_early_processor_set_osc(),
>>>>> which walks acpi name space and finds acpi processor object and
>>>>> set capability via _OSC method to take over thermal LVT.
>>>> Does this change just affect Skylake platforms or all platforms?
>>> Any platform which has Intel ® Speed Shift Technology (aka HWP)
>>> feature present and enabled.

Could this be an unexpected change in behavior for platforms
with HWP that don't have this bug, assuming they would look at
the _OSC CPPP bit?  That's actually my main concern here.

-- ljk

>>>
>>> Thanks,
>>> Srinivas 
>>>
>>>>
>>>>
>>>> -- ljk
>>>>>
>>>>>
>>>>>
>>>>> Also this change writes HWP status bits to 0 to clear any HWP
>>>>> status
>>>>> bits in intel_thermal_interrupt().
>>>>>
>>>>> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.i
>>>>> ntel
>>>>> .com>
>>>>> ---
>>>>> v4:
>>>>> Suggestion by Boris for removing use of intermediate variable
>>>>> for
>>>>> clearing HWP status and using boot_cpu_has instead of
>>>>> static_cpu_has
>>>>>
>>>>> v3:
>>>>> - Added CONFIG_X86 around static_cpu_has() to fix compile error
>>>>> on
>>>>> ARCH=ia64, reported by kbuild test robot
>>>>> - return AE_CTRL_TERMINATE to terminate acpi name walk space,
>>>>> when
>>>>> _OSC
>>>>> is set already once.
>>>>> v2:
>>>>> Unnecessary newline was introduced, removed that in
>>>>> acpi_processor.c
>>>>>
>>>>>  arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++
>>>>>  drivers/acpi/acpi_processor.c            | 47
>>>>> ++++++++++++++++++++++++++++++++
>>>>>  drivers/acpi/bus.c                       |  3 ++
>>>>>  drivers/acpi/internal.h                  |  2 ++
>>>>>  4 files changed, 55 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>>>> b/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>>>> index 2c5aaf8..0553858 100644
>>>>> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>>>> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
>>>>> @@ -385,6 +385,9 @@ static void intel_thermal_interrupt(void)
>>>>>  {
>>>>>  	__u64 msr_val;
>>>>>  
>>>>> +	if (static_cpu_has(X86_FEATURE_HWP))
>>>>> +		wrmsrl_safe(MSR_HWP_STATUS, 0);
>>>>> +
>>>>>  	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
>>>>>  
>>>>>  	/* Check for violation of core thermal thresholds*/
>>>>> diff --git a/drivers/acpi/acpi_processor.c
>>>>> b/drivers/acpi/acpi_processor.c
>>>>> index 6979186..18da84f 100644
>>>>> --- a/drivers/acpi/acpi_processor.c
>>>>> +++ b/drivers/acpi/acpi_processor.c
>>>>> @@ -491,6 +491,53 @@ static void acpi_processor_remove(struct
>>>>> acpi_device *device)
>>>>>  }
>>>>>  #endif /* CONFIG_ACPI_HOTPLUG_CPU */
>>>>>  
>>>>> +#ifdef CONFIG_X86
>>>>> +static bool acpi_hwp_native_thermal_lvt_set;
>>>>> +static acpi_status
>>>>> acpi_set_hwp_native_thermal_lvt_osc(acpi_handle
>>>>> handle,
>>>>> +						       u32
>>>>> lvl,
>>>>> void *context,
>>>>> +						       void
>>>>> **rv)
>>>>> +{
>>>>> +	u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-
>>>>> D87058713953";
>>>>> +	u32 capbuf[2];
>>>>> +	struct acpi_osc_context osc_context = {
>>>>> +		.uuid_str = sb_uuid_str,
>>>>> +		.rev = 1,
>>>>> +		.cap.length = 8,
>>>>> +		.cap.pointer = capbuf,
>>>>> +	};
>>>>> +
>>>>> +	if (acpi_hwp_native_thermal_lvt_set)
>>>>> +		return AE_CTRL_TERMINATE;
>>>>> +
>>>>> +	capbuf[0] = 0x0000;
>>>>> +	capbuf[1] = 0x1000; /* set bit 12 */
>>>>> +
>>>>> +	if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context)))
>>>>> {
>>>>> +		acpi_hwp_native_thermal_lvt_set = true;
>>>>> +		kfree(osc_context.ret.pointer);
>>>>> +	}
>>>>> +
>>>>> +	return AE_OK;
>>>>> +}
>>>>> +
>>>>> +void acpi_early_processor_set_osc(void)
>>>>> +{
>>>>> +	if (boot_cpu_has(X86_FEATURE_HWP)) {
>>>>> +		acpi_walk_namespace(ACPI_TYPE_PROCESSOR,
>>>>> ACPI_ROOT_OBJECT,
>>>>> +				    ACPI_UINT32_MAX,
>>>>> +				    acpi_set_hwp_native_therma
>>>>> l_lv
>>>>> t_osc,
>>>>> +				    NULL, NULL, NULL);
>>>>> +		acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
>>>>> +				 acpi_set_hwp_native_thermal_l
>>>>> vt_o
>>>>> sc,
>>>>> +				 NULL, NULL);
>>>>> +	}
>>>>> +}
>>>>> +#else
>>>>> +
>>>>> +void acpi_early_processor_set_osc(void) {}
>>>>> +
>>>>> +#endif
>>>>> +
>>>>>  /*
>>>>>   * The following ACPI IDs are known to be suitable for
>>>>> representing as
>>>>>   * processor devices.
>>>>> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
>>>>> index 891c42d..7e73aea 100644
>>>>> --- a/drivers/acpi/bus.c
>>>>> +++ b/drivers/acpi/bus.c
>>>>> @@ -1005,6 +1005,9 @@ static int __init acpi_bus_init(void)
>>>>>  		goto error1;
>>>>>  	}
>>>>>  
>>>>> +	/* Set capability bits for _OSC under processor scope
>>>>> */
>>>>> +	acpi_early_processor_set_osc();
>>>>> +
>>>>>  	/*
>>>>>  	 * _OSC method may exist in module level code,
>>>>>  	 * so it must be run after ACPI_FULL_INITIALIZATION
>>>>> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
>>>>> index 1e6833a..5c787ac 100644
>>>>> --- a/drivers/acpi/internal.h
>>>>> +++ b/drivers/acpi/internal.h
>>>>> @@ -138,6 +138,8 @@ void acpi_early_processor_set_pdc(void);
>>>>>  static inline void acpi_early_processor_set_pdc(void) {}
>>>>>  #endif
>>>>>  
>>>>> +void acpi_early_processor_set_osc(void);
>>>>> +
>>>>>  /* ---------------------------------------------------------
>>>>> ----
>>>>> -------------
>>>>>                                    Embedded Controller
>>>>>     -----------------------------------------------------------
>>>>> ----
>>>>> ----------- */
>>>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
  2016-03-17 23:44         ` Linda Knippers
@ 2016-03-18  0:17           ` Rafael J. Wysocki
  2016-03-18 15:00             ` Linda Knippers
  0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2016-03-18  0:17 UTC (permalink / raw)
  To: Linda Knippers
  Cc: Srinivas Pandruvada, tony.luck, bp, tglx, mingo, hpa, lenb,
	linux-edac, linux-kernel

On Thursday, March 17, 2016 07:44:47 PM Linda Knippers wrote:
> 
> On 3/17/2016 5:12 PM, Srinivas Pandruvada wrote:
> <snip>
> >>>>> This needs to be done
> >>>>> before SMM code path looks for _OSC capabilities. The bit 12 of
> >>>>> _OSC in processor scope defines whether OS will handle thermal
> >>>>> interrupts.
> >>>>> When bit 12 is set to 1, OS will handle thermal interrupts.
> >>>>> Refer to this document for details on _OSC
> >>>>> http://www.intel.com/content/www/us/en/standards/processor-vend
> >>>>> or-
> >>>>> specific-acpi-specification.html
> >>>> Where is bit 12 documented?
> >>>>
> >>> In the above document.
> >> When I look at that document, I see bit 12 described as
> >> "If set, OSPM supports native interrupt handling for Collaborative
> >> Processor
> >> Performance Control notifications."  Is that the same thing or am
> >> I looking at the wrong table?
> > Yes. If you look at section 14.4 in Intel SDM, you will see that 
> > "HWP is an implementation of the ACPI-defined Collaborative Processor
> > Performance Control (CPPC)". Section 14.4.5 also specifies that HWP
> > uses IA32_THERM_STATUS to communicate if there are notifications, which
> > is notified via thermal interrupt.
> 
> Ok, thanks. That wasn't clear from the commit message.  It
> sounded like bit 12 directly indicated that the OS will handle
> thermal interrupts but it's a bit more indirect than that.
> 
> > You asked above if platform can handle these notification in SMM only.
> > If you do then the notification will arrive as ACPI notifications. We
> > don't have support for such notifications in Linux yet.
> 
> What I meant to ask was if the platform can disregard the _OSC information
> and handle thermal events on it's own, without OS involvement.
> For example, servers typically don't want to rely on the OS to manage
> thermal issues.
> 
> <snip>
> 
> >>>>> This change introduces a new function
> >>>>> acpi_early_processor_set_osc(),
> >>>>> which walks acpi name space and finds acpi processor object and
> >>>>> set capability via _OSC method to take over thermal LVT.
> >>>> Does this change just affect Skylake platforms or all platforms?
> >>> Any platform which has Intel ® Speed Shift Technology (aka HWP)
> >>> feature present and enabled.
> 
> Could this be an unexpected change in behavior for platforms
> with HWP that don't have this bug, assuming they would look at
> the _OSC CPPP bit?  That's actually my main concern here.

Do you have any specific platforms in mind or just in general?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling
  2016-03-18  0:17           ` Rafael J. Wysocki
@ 2016-03-18 15:00             ` Linda Knippers
  0 siblings, 0 replies; 8+ messages in thread
From: Linda Knippers @ 2016-03-18 15:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, tony.luck, bp, tglx, mingo, hpa, lenb,
	linux-edac, linux-kernel

On 3/17/2016 8:17 PM, Rafael J. Wysocki wrote:
<snip>

>>>>>>> This change introduces a new function
>>>>>>> acpi_early_processor_set_osc(),
>>>>>>> which walks acpi name space and finds acpi processor object and
>>>>>>> set capability via _OSC method to take over thermal LVT.
>>>>>> Does this change just affect Skylake platforms or all platforms?
>>>>> Any platform which has Intel ® Speed Shift Technology (aka HWP)
>>>>> feature present and enabled.
>>
>> Could this be an unexpected change in behavior for platforms
>> with HWP that don't have this bug, assuming they would look at
>> the _OSC CPPP bit?  That's actually my main concern here.
> 
> Do you have any specific platforms in mind or just in general?

It's just general right now but as more HWP server platforms come out,
it may become specific.

-- ljk
> 
> Thanks,
> Rafael
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-03-18 15:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17 18:24 [PATCH v4] acpi: Issue _OSC call for native thermal interrupt handling Srinivas Pandruvada
2016-03-17 20:03 ` Linda Knippers
2016-03-17 20:36   ` Srinivas Pandruvada
2016-03-17 20:51     ` Linda Knippers
2016-03-17 21:12       ` Srinivas Pandruvada
2016-03-17 23:44         ` Linda Knippers
2016-03-18  0:17           ` Rafael J. Wysocki
2016-03-18 15:00             ` Linda Knippers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.