linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ACPI OSI disaster on latest HP laptops - critical temperature shutdowns
@ 2008-07-24 15:27 Thomas Renninger
  2008-07-24 15:42 ` Arjan van de Ven
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Thomas Renninger @ 2008-07-24 15:27 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-acpi, Moore, Robert, Linux Kernel Mailing List, Andi Kleen,
	Len Brown, Christian Kornacker

I found this BIOS bug some days ago.
The positive with this one is, that it nicely shows the
need of some things I lately came up with
(point 1. and 2., 3. and 4. are further suggestions):

1) Do not be transparent to Windows in ACPI OSI parts
   -> and do not fake to be Windows as long term goal

2) Document _OSI BIOS developer usage in
   Documentation/acpi/known_bios_osi_workarounds

3) Linuxfirmwarekit needs kernel support

4) ACPI AML functionality to report errors to the OS


The problem:

HP extensively makes use of ACPI thermal zones.
It seems they hit a bug in Vista which probably caused their
machines to be shut down through a critical temperature event.
They now workaround that Vista bug by returning zero for _CRT
(which is the critical temperature in Kelvin * 10).
So they return -273 degree Celsius which leads to a critical
temperature shutdown as soon as the ACPI thermal driver is loaded.

This is in short the corresponding ACPI BIOS code:

# BIOS checks which OS is running (most parts cut off)
# Linux is returning true for all but not for "Windows 2006 SP1"
# (Vista SP1) and not for "Linux"
                ...
                If (_OSI ("Windows 2001 SP3"))
                {
                    Store (0x12, OSTB)
                    Store (0x12, TPOS)
                }
                If (_OSI ("Windows 2006"))
                {
                    Store (0x40, OSTB)
                    Store (0x40, TPOS)
                }

                If (_OSI ("Windows 2006 SP1"))
                {
                    Store (0x41, OSTB)
                    Store (0x40, TPOS)
                }

                If (_OSI ("Linux"))
                {
                    Store (One, LINX)
                    Store (0x80, OSTB)
                    Store (0x80, TPOS)
                }


# Valid critical/hot temperature: 105 (0x69)
        Name (TPC, 0x69)
        ...
           Method (_HOT, 0, Serialized)
            {
# Match for Vista only, not for Vista SP1 !
!!!                If (LEqual (TPOS, 0x40))
                {
                    Return (Add (0x0AAC, Multiply (TPC, 0x0A)))
                }
                Else
                {
                    Return (Zero)
                }
            }
            Method (_CRT, 0, Serialized)
            {
# Returns valid values for all Windows version before Vista
!!!                If (LLess (TPOS, 0x40))
                {
# This is the valid one: 105 C -> (105 * 10) + 2732 (Kelvin * 10)
                    Return (Add (0x0AAC, Multiply (TPC, 0x0A)))
                }
                Else
                {
# This is returned on Windows Vista
                    Return (Zero)
                }
            }
----------------------

This is the fix for this from Arjan:

    ACPI: Reject below-freezing temperatures as invalid critical temperatures

    My laptop thinks that it's a good idea to give -73C as the critical
    CPU temperature.... which isn't the best thing since it causes a shutdown
    right at bootup.

    Temperatures below freezing are clearly invalid critical thresholds
    so just reject these as such.

commit a39a2d7c72b358c6253a2ec28e17b023b7f6f41c
@@ -364,10 +364,17 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
        if (flag & ACPI_TRIPS_CRITICAL) {
                status = acpi_evaluate_integer(tz->device->handle,
                                "_CRT", NULL, &tz->trips.critical.temperature);
-               if (ACPI_FAILURE(status)) {
+               /*
+                * Treat freezing temperatures as invalid as well; some
+                * BIOSes return really low values and cause reboots at startup.
+                * Below zero (Celcius) values clearly aren't right for sure..
+                * ... so lets discard those as invalid.
+                */
+               if (ACPI_FAILURE(status) ||
+                               tz->trips.critical.temperature <= 2732) {
                        tz->trips.critical.flags.valid = 0;
                        ACPI_EXCEPTION((AE_INFO, status,
-                                       "No critical threshold"));
+                                       "No or invalid critical threshold"));
                        return -ENODEV;
                } else {
                        tz->trips.critical.flags.valid = 1;


----------------------


What are the consequences of:
  1) The fact that BIOS vendors have to fix Windows bugs/erratas through
     ACPI _OSI hooks (this is nearly the only way BIOS vendors do use the
     _OSI interface)
  2) The current Linux _OSI implementation being transparent to Windows
  3) The invalid critical temperature is simply ignored and the trip
     point not shown to userspace


1) One must assume that such a Vista or Vista SP1 only bug workaround has to
   be spread by HP to all of their BIOSes, thus killing all ACPI aware Linux
   kernels to work.

2) Vendors who want to provide Linux and Windows support
   have to provide a separate BIOS or patch the Linux kernel so that they
   need not to run Windows errata workarounds through _OSI hooks.

3) This Vista bug can be workarounded by checking for zero.
   Things could get more complex.
   Linux cannot implement all Windows bugs of all Windows versions on
   long-term.

4) HP certifies (at least some of) their laptops to work with distributions.
   The above patch absorbs the BIOS bug, making it impossible for the current
   Linuxfirmwarekit implementation to detect it.
   Above BIOS update could have been rejected by certification -> needs
   a kernel facility to report BIOS bugs. Or at least the certified
   distribution could have been patched along with with this BIOS update/
   breakage.

5) It is just a matter of time until Windows version specific ACPI bugs are
   workarounded in BIOSes in the server area also.

Therefore some suggestions (from above):

1) As a long term goal Linux should not be transparent to Windows.
   Nearly all _OSI conditions where ACPI code is checking which OS is
   running, do implement Windows bug workarounds. Vendors are not able
   to fix the Windows implementation, therefore they have to do it in
   BIOS. While the next Windows generation might have fixed the cause,
   Linux tries to implement (be compatible with) all Windows bugs.

2) Document Windows bugs workarounded via _OSI in
   Documentation/acpi/known_osi_hooks

3) Document Linux _OSI behavior. No ACPI BIOS developer is aware that
   Linux violates the Spec. All latest ACPI BIOSes do check for "Linux"
   as running OS, but Linux does not return true for the call.
   I have started to document current _OSI behavior on Linux. I then
   realized it might be a good idea to extend it a bit and talk about
   general ACPI BIOS problems on Linux. It's here: ftp://ftp.suse.com/pub/people/trenn/ACPI_BIOS_on_Linux_guide/acpi_guideline_for_vendors.pdf
   Comments for enhancements, additions, etc. are appreciated.
   I'll anounce that separately.

4) Provide a facility to tell userspace about BIOS bugs.
   The:
   FIRMWARE_BUG(severity, "Message");
   interface idea I mentioned recently in an unrelated thread.
   The idea is something similar to printk, but to be able use it intensively
   on each possible bogus value returned from BIOS (also for documentation)
   and to be able to compile it out to not waste that much memory on
   production kernels.
   At the end is a patch that extends Arjan's patch by also checking return
   values for hot (is an issue with HP Bioses already), passive and active
   trip points, in wrong BIOS value case we want to inform userspace
   that something in BIOS is bogus, so that HW vendors who care about Linux
   see that something could go wrong.

5) Something ACPI specific, maybe Intel is able to push this into the
   ACPI specification on (very) long-term:
     ACPI BIOS developers cannot report error conditions.
     Therefore you often end up in invalid values as they have to return
     a value if a function is provided even
     they know it does not make any sense at all.
     Ideas:
         1) Provide an error object similar to the debug object.
            -> Just to have something in the logs
         2) Add error values to each or sets of ACPI function
            -> cumbersome
         3) Introduce return_error statement which can be used instead of
            return. If it is used, the kernel must ignore the value
            of the function.
            -> would help a lot, similar functionality like 2., but easier


       Thomas


This patch also fixes hot, passive and active trip points in case
zero is returned as temperature invalidating the trip point.
Hopefully this can be reported as a firmware bug soon.

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 84c795f..f6344f6 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -400,7 +400,8 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
 	if (flag & ACPI_TRIPS_HOT) {
 		status = acpi_evaluate_integer(tz->device->handle,
 				"_HOT", NULL, &tz->trips.hot.temperature);
-		if (ACPI_FAILURE(status)) {
+		if (ACPI_FAILURE(status) ||
+				tz->trips.hot.temperature <= 2732) {
 			tz->trips.hot.flags.valid = 0;
 			ACPI_DEBUG_PRINT((ACPI_DB_INFO,
 					"No hot threshold\n"));
@@ -425,7 +426,8 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
 				"_PSV", NULL, &tz->trips.passive.temperature);
 		}
 
-		if (ACPI_FAILURE(status))
+		if (ACPI_FAILURE(status) ||
+				tz->trips.passive.temperature <= 2732)
 			tz->trips.passive.flags.valid = 0;
 		else {
 			tz->trips.passive.flags.valid = 1;
@@ -480,7 +482,8 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
 		if (flag & ACPI_TRIPS_ACTIVE) {
 			status = acpi_evaluate_integer(tz->device->handle,
 				name, NULL, &tz->trips.active[i].temperature);
-			if (ACPI_FAILURE(status)) {
+			if (ACPI_FAILURE(status) ||
+			    tz->trips.active[i].temperature <= 2732) {
 				tz->trips.active[i].flags.valid = 0;
 				if (i == 0)
 					break;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdowns
  2008-07-24 15:27 ACPI OSI disaster on latest HP laptops - critical temperature shutdowns Thomas Renninger
@ 2008-07-24 15:42 ` Arjan van de Ven
  2008-07-25  0:04 ` ACPI OSI disaster on latest HP laptops - critical temperature shutdown Len Brown
       [not found] ` <alpine.LFD.1.10.0807261434380.2958@localhost.localdomain>
  2 siblings, 0 replies; 25+ messages in thread
From: Arjan van de Ven @ 2008-07-24 15:42 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-acpi, Moore, Robert, Linux Kernel Mailing List, Andi Kleen,
	Len Brown, Christian Kornacker

Thomas Renninger wrote:
> This is the fix for this from Arjan:
> 
>     ACPI: Reject below-freezing temperatures as invalid critical temperatures
> 
>     My laptop thinks that it's a good idea to give -73C as the critical
>     CPU temperature.... which isn't the best thing since it causes a shutdown
>     right at bootup.
> 
>     Temperatures below freezing are clearly invalid critical thresholds
>     so just reject these as such.
> 

btw on my laptop, it wasn't 0 that was returned, but 2007.
This is suspected to be related to how windows finds some random other ACPI state to return,
but it significantly was an AML issue on the bios side.
Just the effect was a trainwreck so I added a check to the kernel (in addition to getting
full info to Robert for the ACPICA side of the issue).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-07-24 15:27 ACPI OSI disaster on latest HP laptops - critical temperature shutdowns Thomas Renninger
  2008-07-24 15:42 ` Arjan van de Ven
@ 2008-07-25  0:04 ` Len Brown
  2008-07-25 10:44   ` Andi Kleen
                     ` (2 more replies)
       [not found] ` <alpine.LFD.1.10.0807261434380.2958@localhost.localdomain>
  2 siblings, 3 replies; 25+ messages in thread
From: Len Brown @ 2008-07-25  0:04 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

Thomas,

re: OSI(Windows...)

Linux will continue to claim OSI compatibility with Windows
until the day when the majority of Linux systems
have passed a Linux compatibility test rather than
a Windows compatibility test.

Re: OSI(Linux)

I've looked at O(100) DSDT's that look at OSI(Linux),
and all but serveral systems from two vendors do it by mistake.
They simply copied it from the bugged Intel reference code.

OSI(Linux) will _never_ be restored to Linux, ever.

re: the HP BIOS bug at hand.

Linux deletes the entire thermal zone when we see this.
(arguably, we could have just disabled the CRT
and kept the rest of the thermal zone).
If HP cared about testing Linux on this laptop
and had tools such that they could actually
test Linux compatiblity, it would be pretty clear
from user-space that their thermal zone was missing.

thanks,
-Len




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-07-25  0:04 ` ACPI OSI disaster on latest HP laptops - critical temperature shutdown Len Brown
@ 2008-07-25 10:44   ` Andi Kleen
  2008-07-25 11:19   ` Thomas Renninger
  2008-07-25 22:10   ` Eric Piel
  2 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2008-07-25 10:44 UTC (permalink / raw)
  To: Len Brown
  Cc: Thomas Renninger, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Christian Kornacker

Len Brown wrote:
> Thomas,
> 
> re: OSI(Windows...)

Thomas,

I discussed this with Len here in Ottawa. First I fully agree with his 
reasoning for the current behaviour.

The main problem with OSI(Linux) is that it would be a quickly moving 
target so checking for it wouldn't really help the BIOSes.

Still there might be special cases where BIOSes will still check (e.g.
if they intend to work with specific distribution releases which might 
have specific bugs). Or to check for specific Linux features.

One way to do the later would be to define new OSI flags for specific
features. Haven't got a good proposal for that currently, but it's a 
possibility.

The other thing that could be done is to define OSI flags specific for
special distribution releases so BIOSes could potentially check for bugs
in SLED10 or RHWS5 or something like this, which are hopefully stable
in that behaviour doesn't move as quickly. The way to do that wouldn't
be to change the kernel though, but just specify them on the command 
line using acpi_osi=...

-Andi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-07-25  0:04 ` ACPI OSI disaster on latest HP laptops - critical temperature shutdown Len Brown
  2008-07-25 10:44   ` Andi Kleen
@ 2008-07-25 11:19   ` Thomas Renninger
  2008-07-25 15:26     ` Rafael J. Wysocki
                       ` (2 more replies)
  2008-07-25 22:10   ` Eric Piel
  2 siblings, 3 replies; 25+ messages in thread
From: Thomas Renninger @ 2008-07-25 11:19 UTC (permalink / raw)
  To: Len Brown
  Cc: Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

On Friday 25 July 2008 02:04:32 Len Brown wrote:
> Thomas,
>
> re: OSI(Windows...)
>
> Linux will continue to claim OSI compatibility with Windows
> until the day when the majority of Linux systems
> have passed a Linux compatibility test rather than
> a Windows compatibility test.
And to try that out we need the acpi_osi=windows_false boot
param I sent recently. So will you accept that one?

Also we need this documented.
Will you accept a Documentation/acpi/known_osi_vendor_hooks.txt
file. Like that we get an idea of what kind of features come
in through which Windows version and more important, what kind of
ugly Windows bug workarounds exist (the latter will probably be more).

> Re: OSI(Linux)
>
> I've looked at O(100) DSDT's that look at OSI(Linux),
> and all but serveral systems from two vendors do it by mistake.
> They simply copied it from the bugged Intel reference code.
>
> OSI(Linux) will _never_ be restored to Linux, ever.
But it should not have been removed without announcing it half a
year before. It silently moved distributions and vendors into a
situation where they cannot support Linux and Windows with
the same BIOS anymore.

_OSI is mainly not used for interfaces/features in
reality (as you stated in the other mail), but to workaround very
specific Windows version bugs.

While the mainline kernel stays transparent to _OSI you
advise distributions to exactly not do that and provide e.g.
a "SLE 11" or "RHEL X" _OSI string to be able to
support the system on Linux and Windows, is that correct?
Or do you advise them to provide two separate BIOSes?
The last option, "do not implement Windows version bug
fixes" we cannot influence.
I do not see more options with the current implementation.

> re: the HP BIOS bug at hand.
>
> Linux deletes the entire thermal zone when we see this.
OpenSUSE 11.0 (2.6.25) and SLES10-SP2 (2.6.16) shut down when
the thermal driver is loaded. Probably every kernel in every
distribution out there currently is doing that.
> (arguably, we could have just disabled the CRT
> and kept the rest of the thermal zone).
> If HP cared about testing Linux on this laptop
> and had tools such that they could actually
> test Linux compatiblity, it would be pretty clear
> from user-space that their thermal zone was missing.

Len, this is not about the thermal zone, it is just
a real-world example of something I told you will happen
if Linux stays _OSI transparent with Windows.

This is about that they have to provide a BIOS hot-fix for
VISTA or VISTA SP and thus breaking Linux because there
is no way to distinguish anymore.
Windows 2007 likely will have that fixed and they provide
a sane _CRT trip point again.
This is an example of Windows versions workarounds that could
get much more complex, like initializing HW differently or
whatever.
_OSI is used by vendors as a convenient possibility to
adjust/workaround Windows bugs in their BIOSes, without
the need to pay Millions to Microsoft to fix their things.


          Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-07-25 11:19   ` Thomas Renninger
@ 2008-07-25 15:26     ` Rafael J. Wysocki
  2008-07-26 12:42       ` Andi Kleen
       [not found]       ` <alpine.LFD.1.10.0807261406230.2958@localhost.localdomain>
       [not found]     ` <alpine.LFD.1.10.0807250948320.3884@localhost.localdomain>
       [not found]     ` <alpine.LFD.1.10.0807261409290.2958@localhost.localdomain>
  2 siblings, 2 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2008-07-25 15:26 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Len Brown, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

On Friday, 25 of July 2008, Thomas Renninger wrote:
> On Friday 25 July 2008 02:04:32 Len Brown wrote:
[--snip--]
> 
> Len, this is not about the thermal zone, it is just
> a real-world example of something I told you will happen
> if Linux stays _OSI transparent with Windows.
> 
> This is about that they have to provide a BIOS hot-fix for
> VISTA or VISTA SP and thus breaking Linux because there
> is no way to distinguish anymore.
> Windows 2007 likely will have that fixed and they provide
> a sane _CRT trip point again.
> This is an example of Windows versions workarounds that could
> get much more complex, like initializing HW differently or
> whatever.
> _OSI is used by vendors as a convenient possibility to
> adjust/workaround Windows bugs in their BIOSes, without
> the need to pay Millions to Microsoft to fix their things.

This is a valid point, IMO.

If vendors use _OSI(Windows) to work around Windows bugs, we get broken
automatically on those systems unless we put in some DMI-based hacks.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-07-25  0:04 ` ACPI OSI disaster on latest HP laptops - critical temperature shutdown Len Brown
  2008-07-25 10:44   ` Andi Kleen
  2008-07-25 11:19   ` Thomas Renninger
@ 2008-07-25 22:10   ` Eric Piel
  2008-07-25 22:19     ` Moore, Robert
       [not found]     ` <alpine.LFD.1.10.0807261311420.2958@localhost.localdomain>
  2 siblings, 2 replies; 25+ messages in thread
From: Eric Piel @ 2008-07-25 22:10 UTC (permalink / raw)
  To: Len Brown
  Cc: Thomas Renninger, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

Len Brown schreef:
> Thomas,
> 
> re: OSI(Windows...)
> 
> Linux will continue to claim OSI compatibility with Windows
> until the day when the majority of Linux systems
> have passed a Linux compatibility test rather than
> a Windows compatibility test.
> 
> Re: OSI(Linux)
> 
> I've looked at O(100) DSDT's that look at OSI(Linux),
> and all but serveral systems from two vendors do it by mistake.
> They simply copied it from the bugged Intel reference code.
> 
> OSI(Linux) will _never_ be restored to Linux, ever.
> 
Just out of curiosity, let's imagine that today HP decides to fix its 
BIOS. What would be the way to do it? Of course, without putting 
additional problems when Windows is booted.

What they would want is to provide workarounds for each given version of 
Windows and provide a completely ACPI-compliant version when Linux is 
running. I fail to see how it is possible possible to do that today.
Well... they could detect Linux by checking that several OSI's for 
Windows pass, but that would be really a nasty kludge.

So, am I understanding correctly that we are in a desperate need for a 
good OSI solution? Until then, we can only bash and complain at the BIOS 
developers, but they have no way to fix the problems.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-07-25 22:10   ` Eric Piel
@ 2008-07-25 22:19     ` Moore, Robert
       [not found]     ` <alpine.LFD.1.10.0807261311420.2958@localhost.localdomain>
  1 sibling, 0 replies; 25+ messages in thread
From: Moore, Robert @ 2008-07-25 22:19 UTC (permalink / raw)
  To: Eric Piel, Len Brown, Lin, Ming M
  Cc: Thomas Renninger, Arjan van de Ven, linux-acpi,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

The goal for ACPICA has gone from being a complete "reference
implementation" of the ACPI specification to being a "Windows
bug-for-bug compatible" ACPI implementation.

So when we report _OS("Microsoft Windows NT") and respond OK to all _OSI
queries with Microsoft strings, we mean it.

Bob


>-----Original Message-----
>From: Eric Piel [mailto:eric.piel@tremplin-utc.net]
>Sent: Friday, July 25, 2008 3:11 PM
>To: Len Brown
>Cc: Thomas Renninger; Arjan van de Ven; linux-acpi; Moore, Robert;
Linux
>Kernel Mailing List; Andi Kleen; Christian Kornacker
>Subject: Re: ACPI OSI disaster on latest HP laptops - critical
temperature
>shutdown
>
>Len Brown schreef:
>> Thomas,
>>
>> re: OSI(Windows...)
>>
>> Linux will continue to claim OSI compatibility with Windows
>> until the day when the majority of Linux systems
>> have passed a Linux compatibility test rather than
>> a Windows compatibility test.
>>
>> Re: OSI(Linux)
>>
>> I've looked at O(100) DSDT's that look at OSI(Linux),
>> and all but serveral systems from two vendors do it by mistake.
>> They simply copied it from the bugged Intel reference code.
>>
>> OSI(Linux) will _never_ be restored to Linux, ever.
>>
>Just out of curiosity, let's imagine that today HP decides to fix its
>BIOS. What would be the way to do it? Of course, without putting
>additional problems when Windows is booted.
>
>What they would want is to provide workarounds for each given version
of
>Windows and provide a completely ACPI-compliant version when Linux is
>running. I fail to see how it is possible possible to do that today.
>Well... they could detect Linux by checking that several OSI's for
>Windows pass, but that would be really a nasty kludge.
>
>So, am I understanding correctly that we are in a desperate need for a
>good OSI solution? Until then, we can only bash and complain at the
BIOS
>developers, but they have no way to fix the problems.
>
>Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-07-25 15:26     ` Rafael J. Wysocki
@ 2008-07-26 12:42       ` Andi Kleen
       [not found]       ` <alpine.LFD.1.10.0807261406230.2958@localhost.localdomain>
  1 sibling, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2008-07-26 12:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Renninger, Len Brown, Arjan van de Ven, linux-acpi, Moore,
	Robert, Linux Kernel Mailing List, Christian Kornacker


> If vendors use _OSI(Windows) to work around Windows bugs, we get broken
> automatically on those systems unless we put in some DMI-based hacks.

The general goal of ACPICA is to be bug-to-bug compatible with Windows.
So it might be needed for ACPICA to just emulate the respective bugs.

That said for this case I don't think that's needed, Linux just has to
detect the workarounds (which it already does I think)

-Andi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdowns
       [not found] ` <alpine.LFD.1.10.0807261434380.2958@localhost.localdomain>
@ 2008-08-01 21:02   ` Len Brown
  2008-08-13 19:22     ` Pavel Machek
  0 siblings, 1 reply; 25+ messages in thread
From: Len Brown @ 2008-08-01 21:02 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

>From lenb@kernel.org Sat Jul 26 14:40:36 2008
Date: Sat, 26 Jul 2008 14:40:35 -0400 (EDT)
From: Len Brown <lenb@kernel.org>
To: Thomas Renninger <trenn@suse.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>, linux-acpi <linux-acpi@vger.kernel.org>, "Moore, Robert" <robert.moore@intel.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Andi Kleen <ak@linux.intel.com>, Christian Kornacker <ckornacker@suse.de>
Subject: Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdowns

Thomas,

Thank you for debugging and reporting this issue.
I agree with some of your observations and conclusions,
but not with others, so lets review this carefully.

39a2d7c72b358c6253a2ec28e17b023b7f6f41c
(ACPI: Reject below-freezing temperatures as invalid critical temperatures)
was general workaround resulting from a specific HP machine
with a BIOS bug.

The machine functioned properly in 2.6.25, but shutdown
in 2.6.26-rc1.  Arjan and I debugged this together.
Unfortunately, we both neglected to put the bug URL
in the commit-it, so here it is:

http://bugzilla.kernel.org/show_bug.cgi?id=10686

The failure in bug 10686 is similar, but not identical
to the one you reported here with CRT returning 0.
Arjan's HP has a _CRT with no return statement at all.
In Linux-2.6.25, this _CRT was rejected with

ACPI Exception (thermal-0365): AE_BAD_DATA, No critical threshold [20070126]

and the entire thermal zone was rejected.

4e3156b183aa087bc19804b3295c7c1a71f64752
(ACPICA: changed order of interpretation of operand objects)
ironically, a MS bug compatibility patch,
had the side effect of causing the implicit return
workaround applied to _CRT to return 2006 rather than bombing out.
This was interpreted as 200.7K, or -73C.

Bob looked into this one, and determined that the latest
ACPICA will return 0 here.

http://bugzilla.kernel.org/show_bug.cgi?id=10686#c9

Bob,
It may be helpful if you can elaborate on "latest ACPICA"
in this comment -- ie what release, or better yet, what patch
will cause Linux behavior to change on this code fragment?

If we suddenly start returning 0 there, we'll still be okay
because Arjan's patch above will still catch it.

Anyway, we had a choice of simple fixes for Arjan's HP.
At the time, the question was whether to reject
the entire thermal zone -- failing like 2.6.25
(a thermal zone w/o a _CRT is invalid per spec)
or to reject just the _CRT (ala thermal.nocrt).

We decided to keep it simple (and similar to 2.6.25)
and reject the entire thermal zone.  Thinking about this more,
I think it would be a good idea to instead go
the thermal.nocrt route -- for if this machine
had ACPI fan control (this one doesn't),
the rest of the thermal zone
would be pretty important to normal use....

Rui,
as maintainer of ACPI_THERMAL, perhaps you can look into that,
if Thomas doesn't beat you to it?

In light of Thomas' sighting and Bob's mention that the
latest interpreter will return 0 here...

ALL THIS TELLS US is that Vista doesn't fail certification
when _CRT returns 0.

IT DOES NOT TELL US that Vista has any sort of _CRT bug,
or that Vista mandates _CRT=0.
The T61 I'm typing on has a valid _CRT and a Vista sticker...

The AML Thomas' showed did this:

                If (_OSI ("Windows 2006")) {
                    Store (0x40, TPOS)
                }

            Method (_CRT, 0, Serialized) {
                If (LLess (TPOS, 0x40)) {
                    Return (...valid...)
                }
                Else {
                    Return (Zero)
		}

I draw a totally different conclusion than Thomas does.

This does not look like a Vista workaround to me,
it looks like a simple BIOS bug that Vista doesn't catch.

We've seen BIOS bugs like this many times.
They are consistent with this conversation:

Morning:
BIOS Manager: "please quickly update this platform to support Vista"
BIOS writer: "I'm busy today, but have 30 minutes if I work through lunch..."

Afternoon:
BIOS Manager: "did you look at that Vista update yet?"
BIOS Writer: "yes, I think I did it in only 20 minutes"
BIOS Manager: "you're awesome!  lets send it through WHQL,
	as I've got something else for you to do."

The BIOS passes WHQL and nobody with a brain ever looks
at the source code again...

It would be useful to find out what Vista actually _does_
with _CRT=0.  ie. do they throw out the thermal zone,
or just the _CRT.  Linux should ideally do the same.
However, the fact that plenty of systems with Vista stickers
are shipping with valid _CRT proves that it isn't Vista that
is mandating _CRT=0.

So I DO NOT BELIEVE that this sighting is proof that we should disable
OSI compatibility with Vista or any other version of Windows.
I feel STRONGLY that it is better to be compatible with the
tested path through the BIOS -- even if that tested
path includes workarounds for BIOS bugs that Windows
doesn't catch.  (or workarounds for real Windows bugs --
though I don't believe this thread isn't an example of one)
The alternative would be the FAR GREATER EVIL of trying
to be compatible with an entirely untested path
through the BIOS.  We've been there before and it
was horrific.

I think we all agree that the LONG term solution is to have
tools where OEMs can CERTIFY compatibility with Linux
and a large portion of the machines that Linux runs on
having passed that certification.  When that happens,
that is the time to re-visit our current strategy of
being bug compatible with Windows.  While I believe that
this is a realistic and valuable goal in some markets,
is seems unrealistic in the foreseeable future
in other markets.  ie. I think it is valuable and worth pursuing,
but I would not expect universal success in the foreseeable
future.

Andi,
I ACK Thomas' suggestion to check for <= 0C for HOT,
PSV and ACx trip points.  While we don't have such a
failure in hand and thus this is not urgent, it can
only make Linux more bomb proof.  We might dress it
up a bit, however.  I think that with acpi=strict,
we should complain loudly if this workaround is invoked,
if not disable it altogether.  Thus an OEM who can
boot with acpi=strict and not get warnings or failures
knows that they're not requiring any of our out-of-spec
workarounds.

Further, Thomas' sighting demonstrates that it is important
to get Arjan's patch back into the .stable releases.

thanks,
-Len



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
       [not found]     ` <alpine.LFD.1.10.0807250948320.3884@localhost.localdomain>
@ 2008-08-01 21:07       ` Len Brown
  2008-08-01 22:36         ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 25+ messages in thread
From: Len Brown @ 2008-08-01 21:07 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker


[resend of a message that apparently didn't make it off my laptop]

On Fri, 25 Jul 2008, Len Brown wrote:

> 
> 
> On Fri, 25 Jul 2008, Thomas Renninger wrote:
> 
> > On Friday 25 July 2008 02:04:32 Len Brown wrote:
> > > Thomas,
> > >
> > > re: OSI(Windows...)
> > >
> > > Linux will continue to claim OSI compatibility with Windows
> > > until the day when the majority of Linux systems
> > > have passed a Linux compatibility test rather than
> > > a Windows compatibility test.
> 
> > And to try that out we need the acpi_osi=windows_false boot
> > param I sent recently. So will you accept that one?
> 
> if you want to disable the OSI strings in ACPICA,
> you can already use "acpi=osi=" to disable _OSI support entirely.
> 
> > Also we need this documented.
> > Will you accept a Documentation/acpi/known_osi_vendor_hooks.txt
> > file. Like that we get an idea of what kind of features come
> > in through which Windows version and more important, what kind of
> > ugly Windows bug workarounds exist (the latter will probably be more).
> 
> I'm certainly open to posing what is know about vendor use of OSI.
> However, this is very close to the line of public reverse engineering
> which Intel employees are not allowed to do.
> 
> > > Re: OSI(Linux)
> > >
> > > I've looked at O(100) DSDT's that look at OSI(Linux),
> > > and all but serveral systems from two vendors do it by mistake.
> > > They simply copied it from the bugged Intel reference code.
> > >
> > > OSI(Linux) will _never_ be restored to Linux, ever.
> > But it should not have been removed without announcing it half a
> > year before. It silently moved distributions and vendors into a
> > situation where they cannot support Linux and Windows with
> > the same BIOS anymore.
> 
> nonsense.
> we've always had all the windows OSI strings
> and only a handful of models do anything with OSI Linux.
> 
> > _OSI is mainly not used for interfaces/features in
> > reality (as you stated in the other mail), but to workaround very
> > specific Windows version bugs.
> > 
> > While the mainline kernel stays transparent to _OSI you
> > advise distributions to exactly not do that and provide e.g.
> > a "SLE 11" or "RHEL X" _OSI string to be able to
> > support the system on Linux and Windows, is that correct?
> 
> No, but if a BIOS vendors works with you to get a SKU
> that does something special, you are certainly free
> to do this.
> 
> > Or do you advise them to provide two separate BIOSes?
> 
> No.
> 
> > The last option, "do not implement Windows version bug
> > fixes" we cannot influence.
> > I do not see more options with the current implementation.
> 
> stay the course is the best option.
> It is better to expose ourselves to the known tested
> Windows functionality -- even if it seems arbitrary,
> at least it is tested.  The !Windows case results in
> running _completely_ untested BIOS code.
> 
> > > re: the HP BIOS bug at hand.
> > >
> > > Linux deletes the entire thermal zone when we see this.
> 
> > OpenSUSE 11.0 (2.6.25) and SLES10-SP2 (2.6.16) shut down when
> > the thermal driver is loaded. Probably every kernel in every
> > distribution out there currently is doing that.
> 
> Andy needs to send Arjan's patch to 2.6.25.stable.
> 
> > > (arguably, we could have just disabled the CRT
> > > and kept the rest of the thermal zone).
> > > If HP cared about testing Linux on this laptop
> > > and had tools such that they could actually
> > > test Linux compatiblity, it would be pretty clear
> > > from user-space that their thermal zone was missing.
> 
> Note that what triggered the failure on Arjan's machine
> is a change to the implicit return workaround.
> In earlier releases we would get a run-time exception
> when we ran the bogus _CRT that had no return value.
> Bob enhanced the implicit return workaround so that
> it returned something to fix another case and it
> caused the year to be returned by CRT here, which
> in deci-kelvin was a bogus temperature.
> 
> If HP cared about Linux on this box, they would
> test with acpi=strict, which would turn off most
> of these workarounds, and the kernel would point
> out the problem to the tester.
> 
> > Len, this is not about the thermal zone,
> 
> Yes, it is.
> please see "stay the course" above.
> 
> thanks,
> -Len
> 
> > it is just
> > a real-world example of something I told you will happen
> > if Linux stays _OSI transparent with Windows.
> > 
> > This is about that they have to provide a BIOS hot-fix for
> > VISTA or VISTA SP and thus breaking Linux because there
> > is no way to distinguish anymore.
> > Windows 2007 likely will have that fixed and they provide
> > a sane _CRT trip point again.
> > This is an example of Windows versions workarounds that could
> > get much more complex, like initializing HW differently or
> > whatever.
> > _OSI is used by vendors as a convenient possibility to
> > adjust/workaround Windows bugs in their BIOSes, without
> > the need to pay Millions to Microsoft to fix their things.
> > 
> > 
> >           Thomas
> > 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
       [not found]     ` <alpine.LFD.1.10.0807261311420.2958@localhost.localdomain>
@ 2008-08-01 21:07       ` Len Brown
  0 siblings, 0 replies; 25+ messages in thread
From: Len Brown @ 2008-08-01 21:07 UTC (permalink / raw)
  To: Eric Piel
  Cc: Thomas Renninger, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

[resend of a message that didn't hit the list]

On Sat, 26 Jul 2008, Len Brown wrote:

> 
> 
> On Sat, 26 Jul 2008, Eric Piel wrote:
> 
> > Just out of curiosity, let's imagine that today HP decides to fix its BIOS.
> > What would be the way to do it? Of course, without putting additional problems
> > when Windows is booted.
> 
> The should return a valid temperature from _CRT,
> and that should work everyplace without any check for OS version.
> 
> We know that this must work, because plenty of Vista compatible
> platforms do it -- such as the T61 that I'm typing on.
> 
> ie. we really don't know why HP returned _CRT=0 for vista in this BIOS,
> we just assume that it didn't cause anything bad to happen.
> 
> -Len
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
       [not found]       ` <alpine.LFD.1.10.0807261406230.2958@localhost.localdomain>
@ 2008-08-01 21:08         ` Len Brown
  2008-08-03 17:23           ` Thomas Renninger
  0 siblings, 1 reply; 25+ messages in thread
From: Len Brown @ 2008-08-01 21:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Renninger, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

[yet another resend]

On Sat, 26 Jul 2008, Len Brown wrote:

> 
> 
> On Fri, 25 Jul 2008, Rafael J. Wysocki wrote:
> 
> > On Friday, 25 of July 2008, Thomas Renninger wrote:
> > > On Friday 25 July 2008 02:04:32 Len Brown wrote:
> > [--snip--]
> > > 
> > > Len, this is not about the thermal zone, it is just
> > > a real-world example of something I told you will happen
> > > if Linux stays _OSI transparent with Windows.
> > > 
> > > This is about that they have to provide a BIOS hot-fix for
> > > VISTA or VISTA SP and thus breaking Linux because there
> > > is no way to distinguish anymore.
> > > Windows 2007 likely will have that fixed and they provide
> > > a sane _CRT trip point again.
> > > This is an example of Windows versions workarounds that could
> > > get much more complex, like initializing HW differently or
> > > whatever.
> > > _OSI is used by vendors as a convenient possibility to
> > > adjust/workaround Windows bugs in their BIOSes, without
> > > the need to pay Millions to Microsoft to fix their things.
> > 
> > This is a valid point, IMO.
> > 
> > If vendors use _OSI(Windows) to work around Windows bugs, we get broken
> > automatically on those systems unless we put in some DMI-based hacks.
> 
> I belive that the AML Thomas shared does not
> illustrate a Vista bug workaround in a BIOS.
> Rather it is simply a BIOS bug that Vista doesn't catch.
> 
> -Len
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
       [not found]     ` <alpine.LFD.1.10.0807261409290.2958@localhost.localdomain>
@ 2008-08-01 21:08       ` Len Brown
  0 siblings, 0 replies; 25+ messages in thread
From: Len Brown @ 2008-08-01 21:08 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

[yet another resend]

On Sat, 26 Jul 2008, Len Brown wrote:

> > > re: OSI(Windows...)
> > >
> > > Linux will continue to claim OSI compatibility with Windows
> > > until the day when the majority of Linux systems
> > > have passed a Linux compatibility test rather than
> > > a Windows compatibility test.
> 
> > And to try that out we need the acpi_osi=windows_false boot
> > param I sent recently. So will you accept that one?
> 
> I believe that adding and using such a parameter would make
> Linux worse, not better.  So it is unlikely that I'd accept it.
> 
> > Also we need this documented.
> > Will you accept a Documentation/acpi/known_osi_vendor_hooks.txt
> > file. Like that we get an idea of what kind of features come
> > in through which Windows version and more important, what kind of
> > ugly Windows bug workarounds exist (the latter will probably be more).
> 
> I believe that this thread illustrates a BIOS bug that Vista
> doesn't catch.  That doens't mean it is a Vista bug, or or
> a BIOS workaround for a Vista bug.
> 
> I believe that there are many such holes in Windows testing.
> They don't have to do a good job validating ACPI --
> they just care if Windows works or not.
> 
> If we should document every BIOS issue that is worked
> around by Linux is an interesting idea for a project.
> It sounds like a pretty big project to me.
> I guess I'd wonder what the return on investment would be
> and if that is the best way to apply our resources.
> 
> > > Re: OSI(Linux)
> > >
> > > I've looked at O(100) DSDT's that look at OSI(Linux),
> > > and all but serveral systems from two vendors do it by mistake.
> > > They simply copied it from the bugged Intel reference code.
> > >
> > > OSI(Linux) will _never_ be restored to Linux, ever.
> 
> > But it should not have been removed without announcing it half a
> > year before. It silently moved distributions and vendors into a
> > situation where they cannot support Linux and Windows with
> > the same BIOS anymore.
> 
> Linux started complaining about OSI(Linux) in 2.6.22, a year ago.
> Linux changed the default to disable OSI(Linux) in 2.6.23 --
> 3 months after the warning started.
> 
> If I were to do it again, I would have changed faster, not slower,
> for allowing the spread of OSI(Linux) use is bad for Linux,
> not good.
> 
> Of course a distro is free to maintain whatever OSI strings
> that they think they can their OEMs can support.
> 
> > _OSI is mainly not used for interfaces/features in
> > reality (as you stated in the other mail), but to workaround very
> > specific Windows version bugs.
> > 
> > While the mainline kernel stays transparent to _OSI you
> > advise distributions to exactly not do that and provide e.g.
> > a "SLE 11" or "RHEL X" _OSI string to be able to
> > support the system on Linux and Windows, is that correct?
> > Or do you advise them to provide two separate BIOSes?
> > The last option, "do not implement Windows version bug
> > fixes" we cannot influence.
> > I do not see more options with the current implementation.
> > 
> > > re: the HP BIOS bug at hand.
> > >
> > > Linux deletes the entire thermal zone when we see this.
> > OpenSUSE 11.0 (2.6.25) and SLES10-SP2 (2.6.16) shut down when
> > the thermal driver is loaded. Probably every kernel in every
> > distribution out there currently is doing that.
> 
> Clearly Arjan's patch nees to be backported to the stable releases --
> even though it would have no benefit on Arjan's machine,
> it would benefit the HP that you have.
> 
> Andi,
> Please send 39a2d7c72b358c6253a2ec28e17b023b7f6f41c
> (ACPI: Reject below-freezing temperatures as invalid critical 
> temperatures)
> to 2.6.25.stable (and any stable release that will take it)
> 
> thanks,
> -Len
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-01 21:07       ` Len Brown
@ 2008-08-01 22:36         ` Henrique de Moraes Holschuh
  2008-08-02  5:42           ` Matthew Garrett
  0 siblings, 1 reply; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-08-01 22:36 UTC (permalink / raw)
  To: Len Brown
  Cc: Thomas Renninger, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

On Fri, 01 Aug 2008, Len Brown wrote:
> It is better to expose ourselves to the known tested Windows functionality
> -- even if it seems arbitrary, at least it is tested.  The !Windows case
> results in running _completely_ untested BIOS code.

Actually, we should masquerade properly as the latest Windows version
available for that machine, then.  AFAIK, Windows does not set ALL the OSI
strings, just one.  We ARE running untested code in some BIOSes because of
it.

Maybe it would be better if every ACPICA-using OS defined a
_OSI(NotWindows), plus the relevant Windows OSI string they want to support,
and Intel would send word that this string is to be used ONLY to disable all
Windows bug workarounds, not to activate or deactivate any specific
functionality?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-01 22:36         ` Henrique de Moraes Holschuh
@ 2008-08-02  5:42           ` Matthew Garrett
  2008-08-02 14:38             ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Garrett @ 2008-08-02  5:42 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Len Brown, Thomas Renninger, Arjan van de Ven, linux-acpi, Moore,
	Robert, Linux Kernel Mailing List, Andi Kleen,
	Christian Kornacker

On Fri, Aug 01, 2008 at 07:36:57PM -0300, Henrique de Moraes Holschuh wrote:
> On Fri, 01 Aug 2008, Len Brown wrote:
> > It is better to expose ourselves to the known tested Windows functionality
> > -- even if it seems arbitrary, at least it is tested.  The !Windows case
> > results in running _completely_ untested BIOS code.
> 
> Actually, we should masquerade properly as the latest Windows version
> available for that machine, then.  AFAIK, Windows does not set ALL the OSI
> strings, just one.  We ARE running untested code in some BIOSes because of
> it.

The BIOSes I've tested check _OSI in order of Windows release, which is 
consistent with Windows returning OSI strings for all previous versions. 
Do you have any examples that suggest this isn't the case?

> Maybe it would be better if every ACPICA-using OS defined a
> _OSI(NotWindows), plus the relevant Windows OSI string they want to support,
> and Intel would send word that this string is to be used ONLY to disable all
> Windows bug workarounds, not to activate or deactivate any specific
> functionality?

Not all BIOSes would support this, so we'd need to support the Windows 
workarounds anyway. At that point, there's no real benefit in having 
multiple codepaths.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-02  5:42           ` Matthew Garrett
@ 2008-08-02 14:38             ` Henrique de Moraes Holschuh
  2008-08-02 14:44               ` Norbert Preining
  2008-08-02 15:41               ` Matthew Garrett
  0 siblings, 2 replies; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-08-02 14:38 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Len Brown, Thomas Renninger, Arjan van de Ven, linux-acpi, Moore,
	Robert, Linux Kernel Mailing List, Andi Kleen,
	Christian Kornacker

On Sat, 02 Aug 2008, Matthew Garrett wrote:
> On Fri, Aug 01, 2008 at 07:36:57PM -0300, Henrique de Moraes Holschuh wrote:
> > On Fri, 01 Aug 2008, Len Brown wrote:
> > > It is better to expose ourselves to the known tested Windows functionality
> > > -- even if it seems arbitrary, at least it is tested.  The !Windows case
> > > results in running _completely_ untested BIOS code.
> > 
> > Actually, we should masquerade properly as the latest Windows version
> > available for that machine, then.  AFAIK, Windows does not set ALL the OSI
> > strings, just one.  We ARE running untested code in some BIOSes because of
> > it.
> 
> The BIOSes I've tested check _OSI in order of Windows release, which is 
> consistent with Windows returning OSI strings for all previous versions. 
> Do you have any examples that suggest this isn't the case?

Yes.  IBM ThinkPads store the result of each version test separately, and I
recall I saw at least one DSDT code path that didn't test all of them in
order to select a branch of code to run.

It will be hard to find that DSDT, though. It was sometime ago :(

> > Maybe it would be better if every ACPICA-using OS defined a
> > _OSI(NotWindows), plus the relevant Windows OSI string they want to support,
> > and Intel would send word that this string is to be used ONLY to disable all
> > Windows bug workarounds, not to activate or deactivate any specific
> > functionality?
> 
> Not all BIOSes would support this, so we'd need to support the Windows 
> workarounds anyway. At that point, there's no real benefit in having 
> multiple codepaths.

Sorry, but I will disagree.

Anything that can help in the future with the vendors that are better at
Linux support is a good thing.  You are right that we will still have to
deal with the others, but there are such things as vendor-specific windows
workarounds (they didn't want to change their firmware, or they couldn't, or
the others didn't care to add the workaround, etc).  If that vendor uses the
"NotWindows" OSI correctly, we would not need to take any special action.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-02 14:38             ` Henrique de Moraes Holschuh
@ 2008-08-02 14:44               ` Norbert Preining
  2008-08-02 14:51                 ` Henrique de Moraes Holschuh
  2008-08-02 15:41               ` Matthew Garrett
  1 sibling, 1 reply; 25+ messages in thread
From: Norbert Preining @ 2008-08-02 14:44 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh, Matthew Garrett, Len Brown,
	Thomas Renninger, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

Hi everyone,

(please Cc or leave linux-acpi on the recipient list, thanks)

sorry to chime in so late, but I am experiencing something similar: My
Acer TravelMate 3012 sometimes shuts down at the boot sequence telling
me something like "critical temperature" (more I cannot read, it is
quite fast in turning of the laptop.

That happens at strange times (couldn't find a pattern), and after being
switched off for hours, so the laptop is definitely not hot.

Using 2.6.27-rc1, but I am relatively sure that I had similar things in
the past.

There something else strange, no idea if that is related, sometimes it
just hangs after 
	Setting the clock.
That is a message of Debian/sid and probably calls hwclock or something
similar.

If all that is something else, sorry for disturbing.

If you need .config, dmesg, acpidump, dmidecode, whatever, let me know.

Best wishes

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining <preining@logic.at>        Vienna University of Technology
Debian Developer <preining@debian.org>                         Debian TeX Group
gpg DSA: 0x09C5B094      fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
You step in the stream,
But the water has moved on.
This page is not here.
                       --- Windows Error Haiku

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-02 14:44               ` Norbert Preining
@ 2008-08-02 14:51                 ` Henrique de Moraes Holschuh
  2008-08-04 21:33                   ` Norbert Preining
  0 siblings, 1 reply; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-08-02 14:51 UTC (permalink / raw)
  To: Norbert Preining
  Cc: Matthew Garrett, Len Brown, Thomas Renninger, Arjan van de Ven,
	linux-acpi, Moore, Robert, Linux Kernel Mailing List, Andi Kleen,
	Christian Kornacker

On Sat, 02 Aug 2008, Norbert Preining wrote:
> There something else strange, no idea if that is related, sometimes it
> just hangs after 
> 	Setting the clock.
> That is a message of Debian/sid and probably calls hwclock or something
> similar.

Yes, it is a call to hwclock.  It recently started to cause lockups on my
ThinkPad T43 too (it is a regression of some sort, either in hwclock or the
kernel), but I was too busy to care.  I just removed the hwclock calls on
the init path completely, since my clock is in UTC anyway and they were just
wasting boot time.

Interestingly enough, the hwclock call to SET the clock on the shutdown path
never causes any problems.   This could be a Debian userspace issue, or
hwclock reading of the RTC is broken, but writes are not.

But the thermal hang is something else entirely.  You have at least two
different bugs causing you grief.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-02 14:38             ` Henrique de Moraes Holschuh
  2008-08-02 14:44               ` Norbert Preining
@ 2008-08-02 15:41               ` Matthew Garrett
  2008-08-02 17:49                 ` Henrique de Moraes Holschuh
  1 sibling, 1 reply; 25+ messages in thread
From: Matthew Garrett @ 2008-08-02 15:41 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Len Brown, Thomas Renninger, Arjan van de Ven, linux-acpi, Moore,
	Robert, Linux Kernel Mailing List, Andi Kleen,
	Christian Kornacker

On Sat, Aug 02, 2008 at 11:38:33AM -0300, Henrique de Moraes Holschuh wrote:

> Yes.  IBM ThinkPads store the result of each version test separately, and I
> recall I saw at least one DSDT code path that didn't test all of them in
> order to select a branch of code to run.

As an example, here's a section from the T61:

                    If (\_OSI ("Windows 2001"))
                    {
                        Store (0x01, \WNTF)
                        Store (0x01, \WXPF)
                        Store (0x00, \WSPV)
                    }

                    If (\_OSI ("Windows 2001 SP1"))
                    {
                        Store (0x01, \WSPV)
                    }

                    If (\_OSI ("Windows 2001 SP2"))
                    {
                        Store (0x02, \WSPV)
                    }

And then later:

           If (LAnd (\WXPF, LGreaterEqual (\WSPV, 0x01)))
            {
                PPMS (0x02)
            }

The only way WXPF can be non-zero and WSPV can be greater or equal to 
one is if more than one of those tests succeeded.

> > Not all BIOSes would support this, so we'd need to support the Windows 
> > workarounds anyway. At that point, there's no real benefit in having 
> > multiple codepaths.
> 
> Sorry, but I will disagree.
> 
> Anything that can help in the future with the vendors that are better at
> Linux support is a good thing.  You are right that we will still have to
> deal with the others, but there are such things as vendor-specific windows
> workarounds (they didn't want to change their firmware, or they couldn't, or
> the others didn't care to add the workaround, etc).  If that vendor uses the
> "NotWindows" OSI correctly, we would not need to take any special action.

Allowing vendors to special-case Linux means that we have to have a 
special-case path for the minority of vendors who ask for this. It's 
added complexity and we don't actually gain anything from it.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-02 15:41               ` Matthew Garrett
@ 2008-08-02 17:49                 ` Henrique de Moraes Holschuh
  2008-08-02 19:49                   ` Matthew Garrett
  0 siblings, 1 reply; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-08-02 17:49 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Len Brown, Thomas Renninger, Arjan van de Ven, linux-acpi, Moore,
	Robert, Linux Kernel Mailing List, Andi Kleen,
	Christian Kornacker

On Sat, 02 Aug 2008, Matthew Garrett wrote:
> On Sat, Aug 02, 2008 at 11:38:33AM -0300, Henrique de Moraes Holschuh wrote:
> > > Not all BIOSes would support this, so we'd need to support the Windows 
> > > workarounds anyway. At that point, there's no real benefit in having 
> > > multiple codepaths.
> > 
> > Sorry, but I will disagree.
> > 
> > Anything that can help in the future with the vendors that are better at
> > Linux support is a good thing.  You are right that we will still have to
> > deal with the others, but there are such things as vendor-specific windows
> > workarounds (they didn't want to change their firmware, or they couldn't, or
> > the others didn't care to add the workaround, etc).  If that vendor uses the
> > "NotWindows" OSI correctly, we would not need to take any special action.
> 
> Allowing vendors to special-case Linux means that we have to have a 
> special-case path for the minority of vendors who ask for this. It's 
> added complexity and we don't actually gain anything from it.

Correction: special case ALL non-windows.  There's a reason why I said
ACPICA should provide OSI(NonWindows).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-02 17:49                 ` Henrique de Moraes Holschuh
@ 2008-08-02 19:49                   ` Matthew Garrett
  0 siblings, 0 replies; 25+ messages in thread
From: Matthew Garrett @ 2008-08-02 19:49 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Len Brown, Thomas Renninger, Arjan van de Ven, linux-acpi, Moore,
	Robert, Linux Kernel Mailing List, Andi Kleen,
	Christian Kornacker

On Sat, Aug 02, 2008 at 02:49:58PM -0300, Henrique de Moraes Holschuh wrote:

> Correction: special case ALL non-windows.  There's a reason why I said
> ACPICA should provide OSI(NonWindows).

The only two operating systems of any significance that can run on 
generic hardware are Windows and Linux. Given that BIOS authors hate the 
human race, OSI(NonWindows) is equivalent to OSI(Linux).

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-01 21:08         ` Len Brown
@ 2008-08-03 17:23           ` Thomas Renninger
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Renninger @ 2008-08-03 17:23 UTC (permalink / raw)
  To: Len Brown
  Cc: Rafael J. Wysocki, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

On Friday 01 August 2008 11:08:08 pm Len Brown wrote:
> [yet another resend]
>
> On Sat, 26 Jul 2008, Len Brown wrote:
> > On Fri, 25 Jul 2008, Rafael J. Wysocki wrote:
> > > On Friday, 25 of July 2008, Thomas Renninger wrote:
> > > > On Friday 25 July 2008 02:04:32 Len Brown wrote:
> > >
> > > [--snip--]
> > >
> > > > Len, this is not about the thermal zone, it is just
> > > > a real-world example of something I told you will happen
> > > > if Linux stays _OSI transparent with Windows.
> > > >
> > > > This is about that they have to provide a BIOS hot-fix for
> > > > VISTA or VISTA SP and thus breaking Linux because there
> > > > is no way to distinguish anymore.
> > > > Windows 2007 likely will have that fixed and they provide
> > > > a sane _CRT trip point again.
If Windows is returning true for all ever existing OSI(Windows XY) versions as 
Linux is doing it and they stick to that in the future, then my above 
assumption is not true.

Thanks to Matthew Garret pointing to the relevant Microsoft documentation.

> > > > This is an example of Windows versions workarounds that could
> > > > get much more complex, like initializing HW differently or
> > > > whatever.
> > > > _OSI is used by vendors as a convenient possibility to
> > > > adjust/workaround Windows bugs in their BIOSes, without
> > > > the need to pay Millions to Microsoft to fix their things.

If the next Windows version also returns true for the one for which the 
workaround applies to, they have to take care that only the one OS is 
matching in a follow up update, e.g.:
And(brokenOS, !new_OSes)
So this should make half way sure that vendors do not mis-use this too often.
Also it is ensured that the next Linux kernel generation returning 
OSI(newOSes) will get the correct AML code again.

And distributions can still offer OSI(supported dist) for emergencies.

Sorry for the loud noise.
I am convinced now that it is not that bad as it first looked like.

Thanks for all the input/feedback,

          Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdown
  2008-08-02 14:51                 ` Henrique de Moraes Holschuh
@ 2008-08-04 21:33                   ` Norbert Preining
  0 siblings, 0 replies; 25+ messages in thread
From: Norbert Preining @ 2008-08-04 21:33 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Matthew Garrett, Len Brown, Thomas Renninger, Arjan van de Ven,
	linux-acpi, Moore, Robert, Linux Kernel Mailing List, Andi Kleen,
	Christian Kornacker

Hi Nerique,

On Sa, 02 Aug 2008, Henrique de Moraes Holschuh wrote:
> Yes, it is a call to hwclock.  It recently started to cause lockups on my
> ThinkPad T43 too (it is a regression of some sort, either in hwclock or the

Thanks a lot for that bit, didn't know and was already scared that my
laptop was nearing its end. 

> But the thermal hang is something else entirely.  You have at least two
> different bugs causing you grief.

We will see how rc2++ behave. Thanks a lot.

Best wishes

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining <preining@logic.at>        Vienna University of Technology
Debian Developer <preining@debian.org>                         Debian TeX Group
gpg DSA: 0x09C5B094      fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
LISTOWEL (n.)
The small mat on the bar designed to be more absorbent than the bar,
but not as absorbent as your elbows.
			--- Douglas Adams, The Meaning of Liff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: ACPI OSI disaster on latest HP laptops - critical temperature shutdowns
  2008-08-01 21:02   ` ACPI OSI disaster on latest HP laptops - critical temperature shutdowns Len Brown
@ 2008-08-13 19:22     ` Pavel Machek
  0 siblings, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2008-08-13 19:22 UTC (permalink / raw)
  To: Len Brown
  Cc: Thomas Renninger, Arjan van de Ven, linux-acpi, Moore, Robert,
	Linux Kernel Mailing List, Andi Kleen, Christian Kornacker

Hi!

> I ACK Thomas' suggestion to check for <= 0C for HOT,
> PSV and ACx trip points.  While we don't have such a

Hmm, I don't think that's good idea. 0Celsius is not special value.

While having _CRT below 25Celsius would be quite strange (machine
that can not run on room temperature?), I could imagine it (cryogenic
cooling, cpu is overclocked, needs -10C to work).

Machine with AC0 == -10C is very easy to imagine, OTOH. It is normal
notebook that needs fans during normal temperatures, but does not need
it if it is really cold.

							Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-08-14 11:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-24 15:27 ACPI OSI disaster on latest HP laptops - critical temperature shutdowns Thomas Renninger
2008-07-24 15:42 ` Arjan van de Ven
2008-07-25  0:04 ` ACPI OSI disaster on latest HP laptops - critical temperature shutdown Len Brown
2008-07-25 10:44   ` Andi Kleen
2008-07-25 11:19   ` Thomas Renninger
2008-07-25 15:26     ` Rafael J. Wysocki
2008-07-26 12:42       ` Andi Kleen
     [not found]       ` <alpine.LFD.1.10.0807261406230.2958@localhost.localdomain>
2008-08-01 21:08         ` Len Brown
2008-08-03 17:23           ` Thomas Renninger
     [not found]     ` <alpine.LFD.1.10.0807250948320.3884@localhost.localdomain>
2008-08-01 21:07       ` Len Brown
2008-08-01 22:36         ` Henrique de Moraes Holschuh
2008-08-02  5:42           ` Matthew Garrett
2008-08-02 14:38             ` Henrique de Moraes Holschuh
2008-08-02 14:44               ` Norbert Preining
2008-08-02 14:51                 ` Henrique de Moraes Holschuh
2008-08-04 21:33                   ` Norbert Preining
2008-08-02 15:41               ` Matthew Garrett
2008-08-02 17:49                 ` Henrique de Moraes Holschuh
2008-08-02 19:49                   ` Matthew Garrett
     [not found]     ` <alpine.LFD.1.10.0807261409290.2958@localhost.localdomain>
2008-08-01 21:08       ` Len Brown
2008-07-25 22:10   ` Eric Piel
2008-07-25 22:19     ` Moore, Robert
     [not found]     ` <alpine.LFD.1.10.0807261311420.2958@localhost.localdomain>
2008-08-01 21:07       ` Len Brown
     [not found] ` <alpine.LFD.1.10.0807261434380.2958@localhost.localdomain>
2008-08-01 21:02   ` ACPI OSI disaster on latest HP laptops - critical temperature shutdowns Len Brown
2008-08-13 19:22     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).