linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some sort corruption of my Thermal Subsystem after suspend to ram
@ 2008-04-30 20:42 Gabriel C
  2008-04-30 20:46 ` Jay Cliburn
  2008-05-01  5:57 ` Len Brown
  0 siblings, 2 replies; 12+ messages in thread
From: Gabriel C @ 2008-04-30 20:42 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Len Brown, linux-pm, pavel, Rafael J. Wysocki

Hi all,

I thought I've found all issue with that new box but I was wrong ;)

On an ASUS P5E-VM DO , 4G RAM , Q9300 CPU after suspend to ram Thermal 
Subsystem gets corrupted in some way.

I got some reboot , halt problems and was hunting the issue and noticed these problems only
occurred when I've suspend the box to ram , at least once.

I've tested 2.6.{24*,25,linus-git(before ACPI merge),x86-latest-git} and all got that problem.

Also on x86-latest-git I've tested with MTRR_SANITIZER on/off. It does not make any difference.


lspci output before s2r is :

..


00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev 02)
        Subsystem: ASUSTeK Computer Inc. Device [1043:8277]
        Flags: fast devsel
        Memory at fed08000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [50] Power Management version 3                                                                                                       

...


and after :

...

00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev ff) (prog-if ff)
        !!! Unknown header type 7f

...

dmesg | grep 00:1f.6
[21520.103062] PM: Writing back config space on device 0000:00:1f.6 at offset f (was ffffffff, writing 300)
[21520.103066] PM: Writing back config space on device 0000:00:1f.6 at offset e (was ffffffff, writing 0)
[21520.103070] PM: Writing back config space on device 0000:00:1f.6 at offset d (was ffffffff, writing 50)
[21520.103074] PM: Writing back config space on device 0000:00:1f.6 at offset c (was ffffffff, writing 0)
[21520.103078] PM: Writing back config space on device 0000:00:1f.6 at offset b (was ffffffff, writing 82771043)
[21520.103083] PM: Writing back config space on device 0000:00:1f.6 at offset a (was ffffffff, writing 0)
[21520.103087] PM: Writing back config space on device 0000:00:1f.6 at offset 9 (was ffffffff, writing 0)
[21520.103091] PM: Writing back config space on device 0000:00:1f.6 at offset 8 (was ffffffff, writing 0)
[21520.103095] PM: Writing back config space on device 0000:00:1f.6 at offset 7 (was ffffffff, writing 0)
[21520.103099] PM: Writing back config space on device 0000:00:1f.6 at offset 6 (was ffffffff, writing 0)
[21520.103103] PM: Writing back config space on device 0000:00:1f.6 at offset 5 (was ffffffff, writing 0)
[21520.103107] PM: Writing back config space on device 0000:00:1f.6 at offset 4 (was ffffffff, writing fed08004)
[21520.103111] PM: Writing back config space on device 0000:00:1f.6 at offset 3 (was ffffffff, writing 0)
[21520.103115] PM: Writing back config space on device 0000:00:1f.6 at offset 2 (was ffffffff, writing 11800002)
[21520.103119] PM: Writing back config space on device 0000:00:1f.6 at offset 1 (was ffffffff, writing 100002)
[21520.103123] PM: Writing back config space on device 0000:00:1f.6 at offset 0 (was ffffffff, writing 29328086)

..


Please let me know if you need my config , dmesg or any other informations.


Best Regards,

Gabriel C


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-04-30 20:42 Some sort corruption of my Thermal Subsystem after suspend to ram Gabriel C
@ 2008-04-30 20:46 ` Jay Cliburn
  2008-04-30 20:59   ` Gabriel C
  2008-04-30 21:02   ` Jay Cliburn
  2008-05-01  5:57 ` Len Brown
  1 sibling, 2 replies; 12+ messages in thread
From: Jay Cliburn @ 2008-04-30 20:46 UTC (permalink / raw)
  To: Gabriel C
  Cc: Linux Kernel Mailing List, Len Brown, linux-pm, pavel, Rafael J. Wysocki

Gabriel C wrote:
> Hi all,
> 
> I thought I've found all issue with that new box but I was wrong ;)
> 
> On an ASUS P5E-VM DO , 4G RAM , Q9300 CPU after suspend to ram Thermal 
> Subsystem gets corrupted in some way.

Does this board contain an Attansic L1 NIC?

> 
> I got some reboot , halt problems and was hunting the issue and noticed these problems only
> occurred when I've suspend the box to ram , at least once.
> 
> I've tested 2.6.{24*,25,linus-git(before ACPI merge),x86-latest-git} and all got that problem.
> 
> Also on x86-latest-git I've tested with MTRR_SANITIZER on/off. It does not make any difference.
> 
> 
> lspci output before s2r is :
> 
> ..
> 
> 
> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev 02)
>         Subsystem: ASUSTeK Computer Inc. Device [1043:8277]
>         Flags: fast devsel
>         Memory at fed08000 (64-bit, non-prefetchable) [size=4K]
>         Capabilities: [50] Power Management version 3                                                                                                       
> 
> ...
> 
> 
> and after :
> 
> ...
> 
> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev ff) (prog-if ff)
>         !!! Unknown header type 7f
> 
> ...
> 
> dmesg | grep 00:1f.6
> [21520.103062] PM: Writing back config space on device 0000:00:1f.6 at offset f (was ffffffff, writing 300)
> [21520.103066] PM: Writing back config space on device 0000:00:1f.6 at offset e (was ffffffff, writing 0)
> [21520.103070] PM: Writing back config space on device 0000:00:1f.6 at offset d (was ffffffff, writing 50)
> [21520.103074] PM: Writing back config space on device 0000:00:1f.6 at offset c (was ffffffff, writing 0)
> [21520.103078] PM: Writing back config space on device 0000:00:1f.6 at offset b (was ffffffff, writing 82771043)
> [21520.103083] PM: Writing back config space on device 0000:00:1f.6 at offset a (was ffffffff, writing 0)
> [21520.103087] PM: Writing back config space on device 0000:00:1f.6 at offset 9 (was ffffffff, writing 0)
> [21520.103091] PM: Writing back config space on device 0000:00:1f.6 at offset 8 (was ffffffff, writing 0)
> [21520.103095] PM: Writing back config space on device 0000:00:1f.6 at offset 7 (was ffffffff, writing 0)
> [21520.103099] PM: Writing back config space on device 0000:00:1f.6 at offset 6 (was ffffffff, writing 0)
> [21520.103103] PM: Writing back config space on device 0000:00:1f.6 at offset 5 (was ffffffff, writing 0)
> [21520.103107] PM: Writing back config space on device 0000:00:1f.6 at offset 4 (was ffffffff, writing fed08004)
> [21520.103111] PM: Writing back config space on device 0000:00:1f.6 at offset 3 (was ffffffff, writing 0)
> [21520.103115] PM: Writing back config space on device 0000:00:1f.6 at offset 2 (was ffffffff, writing 11800002)
> [21520.103119] PM: Writing back config space on device 0000:00:1f.6 at offset 1 (was ffffffff, writing 100002)
> [21520.103123] PM: Writing back config space on device 0000:00:1f.6 at offset 0 (was ffffffff, writing 29328086)
> 
> ..
> 
> 
> Please let me know if you need my config , dmesg or any other informations.
> 
> 
> Best Regards,
> 
> Gabriel C
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-04-30 20:46 ` Jay Cliburn
@ 2008-04-30 20:59   ` Gabriel C
  2008-04-30 21:02   ` Jay Cliburn
  1 sibling, 0 replies; 12+ messages in thread
From: Gabriel C @ 2008-04-30 20:59 UTC (permalink / raw)
  To: jacliburn
  Cc: Linux Kernel Mailing List, Len Brown, linux-pm, pavel, Rafael J. Wysocki

Jay Cliburn wrote:
> Gabriel C wrote:
>> Hi all,
>>
>> I thought I've found all issue with that new box but I was wrong ;)
>>
>> On an ASUS P5E-VM DO , 4G RAM , Q9300 CPU after suspend to ram Thermal 
>> Subsystem gets corrupted in some way.
> 
> Does this board contain an Attansic L1 NIC?

No,

Intel 82566DM-2.

here full lspci output :

00:00.0 Host bridge [0600]: Intel Corporation 82Q35 Express DRAM Controller [8086:29b0] (rev 02)
00:02.0 VGA compatible controller [0300]: Intel Corporation 82Q35 Express Integrated Graphics Controller [8086:29b2] (rev 02)
00:03.0 Communication controller [0780]: Intel Corporation 82Q35 Express MEI Controller [8086:29b4] (rev 02)
00:03.2 IDE interface [0101]: Intel Corporation 82Q35 Express PT IDER Controller [8086:29b6] (rev 02)
00:03.3 Serial controller [0700]: Intel Corporation 82Q35 Express Serial KT Controller [8086:29b7] (rev 02)
00:19.0 Ethernet controller [0200]: Intel Corporation 82566DM-2 Gigabit Network Connection [8086:10bd] (rev 02)
00:1a.0 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 [8086:2937] (rev 02)
00:1a.1 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 [8086:2938] (rev 02)
00:1a.2 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 [8086:2939] (rev 02)
00:1a.7 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 [8086:293c] (rev 02)
00:1b.0 Audio device [0403]: Intel Corporation 82801I (ICH9 Family) HD Audio Controller [8086:293e] (rev 02)
00:1c.0 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 [8086:2940] (rev 02)
00:1c.4 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 [8086:2948] (rev 02)
00:1d.0 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 [8086:2934] (rev 02)
00:1d.1 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 [8086:2935] (rev 02)
00:1d.2 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 [8086:2936] (rev 02)
00:1d.7 USB Controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 [8086:293a] (rev 02)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 92)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801IO (ICH9DO) LPC Interface Controller [8086:2914] (rev 02)
00:1f.2 SATA controller [0106]: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02)
00:1f.3 SMBus [0c05]: Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930] (rev 02)
00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev 02)
01:00.0 IDE interface [0101]: JMicron Technologies, Inc. JMB368 IDE controller [197b:2368]
03:02.0 FireWire (IEEE 1394) [0c00]: Agere Systems FW323 [11c1:5811] (rev 70)


> 
>> I got some reboot , halt problems and was hunting the issue and noticed these problems only
>> occurred when I've suspend the box to ram , at least once.
>>
>> I've tested 2.6.{24*,25,linus-git(before ACPI merge),x86-latest-git} and all got that problem.
>>
>> Also on x86-latest-git I've tested with MTRR_SANITIZER on/off. It does not make any difference.
>>
>>
>> lspci output before s2r is :
>>
>> ..
>>
>>
>> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev 02)
>>         Subsystem: ASUSTeK Computer Inc. Device [1043:8277]
>>         Flags: fast devsel
>>         Memory at fed08000 (64-bit, non-prefetchable) [size=4K]
>>         Capabilities: [50] Power Management version 3                                                                                                       
>>
>> ...
>>
>>
>> and after :
>>
>> ...
>>
>> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev ff) (prog-if ff)
>>         !!! Unknown header type 7f
>>
>> ...
>>
>> dmesg | grep 00:1f.6
>> [21520.103062] PM: Writing back config space on device 0000:00:1f.6 at offset f (was ffffffff, writing 300)
>> [21520.103066] PM: Writing back config space on device 0000:00:1f.6 at offset e (was ffffffff, writing 0)
>> [21520.103070] PM: Writing back config space on device 0000:00:1f.6 at offset d (was ffffffff, writing 50)
>> [21520.103074] PM: Writing back config space on device 0000:00:1f.6 at offset c (was ffffffff, writing 0)
>> [21520.103078] PM: Writing back config space on device 0000:00:1f.6 at offset b (was ffffffff, writing 82771043)
>> [21520.103083] PM: Writing back config space on device 0000:00:1f.6 at offset a (was ffffffff, writing 0)
>> [21520.103087] PM: Writing back config space on device 0000:00:1f.6 at offset 9 (was ffffffff, writing 0)
>> [21520.103091] PM: Writing back config space on device 0000:00:1f.6 at offset 8 (was ffffffff, writing 0)
>> [21520.103095] PM: Writing back config space on device 0000:00:1f.6 at offset 7 (was ffffffff, writing 0)
>> [21520.103099] PM: Writing back config space on device 0000:00:1f.6 at offset 6 (was ffffffff, writing 0)
>> [21520.103103] PM: Writing back config space on device 0000:00:1f.6 at offset 5 (was ffffffff, writing 0)
>> [21520.103107] PM: Writing back config space on device 0000:00:1f.6 at offset 4 (was ffffffff, writing fed08004)
>> [21520.103111] PM: Writing back config space on device 0000:00:1f.6 at offset 3 (was ffffffff, writing 0)
>> [21520.103115] PM: Writing back config space on device 0000:00:1f.6 at offset 2 (was ffffffff, writing 11800002)
>> [21520.103119] PM: Writing back config space on device 0000:00:1f.6 at offset 1 (was ffffffff, writing 100002)
>> [21520.103123] PM: Writing back config space on device 0000:00:1f.6 at offset 0 (was ffffffff, writing 29328086)
>>
>> ..
>>
>>
>> Please let me know if you need my config , dmesg or any other informations.
>>
>>
>> Best Regards,
>>
>> Gabriel C
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-04-30 20:46 ` Jay Cliburn
  2008-04-30 20:59   ` Gabriel C
@ 2008-04-30 21:02   ` Jay Cliburn
  1 sibling, 0 replies; 12+ messages in thread
From: Jay Cliburn @ 2008-04-30 21:02 UTC (permalink / raw)
  To: jacliburn
  Cc: Gabriel C, Linux Kernel Mailing List, Len Brown, linux-pm, pavel,
	Rafael J. Wysocki

Jay Cliburn wrote:
> Gabriel C wrote:
>> Hi all,
>>
>> I thought I've found all issue with that new box but I was wrong ;)
>>
>> On an ASUS P5E-VM DO , 4G RAM , Q9300 CPU after suspend to ram Thermal 
>> Subsystem gets corrupted in some way.
> 
> Does this board contain an Attansic L1 NIC?

Belay that.  I see it's an Intel NIC.

http://www.asus.com/products.aspx?modelmenu=2&model=1849&l1=3&l2=11&l3=571&l4=0

I should've looked first.  Sorry for the noise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-04-30 20:42 Some sort corruption of my Thermal Subsystem after suspend to ram Gabriel C
  2008-04-30 20:46 ` Jay Cliburn
@ 2008-05-01  5:57 ` Len Brown
  2008-05-01 12:53   ` Matthew Garrett
                     ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Len Brown @ 2008-05-01  5:57 UTC (permalink / raw)
  To: Gabriel C; +Cc: Linux Kernel Mailing List, linux-pm, pavel, Rafael J. Wysocki

The ICH9 apparently provides a pair of memory mapped on-die thermal sensors.

The sensors basically shut off your hardware if it gets too hot.

ICH9 exports the base address for the sensors via a PCI device -- D31:F6, aka Linux 00:1f.6

I'm not aware of a native Linux device driver that talks to this device
(nor can I think of a useful purpose for such a driver)
So it seems what is in play here is any BIOS code that talks to this device,
and Linux's standard PCI config space restore.

My guess is that the BIOS is enabling the device on cold boot,
but not enabling it after resume from S3.
Is there a BIOS SETUP option that controls if the ICH9 thermal device is enabled or not?

-Len




On Wednesday 30 April 2008, Gabriel C wrote:
> Hi all,
> 
> I thought I've found all issue with that new box but I was wrong ;)
> 
> On an ASUS P5E-VM DO , 4G RAM , Q9300 CPU after suspend to ram Thermal 
> Subsystem gets corrupted in some way.
> 
> I got some reboot , halt problems and was hunting the issue and noticed these problems only
> occurred when I've suspend the box to ram , at least once.
> 
> I've tested 2.6.{24*,25,linus-git(before ACPI merge),x86-latest-git} and all got that problem.
> 
> Also on x86-latest-git I've tested with MTRR_SANITIZER on/off. It does not make any difference.
> 
> 
> lspci output before s2r is :
> 
> ..
> 
> 
> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev 02)
>         Subsystem: ASUSTeK Computer Inc. Device [1043:8277]
>         Flags: fast devsel
>         Memory at fed08000 (64-bit, non-prefetchable) [size=4K]
>         Capabilities: [50] Power Management version 3                                                                                                       
> 
> ...
> 
> 
> and after :
> 
> ...
> 
> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev ff) (prog-if ff)
>         !!! Unknown header type 7f
> 
> ...
> 
> dmesg | grep 00:1f.6
> [21520.103062] PM: Writing back config space on device 0000:00:1f.6 at offset f (was ffffffff, writing 300)
> [21520.103066] PM: Writing back config space on device 0000:00:1f.6 at offset e (was ffffffff, writing 0)
> [21520.103070] PM: Writing back config space on device 0000:00:1f.6 at offset d (was ffffffff, writing 50)
> [21520.103074] PM: Writing back config space on device 0000:00:1f.6 at offset c (was ffffffff, writing 0)
> [21520.103078] PM: Writing back config space on device 0000:00:1f.6 at offset b (was ffffffff, writing 82771043)
> [21520.103083] PM: Writing back config space on device 0000:00:1f.6 at offset a (was ffffffff, writing 0)
> [21520.103087] PM: Writing back config space on device 0000:00:1f.6 at offset 9 (was ffffffff, writing 0)
> [21520.103091] PM: Writing back config space on device 0000:00:1f.6 at offset 8 (was ffffffff, writing 0)
> [21520.103095] PM: Writing back config space on device 0000:00:1f.6 at offset 7 (was ffffffff, writing 0)
> [21520.103099] PM: Writing back config space on device 0000:00:1f.6 at offset 6 (was ffffffff, writing 0)
> [21520.103103] PM: Writing back config space on device 0000:00:1f.6 at offset 5 (was ffffffff, writing 0)
> [21520.103107] PM: Writing back config space on device 0000:00:1f.6 at offset 4 (was ffffffff, writing fed08004)
> [21520.103111] PM: Writing back config space on device 0000:00:1f.6 at offset 3 (was ffffffff, writing 0)
> [21520.103115] PM: Writing back config space on device 0000:00:1f.6 at offset 2 (was ffffffff, writing 11800002)
> [21520.103119] PM: Writing back config space on device 0000:00:1f.6 at offset 1 (was ffffffff, writing 100002)
> [21520.103123] PM: Writing back config space on device 0000:00:1f.6 at offset 0 (was ffffffff, writing 29328086)
> 
> ..
> 
> 
> Please let me know if you need my config , dmesg or any other informations.
> 
> 
> Best Regards,
> 
> Gabriel C
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-05-01  5:57 ` Len Brown
@ 2008-05-01 12:53   ` Matthew Garrett
  2008-05-01 12:57     ` Rafael J. Wysocki
  2008-05-01 13:19   ` Gabriel C
  2008-05-01 16:09   ` Pavel Machek
  2 siblings, 1 reply; 12+ messages in thread
From: Matthew Garrett @ 2008-05-01 12:53 UTC (permalink / raw)
  To: Len Brown
  Cc: Gabriel C, Linux Kernel Mailing List, linux-pm, pavel, Rafael J. Wysocki

On Thu, May 01, 2008 at 01:57:58AM -0400, Len Brown wrote:

> I'm not aware of a native Linux device driver that talks to this device
> (nor can I think of a useful purpose for such a driver)
> So it seems what is in play here is any BIOS code that talks to this device,
> and Linux's standard PCI config space restore.

Perhaps we should be more aggressive about restoring PCI config space if 
there's no driver bound to a device. The alternative in this case would 
seem to be to write a driver for this device that does nothing other 
than handle suspend/resume.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-05-01 12:53   ` Matthew Garrett
@ 2008-05-01 12:57     ` Rafael J. Wysocki
  2008-05-01 13:06       ` Matthew Garrett
  0 siblings, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 12:57 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Len Brown, Gabriel C, Linux Kernel Mailing List, linux-pm, pavel

On Thursday, 1 of May 2008, Matthew Garrett wrote:
> On Thu, May 01, 2008 at 01:57:58AM -0400, Len Brown wrote:
> 
> > I'm not aware of a native Linux device driver that talks to this device
> > (nor can I think of a useful purpose for such a driver)
> > So it seems what is in play here is any BIOS code that talks to this device,
> > and Linux's standard PCI config space restore.
> 
> Perhaps we should be more aggressive about restoring PCI config space if 
> there's no driver bound to a device. The alternative in this case would 
> seem to be to write a driver for this device that does nothing other 
> than handle suspend/resume.

Well, we have default suspend/resume for PCI devices.  They are called for
devices that have no drivers bound to them and execute
pci_restore_state()/pci_restore_state(), among other things.  Isn't that
sufficient?

Rafael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-05-01 12:57     ` Rafael J. Wysocki
@ 2008-05-01 13:06       ` Matthew Garrett
  2008-05-01 17:14         ` Rafael J. Wysocki
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Garrett @ 2008-05-01 13:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Gabriel C, Linux Kernel Mailing List, linux-pm, pavel

On Thu, May 01, 2008 at 02:57:49PM +0200, Rafael J. Wysocki wrote:
> On Thursday, 1 of May 2008, Matthew Garrett wrote:
> > Perhaps we should be more aggressive about restoring PCI config space if 
> > there's no driver bound to a device. The alternative in this case would 
> > seem to be to write a driver for this device that does nothing other 
> > than handle suspend/resume.
> 
> Well, we have default suspend/resume for PCI devices.  They are called for
> devices that have no drivers bound to them and execute
> pci_restore_state()/pci_restore_state(), among other things.  Isn't that
> sufficient?

That only saves the "standard" registers, not the rest of config space.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-05-01  5:57 ` Len Brown
  2008-05-01 12:53   ` Matthew Garrett
@ 2008-05-01 13:19   ` Gabriel C
  2008-05-01 16:09   ` Pavel Machek
  2 siblings, 0 replies; 12+ messages in thread
From: Gabriel C @ 2008-05-01 13:19 UTC (permalink / raw)
  To: Len Brown; +Cc: Linux Kernel Mailing List, linux-pm, pavel, Rafael J. Wysocki

Len Brown wrote:
> The ICH9 apparently provides a pair of memory mapped on-die thermal sensors.
> 
> The sensors basically shut off your hardware if it gets too hot.
> 
> ICH9 exports the base address for the sensors via a PCI device -- D31:F6, aka Linux 00:1f.6
> 
> I'm not aware of a native Linux device driver that talks to this device
> (nor can I think of a useful purpose for such a driver)
> So it seems what is in play here is any BIOS code that talks to this device,
> and Linux's standard PCI config space restore.
> 
> My guess is that the BIOS is enabling the device on cold boot,
> but not enabling it after resume from S3.
> Is there a BIOS SETUP option that controls if the ICH9 thermal device is enabled or not?

I have some options for the ASUS HW monitoring things where I can set fan profile and see the speed of the fans
but I cannot find something where I can enable/disable the whole thermal devices.

There is a option to ignore some fans but that is.

> 
> -Len
> 
> 
> 
> 
> On Wednesday 30 April 2008, Gabriel C wrote:
>> Hi all,
>>
>> I thought I've found all issue with that new box but I was wrong ;)
>>
>> On an ASUS P5E-VM DO , 4G RAM , Q9300 CPU after suspend to ram Thermal 
>> Subsystem gets corrupted in some way.
>>
>> I got some reboot , halt problems and was hunting the issue and noticed these problems only
>> occurred when I've suspend the box to ram , at least once.
>>
>> I've tested 2.6.{24*,25,linus-git(before ACPI merge),x86-latest-git} and all got that problem.
>>
>> Also on x86-latest-git I've tested with MTRR_SANITIZER on/off. It does not make any difference.
>>
>>
>> lspci output before s2r is :
>>
>> ..
>>
>>
>> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev 02)
>>         Subsystem: ASUSTeK Computer Inc. Device [1043:8277]
>>         Flags: fast devsel
>>         Memory at fed08000 (64-bit, non-prefetchable) [size=4K]
>>         Capabilities: [50] Power Management version 3                                                                                                       
>>
>> ...
>>
>>
>> and after :
>>
>> ...
>>
>> 00:1f.6 Signal processing controller [1180]: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem [8086:2932] (rev ff) (prog-if ff)
>>         !!! Unknown header type 7f
>>
>> ...
>>
>> dmesg | grep 00:1f.6
>> [21520.103062] PM: Writing back config space on device 0000:00:1f.6 at offset f (was ffffffff, writing 300)
>> [21520.103066] PM: Writing back config space on device 0000:00:1f.6 at offset e (was ffffffff, writing 0)
>> [21520.103070] PM: Writing back config space on device 0000:00:1f.6 at offset d (was ffffffff, writing 50)
>> [21520.103074] PM: Writing back config space on device 0000:00:1f.6 at offset c (was ffffffff, writing 0)
>> [21520.103078] PM: Writing back config space on device 0000:00:1f.6 at offset b (was ffffffff, writing 82771043)
>> [21520.103083] PM: Writing back config space on device 0000:00:1f.6 at offset a (was ffffffff, writing 0)
>> [21520.103087] PM: Writing back config space on device 0000:00:1f.6 at offset 9 (was ffffffff, writing 0)
>> [21520.103091] PM: Writing back config space on device 0000:00:1f.6 at offset 8 (was ffffffff, writing 0)
>> [21520.103095] PM: Writing back config space on device 0000:00:1f.6 at offset 7 (was ffffffff, writing 0)
>> [21520.103099] PM: Writing back config space on device 0000:00:1f.6 at offset 6 (was ffffffff, writing 0)
>> [21520.103103] PM: Writing back config space on device 0000:00:1f.6 at offset 5 (was ffffffff, writing 0)
>> [21520.103107] PM: Writing back config space on device 0000:00:1f.6 at offset 4 (was ffffffff, writing fed08004)
>> [21520.103111] PM: Writing back config space on device 0000:00:1f.6 at offset 3 (was ffffffff, writing 0)
>> [21520.103115] PM: Writing back config space on device 0000:00:1f.6 at offset 2 (was ffffffff, writing 11800002)
>> [21520.103119] PM: Writing back config space on device 0000:00:1f.6 at offset 1 (was ffffffff, writing 100002)
>> [21520.103123] PM: Writing back config space on device 0000:00:1f.6 at offset 0 (was ffffffff, writing 29328086)
>>
>> ..
>>
>>
>> Please let me know if you need my config , dmesg or any other informations.
>>
>>
>> Best Regards,
>>
>> Gabriel C
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-05-01  5:57 ` Len Brown
  2008-05-01 12:53   ` Matthew Garrett
  2008-05-01 13:19   ` Gabriel C
@ 2008-05-01 16:09   ` Pavel Machek
  2008-05-01 16:30     ` Alan
  2 siblings, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2008-05-01 16:09 UTC (permalink / raw)
  To: Len Brown
  Cc: Gabriel C, Linux Kernel Mailing List, linux-pm, Rafael J. Wysocki

Hi!

> The ICH9 apparently provides a pair of memory mapped on-die thermal sensors.
> 
> The sensors basically shut off your hardware if it gets too hot.

Are the sensors driven by ACPI? Probably yes, otherwise they would be
unable to shut down the system.

> ICH9 exports the base address for the sensors via a PCI device -- D31:F6, aka Linux 00:1f.6
> 
> I'm not aware of a native Linux device driver that talks to this device
> (nor can I think of a useful purpose for such a driver)
> So it seems what is in play here is any BIOS code that talks to this device,
> and Linux's standard PCI config space restore.
> 
> My guess is that the BIOS is enabling the device on cold boot,
> but not enabling it after resume from S3.

That sounds like a BIOS problem, right? If the piece of hardware is
driven by BIOS, Linux can't be responsible for saving/restoring it.
 									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to  ram
  2008-05-01 16:09   ` Pavel Machek
@ 2008-05-01 16:30     ` Alan
  0 siblings, 0 replies; 12+ messages in thread
From: Alan @ 2008-05-01 16:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, Gabriel C, Linux Kernel Mailing List, linux-pm,
	Rafael J. Wysocki

> Hi!
>
>> The ICH9 apparently provides a pair of memory mapped on-die thermal
>> sensors.
>>
>> The sensors basically shut off your hardware if it gets too hot.
>
> Are the sensors driven by ACPI? Probably yes, otherwise they would be
> unable to shut down the system.
>
>> ICH9 exports the base address for the sensors via a PCI device --
>> D31:F6, aka Linux 00:1f.6
>>
>> I'm not aware of a native Linux device driver that talks to this device
>> (nor can I think of a useful purpose for such a driver)
>> So it seems what is in play here is any BIOS code that talks to this
>> device,
>> and Linux's standard PCI config space restore.
>>
>> My guess is that the BIOS is enabling the device on cold boot,
>> but not enabling it after resume from S3.
>
> That sounds like a BIOS problem, right? If the piece of hardware is
> driven by BIOS, Linux can't be responsible for saving/restoring it.

I was having a similar problem on my old laptop. It seemed a lot more
sensitive to the problem on later kernels.

What was actually happening is that the last time the laptop went in for
repairs they did not seat the heat sink correctly.  It slowly worked
itself loose.  After regreasing it with heat sink grease (not a heat sink
pad, but actual silver grease) and reseating it and tightening it down,
the machine worked fine.  (I also found some nasty tool marks in the soft
metal next to the heat sink that shows that whoever did it slipped at
least once.)

Sometimes hardware problems increase slowly.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some sort corruption of my Thermal Subsystem after suspend to ram
  2008-05-01 13:06       ` Matthew Garrett
@ 2008-05-01 17:14         ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2008-05-01 17:14 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Len Brown, Gabriel C, Linux Kernel Mailing List, linux-pm, pavel

On Thursday, 1 of May 2008, Matthew Garrett wrote:
> On Thu, May 01, 2008 at 02:57:49PM +0200, Rafael J. Wysocki wrote:
> > On Thursday, 1 of May 2008, Matthew Garrett wrote:
> > > Perhaps we should be more aggressive about restoring PCI config space if 
> > > there's no driver bound to a device. The alternative in this case would 
> > > seem to be to write a driver for this device that does nothing other 
> > > than handle suspend/resume.
> > 
> > Well, we have default suspend/resume for PCI devices.  They are called for
> > devices that have no drivers bound to them and execute
> > pci_restore_state()/pci_restore_state(), among other things.  Isn't that
> > sufficient?
> 
> That only saves the "standard" registers, not the rest of config space.

Hm, in that case I'd probably opt for writing a special driver for this
particular device.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-05-01 17:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-30 20:42 Some sort corruption of my Thermal Subsystem after suspend to ram Gabriel C
2008-04-30 20:46 ` Jay Cliburn
2008-04-30 20:59   ` Gabriel C
2008-04-30 21:02   ` Jay Cliburn
2008-05-01  5:57 ` Len Brown
2008-05-01 12:53   ` Matthew Garrett
2008-05-01 12:57     ` Rafael J. Wysocki
2008-05-01 13:06       ` Matthew Garrett
2008-05-01 17:14         ` Rafael J. Wysocki
2008-05-01 13:19   ` Gabriel C
2008-05-01 16:09   ` Pavel Machek
2008-05-01 16:30     ` Alan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).