From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Devriendt, Paul" Subject: RE: powernow-k8 and stuck change-pending bit Date: Tue, 7 Jun 2005 00:07:22 -0500 Message-ID: <84EA05E2CA77634C82730353CBE3A843028F4CC5@SAUSEXMB1.amd.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: Content-class: urn:content-classes:message List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: cpufreq-bounces@lists.linux.org.uk Errors-To: cpufreq-bounces+glkc-cpufreq=gmane.org@lists.linux.org.uk Content-Type: text/plain; charset="us-ascii" To: John Belmonte , cpufreq@lists.linux.org.uk > symptoms: >=20 > After working fine for a while (several minutes to several hours), > cpufreq seems to get into a bad state where: >=20 > * change pending bit set / stuck kernel messages appear >=20 > * reloading powernow-k8 module fails, with "change=20 > pending bit stuck" I have seen a very small number of reports of this over the past couple of years. Perhaps 3 or 4. I can tell you what is happening, but it may not be much help. The processor has to communicate externally during a frequency or a voltage change. On a frequency change, all devices on the=20 HyperTransport fabric will see a LDT_STOP. On a voltage change, the=20 new vid (voltage code) has to be driven out to the external VRM=20 (voltage regulator module) which actually supplies the correct=20 voltage. Obviously, this external activity takes a little while. The software interface to this is via a register, and there is a bit to warn the software that a change is in progress. This is the pending=20 bit. A change should never fail to complete. It can fail if the change request was invalid, but it should not fail to complete. The driver is written to timeout (i.e. not hang the system) if the pending bit should ever fail to clear. It is also written to never attempt to make a change if the pending bit is set. This is what you are seeing. A transition was attempted, it never completed, and the driver is not happy about it. The most likely scenario here is the VRM was unable to change voltage, and that means flaky hardware as it has previously been able to successfully do so. The reason I pick on the VRM and not some sort of issue with the HyperTransport fabric is that it is unlikely anything would still be alive if there was some sort of=20 problem coming back from the LDT_STOP. Assuming the VRM continues to supply the current voltage within=20 spec, the system will stay up indefinitely, at the current voltage and frequency. The best path to follow would be to investigate warrantee replacement of the board. Paul.