linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 2.4.20 RTC Timer bug
@ 2003-07-16 20:31 Richard B. Johnson
  2003-07-16 21:18 ` Willy Tarreau
  0 siblings, 1 reply; 5+ messages in thread
From: Richard B. Johnson @ 2003-07-16 20:31 UTC (permalink / raw)
  To: Linux kernel


#if 0

In Linux 2.4.20, some rogue is incrementing the value
in register 0x71 at about 1 second intervals!  This
port is the index register for the CMOS timer chip
and it must be left alone when the chip is not being
accessed and it must be left at an unused offset,
typically 0xff, after access. This is to prevent
destruction of CMOS data during the power-down
transient.

This code clearly shows that somebody is mucking with
this chip. Here, I have reviewed the only drivers
installed, SCSI and network, and have not found anybody
messing with this chip so I think it must be something
in the kernel proper. The numbers increase at 1 second
intervals from 0 to 89 and then restart. This shows that
it is not residual from the system clock code that will
read only the timer registers.

These are the only modules installed...

Module                  Size  Used by
ipchains               41400   7
3c59x                  28224   1  (autoclean)
nls_cp437               4376   4  (autoclean)
BusLogic               35768   7
sd_mod                 10184  14
scsi_mod               54572   2  [BusLogic sd_mod]

This running of the CMOS timer index register is the
reason why the CMOS checksum and parameters are being
lost on several systems that run 2.4.20. If any of
these system are turned off when the index register
points to checksummed data, the byte at that location
will get smashed to whatever is on the bus when the
power fails. To non-believers, note that the chip-select
goes low to enable ... and that's what a power failure
does ... goes low, while internally, the chip still has
power from its battery.

#endif

#include <stdio.h>

__inline__ int inb()
{
	register int eax = 0;
	__asm__ volatile ("inb	$0x71, %%al" : "=eax" (eax));
	return eax;
}

extern int iopl(int);
extern int usleep(int);

int main()
{
    iopl(3);

    for(;;)
    {
        printf("%d\n", inb());
        usleep(100000);
    }
}


Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
            Note 96.3% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 2.4.20 RTC Timer bug
  2003-07-16 20:31 Linux 2.4.20 RTC Timer bug Richard B. Johnson
@ 2003-07-16 21:18 ` Willy Tarreau
  2003-07-17 11:25   ` Richard B. Johnson
  0 siblings, 1 reply; 5+ messages in thread
From: Willy Tarreau @ 2003-07-16 21:18 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Linux kernel

Dick,

0x71 is the DATA register ! The thing that modifies what you read from it
is the RTC clock itself, because seconds are stored at index 0x00 IIRC, which
is often assumed if you read without writing.

maybe it's time to go to bed ? :-)

Cheers,
Willy

On Wed, Jul 16, 2003 at 04:31:15PM -0400, Richard B. Johnson wrote:
> 
> #if 0
> 
> In Linux 2.4.20, some rogue is incrementing the value
> in register 0x71 at about 1 second intervals!  This
> port is the index register for the CMOS timer chip
> and it must be left alone when the chip is not being
> accessed and it must be left at an unused offset,
> typically 0xff, after access. This is to prevent
> destruction of CMOS data during the power-down
> transient.
> 
> This code clearly shows that somebody is mucking with
> this chip. Here, I have reviewed the only drivers
> installed, SCSI and network, and have not found anybody
> messing with this chip so I think it must be something
> in the kernel proper. The numbers increase at 1 second
> intervals from 0 to 89 and then restart. This shows that
> it is not residual from the system clock code that will
> read only the timer registers.
> 
> These are the only modules installed...
> 
> Module                  Size  Used by
> ipchains               41400   7
> 3c59x                  28224   1  (autoclean)
> nls_cp437               4376   4  (autoclean)
> BusLogic               35768   7
> sd_mod                 10184  14
> scsi_mod               54572   2  [BusLogic sd_mod]
> 
> This running of the CMOS timer index register is the
> reason why the CMOS checksum and parameters are being
> lost on several systems that run 2.4.20. If any of
> these system are turned off when the index register
> points to checksummed data, the byte at that location
> will get smashed to whatever is on the bus when the
> power fails. To non-believers, note that the chip-select
> goes low to enable ... and that's what a power failure
> does ... goes low, while internally, the chip still has
> power from its battery.
> 
> #endif
> 
> #include <stdio.h>
> 
> __inline__ int inb()
> {
> 	register int eax = 0;
> 	__asm__ volatile ("inb	$0x71, %%al" : "=eax" (eax));
> 	return eax;
> }
> 
> extern int iopl(int);
> extern int usleep(int);
> 
> int main()
> {
>     iopl(3);
> 
>     for(;;)
>     {
>         printf("%d\n", inb());
>         usleep(100000);
>     }
> }
> 
> 
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
>             Note 96.3% of all statistics are fiction.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 2.4.20 RTC Timer bug
  2003-07-16 21:18 ` Willy Tarreau
@ 2003-07-17 11:25   ` Richard B. Johnson
  2003-07-17 19:17     ` Willy Tarreau
  0 siblings, 1 reply; 5+ messages in thread
From: Richard B. Johnson @ 2003-07-17 11:25 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Linux kernel

On Wed, 16 Jul 2003, Willy Tarreau wrote:

> Dick,
>
> 0x71 is the DATA register ! The thing that modifies what you read from it
> is the RTC clock itself, because seconds are stored at index 0x00 IIRC, which
> is often assumed if you read without writing.
>
> maybe it's time to go to bed ? :-)
>
> Cheers,
> Willy

It's so easy to kill the messenger instead of finding the problem.
Most modern RTC emulations will return 0xff when you read the index
register at 0x70 because it's a write-only register. Therefore, to
discover what it has been set to, one must read the data register at
0x71. If it increments at one second intervals from 0 to 59 (BCD) ,
(you change the "%d" to "%x" to read BCD within that range), then
the index register has left at 0. This is okay except that the
time may get trashed upon power off.

In machines tested here, running linux-2.4.20, the value read from
0x71 increments from 0 to 99 with a few missing codes in-between so
it's not possible to guess what it's been set to, maybe the
'B' register (status), then something else. That something else
is the killer.

When the power fails, most all systems running Linux will fail to
restart because of CMOS corruption. You can easily check. Run linux,
`init 1`, dismount drives, then pull the plug. Don't use the
front panel power switch because, again, modern power supplies
protect devices during 'normal' shutdown by using the reset
circuitry.


If you use the same motherboard, but leave it is CMOS setup so
Linux isn't running, you can pull the plug and never cause
corruption.


>
> On Wed, Jul 16, 2003 at 04:31:15PM -0400, Richard B. Johnson wrote:
> >
> > #if 0
> >
> > In Linux 2.4.20, some rogue is incrementing the value
> > in register 0x71 at about 1 second intervals!  This
> > port is the index register for the CMOS timer chip
> > and it must be left alone when the chip is not being
> > accessed and it must be left at an unused offset,
> > typically 0xff, after access. This is to prevent
> > destruction of CMOS data during the power-down
> > transient.
> >
> > This code clearly shows that somebody is mucking with
> > this chip. Here, I have reviewed the only drivers
> > installed, SCSI and network, and have not found anybody
> > messing with this chip so I think it must be something
> > in the kernel proper. The numbers increase at 1 second
> > intervals from 0 to 89 and then restart. This shows that
> > it is not residual from the system clock code that will
> > read only the timer registers.
> >
> > These are the only modules installed...
> >
> > Module                  Size  Used by
> > ipchains               41400   7
> > 3c59x                  28224   1  (autoclean)
> > nls_cp437               4376   4  (autoclean)
> > BusLogic               35768   7
> > sd_mod                 10184  14
> > scsi_mod               54572   2  [BusLogic sd_mod]
> >
> > This running of the CMOS timer index register is the
> > reason why the CMOS checksum and parameters are being
> > lost on several systems that run 2.4.20. If any of
> > these system are turned off when the index register
> > points to checksummed data, the byte at that location
> > will get smashed to whatever is on the bus when the
> > power fails. To non-believers, note that the chip-select
> > goes low to enable ... and that's what a power failure
> > does ... goes low, while internally, the chip still has
> > power from its battery.
> >
> > #endif
> >
> > #include <stdio.h>
> >
> > __inline__ int inb()
> > {
> > 	register int eax = 0;
> > 	__asm__ volatile ("inb	$0x71, %%al" : "=eax" (eax));
> > 	return eax;
> > }
> >
> > extern int iopl(int);
> > extern int usleep(int);
> >
> > int main()
> > {
> >     iopl(3);
> >
> >     for(;;)
> >     {
> >         printf("%d\n", inb());
> >         usleep(100000);
> >     }
> > }
> >
> >
> > Cheers,
> > Dick Johnson
> > Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
> >             Note 96.3% of all statistics are fiction.
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 2.4.20 RTC Timer bug
  2003-07-17 11:25   ` Richard B. Johnson
@ 2003-07-17 19:17     ` Willy Tarreau
       [not found]       ` <Pine.LNX.4.53.0307171624510.12702@chaos>
  0 siblings, 1 reply; 5+ messages in thread
From: Willy Tarreau @ 2003-07-17 19:17 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Willy Tarreau, Linux kernel

Hi Dick,

On Thu, Jul 17, 2003 at 07:25:39AM -0400, Richard B. Johnson wrote:
> On Wed, 16 Jul 2003, Willy Tarreau wrote:
> 
> > Dick,
> >
> > 0x71 is the DATA register ! The thing that modifies what you read from it
> > is the RTC clock itself, because seconds are stored at index 0x00 IIRC, which
> > is often assumed if you read without writing.
> >
> > maybe it's time to go to bed ? :-)
> >
> > Cheers,
> > Willy
> 
> It's so easy to kill the messenger instead of finding the problem.

My apologies Dick, I didn't notice your mention of the range (0 to 89) in your
original mail. I agree with you, it cannot (or at least, should not) be the
clock, so there's certainly something playing with it.

> Most modern RTC emulations will return 0xff when you read the index
> register at 0x70 because it's a write-only register. Therefore, to
> discover what it has been set to, one must read the data register at
> 0x71. If it increments at one second intervals from 0 to 59 (BCD) ,
> (you change the "%d" to "%x" to read BCD within that range), then
> the index register has left at 0. This is okay except that the
> time may get trashed upon power off.

I agree. This reminds me of two (broken ?) clocks I encountered about 4 years
ago. One of them would increment hours up to 99 if you set it by hand to
something bigger than 23, and after that, it stuck to 99. This clearly shows
the event-driven mechanism which jumps to zero if it changes to 24 ! The other
one was funnier : it would cycle into the tens you initialize it (it could only
increment tens from 0 to 1 then 2). So if you initialized it to 35, it could
run forever from 30 to 39, then 30. And if you set it to 25, it would run up
to 29, then jump to 20 and get back to something normal.

I don't remember if I played with seconds, though.

> In machines tested here, running linux-2.4.20, the value read from
> 0x71 increments from 0 to 99 with a few missing codes in-between so
> it's not possible to guess what it's been set to, maybe the
> 'B' register (status), then something else. That something else
> is the killer.

just a silly question : have you tried within vmware, or on other hardware ?

> When the power fails, most all systems running Linux will fail to
> restart because of CMOS corruption. You can easily check. Run linux,
> `init 1`, dismount drives, then pull the plug. Don't use the
> front panel power switch because, again, modern power supplies
> protect devices during 'normal' shutdown by using the reset
> circuitry.

I never noticed, but will probably try harder. I'm interested in such problems
because I use PC-based semi-embedded boxes at work. If this is the case, it
clearly shows that Linux continually modifies checksummed portions of the CMOS.

BTW, while I was playing with the hours>24, I noticed that both DOS and
Windows95 got a divide error during boot under such condition. That's what lead
me to the real problem in fact, because only Linux booted OK and I was
beginning scratching my head a lot.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 2.4.20 RTC Timer bug
       [not found]         ` <20030718043129.GA3229@alpha.home.local>
@ 2003-07-18 12:39           ` Richard B. Johnson
  0 siblings, 0 replies; 5+ messages in thread
From: Richard B. Johnson @ 2003-07-18 12:39 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Linux kernel

On Fri, 18 Jul 2003, Willy Tarreau wrote:

> On Thu, Jul 17, 2003 at 04:38:05PM -0400, Richard B. Johnson wrote:
>
[SNIPPED...]

> > Or leaves the index register somewhere that a power-fail sequence will
> > trash a checksummed register..
>
> yes, that's possible too.
>
>
>
> is it in the spec that it has to be set to 0xFF or because you chose this
> value since it's out of the checksummed area ? On older chips, it will be
> equal to 0x3F (0xBF if we include NMI) and on newer ones, 0x7F. If the BIOS
> uses these bytes to write the checksum of the advanced setup, it might be
> dangerous. I would personaly prefer to set it to something not checked, such
> as the alarm, status, or so (except if the chip does a special case of FF).

Well, yes. It probably should be left at status register 'D'.  Here's
a patch. Let's see if the Linux folks accept it....

Problem:
Power failure can cause CMOS checksum corruption and failure to boot.
Patch (tested -- running now):


--- linux-2.4.20/include/asm-i386/mc146818rtc.h.orig	Thu Jul 17 15:06:33 2003
+++ linux-2.4.20/include/asm-i386/mc146818rtc.h	Fri Jul 18 08:01:58 2003
@@ -10,18 +10,28 @@
 #define RTC_PORT(x)	(0x70 + (x))
 #define RTC_ALWAYS_BCD	1	/* RTC operates in binary mode */
 #endif
+#define RTC_FAILSAFE  0x0d
+#define NMI_ENABLE    0x80

 /*
  * The yet supported machines all access the RTC index register via
  * an ISA port access but the way to access the date register differs ...
+ * Modified 18-JUL-2003
+ * Willy Tarreau <willy@w.ods.org>, Richard Johnson <rjohnson@analogic.com>
+ * Leave index register at status register D so a power failure doesn't
+ * corrupt checksummed CMOS. Keep NMI bit enabled (in case it really exists).
  */
 #define CMOS_READ(addr) ({ \
-outb_p((addr),RTC_PORT(0)); \
-inb_p(RTC_PORT(1)); \
+int __val; \
+outb_p((addr)|NMI_ENABLE,RTC_PORT(0)); \
+__val = inb_p(RTC_PORT(1)); \
+outb_p(RTC_FAILSAFE|NMI_ENABLE, RTC_PORT(0)); \
+__val; \
 })
 #define CMOS_WRITE(val, addr) ({ \
-outb_p((addr),RTC_PORT(0)); \
+outb_p((addr)|NMI_ENABLE,RTC_PORT(0)); \
 outb_p((val),RTC_PORT(1)); \
+outb_p(RTC_FAILSAFE|NMI_ENABLE, RTC_PORT(0)); \
 })

 #define RTC_IRQ 8




Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-07-18 12:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-16 20:31 Linux 2.4.20 RTC Timer bug Richard B. Johnson
2003-07-16 21:18 ` Willy Tarreau
2003-07-17 11:25   ` Richard B. Johnson
2003-07-17 19:17     ` Willy Tarreau
     [not found]       ` <Pine.LNX.4.53.0307171624510.12702@chaos>
     [not found]         ` <20030718043129.GA3229@alpha.home.local>
2003-07-18 12:39           ` Richard B. Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).