* Re: reading /proc/stat segfaults after long uptimes
[not found] <20091020091315.5e23263e@mschwide.boeblingen.de.ibm.com>
@ 2009-10-20 8:00 ` Martin Schwidefsky
0 siblings, 0 replies; only message in thread
From: Martin Schwidefsky @ 2009-10-20 8:00 UTC (permalink / raw)
To: linux-s390
On Tue, 20 Oct 2009 09:13:15 +0200
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> On Mon, 19 Oct 2009 00:17:30 +0200
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
>
> > On Sun, 18 Oct 2009 05:35:29 -0400
> > Mike Frysinger <vapier@gentoo.org> wrote:
> >
> > > this bug has been around for as long as i can remember (before 2.6.16.x), and
> > > it is still in 2.6.27.10. i'm upgrading to 2.6.31.4 now, but it'll be a while
> > > before i can report back since the bug doesnt manifest itself for a long time.
> > > current uptime is ~3 months.
> > >
> > > $ cat /proc/stat
> > > Segmentation fault
> > >
> > > $ dmesg
> > > ------------[ cut here ]------------
> > > Kernel BUG at 001b0c92 [verbose debug info unavailable]
> > > fixpoint divide exception: 0009 [#13] SMP
> > > Modules linked in: ipv6
> > > CPU: 1 Tainted: G D 2.6.27.10 #5
> > > Process cat (pid: 21352, task: 1fb34138, ksp: 1d2a3d98)
> > > Krnl PSW : 070c2000 801b0c92 (show_stat+0x2ca/0x68c)
> > > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0
> > > Krnl GPRS: 00000001 00001388 00000bb8 0015d2a1
> > > 00000000 00000000 000003e8 0001fd91
> > > 00000000 00000000 0000129d eecd2ff0
> > > 1cc533b9 0036f780 801b0bce 1d2a3cc0
> > > Krnl Code: 801b0c86: f18890abf198 mvo 171(9,%r9),408(9,%r15)
> > > 801b0c8c: 98abf170 lm %r10,%r11,368(%r15)
> > > 801b0c90: 1da1 dr %r10,%r1
> > > >801b0c92: 90abf170 stm %r10,%r11,368(%r15)
> > > 801b0c96: 98abf190 lm %r10,%r11,400(%r15)
> > > 801b0c9a: 1da1 dr %r10,%r1
> > > 801b0c9c: 90abf190 stm %r10,%r11,400(%r15)
> > > 801b0ca0: 18a3 lr %r10,%r3
> > > Call Trace:
> > > ([<00000000001b09f4>] show_stat+0x2c/0x68c)
> > > [<000000000018dcee>] seq_read+0xb2/0x364
> > > [<00000000001a9980>] proc_reg_read+0x68/0x98
> > > [<00000000001705ee>] vfs_read+0x6e/0xe8
> > > [<0000000000170732>] sys_read+0x36/0x78
> > > [<000000000010f750>] sysc_do_restart+0x12/0x16
> > > [<0000000077f3ad6a>] 0x77f3ad6a
> > > <4>---[ end trace 1436ea9559d3de9e ]---
> > >
> > > i'm making sure to enable verbose debug this time in case the bug comes up
> > > again, but perhaps someone can ninja this out before
> >
> > The dr %r10,%r1 got a divide exception because the division result does
> > not fit into a 32 bit register. Does not happen on 64 bit because the
> > target register is larger. It is probably the idle time conversion to
> > ticks. I'll have a look.
>
> The bug should show up on a completely idle machine after 49.7 days.
> cputime64_to_clock_t is broken. I'll post a patch shortly.
This patch should fix it:
--
Subject: [PATCH] cputime: fix overflow on 31 bit systems
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
The cputime_to_msecs / cputime_to_clock_t and cputime64_to_clock_t
cause fixpoint divide exceptions if the cputime is too large.
On a machine that collected 49.7 days worth of idle time reading
from /proc/stat will generate oopses like this:
Kernel BUG at 001b0c92 [verbose debug info unavailable]
fixpoint divide exception: 0009 [#13] SMP
Modules linked in: ipv6
CPU: 1 Tainted: G D 2.6.27.10 #5
Process cat (pid: 21352, task: 1fb34138, ksp: 1d2a3d98)
Krnl PSW : 070c2000 801b0c92 (show_stat+0x2ca/0x68c)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0
Krnl GPRS: 00000001 00001388 00000bb8 0015d2a1
00000000 00000000 000003e8 0001fd91
00000000 00000000 0000129d eecd2ff0
1cc533b9 0036f780 801b0bce 1d2a3cc0
Krnl Code: 801b0c86: f18890abf198 mvo 171(9,%r9),408(9,%r15)
801b0c8c: 98abf170 lm %r10,%r11,368(%r15)
801b0c90: 1da1 dr %r10,%r1
>801b0c92: 90abf170 stm %r10,%r11,368(%r15)
801b0c96: 98abf190 lm %r10,%r11,400(%r15)
801b0c9a: 1da1 dr %r10,%r1
801b0c9c: 90abf190 stm %r10,%r11,400(%r15)
801b0ca0: 18a3 lr %r10,%r3
Call Trace:
([<00000000001b09f4>] show_stat+0x2c/0x68c)
[<000000000018dcee>] seq_read+0xb2/0x364
[<00000000001a9980>] proc_reg_read+0x68/0x98
[<00000000001705ee>] vfs_read+0x6e/0xe8
[<0000000000170732>] sys_read+0x36/0x78
[<000000000010f750>] sysc_do_restart+0x12/0x16
[<0000000077f3ad6a>] 0x77f3ad6a
<4>---[ end trace 1436ea9559d3de9e ]---
Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/include/asm/cputime.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff -urpN linux-2.6/arch/s390/include/asm/cputime.h linux-2.6-patched/arch/s390/include/asm/cputime.h
--- linux-2.6/arch/s390/include/asm/cputime.h 2009-10-20 09:48:48.000000000 +0200
+++ linux-2.6-patched/arch/s390/include/asm/cputime.h 2009-10-20 09:48:56.000000000 +0200
@@ -78,7 +78,7 @@ cputime64_to_jiffies64(cputime64_t cputi
static inline unsigned int
cputime_to_msecs(const cputime_t cputime)
{
- return __div(cputime, 4096000);
+ return cputime_div(cputime, 4096000);
}
static inline cputime_t
@@ -160,7 +160,7 @@ cputime_to_timeval(const cputime_t cputi
static inline clock_t
cputime_to_clock_t(cputime_t cputime)
{
- return __div(cputime, 4096000000ULL / USER_HZ);
+ return cputime_div(cputime, 4096000000ULL / USER_HZ);
}
static inline cputime_t
@@ -175,7 +175,7 @@ clock_t_to_cputime(unsigned long x)
static inline clock_t
cputime64_to_clock_t(cputime64_t cputime)
{
- return __div(cputime, 4096000000ULL / USER_HZ);
+ return cputime_div(cputime, 4096000000ULL / USER_HZ);
}
struct s390_idle_data {
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2009-10-20 8:00 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20091020091315.5e23263e@mschwide.boeblingen.de.ibm.com>
2009-10-20 8:00 ` reading /proc/stat segfaults after long uptimes Martin Schwidefsky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.