* [PATCH] fix oops in updating thread cputime and task time
@ 2012-03-16 16:29 Fawzi Mohamed
2012-04-21 7:23 ` Fawzi Mohamed
0 siblings, 1 reply; 3+ messages in thread
From: Fawzi Mohamed @ 2012-03-16 16:29 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Thomas Dargel, Hillf Danton
We have a machine with a (back ported) fix working without problems since a few days.
We don´t consider it
tested-by: Thomas Dargel <td@no-spam.hu-berlin.de>
because last time it did need 19 days of computation to happen, but it should be fixed.
Fawzi and Thomas.
>From fa4dccbaabaa1e08f3ca1835cc1b40e34a87fcdb Mon Sep 17 00:00:00 2001
From: Fawzi Mohamed <fawzi@gmx.ch>
Date: Mon, 12 Mar 2012 23:10:57 +0100
Subject: [PATCH] fix oops in updating thread cputime and task time
use div64_u64 instead of do_div to divide cputime_t with each other
because if cputime_t was u64 it would crash when overflowing u32.
KERNEL: ./vmlinux-2.6.32.54-0.3-default.gz
DEBUGINFO: .//vmlinux-2.6.32.54-0.3-default.debug
DUMPFILE: ./vmcore
CPUS: 24
DATE: Mon Mar 5 00:58:10 2012
UPTIME: 19 days, 11:01:56
LOAD AVERAGE: 0.74, 0.42, 0.40
TASKS: 731
NODENAME: node65
RELEASE: 2.6.32.54-0.3-default
VERSION: #1 SMP 2012-01-27 17:38:56 +0100
MACHINE: x86_64 (3067 Mhz)
MEMORY: 96 GB
PANIC: ""
PID: 11234
COMMAND: "ricc2_smp"
TASK: ffff881813de2240 [THREAD_INFO: ffff8818153c2000]
CPU: 8
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 11234 TASK: ffff881813de2240 CPU: 8 COMMAND: "ricc2_smp"
RIP: 00007f5e4f418617 RSP: 00007f5e4dd3ebd0 RFLAGS: 00000202
RAX: 0000000000000062 RBX: ffffffff81002f7b RCX: 0000000000000179
RDX: 00007f5e4dd3edb4 RSI: 00007f5e4dd3ec70 RDI: 0000000000000000
RBP: 00007f5e4dd3ecf0 R8: 0000000000000001 R9: 00007f5e4dd3edc4
R10: 0000000000000001 R11: 0000000000000246 R12: ffffffff8105eb81
R13: 00007f5e4dd3ed30 R14: 00007f5e4dd3f2f0 R15: 0000000000000000
ORIG_RAX: 0000000000000062 CS: 0033 SS: 002b
Basically the same problem was found earlier in kernel vmlinux-2.6.32.12-0.7-default.gz.
Signed-off-by: Fawzi Mohamed <fawzi@gmx.ch>
Signed-off-by: Thomas Dargel <td@no-spam.hu-berlin.de>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
kernel/sched/core.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b342f57..f2cc092 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2934,7 +2934,7 @@ void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
u64 temp = (__force u64) rtime;
temp *= (__force u64) utime;
- do_div(temp, (__force u32) total);
+ temp = div64_u64(temp, total);
utime = (__force cputime_t) temp;
} else
utime = rtime;
@@ -2967,7 +2967,7 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
u64 temp = (__force u64) rtime;
temp *= (__force u64) cputime.utime;
- do_div(temp, (__force u32) total);
+ temp = div64_u64(temp, total);
utime = (__force cputime_t) temp;
} else
utime = rtime;
--
1.7.0.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] fix oops in updating thread cputime and task time
2012-03-16 16:29 [PATCH] fix oops in updating thread cputime and task time Fawzi Mohamed
@ 2012-04-21 7:23 ` Fawzi Mohamed
2012-04-25 12:10 ` Peter Zijlstra
0 siblings, 1 reply; 3+ messages in thread
From: Fawzi Mohamed @ 2012-04-21 7:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Thomas Dargel, Hillf Danton
An update on this, it successfully passed our testing (more than 35 days of uptime/computation without problems), so we rebased, added tested-by it and submit it again.
>From d0a7d56c283205806d122442448063d2c8812669 Mon Sep 17 00:00:00 2001
From: Fawzi Mohamed <fawzi@gmx.ch>
Date: Mon, 12 Mar 2012 23:10:57 +0100
Subject: [PATCH] fix oops in updating thread cputime and task time
use div64_u64 instead of do_div to divide cputime_t with each other
because if cputime_t is u64 it does crash when overflowing u32.
KERNEL: ./vmlinux-2.6.32.54-0.3-default.gz
DEBUGINFO: .//vmlinux-2.6.32.54-0.3-default.debug
DUMPFILE: ./vmcore
CPUS: 24
DATE: Mon Mar 5 00:58:10 2012
UPTIME: 19 days, 11:01:56
LOAD AVERAGE: 0.74, 0.42, 0.40
TASKS: 731
NODENAME: node65
RELEASE: 2.6.32.54-0.3-default
VERSION: #1 SMP 2012-01-27 17:38:56 +0100
MACHINE: x86_64 (3067 Mhz)
MEMORY: 96 GB
PANIC: ""
PID: 11234
COMMAND: "ricc2_smp"
TASK: ffff881813de2240 [THREAD_INFO: ffff8818153c2000]
CPU: 8
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 11234 TASK: ffff881813de2240 CPU: 8 COMMAND: "ricc2_smp"
RIP: 00007f5e4f418617 RSP: 00007f5e4dd3ebd0 RFLAGS: 00000202
RAX: 0000000000000062 RBX: ffffffff81002f7b RCX: 0000000000000179
RDX: 00007f5e4dd3edb4 RSI: 00007f5e4dd3ec70 RDI: 0000000000000000
RBP: 00007f5e4dd3ecf0 R8: 0000000000000001 R9: 00007f5e4dd3edc4
R10: 0000000000000001 R11: 0000000000000246 R12: ffffffff8105eb81
R13: 00007f5e4dd3ed30 R14: 00007f5e4dd3f2f0 R15: 0000000000000000
ORIG_RAX: 0000000000000062 CS: 0033 SS: 002b
Basically the same problem was found earlier in kernel vmlinux-2.6.32.12-0.7-default.gz.
This patch was tested on 2.6.32 (as some kernel modules we need
require it) and the computation ran for more than 35 days without
any problems, before it would crash after 19 days of that ricc2 computation.
Signed-off-by: Fawzi Mohamed <fawzi@gmx.ch>
Signed-off-by: Thomas Dargel <td@no-spam.hu-berlin.de>
Tested-by: Thomas Dargel <td@no-spam.hu-berlin.de>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
kernel/sched/core.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4603b9d..03a2d89 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2966,7 +2966,7 @@ void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
u64 temp = (__force u64) rtime;
temp *= (__force u64) utime;
- do_div(temp, (__force u32) total);
+ temp = div64_u64(temp, total);
utime = (__force cputime_t) temp;
} else
utime = rtime;
@@ -2999,7 +2999,7 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
u64 temp = (__force u64) rtime;
temp *= (__force u64) cputime.utime;
- do_div(temp, (__force u32) total);
+ temp = div64_u64(temp, total);
utime = (__force cputime_t) temp;
} else
utime = rtime;
--
1.7.0.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] fix oops in updating thread cputime and task time
2012-04-21 7:23 ` Fawzi Mohamed
@ 2012-04-25 12:10 ` Peter Zijlstra
0 siblings, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2012-04-25 12:10 UTC (permalink / raw)
To: Fawzi Mohamed
Cc: linux-kernel, Ingo Molnar, Thomas Dargel, Hillf Danton,
Paul Turner, Martin Schwidefsky
On Sat, 2012-04-21 at 09:23 +0200, Fawzi Mohamed wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4603b9d..03a2d89 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2966,7 +2966,7 @@ void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
> u64 temp = (__force u64) rtime;
>
> temp *= (__force u64) utime;
> - do_div(temp, (__force u32) total);
> + temp = div64_u64(temp, total);
> utime = (__force cputime_t) temp;
> } else
> utime = rtime;
> @@ -2999,7 +2999,7 @@ void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
> u64 temp = (__force u64) rtime;
>
> temp *= (__force u64) cputime.utime;
> - do_div(temp, (__force u32) total);
> + temp = div64_u64(temp, total);
> utime = (__force cputime_t) temp;
> } else
> utime = rtime;
I'm not entirely sure why it takes 19 days, suppose we have HZ=1000 and
your app never idles, it still takes 2^32/1000 seconds ~50 days to
overflow that u32.
Anyway, yes your patch avoids the /0 issue, but it leaves the other
problem with that code..
rtime * utime / total
The multiplication can overflow the u64 at which point you're staring at
complete rubbish, this happens at about that same point.
So I figure we need to use the shiny new mult_frac() primitive.
32bit platforms are going to be staring at crap either way though, since
their entire time accounting (cputime_t) will start warping at that
point,.. not sure what if anything we should do about that though..
anybody?
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-04-25 12:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-16 16:29 [PATCH] fix oops in updating thread cputime and task time Fawzi Mohamed
2012-04-21 7:23 ` Fawzi Mohamed
2012-04-25 12:10 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).