linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cputime: Avoid multiplication overflow on utime scaling
@ 2013-01-26 16:19 Frederic Weisbecker
  2013-01-27 14:37 ` [tip:sched/core] " tip-bot for Frederic Weisbecker
  0 siblings, 1 reply; 2+ messages in thread
From: Frederic Weisbecker @ 2013-01-26 16:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Stanislaw Gruszka, Andrew Morton,
	Oleg Nesterov, Peter Zijlstra

We scale stime, utime values based on rtime (sum_exec_runtime converted
to jiffies). During scaling we multiple rtime * utime, which seems to be
fine, since both values are converted to u64, but it's not.

Let assume HZ is 1000 - 1ms tick. Process consist of 64 threads, run
for 1 day, threads utilize 100% cpu on user space. Machine has 64 cpus.

Process rtime = utime will be 64 * 24 * 60 * 60 * 1000 jiffies, which is
0x149970000. Multiplication rtime * utime result is 0x1a855771100000000,
which can not be covered in 64 bits.

Result of overflow is stall of utime values visible in user space
(prev_utime in kernel), even if application still consume lot of CPU
time.

A solution to solve this is to perform the multiplication on stime
instead of utime. It's easy to grow the utime value fast with a CPU
bound thread in userspace for example. Now we assume that doing so with
stime is much harder. In most cases a task shouldn't ever spend much
time in kernel space as it tends to sleep waiting for jobs completion
when they take long to achieve. IO is the typical example of that.

Hence scaling the cputime by performing the multiplication on stime
instead of utime should considerably reduce the chances of an overflow
on most workloads.

This is largely inspired by a patch from Stanislaw Gruszka:
http://lkml.kernel.org/r/20130107113144.GA7544@redhat.com

Inspired-by: Stanislaw Gruszka <sgruszka@redhat.com>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/cputime.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 293b202..825a956 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -509,11 +509,11 @@ EXPORT_SYMBOL_GPL(vtime_account);
 # define nsecs_to_cputime(__nsecs)	nsecs_to_jiffies(__nsecs)
 #endif
 
-static cputime_t scale_utime(cputime_t utime, cputime_t rtime, cputime_t total)
+static cputime_t scale_stime(cputime_t stime, cputime_t rtime, cputime_t total)
 {
 	u64 temp = (__force u64) rtime;
 
-	temp *= (__force u64) utime;
+	temp *= (__force u64) stime;
 
 	if (sizeof(cputime_t) == 4)
 		temp = div_u64(temp, (__force u32) total);
@@ -531,10 +531,10 @@ static void cputime_adjust(struct task_cputime *curr,
 			   struct cputime *prev,
 			   cputime_t *ut, cputime_t *st)
 {
-	cputime_t rtime, utime, total;
+	cputime_t rtime, stime, total;
 
-	utime = curr->utime;
-	total = utime + curr->stime;
+	stime = curr->stime;
+	total = stime + curr->utime;
 
 	/*
 	 * Tick based cputime accounting depend on random scheduling
@@ -549,17 +549,17 @@ static void cputime_adjust(struct task_cputime *curr,
 	rtime = nsecs_to_cputime(curr->sum_exec_runtime);
 
 	if (total)
-		utime = scale_utime(utime, rtime, total);
+		stime = scale_stime(stime, rtime, total);
 	else
-		utime = rtime;
+		stime = rtime;
 
 	/*
 	 * If the tick based count grows faster than the scheduler one,
 	 * the result of the scaling may go backward.
 	 * Let's enforce monotonicity.
 	 */
-	prev->utime = max(prev->utime, utime);
-	prev->stime = max(prev->stime, rtime - prev->utime);
+	prev->stime = max(prev->stime, stime);
+	prev->utime = max(prev->utime, rtime - prev->stime);
 
 	*ut = prev->utime;
 	*st = prev->stime;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [tip:sched/core] cputime: Avoid multiplication overflow on utime scaling
  2013-01-26 16:19 [PATCH] cputime: Avoid multiplication overflow on utime scaling Frederic Weisbecker
@ 2013-01-27 14:37 ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 2+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-01-27 14:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, fweisbec, akpm, tglx, oleg, sgruszka

Commit-ID:  62188451f0d63add7ad0cd2a1ae269d600c1663d
Gitweb:     http://git.kernel.org/tip/62188451f0d63add7ad0cd2a1ae269d600c1663d
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Sat, 26 Jan 2013 17:19:42 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 27 Jan 2013 14:04:44 +0100

cputime: Avoid multiplication overflow on utime scaling

We scale stime, utime values based on rtime (sum_exec_runtime
converted to jiffies). During scaling we multiple rtime * utime,
which seems to be fine, since both values are converted to u64,
but it's not.

Let assume HZ is 1000 - 1ms tick. Process consist of 64 threads,
run for 1 day, threads utilize 100% cpu on user space. Machine
has 64 cpus.

Process rtime = utime will be 64 * 24 * 60 * 60 * 1000 jiffies,
which is 0x149970000. Multiplication rtime * utime result is
0x1a855771100000000, which can not be covered in 64 bits.

Result of overflow is stall of utime values visible in user
space (prev_utime in kernel), even if application still consume
lot of CPU time.

A solution to solve this is to perform the multiplication on
stime instead of utime. It's easy to grow the utime value fast
with a CPU bound thread in userspace for example. Now we assume
that doing so with stime is much harder. In most cases a task
shouldn't ever spend much time in kernel space as it tends to
sleep waiting for jobs completion when they take long to
achieve. IO is the typical example of that.

Hence scaling the cputime by performing the multiplication on
stime instead of utime should considerably reduce the chances of
an overflow on most workloads.

This is largely inspired by a patch from Stanislaw Gruszka:
http://lkml.kernel.org/r/20130107113144.GA7544@redhat.com

Inspired-by: Stanislaw Gruszka <sgruszka@redhat.com>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1359217182-25184-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cputime.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 293b202..825a956 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -509,11 +509,11 @@ EXPORT_SYMBOL_GPL(vtime_account);
 # define nsecs_to_cputime(__nsecs)	nsecs_to_jiffies(__nsecs)
 #endif
 
-static cputime_t scale_utime(cputime_t utime, cputime_t rtime, cputime_t total)
+static cputime_t scale_stime(cputime_t stime, cputime_t rtime, cputime_t total)
 {
 	u64 temp = (__force u64) rtime;
 
-	temp *= (__force u64) utime;
+	temp *= (__force u64) stime;
 
 	if (sizeof(cputime_t) == 4)
 		temp = div_u64(temp, (__force u32) total);
@@ -531,10 +531,10 @@ static void cputime_adjust(struct task_cputime *curr,
 			   struct cputime *prev,
 			   cputime_t *ut, cputime_t *st)
 {
-	cputime_t rtime, utime, total;
+	cputime_t rtime, stime, total;
 
-	utime = curr->utime;
-	total = utime + curr->stime;
+	stime = curr->stime;
+	total = stime + curr->utime;
 
 	/*
 	 * Tick based cputime accounting depend on random scheduling
@@ -549,17 +549,17 @@ static void cputime_adjust(struct task_cputime *curr,
 	rtime = nsecs_to_cputime(curr->sum_exec_runtime);
 
 	if (total)
-		utime = scale_utime(utime, rtime, total);
+		stime = scale_stime(stime, rtime, total);
 	else
-		utime = rtime;
+		stime = rtime;
 
 	/*
 	 * If the tick based count grows faster than the scheduler one,
 	 * the result of the scaling may go backward.
 	 * Let's enforce monotonicity.
 	 */
-	prev->utime = max(prev->utime, utime);
-	prev->stime = max(prev->stime, rtime - prev->utime);
+	prev->stime = max(prev->stime, stime);
+	prev->utime = max(prev->utime, rtime - prev->stime);
 
 	*ut = prev->utime;
 	*st = prev->stime;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-01-27 14:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-26 16:19 [PATCH] cputime: Avoid multiplication overflow on utime scaling Frederic Weisbecker
2013-01-27 14:37 ` [tip:sched/core] " tip-bot for Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).