linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 2.6.16-mm2 0/9] sched throttle tree extract
@ 2006-04-01  8:28 Mike Galbraith
  2006-04-01  8:33 ` [patch 2.6.16-mm2 1/9] sched throttle tree extract - ignore invalid timestamps Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  8:28 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

Greetings,

The following 9 patches is an extraction of my scheduler throttling tree
against 2.6.16-mm2.  The patch below has already been applied, and is
only included such that anyone applying these patches to virgin source
will see no offsets.

	-Mike

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c.org	2006-03-31 09:56:37.000000000 +0200
+++ linux-2.6.16-mm2/kernel/sched.c	2006-03-31 13:32:40.000000000 +0200
@@ -820,7 +820,7 @@ static void __activate_task(task_t *p, r
 {
 	prio_array_t *target = rq->active;
 
-	if (unlikely(batch_task(p) || expired_starving(rq)))
+	if (unlikely(batch_task(p) || (expired_starving(rq) && !rt_task(p))))
 		target = rq->expired;
 	enqueue_task(p, target);
 	inc_nr_running(p, rq);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 1/9] sched throttle tree extract - ignore invalid timestamps
  2006-04-01  8:28 [patch 2.6.16-mm2 0/9] sched throttle tree extract Mike Galbraith
@ 2006-04-01  8:33 ` Mike Galbraith
  2006-04-01  8:38   ` [patch 2.6.16-mm2 2/9] sched throttle tree extract - fix potential task uninterruptible bug Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  8:33 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

On an SMP system, a task can awaken on a different cpu from where it
went to sleep.  Timestamp inaccuracies can result from the attempt to
compensate for clock drift.  Ignore resulting timewarps.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c-0.fix_rt	2006-03-23 15:01:41.000000000 +0100
+++ linux-2.6.16-mm2/kernel/sched.c	2006-03-23 15:02:25.000000000 +0100
@@ -845,6 +845,15 @@ static int recalc_task_prio(task_t *p, u
 	unsigned long long __sleep_time = now - p->timestamp;
 	unsigned long sleep_time;
 
+	/*
+	 * On SMP systems, a task can go to sleep on one CPU
+	 * and wake up on another.  When this happens, now can
+	 * end up being less than p->timestamp for short sleeps.
+	 * Ignore these, they're insignificant.
+	 */
+	if (unlikely(now < p->timestamp))
+		__sleep_time = 0;
+
 	if (batch_task(p))
 		sleep_time = 0;
 	else {



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 2/9] sched throttle tree extract - fix potential task uninterruptible bug
  2006-04-01  8:33 ` [patch 2.6.16-mm2 1/9] sched throttle tree extract - ignore invalid timestamps Mike Galbraith
@ 2006-04-01  8:38   ` Mike Galbraith
  2006-04-01  8:44     ` [patch 2.6.16-mm2 3/9] sched throttle tree extract - remove IO priority barrier Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  8:38 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch fixes a bug waiting for a place to happen should anyone ever
combine TASK_NONINTERACTIVE with TASK_UNINTERRUPTIBLE.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c-1.timewarp	2006-03-23 15:04:42.000000000 +0100
+++ linux-2.6.16-mm2/kernel/sched.c	2006-03-23 15:07:08.000000000 +0100
@@ -1457,7 +1457,7 @@ out_set_cpu:
 
 out_activate:
 #endif /* CONFIG_SMP */
-	if (old_state == TASK_UNINTERRUPTIBLE) {
+	if (old_state & TASK_UNINTERRUPTIBLE) {
 		rq->nr_uninterruptible--;
 		/*
 		 * Tasks on involuntary sleep don't earn
@@ -3167,7 +3167,7 @@ need_resched_nonpreemptible:
 				unlikely(signal_pending(prev))))
 			prev->state = TASK_RUNNING;
 		else {
-			if (prev->state == TASK_UNINTERRUPTIBLE)
+			if (prev->state & TASK_UNINTERRUPTIBLE)
 				rq->nr_uninterruptible++;
 			deactivate_task(prev, rq);
 		}



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 3/9] sched throttle tree extract - remove IO priority barrier
  2006-04-01  8:38   ` [patch 2.6.16-mm2 2/9] sched throttle tree extract - fix potential task uninterruptible bug Mike Galbraith
@ 2006-04-01  8:44     ` Mike Galbraith
  2006-04-01  8:51       ` [patch 2.6.16-mm2 4/9] sched throttle tree extract - remove kthread barrier Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  8:44 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch removes the barrier preventing IO bound tasks from competing
against highly interactive tasks for cpu time.  This barrier has been
demonstrated to cause starvation of IO bound tasks, and in testing, I've
found it to now be unnecessary in any case.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c-2.fix_uninterruptible	2006-03-31 13:34:04.000000000 +0200
+++ linux-2.6.16-mm2/kernel/sched.c	2006-04-01 08:57:40.000000000 +0200
@@ -880,21 +880,6 @@ static int recalc_task_prio(task_t *p, u
 					p->sleep_avg = ceiling;
 		} else {
 			/*
-			 * Tasks waking from uninterruptible sleep are
-			 * limited in their sleep_avg rise as they
-			 * are likely to be waiting on I/O
-			 */
-			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
-				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
-					sleep_time = 0;
-				else if (p->sleep_avg + sleep_time >=
-						INTERACTIVE_SLEEP(p)) {
-					p->sleep_avg = INTERACTIVE_SLEEP(p);
-					sleep_time = 0;
-				}
-			}
-
-			/*
 			 * This code gives a bonus to interactive tasks.
 			 *
 			 * The boost works by updating the 'average sleep time'



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 4/9] sched throttle tree extract - remove kthread barrier
  2006-04-01  8:44     ` [patch 2.6.16-mm2 3/9] sched throttle tree extract - remove IO priority barrier Mike Galbraith
@ 2006-04-01  8:51       ` Mike Galbraith
  2006-04-01  8:59         ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  8:51 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch removes the last barrier between a normal task using dynamic
priorities and kthreads using the same.  In testing, this separation has
proven to be unnecessary.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c-3.remove_IO_barrier	2006-04-01 08:57:40.000000000 +0200
+++ linux-2.6.16-mm2/kernel/sched.c	2006-04-01 09:01:54.000000000 +0200
@@ -865,13 +865,13 @@ static int recalc_task_prio(task_t *p, u
 
 	if (likely(sleep_time > 0)) {
 		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle. They will only have their sleep_avg increased to a
+		 * Tasks that sleep a long time are categorised as idle.
+		 * They will only have their sleep_avg increased to a
 		 * level that makes them just interactive priority to stay
 		 * active yet prevent them suddenly becoming cpu hogs and
 		 * starving other processes.
 		 */
-		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
+		if (sleep_time > INTERACTIVE_SLEEP(p)) {
 				unsigned long ceiling;
 
 				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic
  2006-04-01  8:51       ` [patch 2.6.16-mm2 4/9] sched throttle tree extract - remove kthread barrier Mike Galbraith
@ 2006-04-01  8:59         ` Mike Galbraith
  2006-04-01  9:12           ` [patch 2.6.16-mm2 6/9] sched throttle tree extract - move division to slow path Mike Galbraith
  2006-04-01 16:53           ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Lee Revell
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  8:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch corrects the idle sleep logic to place a long sleeping task
into the runqueue at a barely interactive priority such that it can not
destroy interactivity should it immediately begin consuming massive cpu.

It further ensures that no task will cross the boundary from normal task
to interactive task without first stopping at the border before moving
on.  Note, that this in no way hinders a long sleeping task from waking
at max priority.  They stop on the way up, but are not truncated.  This
only prevents long sleeping tasks from slamming straight to max
interactive with one sleep, and causing trouble.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c-4.remove_kthread_barrier	2006-04-01 09:01:54.000000000 +0200
+++ linux-2.6.16-mm2/kernel/sched.c	2006-04-01 09:12:51.000000000 +0200
@@ -99,6 +99,9 @@
 #define MAX_SLEEP_AVG		(DEF_TIMESLICE * MAX_BONUS)
 #define STARVATION_LIMIT	(MAX_SLEEP_AVG)
 #define NS_MAX_SLEEP_AVG	(JIFFIES_TO_NS(MAX_SLEEP_AVG))
+#define NS_MAX_SLEEP_AVG_PCNT	(NS_MAX_SLEEP_AVG / 100)
+#define PCNT_PER_DYNPRIO	(100 / MAX_BONUS)
+#define NS_PER_DYNPRIO		(PCNT_PER_DYNPRIO * NS_MAX_SLEEP_AVG_PCNT)
 
 /*
  * If a task is 'interactive' then we reinsert it in the active
@@ -153,9 +156,23 @@
 #define TASK_INTERACTIVE(p) \
 	((p)->prio <= (p)->static_prio - DELTA(p))
 
-#define INTERACTIVE_SLEEP(p) \
-	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
-		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+#define INTERACTIVE_SLEEP_AVG(p) \
+	(min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA(p)) / \
+	MAX_BONUS), NS_MAX_SLEEP_AVG))
+
+/*
+ * Returns whether a task has been asleep long enough to be considered idle.
+ * The metric is whether this quantity of sleep would promote the task more
+ * than one priority beyond marginally interactive.
+ */
+static int task_interactive_idle(task_t *p, unsigned long sleep_time)
+{
+	unsigned long ceiling = (CURRENT_BONUS(p) + 2) * NS_PER_DYNPRIO;
+
+	if (p->sleep_avg + sleep_time < ceiling)
+		return 0;
+	return p->sleep_avg + sleep_time >= INTERACTIVE_SLEEP_AVG(p);
+}
 
 #define TASK_PREEMPTS_CURR(p, rq) \
 	((p)->prio < (rq)->curr->prio)
@@ -871,13 +888,25 @@ static int recalc_task_prio(task_t *p, u
 		 * active yet prevent them suddenly becoming cpu hogs and
 		 * starving other processes.
 		 */
-		if (sleep_time > INTERACTIVE_SLEEP(p)) {
-				unsigned long ceiling;
+		if (task_interactive_idle(p, sleep_time)) {
+			unsigned long ceiling = INTERACTIVE_SLEEP_AVG(p);
+
+			/*
+			 * Promote previously interactive task.
+			 */
+			if (p->sleep_avg > ceiling) {
+				ceiling = p->sleep_avg / NS_PER_DYNPRIO;
+				if (ceiling < MAX_BONUS)
+					ceiling++;
+				ceiling *= NS_PER_DYNPRIO;
+			} else {
+				ceiling += p->slice_time_ns >> 2;
+				if (ceiling > NS_MAX_SLEEP_AVG)
+					ceiling = NS_MAX_SLEEP_AVG;
+			}
 
-				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-					DEF_TIMESLICE);
-				if (p->sleep_avg < ceiling)
-					p->sleep_avg = ceiling;
+			if (p->sleep_avg < ceiling)
+				p->sleep_avg = ceiling;
 		} else {
 			/*
 			 * This code gives a bonus to interactive tasks.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 6/9] sched throttle tree extract - move division to slow path
  2006-04-01  8:59         ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Mike Galbraith
@ 2006-04-01  9:12           ` Mike Galbraith
  2006-04-01  9:23             ` [patch 2.6.16-mm2 7/9] sched throttle tree extract - implement throttling Mike Galbraith
  2006-04-01 16:53           ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Lee Revell
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  9:12 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch moves division of runtime from the fast path into the slow
path.  In doing so, it converts the task's time slice into nanoseconds,
which also disallows high frequency scheduling tasks from stealing time
from their neighbors by ducking the timer interrupt. (two birds, one
stone)

slice_time_ns is identical in function to time_slice, but conversion of
everything that referred to time_slice to nanoseconds just made a mess.
Ergo, I close the lesser of two evils, and duplicated what was necessary
to allow use of nanoseconds only where needed.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/include/linux/sched.h.org	2006-03-31 11:25:02.000000000 +0200
+++ linux-2.6.16-mm2/include/linux/sched.h	2006-03-31 11:27:32.000000000 +0200
@@ -744,6 +744,7 @@ struct task_struct {
 	unsigned long policy;
 	cpumask_t cpus_allowed;
 	unsigned int time_slice, first_time_slice;
+	long slice_time_ns;
 
 #ifdef CONFIG_SCHEDSTATS
 	struct sched_info sched_info;
--- linux-2.6.16-mm2/kernel/sched.c-5.interactive_idle	2006-04-01 09:12:51.000000000 +0200
+++ linux-2.6.16-mm2/kernel/sched.c	2006-03-31 13:34:12.000000000 +0200
@@ -102,6 +102,7 @@
 #define NS_MAX_SLEEP_AVG_PCNT	(NS_MAX_SLEEP_AVG / 100)
 #define PCNT_PER_DYNPRIO	(100 / MAX_BONUS)
 #define NS_PER_DYNPRIO		(PCNT_PER_DYNPRIO * NS_MAX_SLEEP_AVG_PCNT)
+#define NS_TICK			(1000000000 / HZ)
 
 /*
  * If a task is 'interactive' then we reinsert it in the active
@@ -156,6 +157,8 @@
 #define TASK_INTERACTIVE(p) \
 	((p)->prio <= (p)->static_prio - DELTA(p))
 
+#define SLEEP_AVG_DIVISOR(p) (1 + CURRENT_BONUS(p))
+
 #define INTERACTIVE_SLEEP_AVG(p) \
 	(min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA(p)) / \
 	MAX_BONUS), NS_MAX_SLEEP_AVG))
@@ -202,6 +205,11 @@ static inline unsigned int task_timeslic
 	return static_prio_timeslice(p->static_prio);
 }
 
+static inline unsigned int task_timeslice_ns(task_t *p)
+{
+	return static_prio_timeslice(p->static_prio) * NS_TICK;
+}
+
 #define task_hot(p, now, sd) ((long long) ((now) - (p)->last_ran)	\
 				< (long long) (sd)->cache_hot_time)
 
@@ -1563,21 +1571,23 @@ void fastcall sched_fork(task_t *p, int 
 	 * resulting in more scheduling fairness.
 	 */
 	local_irq_disable();
-	p->time_slice = (current->time_slice + 1) >> 1;
+	p->slice_time_ns = current->slice_time_ns >> 1;
+	if (unlikely(p->slice_time_ns < NS_TICK))
+		p->slice_time_ns = NS_TICK;
+	p->time_slice = NS_TO_JIFFIES(p->slice_time_ns);
 	/*
 	 * The remainder of the first timeslice might be recovered by
 	 * the parent if the child exits early enough.
 	 */
 	p->first_time_slice = 1;
-	current->time_slice >>= 1;
 	p->timestamp = sched_clock();
-	if (unlikely(!current->time_slice)) {
+	current->slice_time_ns >>= 1;
+	if (unlikely(current->slice_time_ns < NS_TICK)) {
 		/*
 		 * This case is rare, it happens when the parent has only
 		 * a single jiffy left from its timeslice. Taking the
 		 * runqueue lock is not a problem.
 		 */
-		current->time_slice = 1;
 		scheduler_tick();
 	}
 	local_irq_enable();
@@ -1686,9 +1696,9 @@ void fastcall sched_exit(task_t *p)
 	 */
 	rq = task_rq_lock(p->parent, &flags);
 	if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
-		p->parent->time_slice += p->time_slice;
-		if (unlikely(p->parent->time_slice > task_timeslice(p)))
-			p->parent->time_slice = task_timeslice(p);
+		p->parent->slice_time_ns += p->slice_time_ns;
+		if (unlikely(p->parent->slice_time_ns > task_timeslice_ns(p)))
+			p->parent->slice_time_ns = task_timeslice_ns(p);
 	}
 	if (p->sleep_avg < p->parent->sleep_avg)
 		p->parent->sleep_avg = p->parent->sleep_avg /
@@ -2709,7 +2719,9 @@ static inline void update_cpu_clock(task
 				    unsigned long long now)
 {
 	unsigned long long last = max(p->timestamp, rq->timestamp_last_tick);
-	p->sched_time += now - last;
+	long run_time = now - last;
+	p->sched_time += run_time;
+	p->slice_time_ns -= run_time;
 }
 
 /*
@@ -2837,13 +2849,21 @@ void scheduler_tick(void)
 	 * priority until it either goes to sleep or uses up its
 	 * timeslice. This makes it possible for interactive tasks
 	 * to use up their timeslices at their highest priority levels.
-	 */
+	 *
+	 * We don't want to update every task at each tick, so we
+	 * update a task's time_slice according to how much cpu
+	 * time it has received since it was last ticked.
+	 */
+	if (p->slice_time_ns < NS_TICK)
+		p->time_slice = 0;
+	else p->time_slice = NS_TO_JIFFIES(p->slice_time_ns);
 	if (rt_task(p)) {
 		/*
 		 * RR tasks need a special form of timeslice management.
 		 * FIFO tasks have no timeslices.
 		 */
-		if ((p->policy == SCHED_RR) && !--p->time_slice) {
+		if ((p->policy == SCHED_RR) && !p->time_slice) {
+			p->slice_time_ns = task_timeslice_ns(p);
 			p->time_slice = task_timeslice(p);
 			p->first_time_slice = 0;
 			set_tsk_need_resched(p);
@@ -2853,11 +2873,22 @@ void scheduler_tick(void)
 		}
 		goto out_unlock;
 	}
-	if (!--p->time_slice) {
+	if (!p->time_slice) {
+		long slice_time_ns = task_timeslice_ns(p);
+		long run_time = (-1 * p->slice_time_ns) + slice_time_ns;
 		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
-		p->prio = effective_prio(p);
+		p->slice_time_ns = slice_time_ns;
 		p->time_slice = task_timeslice(p);
+		/*
+		 * Tasks are charged proportionately less run_time at high
+		 * sleep_avg to delay them losing their interactive status
+		 */
+		run_time /= SLEEP_AVG_DIVISOR(p);
+		if (p->sleep_avg >= run_time)
+			p->sleep_avg -= run_time;
+		else p->sleep_avg = 0;
+		p->prio = effective_prio(p);
 		p->first_time_slice = 0;
 
 		if (!rq->expired_timestamp)
@@ -3122,7 +3153,6 @@ asmlinkage void __sched schedule(void)
 	prio_array_t *array;
 	struct list_head *queue;
 	unsigned long long now;
-	unsigned long run_time;
 	int cpu, idx, new_prio;
 
 	/*
@@ -3156,19 +3186,6 @@ need_resched_nonpreemptible:
 
 	schedstat_inc(rq, sched_cnt);
 	now = sched_clock();
-	if (likely((long long)(now - prev->timestamp) < NS_MAX_SLEEP_AVG)) {
-		run_time = now - prev->timestamp;
-		if (unlikely((long long)(now - prev->timestamp) < 0))
-			run_time = 0;
-	} else
-		run_time = NS_MAX_SLEEP_AVG;
-
-	/*
-	 * Tasks charged proportionately less run_time at high sleep_avg to
-	 * delay them losing their interactive status
-	 */
-	run_time /= (CURRENT_BONUS(prev) ? : 1);
-
 	spin_lock_irq(&rq->lock);
 
 	if (unlikely(prev->flags & PF_DEAD))
@@ -3242,7 +3259,6 @@ go_idle:
 		if (next->sleep_type == SLEEP_INTERACTIVE)
 			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
 
-		array = next->array;
 		new_prio = recalc_task_prio(next, next->timestamp + delta);
 
 		if (unlikely(next->prio != new_prio)) {
@@ -3262,9 +3278,6 @@ switch_tasks:
 
 	update_cpu_clock(prev, rq, now);
 
-	prev->sleep_avg -= run_time;
-	if ((long)prev->sleep_avg <= 0)
-		prev->sleep_avg = 0;
 	prev->timestamp = prev->last_ran = now;
 
 	sched_info_switch(prev, next);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 7/9] sched throttle tree extract - implement throttling
  2006-04-01  9:12           ` [patch 2.6.16-mm2 6/9] sched throttle tree extract - move division to slow path Mike Galbraith
@ 2006-04-01  9:23             ` Mike Galbraith
  2006-04-01  9:26               ` [patch 2.6.16-mm2 8/9] sched throttle tree extract - maximize timeslice accounting Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  9:23 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch implements task throttling.  In order to determine whether a
task needs throttling, compute slice_avg, which is the upper limit a
task's sleep_avg can be and be sane.  A task which consumes no more than
what is expected accrues CPU credit, which may be used later, on a slice
by slice basis.  This allows interactive tasks to perform in high cpu
usage bursts, yet we retain control over their long term cpu usage.  A
task which exhausts it's earned credit will have it's sleep_avg trimmed,
and consequently it's priority trimmed to match it's actual cpu usage.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/include/linux/sched.h.org	2006-02-28 06:11:17.000000000 +0100
+++ linux-2.6.16-mm2/include/linux/sched.h	2006-02-28 06:11:41.000000000 +0100
@@ -736,14 +736,14 @@ struct task_struct {
 	unsigned short ioprio;
 	unsigned int btrace_seq;
 
-	unsigned long sleep_avg;
+	unsigned long sleep_avg, last_slice, throttle;
 	unsigned long long timestamp, last_ran;
 	unsigned long long sched_time; /* sched_clock time spent running */
 	enum sleep_type sleep_type;
 
 	unsigned long policy;
 	cpumask_t cpus_allowed;
-	unsigned int time_slice, first_time_slice;
+	unsigned int time_slice, slice_info;
 	long slice_time_ns;
 
 #ifdef CONFIG_SCHEDSTATS
--- linux-2.6.16-mm2/kernel/sched.c-6.move_division	2006-03-23 15:18:48.000000000 +0100
+++ linux-2.6.16-mm2/kernel/sched.c	2006-03-23 16:28:04.000000000 +0100
@@ -80,6 +80,21 @@
 #define NS_TO_JIFFIES(TIME)	((TIME) / (1000000000 / HZ))
 #define JIFFIES_TO_NS(TIME)	((TIME) * (1000000000 / HZ))
 
+#if (BITS_PER_LONG < 64)
+#define JIFFIES_TO_NS64(TIME) \
+	((unsigned long long)(TIME) * ((unsigned long) (1000000000 / HZ)))
+
+#define NS64_TO_JIFFIES(TIME) \
+	((((unsigned long long)((TIME)) >> BITS_PER_LONG) * \
+	(1 + NS_TO_JIFFIES(~0UL))) + NS_TO_JIFFIES((unsigned long)(TIME)))
+#else /* BITS_PER_LONG < 64 */
+
+#define NS64_TO_JIFFIES(TIME) NS_TO_JIFFIES(TIME)
+#define JIFFIES_TO_NS64(TIME) JIFFIES_TO_NS(TIME)
+
+#endif /* BITS_PER_LONG < 64 */
+
+
 /*
  * These are the 'tuning knobs' of the scheduler:
  *
@@ -177,6 +192,115 @@ static int task_interactive_idle(task_t 
 	return p->sleep_avg + sleep_time >= INTERACTIVE_SLEEP_AVG(p);
 }
 
+/*
+ * Interactive boost can lead to starvation if the decision to
+ * boost a task turns out to be a bad one.  To combat this, we
+ * compute the sane upper limit for cpu usage 'slice_avg' based
+ * upon a task's sleep_avg, and use this information combined
+ * with a timer to determine when intervention is required.
+ *
+ * When a task is behaving as it's sleep_avg indicates it should,
+ * it's throttle is moved forward, otherwise it will timeout, and
+ * the task's priority will be lowered.
+ *
+ * Throttling tunables.
+ *
+ * CREDIT_C1: The amount of cpu time in seconds that a new task
+ *           will run completely free, ie the head start a task
+ *           has before it has to push it's timer forward to avoid
+ *           being throttled.  Each conforming slice thereafter
+ *           increases it's stored credit, and vice versa.
+ *
+ * CREDIT_C2: The maximum amount of CPU time in seconds a task
+ *           can store for later use.  When a task has no stored
+ *           credit left, now is time C2.  Tasks begin life with
+ *           C1 seconds credit, ie C2 is C1 seconds in front of
+ *           them, and the 'buffer' will grow in front of them
+ *           if they perform in a conformant manner.  The maximum
+ *           credit that fits in 32 bits jiffies is 42949 seconds.
+ */
+
+#define CREDIT_C1 10
+#define CREDIT_C2 14400
+
+#define C1 (CREDIT_C1 * MAX_BONUS * HZ)
+#define C2 (CREDIT_C2 * MAX_BONUS * HZ + C1)
+#define C3 (MAX_BONUS * C2)
+
+#define credit_exhausted(p, credit) \
+	(time_after_eq(jiffies, (p)->throttle + (credit)))
+
+/*
+ * Masks for p->slice_info, formerly p->first_time_slice.
+ * SLICE_FTS:   0x80000000  Task is in it's first ever timeslice.
+ * SLICE_NEW:   0x40000000  Slice refreshed.
+ * SLICE_SPA:   0x3FFE0000  Spare bits.
+ * SLICE_LTS:   0x0001FF80  Last time slice
+ * SLICE_AVG:   0x0000007F  Task slice_avg stored as percentage.
+ */
+#define SLICE_AVG_BITS    7
+#define SLICE_LTS_BITS   10
+#define SLICE_SPA_BITS   13
+#define SLICE_NEW_BITS    1
+#define SLICE_FTS_BITS    1
+
+#define SLICE_AVG_SHIFT   0
+#define SLICE_LTS_SHIFT   (SLICE_AVG_SHIFT + SLICE_AVG_BITS)
+#define SLICE_SPA_SHIFT   (SLICE_LTS_SHIFT + SLICE_LTS_BITS)
+#define SLICE_NEW_SHIFT   (SLICE_SPA_SHIFT + SLICE_SPA_BITS)
+#define SLICE_FTS_SHIFT   (SLICE_NEW_SHIFT + SLICE_NEW_BITS)
+
+#define INFO_MASK(x)      ((1U << (x))-1)
+#define SLICE_AVG_MASK    (INFO_MASK(SLICE_AVG_BITS) << SLICE_AVG_SHIFT)
+#define SLICE_LTS_MASK    (INFO_MASK(SLICE_LTS_BITS) << SLICE_LTS_SHIFT)
+#define SLICE_SPA_MASK    (INFO_MASK(SLICE_SPA_BITS) << SLICE_SPA_SHIFT)
+#define SLICE_NEW_MASK    (INFO_MASK(SLICE_NEW_BITS) << SLICE_NEW_SHIFT)
+#define SLICE_FTS_MASK    (INFO_MASK(SLICE_FTS_BITS) << SLICE_FTS_SHIFT)
+
+/*
+ * p->slice_info access macros.
+ */
+#define first_time_slice(p) ((p)->slice_info & SLICE_FTS_MASK)
+#define set_first_time_slice(p) ((p)->slice_info |= SLICE_FTS_MASK)
+#define clr_first_time_slice(p) ((p)->slice_info &= ~SLICE_FTS_MASK)
+
+#define slice_is_new(p) ((p)->slice_info & SLICE_NEW_MASK)
+#define set_slice_is_new(p) ((p)->slice_info |= SLICE_NEW_MASK)
+#define clr_slice_is_new(p) ((p)->slice_info &= ~SLICE_NEW_MASK)
+
+#define last_slice(p) (((p)->slice_info & SLICE_LTS_MASK) >> SLICE_LTS_SHIFT)
+#define set_last_slice(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_LTS_MASK) | (((n) << SLICE_LTS_SHIFT) & SLICE_LTS_MASK)))
+
+#define NS_SLEEP_AVG_PCNT (NS_MAX_SLEEP_AVG / 100)
+
+#define slice_avg(p) ((typeof((p)->sleep_avg)) \
+	((((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT) * \
+	NS_SLEEP_AVG_PCNT))
+#define set_slice_avg(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | ((((n) / NS_SLEEP_AVG_PCNT) \
+	<< SLICE_AVG_SHIFT) & SLICE_AVG_MASK)))
+#define slice_avg_raw(p)  \
+	(((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT)
+#define set_slice_avg_raw(p, n) ((p)->slice_info = (((p)->slice_info & \
+	~SLICE_AVG_MASK) | (((n) << SLICE_AVG_SHIFT) & SLICE_AVG_MASK)))
+
+/*
+ * cpu usage macros.
+ */
+#define cpu_avg(p) \
+	(100 - slice_avg_raw(p))
+
+#define cpu_max(p) \
+	(100 - ((p)->sleep_avg / NS_SLEEP_AVG_PCNT))
+
+#define time_this_slice(p) \
+	(jiffies - (p)->last_slice)
+
+#define cpu_this_slice(p) \
+	(100 * last_slice(p) / max((unsigned) time_this_slice(p), \
+	(unsigned) last_slice(p)))
+
 #define TASK_PREEMPTS_CURR(p, rq) \
 	((p)->prio < (rq)->curr->prio)
 
@@ -890,13 +1014,20 @@ static int recalc_task_prio(task_t *p, u
 
 	if (likely(sleep_time > 0)) {
 		/*
+		 * Update throttle position.
+		 */
+		p->throttle += NS64_TO_JIFFIES(__sleep_time);
+		if (time_before(jiffies, p->throttle))
+			p->throttle = jiffies;
+
+		/*
 		 * Tasks that sleep a long time are categorised as idle.
 		 * They will only have their sleep_avg increased to a
 		 * level that makes them just interactive priority to stay
 		 * active yet prevent them suddenly becoming cpu hogs and
 		 * starving other processes.
 		 */
-		if (task_interactive_idle(p, sleep_time)) {
+		if (C2 && task_interactive_idle(p, sleep_time)) {
 			unsigned long ceiling = INTERACTIVE_SLEEP_AVG(p);
 
 			/*
@@ -951,6 +1082,7 @@ static void activate_task(task_t *p, run
 		runqueue_t *this_rq = this_rq();
 		now = (now - this_rq->timestamp_last_tick)
 			+ rq->timestamp_last_tick;
+
 	}
 #endif
 
@@ -1571,16 +1703,28 @@ void fastcall sched_fork(task_t *p, int 
 	 * resulting in more scheduling fairness.
 	 */
 	local_irq_disable();
-	p->slice_time_ns = current->slice_time_ns >> 1;
-	if (unlikely(p->slice_time_ns < NS_TICK))
-		p->slice_time_ns = NS_TICK;
+	p->slice_time_ns = (current->slice_time_ns + NS_TICK) >> 1;
 	p->time_slice = NS_TO_JIFFIES(p->slice_time_ns);
+	if (unlikely(p->slice_time_ns < NS_TICK)) {
+		p->slice_time_ns = NS_TICK;
+		p->time_slice = 1;
+	}
 	/*
 	 * The remainder of the first timeslice might be recovered by
 	 * the parent if the child exits early enough.
 	 */
-	p->first_time_slice = 1;
+	set_first_time_slice(p);
 	p->timestamp = sched_clock();
+
+	/*
+	 * Set up slice_info for the child.
+	 */
+	set_slice_avg(p, p->sleep_avg);
+	set_last_slice(p, p->time_slice);
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+	p->throttle = jiffies - C2 + C1;
+
 	current->slice_time_ns >>= 1;
 	if (unlikely(current->slice_time_ns < NS_TICK)) {
 		/*
@@ -1695,7 +1839,7 @@ void fastcall sched_exit(task_t *p)
 	 * the sleep_avg of the parent as well.
 	 */
 	rq = task_rq_lock(p->parent, &flags);
-	if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
+	if (first_time_slice(p) && task_cpu(p) == task_cpu(p->parent)) {
 		p->parent->slice_time_ns += p->slice_time_ns;
 		if (unlikely(p->parent->slice_time_ns > task_timeslice_ns(p)))
 			p->parent->slice_time_ns = task_timeslice_ns(p);
@@ -2813,6 +2957,89 @@ void account_steal_time(struct task_stru
 }
 
 /*
+ * Refresh timeslice and associated slice information.
+ * @p: the process to refresh.
+ */
+static void refresh_timeslice(task_t *p)
+{
+	unsigned long slice_time = jiffies - p->last_slice;
+	unsigned int slice = last_slice(p);
+	unsigned int slice_avg, cpu, idle;
+	long run_time = -1 * p->slice_time_ns;
+	int w = MAX_BONUS, delta, bonus;
+
+	/*
+	 * Update time_slice.
+	 */
+	p->slice_time_ns = task_timeslice_ns(p);
+	p->time_slice = task_timeslice(p);
+	set_last_slice(p, p->time_slice);
+
+	/*
+	 * Update sleep_avg.
+	 *
+	 * Tasks charged proportionately less run_time at high
+	 * sleep_avg to delay them losing their interactive status
+	 */
+	run_time += JIFFIES_TO_NS(slice);
+	if (C2)
+		run_time /= SLEEP_AVG_DIVISOR(p);
+	if (p->sleep_avg >= run_time)
+		p->sleep_avg -= run_time;
+	else p->sleep_avg = 0;
+
+	/*
+	 * Update slice_avg.
+	 */
+	slice_avg = slice_avg_raw(p);
+	cpu = cpu_this_slice(p);
+	idle = 100 - cpu;
+
+	delta = max(slice_avg, idle) - min(slice_avg, idle);
+	w = 1 + (delta / w);
+	slice_avg = (w * slice_avg + idle) / (w + 1);
+	set_slice_avg_raw(p, slice_avg);
+
+	/*
+	 * If we've hit the timeout, we aren't draining enough sleep_avg
+	 * to catch up with the task's cpu usage.  Up the ante to bring
+	 * the task back toward balance.  This is important, because it
+	 * allows interactive tasks to push their throttle back enough
+	 * that they can both sustain, and rapidly recover from throttling
+	 * instead of descending into C3.
+	 */
+	if (credit_exhausted(p, C2) && p->sleep_avg > slice_avg(p)) {
+		unsigned long run_time = p->sleep_avg - slice_avg(p);
+		run_time /= w;
+		if (p->sleep_avg >= run_time)
+			p->sleep_avg -= run_time;
+	}
+
+	/*
+	 * Update throttle position.
+	 */
+	if (cpu < cpu_max(p) + PCNT_PER_DYNPRIO || !credit_exhausted(p, C1)) {
+		bonus = idle * PCNT_PER_DYNPRIO / 100;
+		p->throttle += (slice_time - slice) * bonus;
+	} else if (cpu >= cpu_max(p) + PCNT_PER_DYNPRIO) {
+		bonus = (cpu - cpu_max(p)) / PCNT_PER_DYNPRIO;
+		p->throttle -= slice_time * bonus;
+	}
+
+	if (time_before(jiffies, p->throttle))
+		p->throttle = jiffies;
+	else if (credit_exhausted(p, C3))
+		p->throttle = jiffies - C3;
+
+	/*
+	 * And finally, stamp and flag the new slice.
+	 */
+	clr_first_time_slice(p);
+	set_slice_is_new(p);
+	p->last_slice = jiffies;
+}
+
+/*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  *
@@ -2863,9 +3090,7 @@ void scheduler_tick(void)
 		 * FIFO tasks have no timeslices.
 		 */
 		if ((p->policy == SCHED_RR) && !p->time_slice) {
-			p->slice_time_ns = task_timeslice_ns(p);
-			p->time_slice = task_timeslice(p);
-			p->first_time_slice = 0;
+			refresh_timeslice(p);
 			set_tsk_need_resched(p);
 
 			/* put it at the end of the queue: */
@@ -2874,22 +3099,10 @@ void scheduler_tick(void)
 		goto out_unlock;
 	}
 	if (!p->time_slice) {
-		long slice_time_ns = task_timeslice_ns(p);
-		long run_time = (-1 * p->slice_time_ns) + slice_time_ns;
 		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
-		p->slice_time_ns = slice_time_ns;
-		p->time_slice = task_timeslice(p);
-		/*
-		 * Tasks are charged proportionately less run_time at high
-		 * sleep_avg to delay them losing their interactive status
-		 */
-		run_time /= SLEEP_AVG_DIVISOR(p);
-		if (p->sleep_avg >= run_time)
-			p->sleep_avg -= run_time;
-		else p->sleep_avg = 0;
+		refresh_timeslice(p);
 		p->prio = effective_prio(p);
-		p->first_time_slice = 0;
 
 		if (!rq->expired_timestamp)
 			rq->expired_timestamp = jiffies;
@@ -3280,6 +3493,14 @@ switch_tasks:
 
 	prev->timestamp = prev->last_ran = now;
 
+	/*
+	 * Tag start of execution of a new timeslice.
+	 */
+	if (unlikely(slice_is_new(next))) {
+		next->last_slice = jiffies;
+		clr_slice_is_new(next);
+	}
+
 	sched_info_switch(prev, next);
 	if (likely(prev != next)) {
 		next->timestamp = now;



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 8/9] sched throttle tree extract - maximize timeslice accounting
  2006-04-01  9:23             ` [patch 2.6.16-mm2 7/9] sched throttle tree extract - implement throttling Mike Galbraith
@ 2006-04-01  9:26               ` Mike Galbraith
  2006-04-01  9:31                 ` [patch 2.6.16-mm2 9/9] sched throttle tree extract - export tunables Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  9:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch maximizes time slice accounting.  A task which receives too
much CPU time due to missing the timer interrupt will have the excess
deducted from it's next slice.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c-7.implement_throttle	2006-03-24 09:36:08.000000000 +0100
+++ linux-2.6.16-mm2/kernel/sched.c	2006-03-24 09:40:33.000000000 +0100
@@ -2966,13 +2966,28 @@ static void refresh_timeslice(task_t *p)
 	unsigned int slice = last_slice(p);
 	unsigned int slice_avg, cpu, idle;
 	long run_time = -1 * p->slice_time_ns;
+	long slice_time_ns = task_timeslice_ns(p);
 	int w = MAX_BONUS, delta, bonus;
 
 	/*
-	 * Update time_slice.
+	 * Update time_slice.  Account for unused fragment,
+	 * or excess time received due to missed tick.
 	 */
-	p->slice_time_ns = task_timeslice_ns(p);
-	p->time_slice = task_timeslice(p);
+	p->slice_time_ns += slice_time_ns;
+	/*
+	 * Not common, but this does happen on SMP systems.
+	 * Timeslice theft of this magnitude has never been
+	 * observed in the wild, so assume that this is BS,
+	 * and give the poor task it's full slice.  Theory:
+	 * mostly idle task migrates between CPUs numerous
+	 * times during it's slice, timestamp rounding leads
+	 * to wildly inaccurate calculation.  Rounding has
+	 * maximum effect on those who stretch their slice,
+	 * but is also fairly meaningless, so ignore it.
+	 */
+	if (unlikely(p->slice_time_ns < NS_TICK))
+		p->slice_time_ns = slice_time_ns;
+	p->time_slice = NS_TO_JIFFIES(p->slice_time_ns);
 	set_last_slice(p, p->time_slice);
 
 	/*



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 9/9] sched throttle tree extract - export tunables
  2006-04-01  9:26               ` [patch 2.6.16-mm2 8/9] sched throttle tree extract - maximize timeslice accounting Mike Galbraith
@ 2006-04-01  9:31                 ` Mike Galbraith
  2006-04-05 17:38                   ` [patch 2.6.16-mm2 10/9] sched throttle tree extract - kill interactive task feedback loop Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01  9:31 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

This patch is a convenience for people testing this patch series.  It
exports throttling tunables to userland, and is utterly disposable.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/include/linux/sysctl.h.org	2006-03-31 11:25:16.000000000 +0200
+++ linux-2.6.16-mm2/include/linux/sysctl.h	2006-03-31 14:08:28.000000000 +0200
@@ -148,6 +148,8 @@ enum
 	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
 	KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
 	KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */
+	KERN_SCHED_THROTTLE1=73,  /* int: throttling credit period 1 in secs */
+	KERN_SCHED_THROTTLE2=74,  /* int: throttling credit period 2 in secs */
 };
 
 
--- linux-2.6.16-mm2/kernel/sysctl.c.org	2006-03-31 11:24:50.000000000 +0200
+++ linux-2.6.16-mm2/kernel/sysctl.c	2006-03-31 14:08:28.000000000 +0200
@@ -73,6 +73,9 @@ extern int printk_ratelimit_burst;
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
+extern int credit_c1;
+extern int credit_c2;
+extern int credit_max;
 
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
 int unknown_nmi_panic;
@@ -230,6 +233,11 @@ static ctl_table root_table[] = {
 	{ .ctl_name = 0 }
 };
 
+/* Constants for minimum and maximum testing in vm_table and
+ * kern_table.  We use these as one-element integer vectors. */
+static int zero;
+static int one_hundred = 100;
+
 static ctl_table kern_table[] = {
 	{
 		.ctl_name	= KERN_OSTYPE,
@@ -684,15 +692,31 @@ static ctl_table kern_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE1,
+		.procname	= "credit_c1",
+		.data		= &credit_c1,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= &credit_max,
+	},
+	{
+		.ctl_name	= KERN_SCHED_THROTTLE2,
+		.procname	= "credit_c2",
+		.data		= &credit_c2,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= &credit_max,
+	},
 	{ .ctl_name = 0 }
 };
 
-/* Constants for minimum and maximum testing in vm_table.
-   We use these as one-element integer vectors. */
-static int zero;
-static int one_hundred = 100;
-
-
 static ctl_table vm_table[] = {
 	{
 		.ctl_name	= VM_OVERCOMMIT_MEMORY,
--- linux-2.6.16-mm2/kernel/sched.c-8.slice_accounting	2006-03-24 09:40:33.000000000 +0100
+++ linux-2.6.16-mm2/kernel/sched.c	2006-03-24 09:51:06.000000000 +0100
@@ -220,11 +220,12 @@ static int task_interactive_idle(task_t 
  *           credit that fits in 32 bits jiffies is 42949 seconds.
  */
 
-#define CREDIT_C1 10
-#define CREDIT_C2 14400
+int credit_c1 = 10;
+int credit_c2 = 14400;
+int credit_max = 42949;
 
-#define C1 (CREDIT_C1 * MAX_BONUS * HZ)
-#define C2 (CREDIT_C2 * MAX_BONUS * HZ + C1)
+#define C1 (credit_c1 * MAX_BONUS * HZ)
+#define C2 (credit_c2 * MAX_BONUS * HZ + C1)
 #define C3 (MAX_BONUS * C2)
 
 #define credit_exhausted(p, credit) \



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic
  2006-04-01  8:59         ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Mike Galbraith
  2006-04-01  9:12           ` [patch 2.6.16-mm2 6/9] sched throttle tree extract - move division to slow path Mike Galbraith
@ 2006-04-01 16:53           ` Lee Revell
  2006-04-01 18:00             ` Mike Galbraith
  1 sibling, 1 reply; 16+ messages in thread
From: Lee Revell @ 2006-04-01 16:53 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Andrew Morton, lkml, Peter Williams, Nick Piggin,
	Con Kolivas

On Sat, 2006-04-01 at 10:59 +0200, Mike Galbraith wrote:
> This patch corrects the idle sleep logic to place a long sleeping task
> into the runqueue at a barely interactive priority such that it can
> not destroy interactivity should it immediately begin consuming
> massive cpu. 

Did you test this extensively with bloated apps like Evolution and
Firefox that need to be scheduled as interactive tasks even though they
often peg the CPU?

Lee


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic
  2006-04-01 16:53           ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Lee Revell
@ 2006-04-01 18:00             ` Mike Galbraith
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Galbraith @ 2006-04-01 18:00 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Andrew Morton, lkml, Peter Williams, Nick Piggin,
	Con Kolivas

On Sat, 2006-04-01 at 11:53 -0500, Lee Revell wrote:
> On Sat, 2006-04-01 at 10:59 +0200, Mike Galbraith wrote:
> > This patch corrects the idle sleep logic to place a long sleeping task
> > into the runqueue at a barely interactive priority such that it can
> > not destroy interactivity should it immediately begin consuming
> > massive cpu. 
> 
> Did you test this extensively with bloated apps like Evolution and
> Firefox that need to be scheduled as interactive tasks even though they
> often peg the CPU?

I use both Evolution and Firefox with no problems.  Well, no problems
isn't quite true, Evolution slogs along terribly when IO is going on,
but that isn't a scheduler issue.  My desktop experience has zero
problems with these patches, but YMMV.

These patches are much more about anti-starvation than interactive task
selection.  Tasks which were scheduled as interactive before should
remain so, but that's not to say that they won't ever be throttled.
There is no way to differentiate between nice cpu hog and evil cpu hog,
and these patches do no even attempt to do so.  These patches simply
enforce a set of rules.  There is no doubt in my mind that these rules
will not be favorable to every interactive task situation.

I can give you one right off the top of my head.  A visualization for
your mp3 player is a pure cpu hog right from the start, and will be
throttled.  If you test this, you'll see that despite that, it does a
pretty darn good job with the problem.

	-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 10/9] sched throttle tree extract - kill interactive task feedback loop
  2006-04-01  9:31                 ` [patch 2.6.16-mm2 9/9] sched throttle tree extract - export tunables Mike Galbraith
@ 2006-04-05 17:38                   ` Mike Galbraith
  2006-04-05 23:15                     ` Con Kolivas
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-05 17:38 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: lkml, Peter Williams, Nick Piggin, Con Kolivas

Greetings,

The patch below stops interactive tasks from feeding off each other
during round-robin.

With this 10th patch in place, a busy server with _default_ throttle
settings (ie tunables may now be mostly unneeded) looks like this:

[root]:# w
 18:38:00 up 23 min,  2 users,  load average: 10.07, 9.94, 7.50
USER     TTY        LOGIN@   IDLE   JCPU   PCPU  WHAT
root     tty1      18:15   22:12  30.89s 30.84s  top d1
root     pts/0     18:20    0.00s  0.07s  0.00s  w
[root]:# time netstat|grep :81|wc -l
   1758

real    0m0.304s
user    0m0.144s
sys     0m0.135s
[root]:# time netstat|grep :81|wc -l
   1776

real    0m0.306s
user    0m0.118s
sys     0m0.163s
[root]:# time netstat|grep :81|wc -l
   1799

real    0m0.493s
user    0m0.146s
sys     0m0.141s
[root]:#

My desktop still feels just peachy.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.16-mm2/kernel/sched.c-9.export_tunables	2006-03-31 13:37:09.000000000 +0200
+++ linux-2.6.16-mm2/kernel/sched.c	2006-04-05 19:22:01.000000000 +0200
@@ -3480,7 +3480,7 @@ go_idle:
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
-	if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
+	if (!TASK_INTERACTIVE(next) && interactive_sleep(next->sleep_type)) {
 		unsigned long long delta = now - next->timestamp;
 		if (unlikely((long long)(now - next->timestamp) < 0))
 			delta = 0;



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 10/9] sched throttle tree extract - kill interactive task feedback loop
  2006-04-05 17:38                   ` [patch 2.6.16-mm2 10/9] sched throttle tree extract - kill interactive task feedback loop Mike Galbraith
@ 2006-04-05 23:15                     ` Con Kolivas
  2006-04-06  4:10                       ` Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Con Kolivas @ 2006-04-05 23:15 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Andrew Morton, lkml, Peter Williams, Nick Piggin

On Thursday 06 April 2006 03:38, Mike Galbraith wrote:
> Greetings,
>
> The patch below stops interactive tasks from feeding off each other
> during round-robin.
>
> With this 10th patch in place, a busy server with _default_ throttle
> settings (ie tunables may now be mostly unneeded) looks like this:

> --- linux-2.6.16-mm2/kernel/sched.c-9.export_tunables	2006-03-31
> 13:37:09.000000000 +0200 +++ linux-2.6.16-mm2/kernel/sched.c	2006-04-05
> 19:22:01.000000000 +0200 @@ -3480,7 +3480,7 @@ go_idle:
>  	queue = array->queue + idx;
>  	next = list_entry(queue->next, task_t, run_list);
>
> -	if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
> +	if (!TASK_INTERACTIVE(next) && interactive_sleep(next->sleep_type)) {

You can't remove that rt_task check from there can you? We shouldn't ever 
requeue a rt task.

Cheers,
Con

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 10/9] sched throttle tree extract - kill interactive task feedback loop
  2006-04-05 23:15                     ` Con Kolivas
@ 2006-04-06  4:10                       ` Mike Galbraith
  2006-04-06  4:29                         ` Con Kolivas
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2006-04-06  4:10 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, Andrew Morton, lkml, Peter Williams, Nick Piggin

On Thu, 2006-04-06 at 09:15 +1000, Con Kolivas wrote:
> On Thursday 06 April 2006 03:38, Mike Galbraith wrote:
> > Greetings,
> >
> > The patch below stops interactive tasks from feeding off each other
> > during round-robin.
> >
> > With this 10th patch in place, a busy server with _default_ throttle
> > settings (ie tunables may now be mostly unneeded) looks like this:
> 
> > --- linux-2.6.16-mm2/kernel/sched.c-9.export_tunables	2006-03-31
> > 13:37:09.000000000 +0200 +++ linux-2.6.16-mm2/kernel/sched.c	2006-04-05
> > 19:22:01.000000000 +0200 @@ -3480,7 +3480,7 @@ go_idle:
> >  	queue = array->queue + idx;
> >  	next = list_entry(queue->next, task_t, run_list);
> >
> > -	if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
> > +	if (!TASK_INTERACTIVE(next) && interactive_sleep(next->sleep_type)) {
> 
> You can't remove that rt_task check from there can you? We shouldn't ever 
> requeue a rt task.

RT tasks are always interactive aren't they?  (I'll check)

Anyway, this is definitely a large part of the problem with the ssh to
busy server, but a complete block of interactive tasks isn't quite the
right solution.  With an SMP kernel, my desktop appeared to be fine, but
a quick spin with UP kernel isn't looking quite so good.  This needs
more investigation.

	-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 2.6.16-mm2 10/9] sched throttle tree extract - kill interactive task feedback loop
  2006-04-06  4:10                       ` Mike Galbraith
@ 2006-04-06  4:29                         ` Con Kolivas
  0 siblings, 0 replies; 16+ messages in thread
From: Con Kolivas @ 2006-04-06  4:29 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Andrew Morton, lkml, Peter Williams, Nick Piggin

On Thursday 06 April 2006 14:10, Mike Galbraith wrote:
> On Thu, 2006-04-06 at 09:15 +1000, Con Kolivas wrote:
> > On Thursday 06 April 2006 03:38, Mike Galbraith wrote:
> > > -	if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
> > > +	if (!TASK_INTERACTIVE(next) && interactive_sleep(next->sleep_type)) {
> >
> > You can't remove that rt_task check from there can you? We shouldn't ever
> > requeue a rt task.
>
> RT tasks are always interactive aren't they?  (I'll check)

No, they're always equal to their static_prio. This rt_task check was added 
originally because it was found to inappropriately be requeueing SCHED_FIFO 
tasks.

Cheers,
Con

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-04-06  4:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-01  8:28 [patch 2.6.16-mm2 0/9] sched throttle tree extract Mike Galbraith
2006-04-01  8:33 ` [patch 2.6.16-mm2 1/9] sched throttle tree extract - ignore invalid timestamps Mike Galbraith
2006-04-01  8:38   ` [patch 2.6.16-mm2 2/9] sched throttle tree extract - fix potential task uninterruptible bug Mike Galbraith
2006-04-01  8:44     ` [patch 2.6.16-mm2 3/9] sched throttle tree extract - remove IO priority barrier Mike Galbraith
2006-04-01  8:51       ` [patch 2.6.16-mm2 4/9] sched throttle tree extract - remove kthread barrier Mike Galbraith
2006-04-01  8:59         ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Mike Galbraith
2006-04-01  9:12           ` [patch 2.6.16-mm2 6/9] sched throttle tree extract - move division to slow path Mike Galbraith
2006-04-01  9:23             ` [patch 2.6.16-mm2 7/9] sched throttle tree extract - implement throttling Mike Galbraith
2006-04-01  9:26               ` [patch 2.6.16-mm2 8/9] sched throttle tree extract - maximize timeslice accounting Mike Galbraith
2006-04-01  9:31                 ` [patch 2.6.16-mm2 9/9] sched throttle tree extract - export tunables Mike Galbraith
2006-04-05 17:38                   ` [patch 2.6.16-mm2 10/9] sched throttle tree extract - kill interactive task feedback loop Mike Galbraith
2006-04-05 23:15                     ` Con Kolivas
2006-04-06  4:10                       ` Mike Galbraith
2006-04-06  4:29                         ` Con Kolivas
2006-04-01 16:53           ` [patch 2.6.16-mm2 5/9] sched throttle tree extract - correct idle sleep logic Lee Revell
2006-04-01 18:00             ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).