linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [CFT][PATCH] new scheduler policy
@ 2003-08-19  1:53 Nick Piggin
  2003-08-19  2:35 ` William Lee Irwin III
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Nick Piggin @ 2003-08-19  1:53 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

Hi everyone,

As per the latest trend these days, I've done some tinkering with
the cpu scheduler. I have gone in the opposite direction of most
of the recent stuff and come out with something that can be nearly
as good interactivity wise (for me).

I haven't run many tests on it - my mind blanked when I tried to
remember the scores of scheduler "exploits" thrown around. So if
anyone would like to suggest some, or better still, run some,
please do so. And be nice, this isn't my type of scheduler :P

It still does have a few things that need fixing but I thought
I'd get my first hack a bit of exercise.

Its against 2.6.0-test3-mm1

Best regards,
Nick


[-- Attachment #2: sched-policy --]
[-- Type: text/plain, Size: 22017 bytes --]

diff -Nrup -X dontdiff archs/linux-2.6/fs/proc/array.c linux-2.6/fs/proc/array.c
--- archs/linux-2.6/fs/proc/array.c	2003-08-19 01:45:49.000000000 +1000
+++ linux-2.6/fs/proc/array.c	2003-08-18 16:55:23.000000000 +1000
@@ -162,7 +162,7 @@ static inline char * task_state(struct t
 		"Uid:\t%d\t%d\t%d\t%d\n"
 		"Gid:\t%d\t%d\t%d\t%d\n",
 		get_task_state(p),
-		(p->sleep_avg/1024)*100/(1000000000/1024),
+		p->sleep_avg,
 	       	p->tgid,
 		p->pid, p->pid ? p->real_parent->pid : 0,
 		p->pid && p->ptrace ? p->parent->pid : 0,
diff -Nrup -X dontdiff archs/linux-2.6/include/linux/sched.h linux-2.6/include/linux/sched.h
--- archs/linux-2.6/include/linux/sched.h	2003-08-19 01:45:50.000000000 +1000
+++ linux-2.6/include/linux/sched.h	2003-08-18 22:03:49.000000000 +1000
@@ -342,14 +342,17 @@ struct task_struct {
 	struct list_head run_list;
 	prio_array_t *array;
 
+	unsigned long array_sequence;
+	unsigned long timestamp;
+
+	unsigned long sleep_time;
+	unsigned long total_time;
 	unsigned long sleep_avg;
-	long interactive_credit;
-	unsigned long long timestamp;
-	int activated;
 
 	unsigned long policy;
 	cpumask_t cpus_allowed;
 	unsigned int time_slice, first_time_slice;
+	unsigned int used_slice;
 
 	struct list_head tasks;
 	struct list_head ptrace_children;
diff -Nrup -X dontdiff archs/linux-2.6/kernel/sched.c linux-2.6/kernel/sched.c
--- archs/linux-2.6/kernel/sched.c	2003-08-19 01:45:50.000000000 +1000
+++ linux-2.6/kernel/sched.c	2003-08-19 11:20:19.000000000 +1000
@@ -58,8 +58,6 @@
 #define USER_PRIO(p)		((p)-MAX_RT_PRIO)
 #define TASK_USER_PRIO(p)	USER_PRIO((p)->static_prio)
 #define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
-#define AVG_TIMESLICE	(MIN_TIMESLICE + ((MAX_TIMESLICE - MIN_TIMESLICE) *\
-			(MAX_PRIO-1-NICE_TO_PRIO(0))/(MAX_USER_PRIO - 1)))
 
 /*
  * Some helpers for converting nanosecond timing to jiffy resolution
@@ -67,6 +65,12 @@
 #define NS_TO_JIFFIES(TIME)	(TIME / (1000000000 / HZ))
 #define JIFFIES_TO_NS(TIME)	(TIME * (1000000000 / HZ))
 
+#define US_TO_JIFFIES(TIME)	(TIME / (1000000 / HZ))
+#define JIFFIES_TO_US(TIME)	(TIME * (1000000 / HZ))
+
+#define NS_TO_US(TIME)		(TIME / 1000)
+#define US_TO_NS(TIME)		(TIME * 1000)
+
 /*
  * These are the 'tuning knobs' of the scheduler:
  *
@@ -74,93 +78,32 @@
  * maximum timeslice is 200 msecs. Timeslices get refilled after
  * they expire.
  */
-#define MIN_TIMESLICE		( 10 * HZ / 1000)
-#define MAX_TIMESLICE		(200 * HZ / 1000)
-#define TIMESLICE_GRANULARITY	(HZ/40 ?: 1)
-#define ON_RUNQUEUE_WEIGHT	30
-#define CHILD_PENALTY		95
-#define PARENT_PENALTY		100
-#define EXIT_WEIGHT		3
-#define PRIO_BONUS_RATIO	25
-#define MAX_BONUS		(MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
-#define INTERACTIVE_DELTA	2
-#define MAX_SLEEP_AVG		(AVG_TIMESLICE * MAX_BONUS)
-#define STARVATION_LIMIT	(MAX_SLEEP_AVG)
-#define NS_MAX_SLEEP_AVG	(JIFFIES_TO_NS(MAX_SLEEP_AVG))
-#define NODE_THRESHOLD		125
-
-/*
- * If a task is 'interactive' then we reinsert it in the active
- * array after it has expired its current timeslice. (it will not
- * continue to run immediately, it will still roundrobin with
- * other interactive tasks.)
- *
- * This part scales the interactivity limit depending on niceness.
- *
- * We scale it linearly, offset by the INTERACTIVE_DELTA delta.
- * Here are a few examples of different nice levels:
- *
- *  TASK_INTERACTIVE(-20): [1,1,1,1,1,1,1,1,1,0,0]
- *  TASK_INTERACTIVE(-10): [1,1,1,1,1,1,1,0,0,0,0]
- *  TASK_INTERACTIVE(  0): [1,1,1,1,0,0,0,0,0,0,0]
- *  TASK_INTERACTIVE( 10): [1,1,0,0,0,0,0,0,0,0,0]
- *  TASK_INTERACTIVE( 19): [0,0,0,0,0,0,0,0,0,0,0]
- *
- * (the X axis represents the possible -5 ... 0 ... +5 dynamic
- *  priority range a task can explore, a value of '1' means the
- *  task is rated interactive.)
- *
- * Ie. nice +19 tasks can never get 'interactive' enough to be
- * reinserted into the active array. And only heavily CPU-hog nice -20
- * tasks will be expired. Default nice 0 tasks are somewhere between,
- * it takes some effort for them to get interactive, but it's not
- * too hard.
- */
-
-#define SCALE(v1,v1_max,v2_max) \
-	(v1) * (v2_max) / (v1_max)
-
-#define DELTA(p) \
-	(SCALE(TASK_NICE(p), 40, MAX_USER_PRIO*PRIO_BONUS_RATIO/100) + \
-		INTERACTIVE_DELTA)
-
-#define TASK_INTERACTIVE(p) \
-	((p)->prio <= (p)->static_prio - DELTA(p))
-
-#define JUST_INTERACTIVE_SLEEP(p) \
-	(MAX_SLEEP_AVG - (DELTA(p) * AVG_TIMESLICE))
+#define MIN_TIMESLICE		(10 * HZ / 1000)
+#define MAX_TIMESLICE		(100 * HZ / 1000)
 
-#define HIGH_CREDIT(p) \
-	((p)->interactive_credit > MAX_SLEEP_AVG)
-
-#define LOW_CREDIT(p) \
-	((p)->interactive_credit < -MAX_SLEEP_AVG)
-
-#define VARYING_CREDIT(p) \
-	(!(HIGH_CREDIT(p) || LOW_CREDIT(p)))
+#define NODE_THRESHOLD		125
 
 #define TASK_PREEMPTS_CURR(p, rq) \
-	((p)->prio < (rq)->curr->prio || \
-		((p)->prio == (rq)->curr->prio && \
-			(p)->time_slice > (rq)->curr->time_slice * 2))
+	((p)->prio < (rq)->curr->prio)
 
 /*
- * BASE_TIMESLICE scales user-nice values [ -20 ... 19 ]
- * to time slice values.
- *
  * The higher a thread's priority, the bigger timeslices
  * it gets during one round of execution. But even the lowest
  * priority thread gets MIN_TIMESLICE worth of execution time.
- *
- * task_timeslice() is the interface that is used by the scheduler.
  */
-
-#define BASE_TIMESLICE(p) (MIN_TIMESLICE + \
-	((MAX_TIMESLICE - MIN_TIMESLICE) * (MAX_PRIO-1-(p)->static_prio)/(MAX_USER_PRIO - 1)))
-
 static inline unsigned int task_timeslice(task_t *p)
 {
-	return BASE_TIMESLICE(p);
+	unsigned int timeslice = MIN_TIMESLICE +
+		( (MAX_USER_PRIO - USER_PRIO(p->prio))
+		* (MAX_TIMESLICE - MIN_TIMESLICE) )
+		/ MAX_USER_PRIO;
+
+	if (timeslice < MIN_TIMESLICE)
+		timeslice = MIN_TIMESLICE;
+	if (timeslice > MAX_TIMESLICE)
+		timeslice = MAX_TIMESLICE;
+
+	return timeslice;
 }
 
 /*
@@ -186,7 +129,8 @@ struct prio_array {
  */
 struct runqueue {
 	spinlock_t lock;
-	unsigned long nr_running, nr_switches, expired_timestamp,
+	unsigned long array_sequence;
+	unsigned long nr_running, nr_switches,
 			nr_uninterruptible;
 	task_t *curr, *idle;
 	struct mm_struct *prev_mm;
@@ -326,39 +270,50 @@ static inline void enqueue_task(struct t
 	p->array = array;
 }
 
-/*
- * effective_prio - return the priority that is based on the static
- * priority but is modified by bonuses/penalties.
- *
- * We scale the actual sleep average [0 .... MAX_SLEEP_AVG]
- * into the -5 ... 0 ... +5 bonus/penalty range.
- *
- * We use 25% of the full 0...39 priority range so that:
- *
- * 1) nice +19 interactive tasks do not preempt nice 0 CPU hogs.
- * 2) nice -20 CPU hogs do not get preempted by nice 0 tasks.
- *
- * Both properties are important to certain workloads.
- */
-static int effective_prio(task_t *p)
+static unsigned long task_priority(task_t *p)
 {
 	int bonus, prio;
 
 	if (rt_task(p))
 		return p->prio;
 
-	bonus = MAX_USER_PRIO * PRIO_BONUS_RATIO *
-		NS_TO_JIFFIES(p->sleep_avg) / MAX_SLEEP_AVG / 100;
-	bonus -= MAX_USER_PRIO * PRIO_BONUS_RATIO / 100 / 2;
+	bonus = (MAX_USER_PRIO * p->sleep_avg) / 100 / 2;
+	prio = USER_PRIO(p->static_prio) + (MAX_USER_PRIO / 4);
 
-	prio = p->static_prio - bonus;
+	prio = MAX_RT_PRIO + prio - bonus;
 	if (prio < MAX_RT_PRIO)
 		prio = MAX_RT_PRIO;
 	if (prio > MAX_PRIO-1)
 		prio = MAX_PRIO-1;
+
 	return prio;
 }
 
+static void add_task_time(task_t *p, unsigned long time,
+					int sleep)
+{
+	const unsigned long tmax = HZ;
+	
+	if (time == 0)
+		return;
+	
+	if (time > tmax) {
+		time = tmax;
+		p->total_time = 0;
+		p->sleep_time = 0;
+	} else {
+		p->total_time = ((tmax - time) * p->total_time) / tmax;
+		p->sleep_time = ((tmax - time) * p->sleep_time) / tmax;
+	}
+
+	p->total_time += time;
+	if (sleep)
+		p->sleep_time += time;
+
+	if (p->total_time != 0)
+		p->sleep_avg = (100 * p->sleep_time) / p->total_time;
+}
+
 /*
  * __activate_task - move a task to the runqueue.
  */
@@ -368,90 +323,6 @@ static inline void __activate_task(task_
 	nr_running_inc(rq);
 }
 
-static void recalc_task_prio(task_t *p, unsigned long long now)
-{
-	unsigned long long __sleep_time = now - p->timestamp;
-	unsigned long sleep_time;
-
-	if (!p->sleep_avg && VARYING_CREDIT(p))
-		p->interactive_credit--;
-
-	if (__sleep_time > NS_MAX_SLEEP_AVG)
-		sleep_time = NS_MAX_SLEEP_AVG;
-	else
-		sleep_time = (unsigned long)__sleep_time;
-
-	if (likely(sleep_time > 0)) {
-		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle and will get just interactive status to stay active &
-		 * prevent them suddenly becoming cpu hogs and starving
-		 * other processes.
-		 */
-		if (p->mm && sleep_time >
-			JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p)))
-				p->sleep_avg =
-					JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p));
-		else {
-			/*
-			 * The lower the sleep avg a task has the more
-			 * rapidly it will rise with sleep time.
-			 */
-			sleep_time *= (MAX_BONUS + 1 -
-					(NS_TO_JIFFIES(p->sleep_avg) *
-					MAX_BONUS / MAX_SLEEP_AVG));
-
-			/*
-			 * Tasks with low interactive_credit are limited to
-			 * one timeslice worth of sleep avg bonus.
-			 */
-			if (LOW_CREDIT(p) &&
-				sleep_time > JIFFIES_TO_NS(task_timeslice(p)))
-					sleep_time =
-						JIFFIES_TO_NS(task_timeslice(p));
-
-			/*
-			 * Non high_credit tasks waking from uninterruptible
-			 * sleep are limited in their sleep_avg rise
-			 */
-			if (!HIGH_CREDIT(p) && p->activated == -1){
-				if (p->sleep_avg >=
-					JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p)))
-						sleep_time = 0;
-				else if (p->sleep_avg + sleep_time >=
-					JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p))){
-						p->sleep_avg =
-							JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p));
-						sleep_time = 0;
-					}
-			}
-
-			/*
-			 * This code gives a bonus to interactive tasks.
-			 *
-			 * The boost works by updating the 'average sleep time'
-			 * value here, based on ->timestamp. The more time a task
-			 * spends sleeping, the higher the average gets - and the
-			 * higher the priority boost gets as well.
-			 */
-			p->sleep_avg += sleep_time;
-
-			/*
-			 * 'Overflow' bonus ticks go to the waker as well, so the
-			 * ticks are not lost. This has the effect of further
-			 * boosting tasks that are related to maximum-interactive
-			 * tasks.
-			 */
-			if (p->sleep_avg > NS_MAX_SLEEP_AVG){
-				p->sleep_avg = NS_MAX_SLEEP_AVG;
-				p->interactive_credit += VARYING_CREDIT(p);
-			}
-		}
-	}
-
-	p->prio = effective_prio(p);
-}
-
 /*
  * activate_task - move a task to the runqueue and do priority recalculation
  *
@@ -460,33 +331,33 @@ static void recalc_task_prio(task_t *p, 
  */
 static inline void activate_task(task_t *p, runqueue_t *rq)
 {
-	unsigned long long now = sched_clock();
+	unsigned long now = jiffies;
+	unsigned long s = now - p->timestamp;
+	unsigned long tmax = HZ;
+
+	if (s > tmax)
+		s = tmax;
+
+	if (!in_interrupt() && current->mm) {
+		add_task_time(p, s / 2, 1);
+		add_task_time(current, s / 2, 1);
+	} else
+		add_task_time(p, s, 1);
 
-	recalc_task_prio(p, now);
+	p->prio = task_priority(p);
 
-	/*
-	 * This checks to make sure it's not an uninterruptible task
-	 * that is now waking up.
-	 */
-	if (!p->activated){
-		/*
-		 * Tasks which were woken up by interrupts (ie. hw events)
-		 * are most likely of interactive nature. So we give them
-		 * the credit of extending their sleep time to the period
-		 * of time they spend on the runqueue, waiting for execution
-		 * on a CPU, first time around:
-		 */
-		if (in_interrupt())
-			p->activated = 2;
-		else
-		/*
-		 * Normal first-time wakeups get a credit too for on-runqueue
-		 * time, but it will be weighted down:
-		 */
-			p->activated = 1;
+	if (rq->array_sequence != p->array_sequence) {
+		p->used_slice = 0;
+		p->time_slice = task_timeslice(p);
 	}
 
-	p->timestamp = now;
+	if (!in_interrupt() && current->mm) {
+		unsigned long steal = s / 2;
+		steal = min((unsigned int)s,
+				(p->time_slice - p->used_slice) / 2);
+		p->time_slice -= steal;
+		current->time_slice += steal;
+	}
 
 	__activate_task(p, rq);
 }
@@ -496,6 +367,7 @@ static inline void activate_task(task_t 
  */
 static inline void deactivate_task(struct task_struct *p, runqueue_t *rq)
 {
+	p->array_sequence = rq->array_sequence;
 	nr_running_dec(rq);
 	if (p->state == TASK_UNINTERRUPTIBLE)
 		rq->nr_uninterruptible++;
@@ -539,7 +411,7 @@ static inline void resched_task(task_t *
  * be called with interrupts off, or it may introduce deadlock with
  * smp_call_function() if an IPI is sent by the same process we are
  * waiting to become inactive.
- */
+ n*/
 void wait_task_inactive(task_t * p)
 {
 	unsigned long flags;
@@ -608,18 +480,10 @@ repeat_lock_task:
 				task_rq_unlock(rq, &flags);
 				goto repeat_lock_task;
 			}
-			if (old_state == TASK_UNINTERRUPTIBLE){
+			if (old_state == TASK_UNINTERRUPTIBLE)
 				rq->nr_uninterruptible--;
-				/*
-				 * Tasks on involuntary sleep don't earn
-				 * sleep_avg beyond just interactive state.
-				 */
-				p->activated = -1;
-			}
-			if (sync)
-				__activate_task(p, rq);
-			else {
-				activate_task(p, rq);
+			activate_task(p, rq);
+			if (!sync) {
 				if (TASK_PREEMPTS_CURR(p, rq))
 					resched_task(rq->curr);
 			}
@@ -658,40 +522,26 @@ int wake_up_state(task_t *p, unsigned in
  * This function will do some initial scheduler statistics housekeeping
  * that must be done for every newly created process.
  */
-void wake_up_forked_process(task_t * p)
+void wake_up_forked_process(task_t *p)
 {
-	unsigned long flags, sleep_avg;
+	unsigned long flags;
 	runqueue_t *rq = task_rq_lock(current, &flags);
 
 	p->state = TASK_RUNNING;
-	/*
-	 * We decrease the sleep average of forking parents
-	 * and children as well, to keep max-interactive tasks
-	 * from forking tasks that are max-interactive.
-	 */
-	sleep_avg = NS_TO_JIFFIES(current->sleep_avg) * MAX_BONUS /
-			MAX_SLEEP_AVG * PARENT_PENALTY / 100 *
-			MAX_SLEEP_AVG / MAX_BONUS;
-	current->sleep_avg = JIFFIES_TO_NS(sleep_avg);
-
-	sleep_avg = NS_TO_JIFFIES(p->sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG *
-			CHILD_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS;
-	p->sleep_avg = JIFFIES_TO_NS(sleep_avg);
 
-	p->interactive_credit = 0;
-
-	p->prio = effective_prio(p);
 	set_task_cpu(p, smp_processor_id());
 
-	if (unlikely(!current->array))
-		__activate_task(p, rq);
-	else {
-		p->prio = current->prio;
-		list_add_tail(&p->run_list, &current->run_list);
-		p->array = current->array;
-		p->array->nr_active++;
-		nr_running_inc(rq);
-	}
+	p->sleep_time = current->sleep_time / 3;
+	p->total_time = current->total_time / 3;
+	p->sleep_avg = current->sleep_avg;
+
+	current->sleep_time = (current->sleep_time*2) / 3;
+	if (current->total_time != 0)
+		current->sleep_avg = (100 * current->sleep_time) / current->total_time;
+
+	p->prio = task_priority(p);
+	__activate_task(p, rq);
+
 	task_rq_unlock(rq, &flags);
 }
 
@@ -710,19 +560,11 @@ void sched_exit(task_t * p)
 
 	local_irq_save(flags);
 	if (p->first_time_slice) {
-		p->parent->time_slice += p->time_slice;
+		p->parent->time_slice += p->time_slice - p->used_slice;
 		if (unlikely(p->parent->time_slice > MAX_TIMESLICE))
 			p->parent->time_slice = MAX_TIMESLICE;
 	}
 	local_irq_restore(flags);
-	/*
-	 * If the child was a (relative-) CPU hog then decrease
-	 * the sleep_avg of the parent as well.
-	 */
-	if (p->sleep_avg < p->parent->sleep_avg)
-		p->parent->sleep_avg = p->parent->sleep_avg /
-		(EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg /
-		(EXIT_WEIGHT + 1);
 }
 
 /**
@@ -1129,25 +971,23 @@ static inline void pull_task(runqueue_t 
 }
 
 /*
- * Previously:
- *
- * #define CAN_MIGRATE_TASK(p,rq,this_cpu)	\
- *	((!idle || (NS_TO_JIFFIES(now - (p)->timestamp) > \
- *		cache_decay_ticks)) && !task_running(rq, p) && \
- *			cpu_isset(this_cpu, (p)->cpus_allowed))
+ * comment me
  */
 
 static inline int
 can_migrate_task(task_t *tsk, runqueue_t *rq, int this_cpu, int idle)
 {
-	unsigned long delta = sched_clock() - tsk->timestamp;
+	unsigned long delta;
 
-	if (idle && (delta <= JIFFIES_TO_NS(cache_decay_ticks)))
-		return 0;
 	if (task_running(rq, tsk))
 		return 0;
 	if (!cpu_isset(this_cpu, tsk->cpus_allowed))
 		return 0;
+
+	delta = jiffies - tsk->timestamp;
+	if (idle && (delta <= cache_decay_ticks))
+		return 0;
+
 	return 1;
 }
 
@@ -1319,20 +1159,6 @@ DEFINE_PER_CPU(struct kernel_stat, kstat
 EXPORT_PER_CPU_SYMBOL(kstat);
 
 /*
- * We place interactive tasks back into the active array, if possible.
- *
- * To guarantee that this does not starve expired tasks we ignore the
- * interactivity of a task if the first expired task had to wait more
- * than a 'reasonable' amount of time. This deadline timeout is
- * load-dependent, as the frequency of array switched decreases with
- * increasing number of running tasks:
- */
-#define EXPIRED_STARVING(rq) \
-		(STARVATION_LIMIT && ((rq)->expired_timestamp && \
-		(jiffies - (rq)->expired_timestamp >= \
-			STARVATION_LIMIT * ((rq)->nr_running) + 1)))
-
-/*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  *
@@ -1376,71 +1202,39 @@ void scheduler_tick(int user_ticks, int 
 	 * The task was running during this tick - update the
 	 * time slice counter. Note: we do not update a thread's
 	 * priority until it either goes to sleep or uses up its
-	 * timeslice. This makes it possible for interactive tasks
-	 * to use up their timeslices at their highest priority levels.
+	 * timeslice.
 	 */
 	if (unlikely(rt_task(p))) {
 		/*
 		 * RR tasks need a special form of timeslice management.
 		 * FIFO tasks have no timeslices.
 		 */
-		if ((p->policy == SCHED_RR) && !--p->time_slice) {
-			p->time_slice = task_timeslice(p);
-			p->first_time_slice = 0;
-			set_tsk_need_resched(p);
-
-			/* put it at the end of the queue: */
-			dequeue_task(p, rq->active);
-			enqueue_task(p, rq->active);
+		if (p->policy == SCHED_RR) {
+			p->used_slice++;
+			if (p->used_slice >= p->time_slice) {
+				p->used_slice = 0;
+				p->time_slice = task_timeslice(p);
+				p->first_time_slice = 0;
+				set_tsk_need_resched(p);
+
+				/* put it at the end of the queue: */
+				dequeue_task(p, rq->active);
+				enqueue_task(p, rq->active);
+			}
 		}
 		goto out_unlock;
 	}
-	if (!--p->time_slice) {
+
+	p->used_slice++;
+	if (p->used_slice >= p->time_slice) {
 		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
-		p->prio = effective_prio(p);
+		p->prio = task_priority(p);
 		p->time_slice = task_timeslice(p);
+		p->used_slice = 0;
 		p->first_time_slice = 0;
 
-		if (!rq->expired_timestamp)
-			rq->expired_timestamp = jiffies;
-		if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
-			enqueue_task(p, rq->expired);
-		} else
-			enqueue_task(p, rq->active);
-	} else {
-		/*
-		 * Prevent a too long timeslice allowing a task to monopolize
-		 * the CPU. We do this by splitting up the timeslice into
-		 * smaller pieces.
-		 *
-		 * Note: this does not mean the task's timeslices expire or
-		 * get lost in any way, they just might be preempted by
-		 * another task of equal priority. (one with higher
-		 * priority would have preempted this task already.) We
-		 * requeue this task to the end of the list on this priority
-		 * level, which is in essence a round-robin of tasks with
-		 * equal priority.
-		 *
-		 * This only applies to user tasks in the interactive
-		 * delta range with at least MIN_TIMESLICE left.
-		 */
-		if (p->mm && TASK_INTERACTIVE(p) && !((task_timeslice(p) -
-			p->time_slice) % TIMESLICE_GRANULARITY) &&
-			(p->time_slice > MIN_TIMESLICE) &&
-			(p->array == rq->active)) {
-
-			dequeue_task(p, rq->active);
-			set_tsk_need_resched(p);
-			/*
-			 * Tasks with interactive credit get all their
-			 * time waiting on the run queue credited as sleep
-			 */
-			if (HIGH_CREDIT(p))
-				p->activated = 2;
-			p->prio = effective_prio(p);
-			enqueue_task(p, rq->active);
-		}
+		enqueue_task(p, rq->expired);
 	}
 out_unlock:
 	spin_unlock(&rq->lock);
@@ -1459,7 +1253,7 @@ asmlinkage void schedule(void)
 	runqueue_t *rq;
 	prio_array_t *array;
 	struct list_head *queue;
-	unsigned long long now;
+	unsigned long now;
 	unsigned long run_time;
 	int idx;
 
@@ -1481,21 +1275,9 @@ need_resched:
 	rq = this_rq();
 
 	release_kernel_lock(prev);
-	now = sched_clock();
-	if (likely(now - prev->timestamp < NS_MAX_SLEEP_AVG))
-		run_time = now - prev->timestamp;
-	else
-		run_time = NS_MAX_SLEEP_AVG;
-
-	/*
-	 * Tasks with interactive credits get charged less run_time
-	 * as their sleep_avg decreases to slow them losing their
-	 * priority bonus
-	 */
-	if (HIGH_CREDIT(prev))
-		run_time /= (MAX_BONUS + 1 -
-			(NS_TO_JIFFIES(prev->sleep_avg) * MAX_BONUS /
-			MAX_SLEEP_AVG));
+	now = jiffies;
+	run_time = now - prev->timestamp;
+	add_task_time(prev, run_time, 0);
 
 	spin_lock_irq(&rq->lock);
 
@@ -1525,7 +1307,6 @@ pick_next_task:
 			goto pick_next_task;
 #endif
 		next = rq->idle;
-		rq->expired_timestamp = 0;
 		goto switch_tasks;
 	}
 
@@ -1534,36 +1315,21 @@ pick_next_task:
 		/*
 		 * Switch the active and expired arrays.
 		 */
+		rq->array_sequence++;
 		rq->active = rq->expired;
 		rq->expired = array;
 		array = rq->active;
-		rq->expired_timestamp = 0;
 	}
 
 	idx = sched_find_first_bit(array->bitmap);
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
-	if (next->activated > 0) {
-		unsigned long long delta = now - next->timestamp;
-
-		if (next->activated == 1)
-			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
-
-		array = next->array;
-		dequeue_task(next, array);
-		recalc_task_prio(next, next->timestamp + delta);
-		enqueue_task(next, array);
-	}
-	next->activated = 0;
 switch_tasks:
 	prefetch(next);
 	clear_tsk_need_resched(prev);
 	RCU_qsctr(task_cpu(prev))++;
 
-	prev->sleep_avg -= run_time;
-	if ((long)prev->sleep_avg < 0)
-		prev->sleep_avg = 0;
 	prev->timestamp = now;
 
 	if (likely(prev != next)) {
@@ -2762,6 +2528,7 @@ void __init sched_init(void)
 		prio_array_t *array;
 
 		rq = cpu_rq(i);
+		rq->array_sequence = 0;
 		rq->active = rq->arrays;
 		rq->expired = rq->arrays + 1;
 		spin_lock_init(&rq->lock);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  1:53 [CFT][PATCH] new scheduler policy Nick Piggin
@ 2003-08-19  2:35 ` William Lee Irwin III
  2003-08-19  2:46   ` Nick Piggin
  2003-08-19 10:24 ` Mike Galbraith
  2003-08-22  8:55 ` Roger Luethi
  2 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-08-19  2:35 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

On Tue, Aug 19, 2003 at 11:53:01AM +1000, Nick Piggin wrote:
> As per the latest trend these days, I've done some tinkering with
> the cpu scheduler. I have gone in the opposite direction of most
> of the recent stuff and come out with something that can be nearly
> as good interactivity wise (for me).
> I haven't run many tests on it - my mind blanked when I tried to
> remember the scores of scheduler "exploits" thrown around. So if
> anyone would like to suggest some, or better still, run some,
> please do so. And be nice, this isn't my type of scheduler :P
> It still does have a few things that need fixing but I thought
> I'd get my first hack a bit of exercise.
> Its against 2.6.0-test3-mm1

Say, any chance you could spray out a brief explanation of your new
heuristics?


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  2:35 ` William Lee Irwin III
@ 2003-08-19  2:46   ` Nick Piggin
  2003-08-19  2:59     ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2003-08-19  2:46 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel



William Lee Irwin III wrote:

>On Tue, Aug 19, 2003 at 11:53:01AM +1000, Nick Piggin wrote:
>
>>As per the latest trend these days, I've done some tinkering with
>>the cpu scheduler. I have gone in the opposite direction of most
>>of the recent stuff and come out with something that can be nearly
>>as good interactivity wise (for me).
>>I haven't run many tests on it - my mind blanked when I tried to
>>remember the scores of scheduler "exploits" thrown around. So if
>>anyone would like to suggest some, or better still, run some,
>>please do so. And be nice, this isn't my type of scheduler :P
>>It still does have a few things that need fixing but I thought
>>I'd get my first hack a bit of exercise.
>>Its against 2.6.0-test3-mm1
>>
>
>Say, any chance you could spray out a brief explanation of your new
>heuristics?
>

Oh alright. BTW, this one's not for your big boxes yet! It does funny
things with timeslices. But they will be (pending free time) made much
more dynamic, so it should _hopefully_ context switch even less than
the normal scheduler in a compute intensive load.

OK. timeslices: they are now dynamic. Full priority tasks will get
100ms, minimum priority tasks 10ms (this is what needs fixing, but
should be OK to test "interactiveness")

interactivity estimator is gone: grep -i interactiv sched.c | wc -l
gives 0.

priorities are much the same, although processes are supposed to be
able to change priority much more quickly.

backboost is back. that is what (hopefully) prevents X from starving
due to the quickly changing priorities thing.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  2:46   ` Nick Piggin
@ 2003-08-19  2:59     ` Nick Piggin
  2003-08-19  5:15       ` Matt Mackall
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2003-08-19  2:59 UTC (permalink / raw)
  To: Nick Piggin; +Cc: William Lee Irwin III, linux-kernel



Nick Piggin wrote:

>
>
> William Lee Irwin III wrote:
>
>> On Tue, Aug 19, 2003 at 11:53:01AM +1000, Nick Piggin wrote:
>>
>>> As per the latest trend these days, I've done some tinkering with
>>> the cpu scheduler. I have gone in the opposite direction of most
>>> of the recent stuff and come out with something that can be nearly
>>> as good interactivity wise (for me).
>>> I haven't run many tests on it - my mind blanked when I tried to
>>> remember the scores of scheduler "exploits" thrown around. So if
>>> anyone would like to suggest some, or better still, run some,
>>> please do so. And be nice, this isn't my type of scheduler :P
>>> It still does have a few things that need fixing but I thought
>>> I'd get my first hack a bit of exercise.
>>> Its against 2.6.0-test3-mm1
>>>
>>
>> Say, any chance you could spray out a brief explanation of your new
>> heuristics?
>>
>
> Oh alright. BTW, this one's not for your big boxes yet! It does funny
> things with timeslices. But they will be (pending free time) made much
> more dynamic, so it should _hopefully_ context switch even less than
> the normal scheduler in a compute intensive load.
>
> OK. timeslices: they are now dynamic. Full priority tasks will get
> 100ms, minimum priority tasks 10ms (this is what needs fixing, but
> should be OK to test "interactiveness")
>
> interactivity estimator is gone: grep -i interactiv sched.c | wc -l
> gives 0.
>
> priorities are much the same, although processes are supposed to be
> able to change priority much more quickly.
>
> backboost is back. that is what (hopefully) prevents X from starving
> due to the quickly changing priorities thing.

  And lack of interactivity estimator.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  2:59     ` Nick Piggin
@ 2003-08-19  5:15       ` Matt Mackall
  2003-08-19  5:34         ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Matt Mackall @ 2003-08-19  5:15 UTC (permalink / raw)
  To: Nick Piggin; +Cc: William Lee Irwin III, linux-kernel

On Tue, Aug 19, 2003 at 12:59:57PM +1000, Nick Piggin wrote:
> 
> 
> Nick Piggin wrote:
> 
> >
> >
> >William Lee Irwin III wrote:
> >
> >>On Tue, Aug 19, 2003 at 11:53:01AM +1000, Nick Piggin wrote:
> >>
> >>>As per the latest trend these days, I've done some tinkering with
> >>>the cpu scheduler. I have gone in the opposite direction of most
> >>>of the recent stuff and come out with something that can be nearly
> >>>as good interactivity wise (for me).
> >>>I haven't run many tests on it - my mind blanked when I tried to
> >>>remember the scores of scheduler "exploits" thrown around. So if
> >>>anyone would like to suggest some, or better still, run some,
> >>>please do so. And be nice, this isn't my type of scheduler :P
> >>>It still does have a few things that need fixing but I thought
> >>>I'd get my first hack a bit of exercise.
> >>>Its against 2.6.0-test3-mm1
> >>>
> >>
> >>Say, any chance you could spray out a brief explanation of your new
> >>heuristics?
> >>
> >
> >Oh alright. BTW, this one's not for your big boxes yet! It does funny
> >things with timeslices. But they will be (pending free time) made much
> >more dynamic, so it should _hopefully_ context switch even less than
> >the normal scheduler in a compute intensive load.
> >
> >OK. timeslices: they are now dynamic. Full priority tasks will get
> >100ms, minimum priority tasks 10ms (this is what needs fixing, but
> >should be OK to test "interactiveness")
> >
> >interactivity estimator is gone: grep -i interactiv sched.c | wc -l
> >gives 0.
> >
> >priorities are much the same, although processes are supposed to be
> >able to change priority much more quickly.
> >
> >backboost is back. that is what (hopefully) prevents X from starving
> >due to the quickly changing priorities thing.
> 
>  And lack of interactivity estimator.

You forgot to mention fork() splitting its timeslice 2/3 to 1/3 parent
to child.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  5:15       ` Matt Mackall
@ 2003-08-19  5:34         ` Nick Piggin
  2003-08-19  5:45           ` Matt Mackall
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2003-08-19  5:34 UTC (permalink / raw)
  To: Matt Mackall; +Cc: William Lee Irwin III, linux-kernel

Matt Mackall wrote:

>On Tue, Aug 19, 2003 at 12:59:57PM +1000, Nick Piggin wrote:
>
>>
>>Nick Piggin wrote:
>>
>>
>>>
>>>William Lee Irwin III wrote:
>>>
>>>
>>>>On Tue, Aug 19, 2003 at 11:53:01AM +1000, Nick Piggin wrote:
>>>>
>>>>
>>>>>As per the latest trend these days, I've done some tinkering with
>>>>>the cpu scheduler. I have gone in the opposite direction of most
>>>>>of the recent stuff and come out with something that can be nearly
>>>>>as good interactivity wise (for me).
>>>>>I haven't run many tests on it - my mind blanked when I tried to
>>>>>remember the scores of scheduler "exploits" thrown around. So if
>>>>>anyone would like to suggest some, or better still, run some,
>>>>>please do so. And be nice, this isn't my type of scheduler :P
>>>>>It still does have a few things that need fixing but I thought
>>>>>I'd get my first hack a bit of exercise.
>>>>>Its against 2.6.0-test3-mm1
>>>>>
>>>>>
>>>>Say, any chance you could spray out a brief explanation of your new
>>>>heuristics?
>>>>
>>>>
>>>Oh alright. BTW, this one's not for your big boxes yet! It does funny
>>>things with timeslices. But they will be (pending free time) made much
>>>more dynamic, so it should _hopefully_ context switch even less than
>>>the normal scheduler in a compute intensive load.
>>>
>>>OK. timeslices: they are now dynamic. Full priority tasks will get
>>>100ms, minimum priority tasks 10ms (this is what needs fixing, but
>>>should be OK to test "interactiveness")
>>>
>>>interactivity estimator is gone: grep -i interactiv sched.c | wc -l
>>>gives 0.
>>>
>>>priorities are much the same, although processes are supposed to be
>>>able to change priority much more quickly.
>>>
>>>backboost is back. that is what (hopefully) prevents X from starving
>>>due to the quickly changing priorities thing.
>>>
>> And lack of interactivity estimator.
>>
>
>You forgot to mention fork() splitting its timeslice 2/3 to 1/3 parent
>to child.
>
>

Hmm... did I do that? I don't actually have the code in front of me, but I
think the timeslice split is still 50/50 (see fork.c). Its the priority
points that go 2/3 to 1/3. Actually its a bit more complex than that even
and probably not exactly right...



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  5:34         ` Nick Piggin
@ 2003-08-19  5:45           ` Matt Mackall
  0 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2003-08-19  5:45 UTC (permalink / raw)
  To: Nick Piggin; +Cc: William Lee Irwin III, linux-kernel

On Tue, Aug 19, 2003 at 03:34:06PM +1000, Nick Piggin wrote:
> Matt Mackall wrote:
> 
> >
> >You forgot to mention fork() splitting its timeslice 2/3 to 1/3 parent
> >to child.
> >
> >
> 
> Hmm... did I do that? I don't actually have the code in front of me, but I
> think the timeslice split is still 50/50 (see fork.c). Its the priority
> points that go 2/3 to 1/3. Actually its a bit more complex than that even
> and probably not exactly right...

Actually, it's as you say. The terms sleeptime and timeslice just
confused me.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  1:53 [CFT][PATCH] new scheduler policy Nick Piggin
  2003-08-19  2:35 ` William Lee Irwin III
@ 2003-08-19 10:24 ` Mike Galbraith
  2003-08-19 13:40   ` Nick Piggin
  2003-08-20  2:13   ` William Lee Irwin III
  2003-08-22  8:55 ` Roger Luethi
  2 siblings, 2 replies; 23+ messages in thread
From: Mike Galbraith @ 2003-08-19 10:24 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

At 11:53 AM 8/19/2003 +1000, Nick Piggin wrote:
>Hi everyone,
>
>As per the latest trend these days, I've done some tinkering with
>the cpu scheduler. I have gone in the opposite direction of most
>of the recent stuff and come out with something that can be nearly
>as good interactivity wise (for me).
>
>I haven't run many tests on it - my mind blanked when I tried to
>remember the scores of scheduler "exploits" thrown around. So if
>anyone would like to suggest some, or better still, run some,
>please do so. And be nice, this isn't my type of scheduler :P

Ok, I took it out for a quick spin...

Test-starve.c starvation is back (curable via other means), but irman2 is 
utterly harmless.  Responsiveness under load is very nice until I get to 
the "very hefty" end of the spectrum (expected).  Throughput is down a bit 
at make -j30, and there are many cc1's running at very high priority once 
swap becomes moderately busy.  OTOH, concurrency for the make -jN in 
general appears to be up a bit.  X is pretty choppy when moving windows 
around, but that _appears_ to be the newer/tamer backboost bleeding a 
kdeinit thread a bit too dry.  (I think it'll be easy to correct, will let 
you know if what I have in mind to test that theory works out).  Ending on 
a decidedly positive note, I can no longer reproduce priority inversion 
troubles with xmms's gl thread, nor with blender.

(/me wonders what the reports from wine/game folks will be like)

         -Mike




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19 10:24 ` Mike Galbraith
@ 2003-08-19 13:40   ` Nick Piggin
  2003-08-19 18:49     ` Mike Galbraith
  2003-08-20  2:13   ` William Lee Irwin III
  1 sibling, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2003-08-19 13:40 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel



Mike Galbraith wrote:

> At 11:53 AM 8/19/2003 +1000, Nick Piggin wrote:
>
>> Hi everyone,
>>
>> As per the latest trend these days, I've done some tinkering with
>> the cpu scheduler. I have gone in the opposite direction of most
>> of the recent stuff and come out with something that can be nearly
>> as good interactivity wise (for me).
>>
>> I haven't run many tests on it - my mind blanked when I tried to
>> remember the scores of scheduler "exploits" thrown around. So if
>> anyone would like to suggest some, or better still, run some,
>> please do so. And be nice, this isn't my type of scheduler :P
>
>
> Ok, I took it out for a quick spin...


Thanks again.

>
> Test-starve.c starvation is back (curable via other means), but irman2 
> is utterly harmless.  Responsiveness under load is very nice until I 
> get to the "very hefty" end of the spectrum (expected).  Throughput is 
> down a bit at make -j30, and there are many cc1's running at very high 
> priority once swap becomes moderately busy.  OTOH, concurrency for the 
> make -jN in general appears to be up a bit.  X is pretty choppy when 
> moving windows around, but that _appears_ to be the newer/tamer 
> backboost bleeding a kdeinit thread a bit too dry.  (I think it'll be 
> easy to correct, will let you know if what I have in mind to test that 
> theory works out).  Ending on a decidedly positive note, I can no 
> longer reproduce priority inversion troubles with xmms's gl thread, 
> nor with blender.


Well, it sounds like a good start, though I'll have to get up to scratch
on the array of scheduler badness programs!

I expect throughput to be down in this release due to the timeslice thing.
This should be fixable.

I think either there is a bug in my accounting somewhere or I have not quite
thought it though properly because priorities don't seem to get distributed
well. Also its not using the nanosecond timing stuff (yet). This might help
a bit.

Nick


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19 13:40   ` Nick Piggin
@ 2003-08-19 18:49     ` Mike Galbraith
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Galbraith @ 2003-08-19 18:49 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

At 11:40 PM 8/19/2003 +1000, Nick Piggin wrote:


>Mike Galbraith wrote:
>
>>At 11:53 AM 8/19/2003 +1000, Nick Piggin wrote:
>>
>>>Hi everyone,
>>>
>>>As per the latest trend these days, I've done some tinkering with
>>>the cpu scheduler. I have gone in the opposite direction of most
>>>of the recent stuff and come out with something that can be nearly
>>>as good interactivity wise (for me).
>>>
>>>I haven't run many tests on it - my mind blanked when I tried to
>>>remember the scores of scheduler "exploits" thrown around. So if
>>>anyone would like to suggest some, or better still, run some,
>>>please do so. And be nice, this isn't my type of scheduler :P
>>
>>
>>Ok, I took it out for a quick spin...
>
>
>Thanks again.
>
>>
>>Test-starve.c starvation is back (curable via other means), but irman2 is 
>>utterly harmless.  Responsiveness under load is very nice until I get to 
>>the "very hefty" end of the spectrum (expected).  Throughput is down a 
>>bit at make -j30, and there are many cc1's running at very high priority 
>>once swap becomes moderately busy.  OTOH, concurrency for the make -jN in 
>>general appears to be up a bit.  X is pretty choppy when moving windows 
>>around, but that _appears_ to be the newer/tamer backboost bleeding a 
>>kdeinit thread a bit too dry.  (I think it'll be easy to correct, will 
>>let you know if what I have in mind to test that theory works 
>>out).  Ending on a decidedly positive note, I can no longer reproduce 
>>priority inversion troubles with xmms's gl thread, nor with blender.
>
>
>Well, it sounds like a good start, though I'll have to get up to scratch
>on the array of scheduler badness programs!

(looks like a fine start to me.  my box [and subjective driver] give it a 
one thumb up plus change;)

>I expect throughput to be down in this release due to the timeslice thing.
>This should be fixable.
>
>I think either there is a bug in my accounting somewhere or I have not quite
>thought it though properly because priorities don't seem to get distributed
>well. Also its not using the nanosecond timing stuff (yet). This might help
>a bit.

Hmm.  I watched priority distribution (eyeballs, not instrumentation), and 
it looked "right" to me until the load reached "fairly hefty"... at the 
point where swap really became a factor, distribution flattened, and the 
mean priority rose (high).  I did see some odd-ball high priority cc1's at 
the low to moderate end (historically indicator of trouble here), but not 
much.  At what I call moderate load, it behaved well, and looked/felt good.

         -Mike 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19 10:24 ` Mike Galbraith
  2003-08-19 13:40   ` Nick Piggin
@ 2003-08-20  2:13   ` William Lee Irwin III
  2003-08-25 13:47     ` Haoqiang Zheng
  1 sibling, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-08-20  2:13 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: hzheng, Nick Piggin, linux-kernel

On Tue, Aug 19, 2003 at 12:24:17PM +0200, Mike Galbraith wrote:
> Test-starve.c starvation is back (curable via other means), but irman2 is 
> utterly harmless.  Responsiveness under load is very nice until I get to 
> the "very hefty" end of the spectrum (expected).  Throughput is down a bit 
> at make -j30, and there are many cc1's running at very high priority once 
> swap becomes moderately busy.  OTOH, concurrency for the make -jN in 
> general appears to be up a bit.  X is pretty choppy when moving windows 
> around, but that _appears_ to be the newer/tamer backboost bleeding a 
> kdeinit thread a bit too dry.  (I think it'll be easy to correct, will let 
> you know if what I have in mind to test that theory works out).  Ending on 
> a decidedly positive note, I can no longer reproduce priority inversion 
> troubles with xmms's gl thread, nor with blender.
> (/me wonders what the reports from wine/game folks will be like)

Someone else appears to have done some work on the X priority inversion
issue who I'd like to drag into this discussion, though there doesn't
really appear to be an opportune time.

Haoqiang, any chance you could describe your solutions to the X priority
inversion issue?


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-19  1:53 [CFT][PATCH] new scheduler policy Nick Piggin
  2003-08-19  2:35 ` William Lee Irwin III
  2003-08-19 10:24 ` Mike Galbraith
@ 2003-08-22  8:55 ` Roger Luethi
  2003-08-22 13:08   ` Nick Piggin
  2 siblings, 1 reply; 23+ messages in thread
From: Roger Luethi @ 2003-08-22  8:55 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

On Tue, 19 Aug 2003 11:53:01 +1000, Nick Piggin wrote:
> I haven't run many tests on it - my mind blanked when I tried to
> remember the scores of scheduler "exploits" thrown around. So if
> anyone would like to suggest some, or better still, run some,
> please do so. And be nice, this isn't my type of scheduler :P

I timed a pathological benchmark from hell I've been playing with lately.
Three consecutive runs following a fresh boot. Time is in seconds:

2.4.21			821	21	25
2.6.0-test3-mm1		724	946	896
2.6.0-test3-mm1-nick	905	987	997

Runtime with ideal scheduling: < 2 seconds (we're thrashing).

If anybody has thrashing test cases closer to the real world, I'd be very
interested to learn about them.

Roger

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-22  8:55 ` Roger Luethi
@ 2003-08-22 13:08   ` Nick Piggin
  2003-08-22 15:11     ` Roger Luethi
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2003-08-22 13:08 UTC (permalink / raw)
  To: Roger Luethi; +Cc: linux-kernel



Roger Luethi wrote:

>On Tue, 19 Aug 2003 11:53:01 +1000, Nick Piggin wrote:
>  
>
>>I haven't run many tests on it - my mind blanked when I tried to
>>remember the scores of scheduler "exploits" thrown around. So if
>>anyone would like to suggest some, or better still, run some,
>>please do so. And be nice, this isn't my type of scheduler :P
>>    
>>
>
>I timed a pathological benchmark from hell I've been playing with lately.
>Three consecutive runs following a fresh boot. Time is in seconds:
>
>2.4.21			821	21	25
>2.6.0-test3-mm1		724	946	896
>2.6.0-test3-mm1-nick	905	987	997
>
>Runtime with ideal scheduling: < 2 seconds (we're thrashing).
>  
>

Cool. Can you post the benchmark source please?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-22 13:08   ` Nick Piggin
@ 2003-08-22 15:11     ` Roger Luethi
  2003-08-23  0:22       ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Roger Luethi @ 2003-08-22 15:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

On Fri, 22 Aug 2003 23:08:40 +1000, Nick Piggin wrote:
> >I timed a pathological benchmark from hell I've been playing with lately.
> >Three consecutive runs following a fresh boot. Time is in seconds:
> >
> >2.4.21			821	21	25
> >2.6.0-test3-mm1		724	946	896
> >2.6.0-test3-mm1-nick	905	987	997
> >
> >Runtime with ideal scheduling: < 2 seconds (we're thrashing).
> >
> 
> Cool. Can you post the benchmark source please?

http://hellgate.ch/code/ploc/thrash.c

A parallel kernel build can generate some decent thrashing, too, but I
wanted a short and simple test case that conveniently provides the
information I need for both logging daemon and post processing tool.

Note: The benchmark could trivially be made more evil which would prevent
2.4.21 from finishing over 30 times faster (as it often does). I
intentionally left it they way it is.

While everybody seems to be working on interactivity, I am currently
looking at this corner case. This should be pretty much orthogonal to your
own work.

Roger

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-22 15:11     ` Roger Luethi
@ 2003-08-23  0:22       ` Nick Piggin
  0 siblings, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2003-08-23  0:22 UTC (permalink / raw)
  To: Roger Luethi; +Cc: linux-kernel



Roger Luethi wrote:

>On Fri, 22 Aug 2003 23:08:40 +1000, Nick Piggin wrote:
>
>>>I timed a pathological benchmark from hell I've been playing with lately.
>>>Three consecutive runs following a fresh boot. Time is in seconds:
>>>
>>>2.4.21			821	21	25
>>>2.6.0-test3-mm1		724	946	896
>>>2.6.0-test3-mm1-nick	905	987	997
>>>
>>>Runtime with ideal scheduling: < 2 seconds (we're thrashing).
>>>
>>>
>>Cool. Can you post the benchmark source please?
>>
>
>http://hellgate.ch/code/ploc/thrash.c
>
>A parallel kernel build can generate some decent thrashing, too, but I
>wanted a short and simple test case that conveniently provides the
>information I need for both logging daemon and post processing tool.
>
>Note: The benchmark could trivially be made more evil which would prevent
>2.4.21 from finishing over 30 times faster (as it often does). I
>intentionally left it they way it is.
>
>While everybody seems to be working on interactivity, I am currently
>looking at this corner case. This should be pretty much orthogonal to your
>own work.
>

Yes, improvements for this problem are usually in the form of a
secondary scheduler of sorts somewhere in the VM. Hard problem.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-20  2:13   ` William Lee Irwin III
@ 2003-08-25 13:47     ` Haoqiang Zheng
  2003-08-25 14:03       ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Haoqiang Zheng @ 2003-08-25 13:47 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Mike Galbraith, Nick Piggin, linux-kernel

William Lee Irwin III wrote:

>On Tue, Aug 19, 2003 at 12:24:17PM +0200, Mike Galbraith wrote:
>  
>
>>Test-starve.c starvation is back (curable via other means), but irman2 is 
>>utterly harmless.  Responsiveness under load is very nice until I get to 
>>the "very hefty" end of the spectrum (expected).  Throughput is down a bit 
>>at make -j30, and there are many cc1's running at very high priority once 
>>swap becomes moderately busy.  OTOH, concurrency for the make -jN in 
>>general appears to be up a bit.  X is pretty choppy when moving windows 
>>around, but that _appears_ to be the newer/tamer backboost bleeding a 
>>kdeinit thread a bit too dry.  (I think it'll be easy to correct, will let 
>>you know if what I have in mind to test that theory works out).  Ending on 
>>a decidedly positive note, I can no longer reproduce priority inversion 
>>troubles with xmms's gl thread, nor with blender.
>>(/me wonders what the reports from wine/game folks will be like)
>>    
>>
>
>Someone else appears to have done some work on the X priority inversion
>issue who I'd like to drag into this discussion, though there doesn't
>really appear to be an opportune time.
>
>Haoqiang, any chance you could describe your solutions to the X priority
>inversion issue?
>
>
>-- wli
>  
>
I didn't follow the whole discussion. But from what wli has described to 
me, the problem (xmms skips frames) is pretty like a X scheduler problem.

X server works like this:
 "The X server uses select(2) to detect clients with pending input. Once 
the set of clients with pending input is determined, the X server starts 
executing requests from the client with the smallest file descriptor. 
Each client has a buffer which is used to read some data from the 
network connection, that buffer can be resized to hold unusually large 
requests, but is typically 4KB. Requests are executed from each client 
until either the buffer is exhausted of complete requests or after ten 
requests. After requests are read from all of the ready clients, the 
server determines whether any clients still have complete requests in 
their buffers. If so, the server foregoes the select(2) call and goes 
back to processing requests for those clients. When all client input 
buffers are exhausted of complete requests, the X server returns to 
select(2) to await additional data. "
--- Keith Packard, "Efficiently Scheduling {X} Clients",  FREENIX-00,

Basically, the X server does a round robin for all the clients with 
pending input.  It's not surprising that xmms skip frames when there are 
a lot of "heavy" x requests pending.  I am not sure if this the cause of 
the problem that you guys are talking about.  But anyway, if this the 
cause, here is my 2 cents:

I think the scheduler of X server has to be "smarter". It has to know 
which X client is more "important" and give the important client a high 
priority, otherwise the  priority inversion problem will be 
un-avoidable.  Suppose the system can provide something like 
"get_most_important_client()" , the X server can be fixed this way:
The X server calls get_most_important_client() before it starts to 
handle an X request. If the return is not NULL, it handles the request 
from this "important" client.  This way, an "important" x client only 
need to wait a maximun of a single X request (instead of unlimited 
number of X requests) to get served.

The problem now is how can we decide which X client is the most 
important?  Well, I guess there are a lot of solutions. I have a kernel 
based solution to this question.    The basic idea is: keep the 
processes blocked by X server in the runqueue. If a certain process (P) 
of this kind is scheduled, the kernel switch to the X server instead. If 
the X server get scheduled in this way, it can handle the X requests 
from this very process (P). If you have interest, you can take a look 
at  http://www.ncl.cs.columbia.edu/publications/cucs-005-03.pdf .

Let me know your comments...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-25 13:47     ` Haoqiang Zheng
@ 2003-08-25 14:03       ` Nick Piggin
  2003-08-25 15:11         ` Haoqiang Zheng
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2003-08-25 14:03 UTC (permalink / raw)
  To: Haoqiang Zheng; +Cc: William Lee Irwin III, Mike Galbraith, linux-kernel



Haoqiang Zheng wrote:

> William Lee Irwin III wrote:
>
>> On Tue, Aug 19, 2003 at 12:24:17PM +0200, Mike Galbraith wrote:
>>  
>>
>>> Test-starve.c starvation is back (curable via other means), but 
>>> irman2 is utterly harmless.  Responsiveness under load is very nice 
>>> until I get to the "very hefty" end of the spectrum (expected).  
>>> Throughput is down a bit at make -j30, and there are many cc1's 
>>> running at very high priority once swap becomes moderately busy.  
>>> OTOH, concurrency for the make -jN in general appears to be up a 
>>> bit.  X is pretty choppy when moving windows around, but that 
>>> _appears_ to be the newer/tamer backboost bleeding a kdeinit thread 
>>> a bit too dry.  (I think it'll be easy to correct, will let you know 
>>> if what I have in mind to test that theory works out).  Ending on a 
>>> decidedly positive note, I can no longer reproduce priority 
>>> inversion troubles with xmms's gl thread, nor with blender.
>>> (/me wonders what the reports from wine/game folks will be like)
>>>   
>>
>>
>> Someone else appears to have done some work on the X priority inversion
>> issue who I'd like to drag into this discussion, though there doesn't
>> really appear to be an opportune time.
>>
>> Haoqiang, any chance you could describe your solutions to the X priority
>> inversion issue?
>>
>>
>> -- wli
>>  
>>
> I didn't follow the whole discussion. But from what wli has described 
> to me, the problem (xmms skips frames) is pretty like a X scheduler 
> problem.
>
> X server works like this:
> "The X server uses select(2) to detect clients with pending input. 
> Once the set of clients with pending input is determined, the X server 
> starts executing requests from the client with the smallest file 
> descriptor. Each client has a buffer which is used to read some data 
> from the network connection, that buffer can be resized to hold 
> unusually large requests, but is typically 4KB. Requests are executed 
> from each client until either the buffer is exhausted of complete 
> requests or after ten requests. After requests are read from all of 
> the ready clients, the server determines whether any clients still 
> have complete requests in their buffers. If so, the server foregoes 
> the select(2) call and goes back to processing requests for those 
> clients. When all client input buffers are exhausted of complete 
> requests, the X server returns to select(2) to await additional data. "
> --- Keith Packard, "Efficiently Scheduling {X} Clients",  FREENIX-00,
>
> Basically, the X server does a round robin for all the clients with 
> pending input.  It's not surprising that xmms skip frames when there 
> are a lot of "heavy" x requests pending.  I am not sure if this the 
> cause of the problem that you guys are talking about.  But anyway, if 
> this the cause, here is my 2 cents:
>
> I think the scheduler of X server has to be "smarter". It has to know 
> which X client is more "important" and give the important client a 
> high priority, otherwise the  priority inversion problem will be 
> un-avoidable.  Suppose the system can provide something like 
> "get_most_important_client()" , the X server can be fixed this way:
> The X server calls get_most_important_client() before it starts to 
> handle an X request. If the return is not NULL, it handles the request 
> from this "important" client.  This way, an "important" x client only 
> need to wait a maximun of a single X request (instead of unlimited 
> number of X requests) to get served.
>
> The problem now is how can we decide which X client is the most 
> important?  Well, I guess there are a lot of solutions. I have a 
> kernel based solution to this question.    The basic idea is: keep the 
> processes blocked by X server in the runqueue. If a certain process 
> (P) of this kind is scheduled, the kernel switch to the X server 
> instead. If the X server get scheduled in this way, it can handle the 
> X requests from this very process (P). If you have interest, you can 
> take a look at  
> http://www.ncl.cs.columbia.edu/publications/cucs-005-03.pdf .
>
> Let me know your comments...



Very interesting. I think X could be smarter about scheduling maybe
quite easily by maintaining a bit more state, say a simple dynamic
priority thing, just to keep heavy users from flooding out the
occasional users. But AFAIK, the X club is pretty exclusive, and you
would need an inside contact to get anything done.

There are still regressions in the CPU scheduler though.

Your last point didn't make sense to me. A client still needs to get CPU
time. I guess I should look at the paper. Having to teach the scheduler
about X doesn't sit well with me. I think the best you could hope for there
_might_ be a config option _if_ you could show some significant
improvements not attainable by modifying either X or the kernel in a more
generic manner.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-25 14:03       ` Nick Piggin
@ 2003-08-25 15:11         ` Haoqiang Zheng
  2003-09-02 14:25           ` Pavel Machek
  0 siblings, 1 reply; 23+ messages in thread
From: Haoqiang Zheng @ 2003-08-25 15:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: William Lee Irwin III, Mike Galbraith, linux-kernel

Nick Piggin wrote:

>
>
> Haoqiang Zheng wrote:
>
>> William Lee Irwin III wrote:
>>
>>> On Tue, Aug 19, 2003 at 12:24:17PM +0200, Mike Galbraith wrote:
>>>  
>>>
>>>> Test-starve.c starvation is back (curable via other means), but 
>>>> irman2 is utterly harmless.  Responsiveness under load is very nice 
>>>> until I get to the "very hefty" end of the spectrum (expected).  
>>>> Throughput is down a bit at make -j30, and there are many cc1's 
>>>> running at very high priority once swap becomes moderately busy.  
>>>> OTOH, concurrency for the make -jN in general appears to be up a 
>>>> bit.  X is pretty choppy when moving windows around, but that 
>>>> _appears_ to be the newer/tamer backboost bleeding a kdeinit thread 
>>>> a bit too dry.  (I think it'll be easy to correct, will let you 
>>>> know if what I have in mind to test that theory works out).  Ending 
>>>> on a decidedly positive note, I can no longer reproduce priority 
>>>> inversion troubles with xmms's gl thread, nor with blender.
>>>> (/me wonders what the reports from wine/game folks will be like)
>>>>   
>>>
>>>
>>>
>>> Someone else appears to have done some work on the X priority inversion
>>> issue who I'd like to drag into this discussion, though there doesn't
>>> really appear to be an opportune time.
>>>
>>> Haoqiang, any chance you could describe your solutions to the X 
>>> priority
>>> inversion issue?
>>>
>>>
>>> -- wli
>>>  
>>>
>> I didn't follow the whole discussion. But from what wli has described 
>> to me, the problem (xmms skips frames) is pretty like a X scheduler 
>> problem.
>>
>> X server works like this:
>> "The X server uses select(2) to detect clients with pending input. 
>> Once the set of clients with pending input is determined, the X 
>> server starts executing requests from the client with the smallest 
>> file descriptor. Each client has a buffer which is used to read some 
>> data from the network connection, that buffer can be resized to hold 
>> unusually large requests, but is typically 4KB. Requests are executed 
>> from each client until either the buffer is exhausted of complete 
>> requests or after ten requests. After requests are read from all of 
>> the ready clients, the server determines whether any clients still 
>> have complete requests in their buffers. If so, the server foregoes 
>> the select(2) call and goes back to processing requests for those 
>> clients. When all client input buffers are exhausted of complete 
>> requests, the X server returns to select(2) to await additional data. "
>> --- Keith Packard, "Efficiently Scheduling {X} Clients",  FREENIX-00,
>>
>> Basically, the X server does a round robin for all the clients with 
>> pending input.  It's not surprising that xmms skip frames when there 
>> are a lot of "heavy" x requests pending.  I am not sure if this the 
>> cause of the problem that you guys are talking about.  But anyway, if 
>> this the cause, here is my 2 cents:
>>
>> I think the scheduler of X server has to be "smarter". It has to know 
>> which X client is more "important" and give the important client a 
>> high priority, otherwise the  priority inversion problem will be 
>> un-avoidable.  Suppose the system can provide something like 
>> "get_most_important_client()" , the X server can be fixed this way:
>> The X server calls get_most_important_client() before it starts to 
>> handle an X request. If the return is not NULL, it handles the 
>> request from this "important" client.  This way, an "important" x 
>> client only need to wait a maximun of a single X request (instead of 
>> unlimited number of X requests) to get served.
>>
>> The problem now is how can we decide which X client is the most 
>> important?  Well, I guess there are a lot of solutions. I have a 
>> kernel based solution to this question.    The basic idea is: keep 
>> the processes blocked by X server in the runqueue. If a certain 
>> process (P) of this kind is scheduled, the kernel switch to the X 
>> server instead. If the X server get scheduled in this way, it can 
>> handle the X requests from this very process (P). If you have 
>> interest, you can take a look at  
>> http://www.ncl.cs.columbia.edu/publications/cucs-005-03.pdf .
>>
>> Let me know your comments...
>
>
>
>
> Very interesting. I think X could be smarter about scheduling maybe
> quite easily by maintaining a bit more state, say a simple dynamic
> priority thing, just to keep heavy users from flooding out the
> occasional users. But AFAIK, the X club is pretty exclusive, and you
> would need an inside contact to get anything done.
>
> There are still regressions in the CPU scheduler though.
>
> Your last point didn't make sense to me. A client still needs to get CPU
> time. I guess I should look at the paper. Having to teach the scheduler
> about X doesn't sit well with me. I think the best you could hope for 
> there
> _might_ be a config option _if_ you could show some significant
> improvements not attainable by modifying either X or the kernel in a more
> generic manner.
>
Yes, this is exactly what Keith Packard did in this paper:  
http://keithp.com/~keithp/talks/usenix2000/smart.html .  The X scheduler 
is certainly "smarter" by giving a higher priority to more interactive X 
clients. But I think guessing the importance of a client by the X server 
itself is flawed because the X server doesn't have a whole picture of 
the system. For example, it doesn't know anything about the "nice" value 
of a process.  I think the kernel is in the best position to decide 
which process is more important. That's why I proposed kernel based 
approach.

>
> Your last point didn't make sense to me. A client still needs to get CPU
> time. I guess I should look at the paper. Having to teach the scheduler
> about X doesn't sit well with me. I think the best you could hope for 
> there
> _might_ be a config option _if_ you could show some significant
> improvements not attainable by modifying either X or the kernel in a more
> generic manner.
>
My paper is about solving the priority inversion problem in general, not 
specific to the X server. But it can be used in this case.

-- Haoqiang


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-08-25 15:11         ` Haoqiang Zheng
@ 2003-09-02 14:25           ` Pavel Machek
  2003-09-08 13:40             ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Pavel Machek @ 2003-09-02 14:25 UTC (permalink / raw)
  To: Haoqiang Zheng
  Cc: Nick Piggin, William Lee Irwin III, Mike Galbraith, linux-kernel

Hi!

> >about X doesn't sit well with me. I think the best you could hope 
> >for there
> >_might_ be a config option _if_ you could show some significant
> >improvements not attainable by modifying either X or the kernel in a 
> >more
> >generic manner.
> >
> Yes, this is exactly what Keith Packard did in this paper:  
> http://keithp.com/~keithp/talks/usenix2000/smart.html .  The X 
> scheduler is certainly "smarter" by giving a higher priority to more 
> interactive X clients. But I think guessing the importance of a 
> client by the X server itself is flawed because the X server doesn't 
> have a whole picture of the system. For example, it doesn't know 
> anything about the "nice" value of a process.  I think the kernel is 
> in the best position to decide which process is more important. 
> That's why I proposed kernel based approach.

Tasks can easily report their interactivity needs/nice value.
X are already depend on clients not trying to screw each other,
so thats okay.
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-09-02 14:25           ` Pavel Machek
@ 2003-09-08 13:40             ` Alan Cox
  2003-09-08 14:20               ` Haoqiang Zheng
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Cox @ 2003-09-08 13:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Haoqiang Zheng, Nick Piggin, William Lee Irwin III,
	Mike Galbraith, Linux Kernel Mailing List

On Maw, 2003-09-02 at 15:25, Pavel Machek wrote:
> > in the best position to decide which process is more important. 
> > That's why I proposed kernel based approach.
> 
> Tasks can easily report their interactivity needs/nice value.
> X are already depend on clients not trying to screw each other,
> so thats okay.

There is a slight problem with a kernel based approach too - the app may
be remote. 

Alan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-09-08 13:40             ` Alan Cox
@ 2003-09-08 14:20               ` Haoqiang Zheng
  2003-09-08 14:28                 ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Haoqiang Zheng @ 2003-09-08 14:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Pavel Machek, Nick Piggin, William Lee Irwin III, Mike Galbraith,
	Linux Kernel Mailing List

How do you define "priority inversion" if the app is remote?

Alan Cox wrote:

>On Maw, 2003-09-02 at 15:25, Pavel Machek wrote:
>  
>
>>>in the best position to decide which process is more important. 
>>>That's why I proposed kernel based approach.
>>>      
>>>
>>Tasks can easily report their interactivity needs/nice value.
>>X are already depend on clients not trying to screw each other,
>>so thats okay.
>>    
>>
>
>There is a slight problem with a kernel based approach too - the app may
>be remote. 
>
>Alan
>  
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-09-08 14:20               ` Haoqiang Zheng
@ 2003-09-08 14:28                 ` Alan Cox
  2003-09-08 15:10                   ` Haoqiang Zheng
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Cox @ 2003-09-08 14:28 UTC (permalink / raw)
  To: Haoqiang Zheng
  Cc: Pavel Machek, Nick Piggin, William Lee Irwin III, Mike Galbraith,
	Linux Kernel Mailing List

On Llu, 2003-09-08 at 15:20, Haoqiang Zheng wrote:
> How do you define "priority inversion" if the app is remote?

You have to know the dependancies for the entire system, its nearly
impossible to do. Once you have the apps also waiting for each other and
for direct communications (eg via a database or a shared service) life
gets fun. 

For local apps one thing that has been suggested and some microkernels
have played with is a syscall that basically is "send this message and
donate the rest of my timeslice to.."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [CFT][PATCH] new scheduler policy
  2003-09-08 14:28                 ` Alan Cox
@ 2003-09-08 15:10                   ` Haoqiang Zheng
  0 siblings, 0 replies; 23+ messages in thread
From: Haoqiang Zheng @ 2003-09-08 15:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: Pavel Machek, Nick Piggin, William Lee Irwin III, Mike Galbraith,
	Linux Kernel Mailing List

Alan Cox wrote:

>You have to know the dependancies for the entire system, its nearly
>impossible to do. Once you have the apps also waiting for each other and
>for direct communications (eg via a database or a shared service) life
>gets fun. 
>  
>
If we assume the priority of a remote process is the same as its local 
priority (local p->prio), I think something can still be done.
Let's make up an example first.  Assume we have two machines A and B. P1 
runs at machine A while P2,P3 run at machine B.  P2 is the database 
server.  In this case, I think the priority inversion problem can be 
solved iff:
1. P2 (the database server) handles requests according to the clients 
priority.
2. P2  inherits the priority of its current client (client with the 
highest priority).

>For local apps one thing that has been suggested and some microkernels
>have played with is a syscall that basically is "send this message and
>donate the rest of my timeslice to.."
>
>  
>
In the additional syscall based approach, the applications have to be 
re-written and the application developers have to know exactly where the 
timeslice should be denoted to. This is not usually feasible.  On the 
other hand, everything is done automatically and transparently in the 
approach I suggested.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-09-08 15:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-19  1:53 [CFT][PATCH] new scheduler policy Nick Piggin
2003-08-19  2:35 ` William Lee Irwin III
2003-08-19  2:46   ` Nick Piggin
2003-08-19  2:59     ` Nick Piggin
2003-08-19  5:15       ` Matt Mackall
2003-08-19  5:34         ` Nick Piggin
2003-08-19  5:45           ` Matt Mackall
2003-08-19 10:24 ` Mike Galbraith
2003-08-19 13:40   ` Nick Piggin
2003-08-19 18:49     ` Mike Galbraith
2003-08-20  2:13   ` William Lee Irwin III
2003-08-25 13:47     ` Haoqiang Zheng
2003-08-25 14:03       ` Nick Piggin
2003-08-25 15:11         ` Haoqiang Zheng
2003-09-02 14:25           ` Pavel Machek
2003-09-08 13:40             ` Alan Cox
2003-09-08 14:20               ` Haoqiang Zheng
2003-09-08 14:28                 ` Alan Cox
2003-09-08 15:10                   ` Haoqiang Zheng
2003-08-22  8:55 ` Roger Luethi
2003-08-22 13:08   ` Nick Piggin
2003-08-22 15:11     ` Roger Luethi
2003-08-23  0:22       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).