* O(1) Scheduler (tuning problem/live-lock) @ 2002-09-06 18:44 Jim Houston 2002-09-30 16:10 ` Andrea Arcangeli 0 siblings, 1 reply; 5+ messages in thread From: Jim Houston @ 2002-09-06 18:44 UTC (permalink / raw) To: linux-kernel; +Cc: jim.houston The current O(1) scheduler heuristics for calculating sleep_avg and assigning process priorities allows a parent and a small group of compute bound child processes to live-lock the system. We found this problem running a stress test including the LTP test suite. In particular the waitpid06 test in the LTP triggered this problem. We are working with a 2.4.18 kernel with a backport of the O(1) scheduler, but this problem is present in Linux-2.5.32. How it happens. The waitpid06 test forks off 8 child processes. Each child enters an infinite loop waiting for a signal from the parent. Yes, it's a stupid test program. The parent (if it gets to run) immediately sends a signal to each child process and then does a wait() call for each child. The parent process spends all of its time in wait(). When a child exits, the parents sleep_avg is adjusted twice. In sched_exit(): parent->sleep_avg = ((3*parent->sleep_avg) + child->sleep_avg)/4; In activate_task(): p->sleep_avg += sleep_time; if (p->sleep_avg > MAX_SLEEP_AVG) p->sleep_avg = MAX_SLEEP_AVG; The child->sleep_avg is set initially to 95% of the parent->sleep_avg. The child->sleep_avg for the running child is decremented in scheduler_tick(). If you have fewer processors than child processes, child->sleep_avg will, on average, decrease less than 1 each tick. The effect is that the parent sleep_avg will approach MAX_SLEEP_AVG giving it and its children a favorable interactive priority. Since these processes are judged interactive they go back into the active array when they use up their time slice but still with a favorable priority and a new time quantum. The problem is easy to reproduce with the waitpid06 test. It provides options so that you can loop repeating the test and also run multiple copies at once. I have been using: waitpid06 -c 8 -i 10000 This runs 8 copies of the test (64 unruly child processes) and loops 10,000 times. I also run a top(1) and a: while true ; do date; sleep 1; done loop so I can tell if the system has locked up. This sometimes takes a few minutes. How do we fix this? I'm just getting started playing with the code. When I tried changing the EXIT_WEIGHT to 1, the problem still happened. I tried changing sched_exit to give the parent the minimum of the two sleep_avg values. This seems to fix the problem. I suspect that this is really a symptom of a larger problem, that the system can be over-commited with processes which are all judged interactive never putting processes in the expired array and so never triggering the EXPIRED_STARVING case. Please CC: me on answers/comments since I read the archives. Jim Houston - Concurrent Computer Corp. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: O(1) Scheduler (tuning problem/live-lock) 2002-09-06 18:44 O(1) Scheduler (tuning problem/live-lock) Jim Houston @ 2002-09-30 16:10 ` Andrea Arcangeli 2002-10-01 7:20 ` Jim Houston 0 siblings, 1 reply; 5+ messages in thread From: Andrea Arcangeli @ 2002-09-30 16:10 UTC (permalink / raw) To: Jim Houston; +Cc: linux-kernel, jim.houston, Ingo Molnar On Fri, Sep 06, 2002 at 02:44:15PM -0400, Jim Houston wrote: > > The current O(1) scheduler heuristics for calculating sleep_avg > and assigning process priorities allows a parent and a small group of > compute bound child processes to live-lock the system. > We found this problem running a stress test including the LTP > test suite. In particular the waitpid06 test in the LTP triggered > this problem. We are working with a 2.4.18 kernel with a backport of > the O(1) scheduler, but this problem is present in Linux-2.5.32. > > How it happens. > > The waitpid06 test forks off 8 child processes. Each child enters > an infinite loop waiting for a signal from the parent. Yes, it's > a stupid test program. The parent (if it gets to run) immediately sends > a signal to each child process and then does a wait() call for each child. > > The parent process spends all of its time in wait(). When a child > exits, the parents sleep_avg is adjusted twice. > > In sched_exit(): > parent->sleep_avg = ((3*parent->sleep_avg) + child->sleep_avg)/4; > > In activate_task(): > p->sleep_avg += sleep_time; > if (p->sleep_avg > MAX_SLEEP_AVG) > p->sleep_avg = MAX_SLEEP_AVG; > > The child->sleep_avg is set initially to 95% of the parent->sleep_avg. > The child->sleep_avg for the running child is decremented in > scheduler_tick(). If you have fewer processors than child processes, > child->sleep_avg will, on average, decrease less than 1 each tick. > > The effect is that the parent sleep_avg will approach MAX_SLEEP_AVG giving > it and its children a favorable interactive priority. > Since these processes are judged interactive they go back into the active > array when they use up their time slice but still with a favorable priority > and a new time quantum. > > The problem is easy to reproduce with the waitpid06 test. It provides > options so that you can loop repeating the test and also run multiple > copies at once. I have been using: > > waitpid06 -c 8 -i 10000 > > This runs 8 copies of the test (64 unruly child processes) and loops > 10,000 times. I also run a top(1) and a: > while true ; do date; sleep 1; done > loop so I can tell if the system has locked up. This sometimes takes > a few minutes. > > > How do we fix this? > > I'm just getting started playing with the code. When I tried changing the > EXIT_WEIGHT to 1, the problem still happened. I tried changing > sched_exit to give the parent the minimum of the two sleep_avg values. > This seems to fix the problem. I suspect that this is really a symptom > of a larger problem, that the system can be over-commited with processes > which are all judged interactive never putting processes in the expired > array and so never triggering the EXPIRED_STARVING case. > > Please CC: me on answers/comments since I read the archives. can you try this patch? I dropeed the starvation logic, it could reinsert into the active queue any interactive task. Problem is that as you say all these tasks will look as interactive just because they have to stay out of the cpu for long because there are many more tasks than cpus. there was an anti starvation logic as well that limits the effect of the starvation logic to 2 seconds, but there was a 2 second window for these tasks to be starved at every roll of the expired/active queues. Furthmore we are just unfair with the wakenup tasks by putting them always in the active queue, the non immediatly wakenup tasks shouldn't have a backdoor to go back into the active after the timeslice has expired. The child penalty as well was too high, we cannot predict if our child will be interactive or not. 50% sounds much better number. The sleep information tells us how much a task is interactive, and a child of an interactive task doesn't need to be interactive at all. It maybe, like in a gui starting a new browser or similar, but it may not. So a 50% looks good compromise, we just don't know. 50% also guaranetees that the dynamic prio goes down to its miniumum quickly under a fast fork/fork/fork cycle. Reclaiming the sleep information from child to parent was completely broken too. The parent has no relationship at all with the child in terms of interactivity. Giving 50% of sleep time to the child may be ok at the light of the past history of a common parent, but the future of the child has no relationship with the future of the parent. so the exit part had to be dropped. The child-run-first was as well not working, this is fixed now, by using the same prio of the parent at the first child-schedule, by putting the child into the array in fifo order (rather than the usual lifo) and by setting need_resched. Now that I think about it, it would be a bit better to put the child just in front of the parent, doing a list_add_tail on the parent rather than on the head of the queue, but it won't chance much the behaviour for your test. I will make this further modification shortly. In the meantime I'd be interested to know if this fixes your livelock problem. I cannot reproduce lockups here with your waitpid testcase (on a 4-way if that matters). Last but not the least the idle prio for secondary cpus was set to an out of bounds value that would randomly corrupt memory if anybody made any use of it (it wasn't the case before these modifications, this is why such bug is been apparently harmless so far). This is part of 2.4.20pre8aa1, porting to any other o1 based kernel should be trivial, if unsure you can try to test your workload on 2.4.20pre8aa1 and we'll know if you can still reproduce the livelock. thanks, --- 2.4.20pre7aa1/include/linux/sched_runqueue.h.~1~ Sat Sep 21 16:50:02 2002 +++ 2.4.20pre7aa1/include/linux/sched_runqueue.h Wed Sep 25 20:01:36 2002 @@ -57,7 +57,7 @@ struct prio_array { */ struct runqueue { spinlock_t lock; - unsigned long nr_running, nr_switches, expired_timestamp; + unsigned long nr_running, nr_switches; long quiescent; /* RCU */ task_t *curr, *idle; prio_array_t *active, *expired, arrays[2]; --- 2.4.20pre7aa1/kernel/sched.c.~1~ Fri Sep 20 12:43:34 2002 +++ 2.4.20pre7aa1/kernel/sched.c Wed Sep 25 20:02:48 2002 @@ -54,51 +54,10 @@ */ #define MIN_TIMESLICE ( 10 * HZ / 1000) #define MAX_TIMESLICE (300 * HZ / 1000) -#define CHILD_PENALTY 95 +#define CHILD_PENALTY 50 #define PARENT_PENALTY 100 -#define EXIT_WEIGHT 3 #define PRIO_BONUS_RATIO 25 -#define INTERACTIVE_DELTA 2 #define MAX_SLEEP_AVG (2*HZ) -#define STARVATION_LIMIT (2*HZ) - -/* - * If a task is 'interactive' then we reinsert it in the active - * array after it has expired its current timeslice. (it will not - * continue to run immediately, it will still roundrobin with - * other interactive tasks.) - * - * This part scales the interactivity limit depending on niceness. - * - * We scale it linearly, offset by the INTERACTIVE_DELTA delta. - * Here are a few examples of different nice levels: - * - * TASK_INTERACTIVE(-20): [1,1,1,1,1,1,1,1,1,0,0] - * TASK_INTERACTIVE(-10): [1,1,1,1,1,1,1,0,0,0,0] - * TASK_INTERACTIVE( 0): [1,1,1,1,0,0,0,0,0,0,0] - * TASK_INTERACTIVE( 10): [1,1,0,0,0,0,0,0,0,0,0] - * TASK_INTERACTIVE( 19): [0,0,0,0,0,0,0,0,0,0,0] - * - * (the X axis represents the possible -5 ... 0 ... +5 dynamic - * priority range a task can explore, a value of '1' means the - * task is rated interactive.) - * - * Ie. nice +19 tasks can never get 'interactive' enough to be - * reinserted into the active array. And only heavily CPU-hog nice -20 - * tasks will be expired. Default nice 0 tasks are somewhere between, - * it takes some effort for them to get interactive, but it's not - * too hard. - */ - -#define SCALE(v1,v1_max,v2_max) \ - (v1) * (v2_max) / (v1_max) - -#define DELTA(p) \ - (SCALE(TASK_NICE(p), 40, MAX_USER_PRIO*PRIO_BONUS_RATIO/100) + \ - INTERACTIVE_DELTA) - -#define TASK_INTERACTIVE(p) \ - ((p)->prio <= (p)->static_prio - DELTA(p)) /* * TASK_TIMESLICE scales user-nice values [ -20 ... 19 ] @@ -157,9 +116,13 @@ static inline void dequeue_task(struct t __clear_bit(p->prio, array->bitmap); } -static inline void enqueue_task(struct task_struct *p, prio_array_t *array) +#define enqueue_task(p, array) __enqueue_task(p, array, 0) +static inline void __enqueue_task(struct task_struct *p, prio_array_t *array, int forked_child) { - list_add_tail(&p->run_list, array->queue + p->prio); + if (!forked_child) + list_add_tail(&p->run_list, array->queue + p->prio); + else + list_add(&p->run_list, array->queue + p->prio); __set_bit(p->prio, array->bitmap); array->nr_active++; p->array = array; @@ -191,12 +154,14 @@ static inline int effective_prio(task_t return prio; } -static inline void activate_task(task_t *p, runqueue_t *rq) +#define activate_task(p, rq) __activate_task(p, rq, 0) +#define activate_forked_task(p, rq) __activate_task(p, rq, 1) +static inline void __activate_task(task_t *p, runqueue_t *rq, int forked_child) { unsigned long sleep_time = jiffies - p->sleep_timestamp; prio_array_t *array = rq->active; - if (!rt_task(p) && sleep_time) { + if (!forked_child && !rt_task(p) && sleep_time) { /* * This code gives a bonus to interactive tasks. We update * an 'average sleep time' value here, based on @@ -204,12 +169,13 @@ static inline void activate_task(task_t * the higher the average gets - and the higher the priority * boost gets as well. */ + p->sleep_timestamp = jiffies; p->sleep_avg += sleep_time; if (p->sleep_avg > MAX_SLEEP_AVG) p->sleep_avg = MAX_SLEEP_AVG; p->prio = effective_prio(p); } - enqueue_task(p, array); + __enqueue_task(p, array, forked_child); rq->nr_running++; } @@ -335,16 +301,35 @@ void wake_up_forked_process(task_t * p) p->state = TASK_RUNNING; if (!rt_task(p)) { /* - * We decrease the sleep average of forking parents - * and children as well, to keep max-interactive tasks + * We decrease the sleep average of forked + * children, to keep max-interactive tasks * from forking tasks that are max-interactive. + * CHILD_PENALTY is set to 50% since we have + * no clue if this is still an interactive + * task like the parent or if this will be a + * cpu bound task. The parent isn't touched + * as we don't make assumption about the parent + * changing behaviour after the child is forked. */ current->sleep_avg = current->sleep_avg * PARENT_PENALTY / 100; p->sleep_avg = p->sleep_avg * CHILD_PENALTY / 100; - p->prio = effective_prio(p); + + /* + * For its first schedule keep the child at the same + * priority (i.e. in the same list) of the parent, + * activate_forked_task() will take care to put the child + * in front of the parent (lifo) to guarantee a + * schedule-child-first behaviour after fork. + */ + p->prio == current->prio; + BUG_ON(p->prio < MAX_RT_PRIO); + BUG_ON(p->prio > MAX_PRIO - 1); + + /* reschedule the child while returning from fork() */ + set_tsk_need_resched(p); } p->cpu = smp_processor_id(); - activate_task(p, rq); + activate_forked_task(p, rq); spin_unlock_irq(&rq->lock); } @@ -367,12 +352,10 @@ void sched_exit(task_t * p) } __sti(); /* - * If the child was a (relative-) CPU hog then decrease - * the sleep_avg of the parent as well. + * Don't touch sleep_avg here, we cannot make any assumption + * that the parent will change runtime behaviour, in function + * of the historic runtime behaviour of the child. */ - if (p->sleep_avg < current->sleep_avg) - current->sleep_avg = (current->sleep_avg * EXIT_WEIGHT + - p->sleep_avg) / (EXIT_WEIGHT + 1); } #if CONFIG_SMP @@ -658,20 +641,6 @@ static inline void idle_tick(void) #endif /* - * We place interactive tasks back into the active array, if possible. - * - * To guarantee that this does not starve expired tasks we ignore the - * interactivity of a task if the first expired task had to wait more - * than a 'reasonable' amount of time. This deadline timeout is - * load-dependent, as the frequency of array switched decreases with - * increasing number of running tasks: - */ -#define EXPIRED_STARVING(rq) \ - ((rq)->expired_timestamp && \ - (jiffies - (rq)->expired_timestamp >= \ - STARVATION_LIMIT * ((rq)->nr_running) + 1)) - -/* * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. */ @@ -734,12 +703,7 @@ void scheduler_tick(int user_tick, int s p->time_slice = TASK_TIMESLICE(p); p->first_time_slice = 0; - if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) { - if (!rq->expired_timestamp) - rq->expired_timestamp = jiffies; - enqueue_task(p, rq->expired); - } else - enqueue_task(p, rq->active); + enqueue_task(p, rq->expired); } out: #if CONFIG_SMP @@ -794,7 +758,6 @@ pick_next_task: goto pick_next_task; #endif next = rq->idle; - rq->expired_timestamp = 0; goto switch_tasks; } @@ -806,7 +769,6 @@ pick_next_task: rq->active = rq->expired; rq->expired = array; array = rq->active; - rq->expired_timestamp = 0; } idx = sched_find_first_bit(array->bitmap); @@ -1049,10 +1011,9 @@ void set_user_nice(task_t *p, long nice) if (array) { enqueue_task(p, array); /* - * If the task is running and lowered its priority, - * or increased its priority then reschedule its CPU: + * If the task is running reschedule its CPU: */ - if ((NICE_TO_PRIO(nice) < p->static_prio) || (p == rq->curr)) + if (p == rq->curr) resched_task(rq->curr); } out_unlock: @@ -1608,7 +1569,7 @@ void __init init_idle(task_t *idle, int idle_rq->curr = idle_rq->idle = idle; deactivate_task(idle, rq); idle->array = NULL; - idle->prio = MAX_PRIO; + idle->prio = MAX_PRIO - 20; idle->state = TASK_RUNNING; idle->cpu = cpu; double_rq_unlock(idle_rq, rq); Andrea ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: O(1) Scheduler (tuning problem/live-lock) 2002-09-30 16:10 ` Andrea Arcangeli @ 2002-10-01 7:20 ` Jim Houston 2002-10-02 6:45 ` Andrea Arcangeli 0 siblings, 1 reply; 5+ messages in thread From: Jim Houston @ 2002-10-01 7:20 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Jim Houston, linux-kernel, Ingo Molnar Hi Andrea, Ingo, Andrea I tried your patch and it does solve the live-lock in the LTP waitpid06 test. The mouse movement gets a bit jerky but atleast it doesn't lock up. I guess the next question is how does it do on normal work loads? I like the idea of making the child processes start with a smaller sleep_avg value. Maybe it should just be a constant rather than a fraction of the parents sleep_avg? Its really the child processes inheriting the favorable sleep_avg that caused the problem with waitpid06. I liked the idea of giving interactive tasks special treatment. Andrea please don't remove this. Always putting processes (which have used up there time slice) into the rq->expired array makes all processes round robin at the same priority. It makes sense to do this to fail gracefully if the system is overloaded but not all the time. I hope this make sense. I'm falling asleep writing it:-) Jim Houston - Concurrent Computer Corp. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: O(1) Scheduler (tuning problem/live-lock) 2002-10-01 7:20 ` Jim Houston @ 2002-10-02 6:45 ` Andrea Arcangeli 2002-10-03 5:50 ` Jim Houston 0 siblings, 1 reply; 5+ messages in thread From: Andrea Arcangeli @ 2002-10-02 6:45 UTC (permalink / raw) To: Jim Houston; +Cc: Jim Houston, linux-kernel, Ingo Molnar On Tue, Oct 01, 2002 at 03:20:57AM -0400, Jim Houston wrote: > Hi Andrea, Ingo, > > Andrea I tried your patch and it does solve the live-lock > in the LTP waitpid06 test. The mouse movement gets a bit > jerky but atleast it doesn't lock up. it's probably because I screwed the wakeup logic here: if (p->prio < rq->curr->prio) resched_task(rq->curr); I didn't realize the idle task ->prio MAX_PRIO out of bounds value was a feature and not a bug. the last comment about the MAX_PRIO setting of idle tasks in the previous email was wrong too. > I guess the next question is how does it do on normal work loads? > > I like the idea of making the child processes start with a smaller > sleep_avg value. Maybe it should just be a constant rather than a > fraction of the parents sleep_avg? Its really the child processes > inheriting the favorable sleep_avg that caused the problem with > waitpid06. I think constant isn't as good as 50%, the history may be significant for the child (like in a GUI), and going down of 50% each fork should be enough to guarantee good fariness. > I liked the idea of giving interactive tasks special treatment. > Andrea please don't remove this. Always putting processes > (which have used up there time slice) into the rq->expired array > makes all processes round robin at the same priority. It makes > sense to do this to fail gracefully if the system is overloaded > but not all the time. yes, when the system isn't overloaded that makes sense, I was just thinking at the overloaded case where the expired tasks would be penalized for two more seconds. But maybe it doesn't matter much, as said I resurrected it so you can test. > I hope this make sense. I'm falling asleep writing it:-) ;) Here the new patch only slightly tested so far. This does the run-child-first right too (if the parent expired at the time of the fork() they will both wait for a reschedule and they will be run serially, that should be the best for throughput, certainly we can't put the parent back in the active queue so it worth to execute the child right before the parent to maximize possible cache effects) You can easily see the effect with this proggy: main() { if (fork()) printf("p\n"); else printf("c\n"); } Could you test this new one too? thanks, --- 2.4.20pre8aa2/kernel/sched.c.~1~ Wed Oct 2 00:25:58 2002 +++ 2.4.20pre8aa2/kernel/sched.c Wed Oct 2 06:27:28 2002 @@ -54,9 +54,8 @@ */ #define MIN_TIMESLICE ( 10 * HZ / 1000) #define MAX_TIMESLICE (300 * HZ / 1000) -#define CHILD_PENALTY 95 +#define CHILD_PENALTY 50 #define PARENT_PENALTY 100 -#define EXIT_WEIGHT 3 #define PRIO_BONUS_RATIO 25 #define INTERACTIVE_DELTA 2 #define MAX_SLEEP_AVG (2*HZ) @@ -157,12 +156,19 @@ static inline void dequeue_task(struct t __clear_bit(p->prio, array->bitmap); } -static inline void enqueue_task(struct task_struct *p, prio_array_t *array) +#define enqueue_task(p, array) __enqueue_task(p, array, NULL) +static inline void __enqueue_task(struct task_struct *p, prio_array_t *array, task_t * parent) { - list_add_tail(&p->run_list, array->queue + p->prio); - __set_bit(p->prio, array->bitmap); + if (!parent) { + list_add_tail(&p->run_list, array->queue + p->prio); + __set_bit(p->prio, array->bitmap); + p->array = array; + __set_bit(p->prio, array->bitmap); + } else { + list_add_tail(&p->run_list, &parent->run_list); + array = p->array = parent->array; + } array->nr_active++; - p->array = array; } static inline int effective_prio(task_t *p) @@ -191,12 +197,13 @@ static inline int effective_prio(task_t return prio; } -static inline void activate_task(task_t *p, runqueue_t *rq) +#define activate_task(p, rq) __activate_task(p, rq, NULL) +static inline void __activate_task(task_t *p, runqueue_t *rq, task_t * parent) { unsigned long sleep_time = jiffies - p->sleep_timestamp; prio_array_t *array = rq->active; - if (!rt_task(p) && sleep_time) { + if (!parent && !rt_task(p) && sleep_time) { /* * This code gives a bonus to interactive tasks. We update * an 'average sleep time' value here, based on @@ -204,12 +211,13 @@ static inline void activate_task(task_t * the higher the average gets - and the higher the priority * boost gets as well. */ + p->sleep_timestamp = jiffies; p->sleep_avg += sleep_time; if (p->sleep_avg > MAX_SLEEP_AVG) p->sleep_avg = MAX_SLEEP_AVG; p->prio = effective_prio(p); } - enqueue_task(p, array); + __enqueue_task(p, array, parent); rq->nr_running++; } @@ -328,23 +336,47 @@ int wake_up_process(task_t * p) void wake_up_forked_process(task_t * p) { runqueue_t *rq; + task_t * parent = current; rq = this_rq(); spin_lock_irq(&rq->lock); p->state = TASK_RUNNING; - if (!rt_task(p)) { + if (likely(!rt_task(p) && parent->array)) { /* - * We decrease the sleep average of forking parents - * and children as well, to keep max-interactive tasks + * We decrease the sleep average of forked + * children, to keep max-interactive tasks * from forking tasks that are max-interactive. + * CHILD_PENALTY is set to 50% since we have + * no clue if this is still an interactive + * task like the parent or if this will be a + * cpu bound task. The parent isn't touched + * as we don't make assumption about the parent + * changing behaviour after the child is forked. */ - current->sleep_avg = current->sleep_avg * PARENT_PENALTY / 100; + parent->sleep_avg = parent->sleep_avg * PARENT_PENALTY / 100; p->sleep_avg = p->sleep_avg * CHILD_PENALTY / 100; + + /* + * For its first schedule keep the child at the same + * priority (i.e. in the same list) of the parent, + * activate_forked_task() will take care to put the + * child in front of the parent (lifo) to guarantee a + * schedule-child-first behaviour after fork. + */ + p->prio = parent->prio; + set_tsk_need_resched(parent); + } else { + /* + * Take the usual wakeup path if it's RT or if + * it's a child of the first idle task (during boot + * only). + */ p->prio = effective_prio(p); + parent = NULL; } - p->cpu = smp_processor_id(); - activate_task(p, rq); + + __activate_task(p, rq, parent); spin_unlock_irq(&rq->lock); } @@ -366,13 +398,6 @@ void sched_exit(task_t * p) current->time_slice = MAX_TIMESLICE; } __sti(); - /* - * If the child was a (relative-) CPU hog then decrease - * the sleep_avg of the parent as well. - */ - if (p->sleep_avg < current->sleep_avg) - current->sleep_avg = (current->sleep_avg * EXIT_WEIGHT + - p->sleep_avg) / (EXIT_WEIGHT + 1); } #if CONFIG_SMP @@ -1027,7 +1052,7 @@ void set_user_nice(task_t *p, long nice) * If the task is running and lowered its priority, * or increased its priority then reschedule its CPU: */ - if ((NICE_TO_PRIO(nice) < p->static_prio) || (p == rq->curr)) + if (p == rq->curr) resched_task(rq->curr); } out_unlock: --- 2.4.20pre8aa2/kernel/fork.c.~1~ Wed Oct 2 00:25:23 2002 +++ 2.4.20pre8aa2/kernel/fork.c Wed Oct 2 06:02:13 2002 @@ -690,8 +690,6 @@ int do_fork(unsigned long clone_flags, u if (p->pid < 0) /* valid pids are >= 0 */ goto bad_fork_cleanup; - INIT_LIST_HEAD(&p->run_list); - p->p_cptr = NULL; init_waitqueue_head(&p->wait_chldexit); p->vfork_done = NULL; @@ -725,7 +723,6 @@ int do_fork(unsigned long clone_flags, u spin_lock_init(&p->sigmask_lock); } #endif - p->array = NULL; p->lock_depth = -1; /* -1 = no lock */ p->start_time = jiffies; Andrea ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: O(1) Scheduler (tuning problem/live-lock) 2002-10-02 6:45 ` Andrea Arcangeli @ 2002-10-03 5:50 ` Jim Houston 0 siblings, 0 replies; 5+ messages in thread From: Jim Houston @ 2002-10-03 5:50 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Jim Houston, linux-kernel, Ingo Molnar Hi Andrea, Ingo, Andrea, I tried your second patch. Again, it keeps on running even with "waitpid06 -c 16 -i 10000". This is good. It still has some jerky mouse behavior (under this load). This is on an old slow Pentium Pro dual processor. If I grab a window and move it around for several seconds, the screen will freeze for a couple seconds. I suspect that my X server fails the TASK_INTERACTIVE test. I have been hacking at sched.c myself trying to avoid the array switch entirely. I'm trying to set up a self-tuning feedback mechanism to adjust priorities so everything gets some cpu time without having to do the array switch. I'm juggling these ideas: 1. Gradually raise the priority of all the processes in the run queue. Do this without having to visit all of the processes. 2. When a process uses up its time slice, move it to a less favorable priority. 3. Tune the sleep_avg. I like the old decaying average approach of old unix systems. The current sleep_avg goes to saturation too often. I would like to be able to tell if a process has been using more than its share of the cpu time. 4. Make the maximum time slice decrease with more favorable priorities. The time slice would depend on the dynamic priority. I have code hacked together for first idea but its not useful without the rest. Jim Houston - Concurrent Computer Corp. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-10-03 5:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-09-06 18:44 O(1) Scheduler (tuning problem/live-lock) Jim Houston 2002-09-30 16:10 ` Andrea Arcangeli 2002-10-01 7:20 ` Jim Houston 2002-10-02 6:45 ` Andrea Arcangeli 2002-10-03 5:50 ` Jim Houston
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).