linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] O17int
@ 2003-08-19 15:01 Con Kolivas
  2003-08-19 16:39 ` Måns Rullgård
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Con Kolivas @ 2003-08-19 15:01 UTC (permalink / raw)
  To: linux kernel mailing list; +Cc: Andrew Morton

[-- Attachment #1: clearsigned data --]
[-- Type: Text/Plain, Size: 1323 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Food for the starving masses.

This patch prevents any single task from being able to starve the runqueue 
that it's on. This minimises the impact a poorly behaved application or a 
malicious one has on the rest of the system. If an interactive task (eg 
blender) spins on wait on a cpu hog (X), other tasks will be less affected by 
priority inversion occurring between X and blender (ie X will still suffer 
but the machine wont come to a standstill). Broken apps will improve at least 
partially with this, but they should be fixed for good performance.

Changes:
A task that uses up it's full timeslice will not be allowed to be the very 
next task scheduled unless no other tasks on that runqueue are active. This 
allows the next highest priority task to be scheduled or even an array switch 
followed by the next task.

Dropped the on runqueue bonus to high credit tasks on requeuing for timeslice 
granularity for the above to work well.

Patch O16.3-O17int and a full patch against 2.6.0-test3 is available at
http://kernel.kolivas.org/2.5

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QjvMZUg7+tp6mRURAuLpAJ4wZaZ1qa5n4otk/wE+muarpqilJgCff3Bi
I8J81js9UZL3XoPqR0FnjEE=
=1KDG
-----END PGP SIGNATURE-----

[-- Attachment #2: patch-O16.3-O17int --]
[-- Type: text/x-diff, Size: 5130 bytes --]

--- linux-2.6.0-test3-mm3-O16.3/kernel/sched.c	2003-08-19 23:17:29.000000000 +1000
+++ linux-2.6.0-test3-mm3/kernel/sched.c	2003-08-20 00:04:00.000000000 +1000
@@ -360,6 +360,16 @@ static int effective_prio(task_t *p)
 }
 
 /*
+ * Activation classes used in p->activated as flags
+ */
+#define STARVED_WAKER	-3	// on runqueue by potential starvation
+#define USED_SLICE	-2	// used up a full timeslice
+#define UNINT_WAKE	-1	// woke from an uninterruptible sleep
+#define AWAKE		0	// other
+#define PART_RQ		1	// on run queue receives partial sleep credit
+#define FULL_RQ		2	// on run queue receives full sleep credit
+
+/*
  * __activate_task - move a task to the runqueue.
  */
 static inline void __activate_task(task_t *p, runqueue_t *rq)
@@ -410,7 +420,7 @@ static void recalc_task_prio(task_t *p, 
 			 * sleep are limited in their sleep_avg rise as they
 			 * are likely to be waiting on I/O
 			 */
-			if (p->activated == -1 && !HIGH_CREDIT(p) && p->mm){
+			if (p->activated == UNINT_WAKE && !HIGH_CREDIT(p) && p->mm){
 				if (p->sleep_avg >=
 					JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p)))
 						sleep_time = 0;
@@ -458,7 +468,7 @@ static inline void activate_task(task_t 
 	 * This checks to make sure it's not an uninterruptible task
 	 * that is now waking up.
 	 */
-	if (!p->activated){
+	if (p->activated == AWAKE){
 		/*
 		 * Tasks which were woken up by interrupts (ie. hw events)
 		 * are most likely of interactive nature. So we give them
@@ -467,13 +477,13 @@ static inline void activate_task(task_t 
 		 * on a CPU, first time around:
 		 */
 		if (in_interrupt())
-			p->activated = 2;
+			p->activated = FULL_RQ;
 		else
 		/*
 		 * Normal first-time wakeups get a credit too for on-runqueue
 		 * time, but it will be weighted down:
 		 */
-			p->activated = 1;
+			p->activated = PART_RQ;
 		}
 	p->timestamp = now;
 
@@ -603,7 +613,7 @@ repeat_lock_task:
 				 * Tasks on involuntary sleep don't earn
 				 * sleep_avg beyond just interactive state.
 				 */
-				p->activated = -1;
+				p->activated = UNINT_WAKE;
 			}
 			if (sync)
 				__activate_task(p, rq);
@@ -1398,8 +1408,10 @@ void scheduler_tick(int user_ticks, int 
 			rq->expired_timestamp = jiffies;
 		if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
 			enqueue_task(p, rq->expired);
-		} else
+		} else {
 			enqueue_task(p, rq->active);
+			p->activated = USED_SLICE;
+		}
 	} else {
 		/*
 		 * Prevent a too long timeslice allowing a task to monopolize
@@ -1415,21 +1427,18 @@ void scheduler_tick(int user_ticks, int 
 		 * equal priority.
 		 *
 		 * This only applies to user tasks in the interactive
-		 * delta range with at least MIN_TIMESLICE left.
+		 * delta range or a starved waker with at least MIN_TIMESLICE
+		 * left.
 		 */
-		if (p->mm && TASK_INTERACTIVE(p) && !((task_timeslice(p) -
-			p->time_slice) % TIMESLICE_GRANULARITY) &&
+		if (p->mm && (TASK_INTERACTIVE(p) ||
+			p->activated == STARVED_WAKER) &&
+			!((task_timeslice(p) - p->time_slice) %
+			TIMESLICE_GRANULARITY) &&
 			(p->time_slice > MIN_TIMESLICE) &&
 			(p->array == rq->active)) {
 
 			dequeue_task(p, rq->active);
 			set_tsk_need_resched(p);
-			/*
-			 * Tasks with interactive credit get all their
-			 * time waiting on the run queue credited as sleep
-			 */
-			if (HIGH_CREDIT(p))
-				p->activated = 2;
 			p->prio = effective_prio(p);
 			enqueue_task(p, rq->active);
 		}
@@ -1534,18 +1543,46 @@ pick_next_task:
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
-	if (next->activated > 0) {
-		unsigned long long delta = now - next->timestamp;
+	/*
+	 * This prevents tasks that have used up a full timeslice from
+	 * being rescheduled immediately, thus avoiding starvation
+	 * and allowing lower priority tasks to run for at least
+	 * TIMESLICE_GRANULARITY or even allowing an array switch
+	 */
+	if (unlikely(next == prev && prev->activated == USED_SLICE &&
+		prev->mm && rq->nr_running > 1)){
+			dequeue_task(prev, array);
+			if (unlikely(!array->nr_active)) {
+				rq->active = rq->expired;
+				rq->expired = array;
+				array = rq->active;
+				rq->expired_timestamp = 0;
+			}
+			idx = sched_find_first_bit(array->bitmap);
+			queue = array->queue + idx;
+			next = list_entry(queue->next, task_t, run_list);
+			enqueue_task(prev, array);
+			prev->activated = AWAKE;
 
-		if (next->activated == 1)
-			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
+			if (!TASK_INTERACTIVE(next))
+				next->activated = STARVED_WAKER;
+			else
+				next->activated = AWAKE;
+	} else {
+		if (next->activated > 0) {
+			unsigned long long delta = now - next->timestamp;
 
-		array = next->array;
-		dequeue_task(next, array);
-		recalc_task_prio(next, next->timestamp + delta);
-		enqueue_task(next, array);
+			if (next->activated == PART_RQ)
+				delta = delta * (ON_RUNQUEUE_WEIGHT * 128 /
+				100) / 128;
+
+			array = next->array;
+			dequeue_task(next, array);
+			recalc_task_prio(next, next->timestamp + delta);
+			enqueue_task(next, array);
+		}
+		next->activated = AWAKE;
 	}
-	next->activated = 0;
 switch_tasks:
 	prefetch(next);
 	clear_tsk_need_resched(prev);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-19 15:01 [PATCH] O17int Con Kolivas
@ 2003-08-19 16:39 ` Måns Rullgård
  2003-08-20  1:23   ` Con Kolivas
  2003-08-19 18:58 ` Måns Rullgård
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Måns Rullgård @ 2003-08-19 16:39 UTC (permalink / raw)
  To: linux-kernel

Con Kolivas <kernel@kolivas.org> writes:

> Food for the starving masses.
>
> This patch prevents any single task from being able to starve the
> runqueue that it's on. This minimises the impact a poorly behaved
> application or a malicious one has on the rest of the system. If an

I have to disagree.  Open a file of a few hundred lines in XEmacs and
do a regexp search for "^[> ]*-*\n\\([> ]*.*\n\\)*[> ]*foo".  The
system will more or less freeze.  It's a very nasty regexp, and it's
an error to try to use it, but it still shouldn't freeze the system.

-- 
Måns Rullgård
mru@users.sf.net


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-19 15:01 [PATCH] O17int Con Kolivas
  2003-08-19 16:39 ` Måns Rullgård
@ 2003-08-19 18:58 ` Måns Rullgård
  2003-08-20  1:19   ` Con Kolivas
  2003-08-20 16:27 ` Wiktor Wodecki
  2003-08-21  5:30 ` Apurva Mehta
  3 siblings, 1 reply; 27+ messages in thread
From: Måns Rullgård @ 2003-08-19 18:58 UTC (permalink / raw)
  To: linux-kernel


Another XEmacs strangeness:

When compiling from xemacs, everything is fine until the compilation
is done.  Then xemacs starts spinning wildly in some loop doing this:

select(1024, [], NULL, NULL, {0, 0})    = 0 (Timeout)
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
select(1024, [], NULL, NULL, {0, 0})    = 0 (Timeout)
ioctl(5, 0x541b, [0])                   = 0
ioctl(5, 0x541b, [0])                   = 0
gettimeofday({1061288133, 302532}, NULL) = 0
select(20, [3 5 6 7 9 10 12 13 19], [], [], {0, 0}) = 1 (in [13], left {0, 0})
gettimeofday({1061288133, 302779}, NULL) = 0
select(1024, [13], NULL, NULL, {0, 0})  = 1 (in [13], left {0, 0})
read(13, 0x8dc4b08, 512)                = -1 EIO (Input/output error)
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1703, 0xbfffe80c, WNOHANG, NULL)  = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1162, 0xbfffe80c, WNOHANG, NULL)  = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1160, 0xbfffe80c, WNOHANG, NULL)  = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1003, 0xbfffe80c, WNOHANG, NULL)  = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
select(1024, [], NULL, NULL, {0, 0})    = 0 (Timeout)
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
select(1024, [], NULL, NULL, {0, 0})    = 0 (Timeout)
ioctl(5, 0x541b, [0])                   = 0
ioctl(5, 0x541b, [0])                   = 0
gettimeofday({1061288133, 304385}, NULL) = 0
select(20, [3 5 6 7 9 10 12 13 19], [], [], {0, 0}) = 1 (in [13], left {0, 0})
gettimeofday({1061288133, 304605}, NULL) = 0
select(1024, [13], NULL, NULL, {0, 0})  = 1 (in [13], left {0, 0})
read(13, 0x8dc4b08, 512)                = -1 EIO (Input/output error)
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1703, 0xbfffe80c, WNOHANG, NULL)  = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1162, 0xbfffe80c, WNOHANG, NULL)  = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1160, 0xbfffe80c, WNOHANG, NULL)  = 0

This goes on for anything from half a second to several seconds.
During that time other processes, except X, are starved.

I saw this first with 2.6.0-test1 vanilla, then it went away in -test2
and -test3, only to show up again with O16.3int.  My O16.2 kernel
seems ok, which seems strange to me since the difference from O16.2 to
O16.3 is very small.

Any ideas?

-- 
Måns Rullgård
mru@users.sf.net


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-19 18:58 ` Måns Rullgård
@ 2003-08-20  1:19   ` Con Kolivas
  2003-08-20  8:53     ` Måns Rullgård
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2003-08-20  1:19 UTC (permalink / raw)
  To: Måns Rullgård, linux-kernel

On Wed, 20 Aug 2003 04:58, Måns Rullgård wrote:
> Another XEmacs strangeness:
>
> When compiling from xemacs, everything is fine until the compilation
> is done.  Then xemacs starts spinning wildly in some loop doing this:
> This goes on for anything from half a second to several seconds.
> During that time other processes, except X, are starved.
>
> I saw this first with 2.6.0-test1 vanilla, then it went away in -test2
> and -test3, only to show up again with O16.3int.  My O16.2 kernel
> seems ok, which seems strange to me since the difference from O16.2 to
> O16.3 is very small.

While being a small patch, 16.2-16.3 was a large change. It removed the very 
aggressive starvation avoidance of spin on wait waker/wakee in 16.2, which 
clearly helped your issue.

> Any ideas?

Pretty sure we have another spinner. A reniced -11 batch run of top -d 1 and 
vmstat during the spinning, and a kernel profile for that time will be 
helpful.

Con


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-19 16:39 ` Måns Rullgård
@ 2003-08-20  1:23   ` Con Kolivas
  2003-08-20  9:19     ` Måns Rullgård
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2003-08-20  1:23 UTC (permalink / raw)
  To: Måns Rullgård, linux-kernel

On Wed, 20 Aug 2003 02:39, Måns Rullgård wrote:
> Con Kolivas <kernel@kolivas.org> writes:
> > Food for the starving masses.
> >
> > This patch prevents any single task from being able to starve the
> > runqueue that it's on. This minimises the impact a poorly behaved
> > application or a malicious one has on the rest of the system. If an
>
> I have to disagree.  Open a file of a few hundred lines in XEmacs and
> do a regexp search for "^[> ]*-*\n\\([> ]*.*\n\\)*[> ]*foo".  The
> system will more or less freeze.  It's a very nasty regexp, and it's
> an error to try to use it, but it still shouldn't freeze the system.

Reasonably sure this is a variation on the starvation which O17 won't address. 
Please top/vmstat/profile this and I'll look into it. There is potential for 
starvation without using up full timeslices and this may be it.

Con


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-20  1:19   ` Con Kolivas
@ 2003-08-20  8:53     ` Måns Rullgård
  0 siblings, 0 replies; 27+ messages in thread
From: Måns Rullgård @ 2003-08-20  8:53 UTC (permalink / raw)
  To: linux-kernel

Con Kolivas <kernel@kolivas.org> writes:

>> When compiling from xemacs, everything is fine until the compilation
>> is done.  Then xemacs starts spinning wildly in some loop doing this:
>> This goes on for anything from half a second to several seconds.
>> During that time other processes, except X, are starved.
>>
>> I saw this first with 2.6.0-test1 vanilla, then it went away in -test2
>> and -test3, only to show up again with O16.3int.  My O16.2 kernel
>> seems ok, which seems strange to me since the difference from O16.2 to
>> O16.3 is very small.
>
> While being a small patch, 16.2-16.3 was a large change. It removed the very 
> aggressive starvation avoidance of spin on wait waker/wakee in 16.2, which 
> clearly helped your issue.

I have confirmed that this change introduced the problem.

>> Any ideas?
>
> Pretty sure we have another spinner. A reniced -11 batch run of top -d 1 and 
> vmstat during the spinning, and a kernel profile for that time will be 
> helpful.

I'll try to get that.

-- 
Måns Rullgård
mru@users.sf.net


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-20  1:23   ` Con Kolivas
@ 2003-08-20  9:19     ` Måns Rullgård
  0 siblings, 0 replies; 27+ messages in thread
From: Måns Rullgård @ 2003-08-20  9:19 UTC (permalink / raw)
  To: linux-kernel

Con Kolivas <kernel@kolivas.org> writes:

>> > This patch prevents any single task from being able to starve the
>> > runqueue that it's on. This minimises the impact a poorly behaved
>> > application or a malicious one has on the rest of the system. If an
>>
>> I have to disagree.  Open a file of a few hundred lines in XEmacs and
>> do a regexp search for "^[> ]*-*\n\\([> ]*.*\n\\)*[> ]*foo".  The
>> system will more or less freeze.  It's a very nasty regexp, and it's
>> an error to try to use it, but it still shouldn't freeze the system.
>
> Reasonably sure this is a variation on the starvation which O17
> won't address.

Well, obviously.

I tried O17 minus 016.2-O16.3, and I don't get the problems, even with
X at nice 0.  Does that tell you anything.

> Please top/vmstat/profile this and I'll look into it. There is
> potential for starvation without using up full timeslices and this
> may be it.

I'll do when I boot a kernel that has the problem.

-- 
Måns Rullgård
mru@users.sf.net


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-19 15:01 [PATCH] O17int Con Kolivas
  2003-08-19 16:39 ` Måns Rullgård
  2003-08-19 18:58 ` Måns Rullgård
@ 2003-08-20 16:27 ` Wiktor Wodecki
  2003-08-20 16:42   ` Nick Piggin
  2003-08-20 21:23   ` Con Kolivas
  2003-08-21  5:30 ` Apurva Mehta
  3 siblings, 2 replies; 27+ messages in thread
From: Wiktor Wodecki @ 2003-08-20 16:27 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

On Wed, Aug 20, 2003 at 01:01:28AM +1000, Con Kolivas wrote:
> Food for the starving masses.
snip

Sorry, but I still have the starving problem. the more I use O16/O17 the
more problems I encounter. xterms sometimes wake up after half a second
for another half a second and falls asleep for a whole second then.
after that, it's fully interactive. io-load seems to produce the
problem. a simple tar xf linux-2.6.0-test3.tar seems to halt the system.
new processes take ages to start. This also happend to me on O16.2.
Maybe it's because some AS patches are missing in vanilla but are in
2.6.0-test3-mm?

-- 
Regards,

Wiktor Wodecki

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-20 16:27 ` Wiktor Wodecki
@ 2003-08-20 16:42   ` Nick Piggin
  2003-08-20 21:23   ` Con Kolivas
  1 sibling, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2003-08-20 16:42 UTC (permalink / raw)
  To: Wiktor Wodecki; +Cc: Con Kolivas, linux kernel mailing list, Andrew Morton



Wiktor Wodecki wrote:

>On Wed, Aug 20, 2003 at 01:01:28AM +1000, Con Kolivas wrote:
>
>>Food for the starving masses.
>>
>snip
>
>Sorry, but I still have the starving problem. the more I use O16/O17 the
>more problems I encounter. xterms sometimes wake up after half a second
>for another half a second and falls asleep for a whole second then.
>after that, it's fully interactive. io-load seems to produce the
>problem. a simple tar xf linux-2.6.0-test3.tar seems to halt the system.
>new processes take ages to start. This also happend to me on O16.2.
>Maybe it's because some AS patches are missing in vanilla but are in
>2.6.0-test3-mm?
>

There are no AS patches missing in vanilla. I don't think there are..
none that would change that.

Its unlikely that its an IO problem because its unlikely that your tar
would be evicting anything that X or the xterm depend on. If the
machine is swapping at this time then try setting /proc/sys/vm/swappiness
to 0.

Oh, you could try my cpu scheduler patch a try if you're bored.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-20 16:27 ` Wiktor Wodecki
  2003-08-20 16:42   ` Nick Piggin
@ 2003-08-20 21:23   ` Con Kolivas
  2003-08-21  5:26     ` Con Kolivas
  1 sibling, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2003-08-20 21:23 UTC (permalink / raw)
  To: Wiktor Wodecki; +Cc: linux kernel mailing list, Andrew Morton

On Thu, 21 Aug 2003 02:27, Wiktor Wodecki wrote:
> On Wed, Aug 20, 2003 at 01:01:28AM +1000, Con Kolivas wrote:
> > Food for the starving masses.
>
> snip
>
> Sorry, but I still have the starving problem. the more I use O16/O17 the
> more problems I encounter. xterms sometimes wake up after half a second
> for another half a second and falls asleep for a whole second then.
> after that, it's fully interactive. io-load seems to produce the
> problem. a simple tar xf linux-2.6.0-test3.tar seems to halt the system.

I can't reproduce your problem here until I hit swap hard. You sure that's not 
happening? Kernel threads do get slight extra priority over regular threads 
by now which should help throughput under load. Perhaps this has tipped the 
balance on your machine when you hit swap pressure hard. Do a vmstat run 
while you can reproduce the problem.

> new processes take ages to start. This also happend to me on O16.2.

But not in 16.3?

> Maybe it's because some AS patches are missing in vanilla but are in
> 2.6.0-test3-mm?

As Nick said, no this isn't the case.

Con


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-20 21:23   ` Con Kolivas
@ 2003-08-21  5:26     ` Con Kolivas
  2003-08-21  7:53       ` Mike Galbraith
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2003-08-21  5:26 UTC (permalink / raw)
  To: Voluspa; +Cc: linux kernel mailing list, Andrew Morton

Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and 
recommending nothing on top of O16.3 yet.

More and more it just seems to be a bandaid to the priority inverting spin on 
waiters, and it does seem to be of detriment to general interacivity. I can 
now reproduce some loss of interactive feel with O17. 

Something specific for the spin on waiters is required that doesn't affect 
general performance. The fact that I can reproduce the same starvation in 
vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic problem 
that needs a specific solution.

Con


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-19 15:01 [PATCH] O17int Con Kolivas
                   ` (2 preceding siblings ...)
  2003-08-20 16:27 ` Wiktor Wodecki
@ 2003-08-21  5:30 ` Apurva Mehta
  3 siblings, 0 replies; 27+ messages in thread
From: Apurva Mehta @ 2003-08-21  5:30 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, Andrew Morton

* Con Kolivas <kernel@kolivas.org> [19-08-2003 21:26]:
> Food for the starving masses.
[snip]

Yesterday, I got a major lockup with this patch. The system just
completely froze for more than 10 seconds. There was no mouse
movement, I could not switch workspaces.. nothing..

There was quite a bit of load. There were about 180 processes (up from
the normal 40-50) due to a number of procmail/sendmail pairs. I run
spamassassin so each of those processes was testing for spam apart
from my other tests. I was also actively browsing the web and
playing music. Thats about all that was going on. 

Unfortunately I could not get the numbers. Once it passed, everything
was perfectly back to normal. I have not been able to reproduce the
freeze because I have not got enough emails at one go to reproduce
that kind of load. If will try to capture the numbers next time round.

	- Apurva

--
Engineers motto: cheap, good, fast: choose any two

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21  5:26     ` Con Kolivas
@ 2003-08-21  7:53       ` Mike Galbraith
  2003-08-21 11:46         ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2003-08-21  7:53 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Voluspa, linux kernel mailing list, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]

At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
>Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
>recommending nothing on top of O16.3 yet.
>
>More and more it just seems to be a bandaid to the priority inverting spin on
>waiters, and it does seem to be of detriment to general interacivity. I can
>now reproduce some loss of interactive feel with O17.
>
>Something specific for the spin on waiters is required that doesn't affect
>general performance. The fact that I can reproduce the same starvation in
>vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic problem
>that needs a specific solution.

I can see only one possible answer to this - never allow a normal task to 
hold the cpu for long stretches (define) while there are other tasks 
runnable.  (see attachment)

I think the _easiest_ fix for this particular starvation (without tossing 
baby out with bath water;) is to try to "down-shift" in schedule() when 
next == prev.  This you can do very cheaply with a find_next_bit().  That 
won't help the case where there are multiple tasks involved, but should fix 
the most common case for dirt cheap.  (another simple alternative is to 
globally "down-shift" periodically)

The most generally effective form of the "down-shift" anti-starvation 
tactic that I've tried, is to periodically check the head of all queues 
below current position (can do very quickly), and actively select the 
oldest task who hasn't run since some defined deadline.  Queues are 
serviced based upon priority most of the time, and based upon age some of 
the time.

Everything I've tried along these lines has a common upside: works, and 
downside: butt ugly :)

         -Mike 

[-- Attachment #2: strace-blender --]
[-- Type: application/octet-stream, Size: 3377 bytes --]

<SNIP>
1061445810.812454 socket(PF_UNIX, SOCK_STREAM, 0) = 3 <0.000073>
1061445810.812704 uname({sys="Linux", node="mikeg", ...}) = 0 <0.000020>
1061445810.813014 uname({sys="Linux", node="mikeg", ...}) = 0 <0.000019>
1061445810.813400 connect(3, {sin_family=AF_UNIX, path="/tmp/.X11-unix/X0"}, 19) = 0 <0.000372>
1061445810.813988 uname({sys="Linux", node="mikeg", ...}) = 0 <0.000021>
1061445810.814345 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 <0.000021>
1061445810.814600 access("/root/.Xauthority", R_OK) = 0 <0.000036>
1061445810.814804 open("/root/.Xauthority", O_RDONLY) = 4 <0.000057>
1061445810.815040 fstat64(4, {st_mode=S_IFREG|0600, st_size=103, ...}) = 0 <0.000022>
1061445810.815233 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x404cd000 <0.000031>
1061445810.815413 read(4, "\1\0\0\tel-kaboom\0\0010\0\22MIT-MAGIC-COOK"..., 4096) = 103 <0.000084>
1061445810.815723 read(4, "", 4096)     = 0 <0.000022>
1061445810.815909 close(4)              = 0 <0.000026>
<SNIP>
1061445810.824626 socket(PF_UNIX, SOCK_STREAM, 0) = 4 <0.000047>
1061445810.824789 uname({sys="Linux", node="mikeg", ...}) = 0 <0.000020>
1061445810.825095 uname({sys="Linux", node="mikeg", ...}) = 0 <0.000019>
1061445810.825392 connect(4, {sin_family=AF_UNIX, path="/tmp/.X11-unix/X0"}, 19) = 0 <0.000242>
<SNIP 40k lines>
1061445834.368068 select(4, [3], NULL, NULL, NULL) = 1 (in [3]) <0.000163>
1061445834.368469 read(3, "\1\0\354\"\0\0\0\0\0\0\0\0\1\0\0\0\0\27\0\0\0\0\0\0\0\0"..., 32) = 32 <0.000025>
1061445834.368698 write(3, "\221\v\3\0\1\0\0\0\2\0\300\1", 12) = 12 <0.000026>
1061445834.368876 writev(3, [{"\221\0012\0\1\0\0\0", 8}, {"\24\0\277\0\0\0\0\0\34\1\0\0\0\5\0\0\312\2\0\0\24\0g\0"..., 192}], 2) = 200 <0.000029>
1061445834.369152 write(3, "\221l\2\0\1\0\0\0", 8) = 8 <0.000026>
1061445834.369328 read(3, 0xbffff4c8, 32) = -1 EAGAIN (Resource temporarily unavailable) <0.000022>
1061445834.369462 select(4, [3], NULL, NULL, NULL) = 1 (in [3]) <0.032704>
1061445834.402553 read(3, "\1\0\357\"\0\0\0\0\0\0\0\0\1\0\0\0\0\27\0\0\0\0\0\0\0\0"..., 32) = 32 <0.000031>
1061445834.402773 gettimeofday({1061445834, 402818}, NULL) = 0 <0.000018>
1061445834.402913 ioctl(3, FIONREAD, [0]) = 0 <0.000021>
1061445834.403056 gettimeofday({1061445834, 403099}, NULL) = 0 <0.000018>
1061445834.403184 gettimeofday({1061445834, 403228}, NULL) = 0 <0.000018>
1061445834.403355 ioctl(3, FIONREAD, [0]) = 0 <0.000020>
1061445834.403489 gettimeofday({1061445834, 403532}, NULL) = 0 <0.000019>
1061445834.403617 gettimeofday({1061445834, 403661}, NULL) = 0 <0.000018>
1061445834.403745 ioctl(3, FIONREAD, [0]) = 0 <0.000019>
1061445834.403876 gettimeofday({1061445834, 403919}, NULL) = 0 <0.000018>
1061445834.404002 gettimeofday({1061445834, 404045}, NULL) = 0 <0.000018>
1061445834.404176 ioctl(3, FIONREAD, [0]) = 0 <0.000021>

<ZZZZZ... and after ~2 seconds uninterrupted thumb twiddling>

1061445836.306958 ioctl(3, FIONREAD, [32]) = 0 <0.001824>
1061445836.308935 read(3, "\6\0\357\"\23\36\32#?\0\0\0\2\0\300\1\0\0\0\0\0\0p\1\0"..., 32) = 32 <0.000028>
1061445836.309194 gettimeofday({1061445836, 309240}, NULL) = 0 <0.000019>
1061445836.309331 ioctl(3, FIONREAD, [0]) = 0 <0.000021>
1061445836.309496 write(3, "(l\4\0?\0\0\0\2\0\300\1\0\0p\1", 16) = 16 <0.000029>
1061445836.309672 read(3, 0xbffff2c4, 32) = -1 EAGAIN (Resource temporarily unavailable) <0.000022>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21  7:53       ` Mike Galbraith
@ 2003-08-21 11:46         ` Con Kolivas
  2003-08-21 15:14           ` Mike Galbraith
  0 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2003-08-21 11:46 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Voluspa, linux kernel mailing list, Andrew Morton

On Thu, 21 Aug 2003 17:53, Mike Galbraith wrote:
> At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
> >Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
> >recommending nothing on top of O16.3 yet.
> >
> >More and more it just seems to be a bandaid to the priority inverting spin
> > on waiters, and it does seem to be of detriment to general interacivity.
> > I can now reproduce some loss of interactive feel with O17.
> >
> >Something specific for the spin on waiters is required that doesn't affect
> >general performance. The fact that I can reproduce the same starvation in
> >vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic
> > problem that needs a specific solution.
>
> I can see only one possible answer to this - never allow a normal task to
> hold the cpu for long stretches (define) while there are other tasks
> runnable.  (see attachment)

I assume you mean the strace ? That was the only attachment, and it just looks 
like shiploads of schedule() from the get time of day. Yes?

>
> I think the _easiest_ fix for this particular starvation (without tossing
> baby out with bath water;) is to try to "down-shift" in schedule() when
> next == prev.  This you can do very cheaply with a find_next_bit().  That
> won't help the case where there are multiple tasks involved, but should fix
> the most common case for dirt cheap.  (another simple alternative is to
> globally "down-shift" periodically)

Err funny you should say that; that's what O17 did. But it hurt because it 
would never allow a task that used a full timeslice to be next==prev. The 
less I throttled that, the less effective the antistarvation was. However 
this is clearly a problem without using up full timeslices. I originally 
thought they weren't trying to schedule lots because of the drop in ctx 
during starvation but I forgot that rescheduling the same task doesnt count 
as a ctx. 

Also I recall that winegames got much better in O10 when everything was 
charged at least one jiffy (pre nanosecond timing) suggesting those that were 
waking up for minute amounts of time repeatedly were being penalised; thus 
taking out the possibility of the starving task staying high priority for 
long.

> The most generally effective form of the "down-shift" anti-starvation
> tactic that I've tried, is to periodically check the head of all queues
> below current position (can do very quickly), and actively select the
> oldest task who hasn't run since some defined deadline.  Queues are
> serviced based upon priority most of the time, and based upon age some of
> the time.

Hmm also sounds fudgy. 

> Everything I've tried along these lines has a common upside: works, and
> downside: butt ugly :)

Well if it works well who cares. My starvation throttler (O17) hurt 
interactivity though which is why I dropped it. 

I'm guessing I need to rewrite the O10 semantics to work with nanosecond 
timing instead. I'll have to give that a go.

Thanks for your comments Mike. Hopefully the munchkins can stack high enough 
;)

Con


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21 11:46         ` Con Kolivas
@ 2003-08-21 15:14           ` Mike Galbraith
  2003-08-21 22:18             ` Wes Janzen
  2003-08-21 23:59             ` Con Kolivas
  0 siblings, 2 replies; 27+ messages in thread
From: Mike Galbraith @ 2003-08-21 15:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Voluspa, linux kernel mailing list, Andrew Morton

At 09:46 PM 8/21/2003 +1000, Con Kolivas wrote:
>On Thu, 21 Aug 2003 17:53, Mike Galbraith wrote:
> > At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
> > >Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
> > >recommending nothing on top of O16.3 yet.
> > >
> > >More and more it just seems to be a bandaid to the priority inverting spin
> > > on waiters, and it does seem to be of detriment to general interacivity.
> > > I can now reproduce some loss of interactive feel with O17.
> > >
> > >Something specific for the spin on waiters is required that doesn't affect
> > >general performance. The fact that I can reproduce the same starvation in
> > >vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic
> > > problem that needs a specific solution.
> >
> > I can see only one possible answer to this - never allow a normal task to
> > hold the cpu for long stretches (define) while there are other tasks
> > runnable.  (see attachment)
>
>I assume you mean the strace ? That was the only attachment, and it just 
>looks
>like shiploads of schedule() from the get time of day. Yes?

(no, ~2 seconds of X being awol)

> > I think the _easiest_ fix for this particular starvation (without tossing
> > baby out with bath water;) is to try to "down-shift" in schedule() when
> > next == prev.  This you can do very cheaply with a find_next_bit().  That
> > won't help the case where there are multiple tasks involved, but should fix
> > the most common case for dirt cheap.  (another simple alternative is to
> > globally "down-shift" periodically)
>
>Err funny you should say that; that's what O17 did. But it hurt because it
>would never allow a task that used a full timeslice to be next==prev. The

If someone is marked for resched, it means we want to give someone else the 
cpu right?  In this case at least, re-selecting blender is not the right 
thing to do.  Looks like he's burning rubber... going nowhere fast.

>  less I throttled that, the less effective the antistarvation was. However
>this is clearly a problem without using up full timeslices. I originally
>thought they weren't trying to schedule lots because of the drop in ctx
>during starvation but I forgot that rescheduling the same task doesnt count
>as a ctx.

Hmm.  In what way did it hurt interactivity?  I know that if you pass the 
cpu off to non-gui task who's going to use his full 100 ms slice, you'll 
definitely feel it.  (made workaround, will spare delicate tummies;)  If 
you mean that, say X releases the cpu and has only a couple of ms left on 
it's slice and is alone in it's queue, that preempting it at the end of 
it's slice after having only had the cpu for such a short time after wakeup 
hurts, you can qualify the preempt decision with a cpu possession time check.

>  Also I recall that winegames got much better in O10 when everything was
>charged at least one jiffy (pre nanosecond timing) suggesting those that were
>waking up for minute amounts of time repeatedly were being penalised; thus
>taking out the possibility of the starving task staying high priority for
>long.

(unsure what you mean here)

> > The most generally effective form of the "down-shift" anti-starvation
> > tactic that I've tried, is to periodically check the head of all queues
> > below current position (can do very quickly), and actively select the
> > oldest task who hasn't run since some defined deadline.  Queues are
> > serviced based upon priority most of the time, and based upon age some of
> > the time.
>
>Hmm also sounds fudgy.

Yeah.  I crossbred it with a ~deadline scheduler, and created a mutt.

         -Mike (2'6"") 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21 15:14           ` Mike Galbraith
@ 2003-08-21 22:18             ` Wes Janzen
  2003-08-22  0:09               ` Con Kolivas
                                 ` (2 more replies)
  2003-08-21 23:59             ` Con Kolivas
  1 sibling, 3 replies; 27+ messages in thread
From: Wes Janzen @ 2003-08-21 22:18 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Con Kolivas, Voluspa, linux kernel mailing list, Andrew Morton



Mike Galbraith wrote:

>>  less I throttled that, the less effective the antistarvation was. 
>> However
>> this is clearly a problem without using up full timeslices. I originally
>> thought they weren't trying to schedule lots because of the drop in ctx
>> during starvation but I forgot that rescheduling the same task doesnt 
>> count
>> as a ctx.
>
>
> Hmm.  In what way did it hurt interactivity?  I know that if you pass 
> the cpu off to non-gui task who's going to use his full 100 ms slice, 
> you'll definitely feel it.  (made workaround, will spare delicate 
> tummies;)  If you mean that, say X releases the cpu and has only a 
> couple of ms left on it's slice and is alone in it's queue, that 
> preempting it at the end of it's slice after having only had the cpu 
> for such a short time after wakeup hurts, you can qualify the preempt 
> decision with a cpu possession time check.

I wish I could get mm3 running so I could evaluate those interactivity 
statements.  I can't imagine it being worse than what I'm experiencing now:

9  0  63968  21992  19672 319056    0    0     0    46 1231   289 87 13  
0  0
 5  0  63968  21096  19704 320020    0    0     0    26 1202   300 86 
14  0  0
14  0  63968  21104  19704 320020    0    0     0     0 1578   385 86 
14  0  0
 8  0  63968  20448  20152 320048    0    0     7    41 1189   294 91  
9  0  0
15  0  63968  19552  20132 321052    0    0     6    76 5926  1330 87 
13  0  0
13  0  63968  18992  20132 321664    0    0     0    31 5464  1266 87 
13  0  0
11  0  63968  18676  20100 321976    0    0     0    65 4613  1008 87 
13  0  0
 8  0  63968  18044  20100 322660    0    0     0    14 4643  1132 87 
13  0  0
16  0  63968  17660  20100 323024    0    0     0    20 5273  1272 87 
13  0  0
14  0  63968  12092  20060 328596    0    0     0    24 5110  1221 87 
13  0  0
14  0  63968  11644  20052 329052    0    0     0    68 5264  1285 87 
13  0  0
13  0  63968  11324  19924 329560    0    0     0    34 4473  1125 87 
13  0  0
16  0  63968  10740  19964 329880    0    0     1    74 1494   445 90 
10  0  0
procs -----------memory---------- ---swap-- -----io---- --system-- 
----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy 
id wa
22  0  63968  10452  19968 330300    0    0     1    51 4584  1059 88 
12  0  0
14  0  63968  10004  19940 330736    0    0     0    90 6567  1547 87 
13  0  0
15  0  63968   9188  20048 331472    5    0    10   106 5176  1289 87 
13  0  0
13  0  63968  13932  20016 326824    6    0     6    26 6026  1318 87 
13  0  0
12  0  63968  13412  19984 327324    0    0     0    24 4683  1083 87 
13  0  0
13  1  63968  12700  20012 327956    0    0    16    46 5673  1313 87 
13  0  0
12  0  63968  12252  20084 328296    0    0     6    78 3499   780 88 
12  0  0
18  0  63968  11468  20060 329016    0    0     0    35 5456  1231 87 
13  0  0
13  0  63968  12052  20096 328580    0    0     3    96 4430   988 87 
13  0  0
13  0  63968   5012  19144 336532    0    0     0    62 5672  1225 88 
12  0  0
16  0  63964   6624  17976 336088    0    0     0   297 4808  1041 88 
12  0  0
11  0  63964   5616  17964 337124    0    0     0    69 5219  1094 89 
11  0  0
16  1  63964   6732  15688 338276   13    0    20   155 4984  1156 87 
13  0  0
 4  1  63964   7772  12544 340436   13    0    20    11 1070   201 90 
10  0  0
 3  1  63964   4364  11560 344840    0    0    86   338 1038   250 92  
8  0  0


That would be compiling the kernel, bunzipping a file, and some cron 
mandb thing that was running gzip in the background niced.  Plus X and 
Mozilla, which probably starts the problem.  At the end there, you see 
things calm down.  That's also the way it starts out, then something 
sets off the "priority inversion" and the machine becomes completely 
worthless.  Even the task that are running aren't really accomplishing 
anything.  So the load goes from around 4/5 into the teens and the 
context switching makes a corresponding jump.  And then both 
interactivity AND throughput fall through the floor.

I can't imagine any interactivity regressions that are worse than this 
behavior...

And this happens with just X and Mozilla running.  It happens less often 
without X running, but still happens.  Even if I'm at a VT, it could 
take 5-6 seconds for my text to appear after I type.  This happens all 
the time, about once every few minutes and correlates with a 
simultaneous increase in context switches and load.

>
>>  Also I recall that winegames got much better in O10 when everything was
>> charged at least one jiffy (pre nanosecond timing) suggesting those 
>> that were
>> waking up for minute amounts of time repeatedly were being penalised; 
>> thus
>> taking out the possibility of the starving task staying high priority 
>> for
>> long.
>
>
> (unsure what you mean here) 

Can you set a cutoff point where if the process uses less that X percent 
of the max timeslice, it is penalized?  I don't know if it's possible  
to do a loop of empty spins at some point and time it to find out what 
the cut-off point should be...otherwise I imagine it would need to be 
tuned for every processor speed.  Could you use the bogomips to gauge 
the speed of the machine and use that to determine the min timeslice?  
 From what I understand above, that would perhaps be more selective than 
just penalizing every process and have a positive affect on 
everything...of course I'm open to the possibility that I have it all 
wrong ;-)

Wes



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21 15:14           ` Mike Galbraith
  2003-08-21 22:18             ` Wes Janzen
@ 2003-08-21 23:59             ` Con Kolivas
  2003-08-22  5:11               ` Mike Galbraith
  1 sibling, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2003-08-21 23:59 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Voluspa, linux kernel mailing list, Andrew Morton

Quoting Mike Galbraith <efault@gmx.de>:

> At 09:46 PM 8/21/2003 +1000, Con Kolivas wrote:
> >On Thu, 21 Aug 2003 17:53, Mike Galbraith wrote:
> > > At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
> > > >Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
> > > >recommending nothing on top of O16.3 yet.
> > > >
> > > >More and more it just seems to be a bandaid to the priority inverting
> spin
> > > > on waiters, and it does seem to be of detriment to general
> interacivity.
> > > > I can now reproduce some loss of interactive feel with O17.
> > > >
> > > >Something specific for the spin on waiters is required that doesn't
> affect
> > > >general performance. The fact that I can reproduce the same starvation
> in
> > > >vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic
> > > > problem that needs a specific solution.
> > >
> > > I can see only one possible answer to this - never allow a normal task
> to
> > > hold the cpu for long stretches (define) while there are other tasks
> > > runnable.  (see attachment)
> >
> >I assume you mean the strace ? That was the only attachment, and it just 
> >looks
> >like shiploads of schedule() from the get time of day. Yes?
> 
> (no, ~2 seconds of X being awol)
> 
> > > I think the _easiest_ fix for this particular starvation (without
> tossing
> > > baby out with bath water;) is to try to "down-shift" in schedule() when
> > > next == prev.  This you can do very cheaply with a find_next_bit(). 
> That
> > > won't help the case where there are multiple tasks involved, but should
> fix
> > > the most common case for dirt cheap.  (another simple alternative is to
> > > globally "down-shift" periodically)
> >
> >Err funny you should say that; that's what O17 did. But it hurt because it
> >would never allow a task that used a full timeslice to be next==prev. The
> 
> If someone is marked for resched, it means we want to give someone else the 
> cpu right?  In this case at least, re-selecting blender is not the right 
> thing to do.  Looks like he's burning rubber... going nowhere fast.
> 
> >  less I throttled that, the less effective the antistarvation was. However
> >this is clearly a problem without using up full timeslices. I originally
> >thought they weren't trying to schedule lots because of the drop in ctx
> >during starvation but I forgot that rescheduling the same task doesnt count
> >as a ctx.
> 
> Hmm.  In what way did it hurt interactivity?  I know that if you pass the 
> cpu off to non-gui task who's going to use his full 100 ms slice, you'll 
> definitely feel it.  (made workaround, will spare delicate tummies;)  If 

Well I made it so that if full timeslice is used, and then the same task is 
next==prev, let next candidate for scheduling get up to TIMESLICE_GRANULARITY 
timeslice; not a full task timeslice. Even this was palpable as losing the X 
waving around smoothness with anything else burning (kernel compile, bunzip or 
whatever). Basically X is somewhere between a frequent sleeper with short 
periods of cpu time, or a burst of full timeslices when used heavily. When it's 
full timeslices, it suffers under the same things that throttle spinners.

> you mean that, say X releases the cpu and has only a couple of ms left on 
> it's slice and is alone in it's queue, that preempting it at the end of 
> it's slice after having only had the cpu for such a short time after wakeup 
> hurts, you can qualify the preempt decision with a cpu possession time
> check.

I was letting it run out the full timeslice unabated, and only preventing it 
from getting immediately rescheduled.

> 
> >  Also I recall that winegames got much better in O10 when everything was
> >charged at least one jiffy (pre nanosecond timing) suggesting those that
> were
> >waking up for minute amounts of time repeatedly were being penalised; thus
> >taking out the possibility of the starving task staying high priority for
> >long.
> 
> (unsure what you mean here)

Ok I think blender (we say blenndah in .au) is waking up, polling for X, then 
going back to sleep only to do it all over again. I don't think it's wasting a 
full timeslice at all. There seem to be two variants of this spin on wait; one 
is the task that uses a full timeslice spinning (wine games), and the other is 
waking up, rescheduling and going back to sleep only to do it all over again. 

> 
> > > The most generally effective form of the "down-shift" anti-starvation
> > > tactic that I've tried, is to periodically check the head of all queues
> > > below current position (can do very quickly), and actively select the
> > > oldest task who hasn't run since some defined deadline.  Queues are
> > > serviced based upon priority most of the time, and based upon age some
> of
> > > the time.
> >
> >Hmm also sounds fudgy.
> 
> Yeah.  I crossbred it with a ~deadline scheduler, and created a mutt.

But how did this mutt perform?

>          -Mike (2'6"") 

+ another 2'6"" doesn't quite get us tall enough. Maybe we need another 
munchkin.

Con

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21 22:18             ` Wes Janzen
@ 2003-08-22  0:09               ` Con Kolivas
  2003-08-22 21:17                 ` Wes Janzen
  2003-08-22  0:42               ` Felipe Alfaro Solana
  2003-08-22  5:34               ` Mike Galbraith
  2 siblings, 1 reply; 27+ messages in thread
From: Con Kolivas @ 2003-08-22  0:09 UTC (permalink / raw)
  To: Wes Janzen
  Cc: Mike Galbraith, Voluspa, linux kernel mailing list, Andrew Morton

Quoting Wes Janzen <superchkn@sbcglobal.net>:

> I wish I could get mm3 running so I could evaluate those interactivity 
> statements.  I can't imagine it being worse than what I'm experiencing now:

Umm. You didn't mention which kernel/patch? I seem to recall you were using 
Osomething but which?

> That would be compiling the kernel, bunzipping a file, and some cron 
> mandb thing that was running gzip in the background niced.  Plus X and 
> Mozilla, which probably starts the problem.  At the end there, you see 
> things calm down.  That's also the way it starts out, then something 
> sets off the "priority inversion" and the machine becomes completely 
> worthless.  Even the task that are running aren't really accomplishing 
> anything.  So the load goes from around 4/5 into the teens and the 
> context switching makes a corresponding jump.  And then both 
> interactivity AND throughput fall through the floor.
> 
> I can't imagine any interactivity regressions that are worse than this 
> behavior...

If this is Osomething, can you tell me when it started doing this and what 
vanilla does.

> And this happens with just X and Mozilla running.  It happens less often 
> without X running, but still happens.  Even if I'm at a VT, it could 
> take 5-6 seconds for my text to appear after I type.  This happens all 
> the time, about once every few minutes and correlates with a 
> simultaneous increase in context switches and load.

Ditto

> Can you set a cutoff point where if the process uses less that X percent 
> of the max timeslice, it is penalized?  I don't know if it's possible  
> to do a loop of empty spins at some point and time it to find out what 
> the cut-off point should be...otherwise I imagine it would need to be 
> tuned for every processor speed.  Could you use the bogomips to gauge 
> the speed of the machine and use that to determine the min timeslice?  
>  From what I understand above, that would perhaps be more selective than 
> just penalizing every process and have a positive affect on 
> everything...of course I'm open to the possibility that I have it all 
> wrong ;-)

A thought, but unfortunately some arbitrary limit would just be hit by 
everything randomly rather than with any meaningful frequency. The timeslice  
interactive things use up is extremely variable depending on circumstances. 

Con

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21 22:18             ` Wes Janzen
  2003-08-22  0:09               ` Con Kolivas
@ 2003-08-22  0:42               ` Felipe Alfaro Solana
  2003-08-22  5:34               ` Mike Galbraith
  2 siblings, 0 replies; 27+ messages in thread
From: Felipe Alfaro Solana @ 2003-08-22  0:42 UTC (permalink / raw)
  To: Wes Janzen; +Cc: linux kernel mailing list

On Fri, 2003-08-22 at 00:18, Wes Janzen wrote:

> I wish I could get mm3 running so I could evaluate those interactivity 
> statements.  I can't imagine it being worse than what I'm experiencing now:

If -mm3 doesn't work, then you can download patches against vanilla
2.6.0-test3 from
http://members.optusnet.com.au/ckolivas/kernel/2.5/2.6.0-test3/patch-test3-O16.3int


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21 23:59             ` Con Kolivas
@ 2003-08-22  5:11               ` Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Galbraith @ 2003-08-22  5:11 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Voluspa, linux kernel mailing list, Andrew Morton

At 09:59 AM 8/22/2003 +1000, Con Kolivas wrote:
>Quoting Mike Galbraith <efault@gmx.de>:
>
> > > > The most generally effective form of the "down-shift" anti-starvation
> > > > tactic that I've tried, is to periodically check the head of all queues
> > > > below current position (can do very quickly), and actively select the
> > > > oldest task who hasn't run since some defined deadline.  Queues are
> > > > serviced based upon priority most of the time, and based upon age some
> > of
> > > > the time.
> > >
> > >Hmm also sounds fudgy.
> >
> > Yeah.  I crossbred it with a ~deadline scheduler, and created a mutt.
>
>But how did this mutt perform?

At the time, I was more concerned by the very long semaphore hold times I 
was seeing than anything else.  That it helped quite a lot.  It didn't hurt 
throughput in any way I could see, and it improved irman's latency 
numbers.  (process load was routinely hitting ~1.5 seconds max latency at 
the time, that tree cut it to roughly 400-500ms iirc)  Just like anything 
else that increases fairness though, it hurt X feel somewhat in the 
presence of load.

         -Mike 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-21 22:18             ` Wes Janzen
  2003-08-22  0:09               ` Con Kolivas
  2003-08-22  0:42               ` Felipe Alfaro Solana
@ 2003-08-22  5:34               ` Mike Galbraith
  2003-08-22 20:48                 ` Wes Janzen
  2 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2003-08-22  5:34 UTC (permalink / raw)
  To: Wes Janzen; +Cc: Con Kolivas, Voluspa, linux kernel mailing list, Andrew Morton

At 05:18 PM 8/21/2003 -0500, Wes Janzen wrote:


>Mike Galbraith wrote:
>
>>>  less I throttled that, the less effective the antistarvation was. However
>>>this is clearly a problem without using up full timeslices. I originally
>>>thought they weren't trying to schedule lots because of the drop in ctx
>>>during starvation but I forgot that rescheduling the same task doesnt count
>>>as a ctx.
>>
>>
>>Hmm.  In what way did it hurt interactivity?  I know that if you pass the 
>>cpu off to non-gui task who's going to use his full 100 ms slice, you'll 
>>definitely feel it.  (made workaround, will spare delicate tummies;)  If 
>>you mean that, say X releases the cpu and has only a couple of ms left on 
>>it's slice and is alone in it's queue, that preempting it at the end of 
>>it's slice after having only had the cpu for such a short time after 
>>wakeup hurts, you can qualify the preempt decision with a cpu possession 
>>time check.
>
>I wish I could get mm3 running so I could evaluate those interactivity 
>statements.  I can't imagine it being worse than what I'm experiencing now:
>
>9  0  63968  21992  19672 319056    0    0     0    46 1231   289 87 13
>0  0
>5  0  63968  21096  19704 320020    0    0     0    26 1202   300 86 14  0  0
>14  0  63968  21104  19704 320020    0    0     0     0 1578   385 86 14  0  0
>8  0  63968  20448  20152 320048    0    0     7    41 1189   294 91
>9  0  0
>15  0  63968  19552  20132 321052    0    0     6    76 5926  1330 87 13  0  0
>13  0  63968  18992  20132 321664    0    0     0    31 5464  1266 87 13  0  0
>11  0  63968  18676  20100 321976    0    0     0    65 4613  1008 87 13  0  0
>8  0  63968  18044  20100 322660    0    0     0    14 4643  1132 87 13  0  0
>16  0  63968  17660  20100 323024    0    0     0    20 5273  1272 87 13  0  0
>14  0  63968  12092  20060 328596    0    0     0    24 5110  1221 87 13  0  0
>14  0  63968  11644  20052 329052    0    0     0    68 5264  1285 87 13  0  0
>13  0  63968  11324  19924 329560    0    0     0    34 4473  1125 87 13  0  0
>16  0  63968  10740  19964 329880    0    0     1    74 1494   445 90 10  0  0
>procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa
>22  0  63968  10452  19968 330300    0    0     1    51 4584  1059 88 12  0  0
>14  0  63968  10004  19940 330736    0    0     0    90 6567  1547 87 13  0  0
>15  0  63968   9188  20048 331472    5    0    10   106 5176  1289 87 13  0  0
>13  0  63968  13932  20016 326824    6    0     6    26 6026  1318 87 13  0  0
>12  0  63968  13412  19984 327324    0    0     0    24 4683  1083 87 13  0  0
>13  1  63968  12700  20012 327956    0    0    16    46 5673  1313 87 13  0  0
>12  0  63968  12252  20084 328296    0    0     6    78 3499   780 88 12  0  0
>18  0  63968  11468  20060 329016    0    0     0    35 5456  1231 87 13  0  0
>13  0  63968  12052  20096 328580    0    0     3    96 4430   988 87 13  0  0
>13  0  63968   5012  19144 336532    0    0     0    62 5672  1225 88 12  0  0
>16  0  63964   6624  17976 336088    0    0     0   297 4808  1041 88 12  0  0
>11  0  63964   5616  17964 337124    0    0     0    69 5219  1094 89 11  0  0
>16  1  63964   6732  15688 338276   13    0    20   155 4984  1156 87 13  0  0
>4  1  63964   7772  12544 340436   13    0    20    11 1070   201 90 10  0  0
>3  1  63964   4364  11560 344840    0    0    86   338 1038   250 92
>8  0  0
>
>
>That would be compiling the kernel, bunzipping a file, and some cron mandb 
>thing that was running gzip in the background niced.  Plus X and Mozilla, 
>which probably starts the problem.  At the end there, you see things calm 
>down.  That's also the way it starts out, then something sets off the 
>"priority inversion" and the machine becomes completely worthless.  Even 
>the task that are running aren't really accomplishing anything.  So the 
>load goes from around 4/5 into the teens and the context switching makes a 
>corresponding jump.  And then both interactivity AND throughput fall 
>through the floor.
>
>I can't imagine any interactivity regressions that are worse than this 
>behavior...
>
>And this happens with just X and Mozilla running.  It happens less often 
>without X running, but still happens.  Even if I'm at a VT, it could take 
>5-6 seconds for my text to appear after I type.  This happens all the 
>time, about once every few minutes and correlates with a simultaneous 
>increase in context switches and load.

Those high interrupt counts are all stalls?  What kernel is that?



>>>  Also I recall that winegames got much better in O10 when everything was
>>>charged at least one jiffy (pre nanosecond timing) suggesting those that 
>>>were
>>>waking up for minute amounts of time repeatedly were being penalised; thus
>>>taking out the possibility of the starving task staying high priority for
>>>long.
>>
>>
>>(unsure what you mean here)
>
>Can you set a cutoff point where if the process uses less that X percent 
>of the max timeslice, it is penalized?  I don't know if it's possible
>to do a loop of empty spins at some point and time it to find out what the 
>cut-off point should be...otherwise I imagine it would need to be tuned 
>for every processor speed.  Could you use the bogomips to gauge the speed 
>of the machine and use that to determine the min timeslice?
> From what I understand above, that would perhaps be more selective than 
> just penalizing every process and have a positive affect on 
> everything...of course I'm open to the possibility that I have it all wrong ;-)
>
>Wes
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-22  5:34               ` Mike Galbraith
@ 2003-08-22 20:48                 ` Wes Janzen
  0 siblings, 0 replies; 27+ messages in thread
From: Wes Janzen @ 2003-08-22 20:48 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Con Kolivas, Voluspa, linux kernel mailing list, Andrew Morton

Yep, those are very bad stalls, much worse than the normal temporary 
stalls when something is running in the background.  I think part of 
that can be attributed to mozilla, which likes to do very little for a 
while, then jump up to 5% for a little bit and then up to 35% for 5 
seconds or so. 
With something running in the background, things get very bad.  Those 
vmstat were supposed to be reported at 5 second intervals, but they were 
not being reported at the rate during the problem.  Those represent 
about 5 minutes of stalling.  I could live with a short stall, but 5 
minutes where the computer barely takes input is crazy.  X becomes 
totally unresponsive to the point I cannot switch to a VT. 

I started a shutdown one time with the acipd daemon watching for power 
button events.  It took 1 hour 30 minutes from the time it said that it 
was shutting down (I could hear the beep from the shutdown process) to 
the point I got to "Stopping at daemon" which is barely into the 
shutdown cycle.  Even then I waited another 10 minutes for it to 
complete the shutdown and it never did.  All I was doing there was 
compiling in a gnome-terminal, and had just clicked on a bug-buddy 
window to get it to show debugging information.  I couldn't get to a VT 
so I wasn't able to get a vmstat log of that one. 

Even if this is due to a bad interaction between a program and X, it 
shouldn't be able to bring the system to its knees.

Kernel version:
2.6.0-test3-mm2 + O16.3int



Mike Galbraith wrote:

>
> Those high interrupt counts are all stalls?  What kernel is that?
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-22  0:09               ` Con Kolivas
@ 2003-08-22 21:17                 ` Wes Janzen
  0 siblings, 0 replies; 27+ messages in thread
From: Wes Janzen @ 2003-08-22 21:17 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Mike Galbraith, Voluspa, linux kernel mailing list, Andrew Morton

My last reply didn't make it to the list due to an html attachment.  
Mozilla is supposed to strip those but apparently does not sometimes...

Con Kolivas wrote:

>Umm. You didn't mention which kernel/patch? I seem to recall you were using 
>Osomething but which?
>  
>
For the record, that's 2.6.0-test3-mm2 + O16.3int

>
>If this is Osomething, can you tell me when it started doing this and what 
>vanilla does.
>  
>
Again, for posterity, this started with mm2, mm1 is similar to vanilla 
except it has better interactivity.  I've been running vanilla overnight 
and today but haven't found any cases where it stalls like mm2 does.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
@ 2003-08-20 22:51 Voluspa
  0 siblings, 0 replies; 27+ messages in thread
From: Voluspa @ 2003-08-20 22:51 UTC (permalink / raw)
  To: linux-kernel


I'm doing a backup now, basically a "cp -a" of all relevant directories
on the system, and smoothness _has_ taken a penalty by the starvation
work. Scrolling a picturefilled webpage (eg cnn.com) in Opera 6.12 jerks
with each disk write, or thereabouts. Load is ca 2.5 to 3. Doing a "dd"
from /dev/zero to a large file doesn't affect anything since it's a long
steady write.

Ah well :-)
Mats Johannesson

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-20  8:00 ` Mike Galbraith
@ 2003-08-20 11:21   ` Con Kolivas
  0 siblings, 0 replies; 27+ messages in thread
From: Con Kolivas @ 2003-08-20 11:21 UTC (permalink / raw)
  To: Mike Galbraith, Voluspa; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 20 Aug 2003 18:00, Mike Galbraith wrote:
> At 06:55 AM 8/20/2003 +0200, Voluspa wrote:
> >Blender 2.28 can not starve xmms one iota. Within blender itself, I can
> >cause 1 to 5 second freezes while doing a slow "world rotate", but that
> >is something the application programmers have to fix.
>
> I'm not so sure that it's an application bug.  With Nick's patch, I cannot
> trigger any delay what so ever, whereas with stock, or with Ingo's changes
> [as well as my own, damn the bad luck] I can.  I'm not saying it's _not_ a
> bug mind you, but color me suspicious ;-)

/me giggles like a 12 year old girl (teehee)

Try an earlier version of blender and you'll see it goes away. Other ones to 
try are opening a gpg signed mail (like this mail) in kmail. The slower the 
sleep avg decay in any tree the longer the spin. Nick's changes I believe 
cover up the flaw. I'm not discounting Nick's work, but I do believe the apps 
are broken and it's only the current scheduler design that makes it visible. 
I would also like to make it impossible for priority inversion to happen but 
at the moment I've just got to make sure they dont starve anything but their 
dependent cpu hog. Still looking for some useful task dependency tracking.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/Q1nQZUg7+tp6mRURAgmIAJ0f6jGLZFjjguhYv+MGEz5S1DuMCwCeO+Id
ii0V4YOXlL9bB7wJ6rn8QEo=
=PygI
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
  2003-08-20  4:55 Voluspa
@ 2003-08-20  8:00 ` Mike Galbraith
  2003-08-20 11:21   ` Con Kolivas
  0 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2003-08-20  8:00 UTC (permalink / raw)
  To: Voluspa; +Cc: linux-kernel

At 06:55 AM 8/20/2003 +0200, Voluspa wrote:

>Blender 2.28 can not starve xmms one iota. Within blender itself, I can
>cause 1 to 5 second freezes while doing a slow "world rotate", but that
>is something the application programmers have to fix.

I'm not so sure that it's an application bug.  With Nick's patch, I cannot 
trigger any delay what so ever, whereas with stock, or with Ingo's changes 
[as well as my own, damn the bad luck] I can.  I'm not saying it's _not_ a 
bug mind you, but color me suspicious ;-)

         -Mike 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] O17int
@ 2003-08-20  4:55 Voluspa
  2003-08-20  8:00 ` Mike Galbraith
  0 siblings, 1 reply; 27+ messages in thread
From: Voluspa @ 2003-08-20  4:55 UTC (permalink / raw)
  To: linux-kernel


On 2003-08-19 15:01:28 Con Kolivas wrote:

> Food for the starving masses.
>
> This patch prevents any single task from being able to starve the
> runqueue that it's on.

This is completely tru in my two test scenarios. Winex 3.1 running
"Baldurs Gate I" is as smooth as -O10 or -A3 and it is impossible to
trigger a permanent starvation. Switching to a text console and back, I
only hear brief, ca 1.5 seconds, "sound repeats" after which the game
recovers totally. Clicking the "Software standard BLT[on/off]", that
even triggered a short priority inversion in A3, has no impact at all.
Playability, for those who wonder, is an impressive 9 out of 10 ;-)

Blender 2.28 can not starve xmms one iota. Within blender itself, I can
cause 1 to 5 second freezes while doing a slow "world rotate", but that
is something the application programmers have to fix.

I see no throughput regression, and overall "feel" of the system is
great. A real keeper, this one.

Mvh
Mats Johannesson

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2003-08-22 21:15 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-19 15:01 [PATCH] O17int Con Kolivas
2003-08-19 16:39 ` Måns Rullgård
2003-08-20  1:23   ` Con Kolivas
2003-08-20  9:19     ` Måns Rullgård
2003-08-19 18:58 ` Måns Rullgård
2003-08-20  1:19   ` Con Kolivas
2003-08-20  8:53     ` Måns Rullgård
2003-08-20 16:27 ` Wiktor Wodecki
2003-08-20 16:42   ` Nick Piggin
2003-08-20 21:23   ` Con Kolivas
2003-08-21  5:26     ` Con Kolivas
2003-08-21  7:53       ` Mike Galbraith
2003-08-21 11:46         ` Con Kolivas
2003-08-21 15:14           ` Mike Galbraith
2003-08-21 22:18             ` Wes Janzen
2003-08-22  0:09               ` Con Kolivas
2003-08-22 21:17                 ` Wes Janzen
2003-08-22  0:42               ` Felipe Alfaro Solana
2003-08-22  5:34               ` Mike Galbraith
2003-08-22 20:48                 ` Wes Janzen
2003-08-21 23:59             ` Con Kolivas
2003-08-22  5:11               ` Mike Galbraith
2003-08-21  5:30 ` Apurva Mehta
2003-08-20  4:55 Voluspa
2003-08-20  8:00 ` Mike Galbraith
2003-08-20 11:21   ` Con Kolivas
2003-08-20 22:51 Voluspa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).