All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] 2.6.1 Hyperthread smart "nice"
@ 2004-01-29  8:17 Con Kolivas
  2004-01-29  9:39 ` Jos Hulzink
  0 siblings, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-01-29  8:17 UTC (permalink / raw)
  To: linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

Hi all

Not pushing this for mainline because this is not the "right way" to do it, 
but it works here and now for those who have P4HT processors. 

A while back we had an lkml thread about the problem of running low priority 
tasks on hyperthread enabled cpus in SMP mode. Brief summary: If you run a 
P4HT in uniprocessor mode and run a cpu intensive task at nice +20 (like 
setiathome), the most cpu it will get during periods of heavy usage is about 
8%. If you boot a P4HT in SMP mode and run a cpu intensive task at nice +20 
then if you run a task even at nice -20 concurrently, the nice +20 task will 
get 50% of the cpu time even though you have a very high priority task. So 
ironically booting in SMP mode makes your machine slower for running 
background tasks.

This patch (together with the ht base patch) will not allow a priority >10 
difference to run concurrently on both siblings, instead putting the low 
priority one to sleep. Overall if you run concurrent nice 0 and nice 20 tasks 
with this patch your cpu throughput will drop during heavy periods by up to 
10% (the hyperthread benefit), but your nice 0 task will run about 90% 
faster. It has no effect if you don't run any tasks at different "nice" 
levels. It does not modify real time tasks or kernel threads, and will allow 
niced tasks to run while a high priority kernel thread is running on the 
sibling cpu.

http://ck.kolivas.org/patches/2.6/2.6.1/experimental/
There are other patches that go with it which is why these have slight offsets 
but should work ok.

Con

[-- Attachment #2: patch-2.6.1.O21-htbase1 --]
[-- Type: text/x-diff, Size: 902 bytes --]

--- linux-2.6.1/kernel/sched.c	2004-01-27 16:28:49.295067104 +1100
+++ linux-2.6.1-ck1/kernel/sched.c	2004-01-27 16:29:12.683511520 +1100
@@ -208,6 +208,7 @@ struct runqueue {
 	atomic_t *node_nr_running;
 	int prev_node_load[MAX_NUMNODES];
 #endif
+	unsigned long cpu;
 	task_t *migration_thread;
 	struct list_head migration_queue;
 
@@ -221,6 +222,10 @@ static DEFINE_PER_CPU(struct runqueue, r
 #define task_rq(p)		cpu_rq(task_cpu(p))
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 
+#define ht_active		(cpu_has_ht && smp_num_siblings > 1)
+#define ht_siblings(cpu1, cpu2)	(ht_active && \
+	cpu_sibling_map[(cpu1)] == (cpu2))
+
 /*
  * Default context-switch locking:
  */
@@ -2814,6 +2819,7 @@ void __init sched_init(void)
 		prio_array_t *array;
 
 		rq = cpu_rq(i);
+		rq->cpu = (unsigned long)(i);
 		rq->active = rq->arrays;
 		rq->expired = rq->arrays + 1;
 		rq->best_expired_prio = MAX_PRIO;

[-- Attachment #3: patch-2.6.1.httweak1-htnice1 --]
[-- Type: text/x-diff, Size: 1939 bytes --]

--- linux-2.6.1/kernel/sched.c	2004-01-27 16:34:48.582447120 +1100
+++ linux-2.6.1-ck1/kernel/sched.c	2004-01-27 16:35:02.671305288 +1100
@@ -1561,6 +1561,20 @@ need_resched:
 		if (!rq->nr_running) {
 			next = rq->idle;
 			rq->expired_timestamp = 0;
+#ifdef CONFIG_SMP
+			if (ht_active) {
+				/*
+				 * If a HT sibling task is sleeping due to
+				 * priority reasons wake it up now
+				 */
+				runqueue_t *htrq;
+				htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+
+				if (htrq->curr == htrq->idle &&
+					htrq->nr_running)
+						resched_task(htrq->idle);
+			}
+#endif
 			goto switch_tasks;
 		}
 	}
@@ -1581,6 +1595,47 @@ need_resched:
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
+#ifdef CONFIG_SMP
+	if (ht_active && next->mm && !rt_task(next)) {
+		runqueue_t *htrq;
+		htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+		task_t *htcurr;
+		htcurr = htrq->curr;
+
+		if (likely(htcurr->mm && !rt_task(htcurr))){
+			/*
+			 * If a user task with >10 dynamic +
+			 * static priority difference from another
+			 * running user task on the hyperthread sibling
+			 * is trying to schedule, delay it to prevent a
+			 * lower priority task from using an unfair
+			 * proportion of the physical cpu resources.
+			 */
+			if (next->prio + next->static_prio >
+				htcurr->prio + htcurr->static_prio + 10) {
+					next = rq->idle;
+					goto switch_tasks;
+			}
+
+			/*
+			 * Reschedule a lower priority task
+			 * on the HT sibling if present.
+			 */
+			if (htcurr->prio + htcurr->static_prio >
+				next->prio + next->static_prio + 10)
+					resched_task(htcurr);
+			else
+				/*
+				 * If a HT sibling task has been put to sleep
+				 * previously for priority reasons wake it up
+				 * now.
+				 */
+				if (htcurr == htrq->idle && htrq->nr_running)
+					resched_task(htcurr);
+		}
+	}
+#endif
+
 	if (next->activated > 0) {
 		unsigned long long delta = now - next->timestamp;
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice"
  2004-01-29  8:17 [PATCH] 2.6.1 Hyperthread smart "nice" Con Kolivas
@ 2004-01-29  9:39 ` Jos Hulzink
  2004-01-29 10:28   ` Con Kolivas
  2004-02-02  9:27   ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
  0 siblings, 2 replies; 14+ messages in thread
From: Jos Hulzink @ 2004-01-29  9:39 UTC (permalink / raw)
  To: Con Kolivas, linux kernel mailing list

On Thursday 29 Jan 2004 09:17, Con Kolivas wrote:
> Hi all
>
> This patch (together with the ht base patch) will not allow a priority >10
> difference to run concurrently on both siblings, instead putting the low
> priority one to sleep. Overall if you run concurrent nice 0 and nice 20
> tasks with this patch your cpu throughput will drop during heavy periods by
> up to 10% (the hyperthread benefit), but your nice 0 task will run about
> 90% faster. It has no effect if you don't run any tasks at different "nice"
> levels. It does not modify real time tasks or kernel threads, and will
> allow niced tasks to run while a high priority kernel thread is running on
> the sibling cpu.

If I read you correctly, if one thread has nothing else to do but the nice 0 
task, the nice 20 task will never be scheduled at all ? Sounds like not the 
perfect solution to me...

Jos


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice"
  2004-01-29  9:39 ` Jos Hulzink
@ 2004-01-29 10:28   ` Con Kolivas
  2004-01-29 10:36     ` Con Kolivas
  2004-02-02  9:27   ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
  1 sibling, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-01-29 10:28 UTC (permalink / raw)
  To: Jos Hulzink, linux kernel mailing list

On Thu, 29 Jan 2004 20:39, Jos Hulzink wrote:
> On Thursday 29 Jan 2004 09:17, Con Kolivas wrote:
> > Hi all
> >
> > This patch (together with the ht base patch) will not allow a priority
> > >10 difference to run concurrently on both siblings, instead putting the
> > low priority one to sleep. Overall if you run concurrent nice 0 and nice
> > 20 tasks with this patch your cpu throughput will drop during heavy
> > periods by up to 10% (the hyperthread benefit), but your nice 0 task will
> > run about 90% faster. It has no effect if you don't run any tasks at
> > different "nice" levels. It does not modify real time tasks or kernel
> > threads, and will allow niced tasks to run while a high priority kernel
> > thread is running on the sibling cpu.
>
> If I read you correctly, if one thread has nothing else to do but the nice
> 0 task, the nice 20 task will never be scheduled at all ? Sounds like not
> the perfect solution to me...

Wrong.. there is the matter of the other runqueue in smp mode :)

Con


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice"
  2004-01-29 10:28   ` Con Kolivas
@ 2004-01-29 10:36     ` Con Kolivas
  0 siblings, 0 replies; 14+ messages in thread
From: Con Kolivas @ 2004-01-29 10:36 UTC (permalink / raw)
  To: Jos Hulzink, linux kernel mailing list

On Thu, 29 Jan 2004 21:28, Con Kolivas wrote:
> On Thu, 29 Jan 2004 20:39, Jos Hulzink wrote:
> > On Thursday 29 Jan 2004 09:17, Con Kolivas wrote:
> > > Hi all
> > >
> > > This patch (together with the ht base patch) will not allow a priority
> > >
> > > >10 difference to run concurrently on both siblings, instead putting
> > > > the
> > >
> > > low priority one to sleep. Overall if you run concurrent nice 0 and
> > > nice 20 tasks with this patch your cpu throughput will drop during
> > > heavy periods by up to 10% (the hyperthread benefit), but your nice 0
> > > task will run about 90% faster. It has no effect if you don't run any
> > > tasks at different "nice" levels. It does not modify real time tasks or
> > > kernel threads, and will allow niced tasks to run while a high priority
> > > kernel thread is running on the sibling cpu.
> >
> > If I read you correctly, if one thread has nothing else to do but the
> > nice 0 task, the nice 20 task will never be scheduled at all ? Sounds
> > like not the perfect solution to me...
>
> Wrong.. there is the matter of the other runqueue in smp mode :)

Oops I should have been clearer than that. Shouldn't email in a hurry. Yes the 
solution is not the right one, yes you can get longer periods of starvation 
compared with UP mode, but if the constant bouncing and balancing of tasks 
puts the low priority task on the same runqueue as the high priority one it 
will get scheduled. This is why Nick's idea of unbalancing runqueues for 
priority difference makes sense. However pushing and pulling tasks very 
frequently may be expensive so it's hard to know how well that will work.

Con


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-01-29  9:39 ` Jos Hulzink
  2004-01-29 10:28   ` Con Kolivas
@ 2004-02-02  9:27   ` Con Kolivas
  2004-02-02 10:31     ` Ingo Molnar
  1 sibling, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-02-02  9:27 UTC (permalink / raw)
  To: linux kernel mailing list; +Cc: Jos Hulzink

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

Following on from the previous hyperthread smart nice patch;

>A while back we had an lkml thread about the problem of running low priority 
>tasks on hyperthread enabled cpus in SMP mode. Brief summary: If you run a 
>P4HT in uniprocessor mode and run a cpu intensive task at nice +20 (like 
>setiathome), the most cpu it will get during periods of heavy usage is about 
>8%. If you boot a P4HT in SMP mode and run a cpu intensive task at nice +20 
>then if you run a task even at nice -20 concurrently, the nice +20 task will 
>get 50% of the cpu time even though you have a very high priority task. So 
>ironically booting in SMP mode makes your machine slower for running 
>background tasks.

Criticism was laid at the previous patch for the way a more "nice" task might 
never run on the sibling cpu if a high priority task was running. This patch 
is a much better solution.

What this one does is the following; If there is a "nice" difference between 
tasks running on logical cores of the same cpu, the more "nice" one will run 
a proportion of time equal to the timeslice it would have been given relative 
to the less "nice" task. 
ie a nice 19 task running on one core and the nice 0 task running on the other 
core will let the nice 0 task run continuously (102ms is normal timeslice) 
and the nice 19 task will only run for the last 10ms of time the nice 0 task 
is running. This makes for a much more balanced resource distribution, gives 
significant preference to the higher priority task, but allows them to 
benefit from running on both logical cores.

This seems to me a satisfactory solution to the hyperthread vs nice problem. 
Once again this is too arch. specific a change to sched.c for mainline, but 
as proof of concept I believe it works well for those who need something that 
works that they can use now.

http://ck.kolivas.org/patches/2.6/2.6.1/experimental/

The stuff on my website is incremental with my other experiments, but the 
attached patch applies cleanly to 2.6.1

Con

[-- Attachment #2: patch-2.6.1-htn2 --]
[-- Type: text/x-diff, Size: 2894 bytes --]

--- linux-2.6.1-base/kernel/sched.c	2004-01-09 22:57:04.000000000 +1100
+++ linux-2.6.1-htn2/kernel/sched.c	2004-02-02 20:01:17.042394133 +1100
@@ -208,6 +208,7 @@ struct runqueue {
 	atomic_t *node_nr_running;
 	int prev_node_load[MAX_NUMNODES];
 #endif
+	unsigned long cpu;
 	task_t *migration_thread;
 	struct list_head migration_queue;
 
@@ -221,6 +222,10 @@ static DEFINE_PER_CPU(struct runqueue, r
 #define task_rq(p)		cpu_rq(task_cpu(p))
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 
+#define ht_active		(cpu_has_ht && smp_num_siblings > 1)
+#define ht_siblings(cpu1, cpu2)	(ht_active && \
+	cpu_sibling_map[(cpu1)] == (cpu2))
+
 /*
  * Default context-switch locking:
  */
@@ -1380,6 +1385,10 @@ void scheduler_tick(int user_ticks, int 
 			cpustat->iowait += sys_ticks;
 		else
 			cpustat->idle += sys_ticks;
+		if (rq->nr_running) {
+			resched_task(p);
+			goto out;
+		}
 		rebalance_tick(rq, 1);
 		return;
 	}
@@ -1536,6 +1545,20 @@ need_resched:
 		if (!rq->nr_running) {
 			next = rq->idle;
 			rq->expired_timestamp = 0;
+#ifdef CONFIG_SMP
+			if (ht_active) {
+				/*
+				 * If a HT sibling task is sleeping due to
+				 * priority reasons wake it up now
+				 */
+				runqueue_t *htrq;
+				htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+
+				if (htrq->curr == htrq->idle &&
+					htrq->nr_running)
+						resched_task(htrq->idle);
+			}
+#endif
 			goto switch_tasks;
 		}
 	}
@@ -1555,6 +1578,42 @@ need_resched:
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
+#ifdef CONFIG_SMP
+	if (ht_active) {
+		runqueue_t *htrq;
+		htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+		task_t *htcurr;
+		htcurr = htrq->curr;
+
+		/*
+		 * If a user task with lower static priority than the
+		 * running task on the hyperthread sibling is trying
+		 * to schedule, delay it till there is equal timeslice
+		 * left of the hyperthread task to prevent a lower priority
+		 * task from using an unfair proportion of the physical
+		 * cpu's resources.
+		 */
+		if (next->mm && htcurr->mm && !rt_task(next) && 
+			((next->static_prio > 
+			htcurr->static_prio && htcurr->time_slice > 
+			task_timeslice(next)) || rt_task(htcurr))) {
+				next = rq->idle;
+				goto switch_tasks;
+		}
+		
+		/*
+		 * Reschedule a lower priority task
+		 * on the HT sibling, or wake it up if it has been
+		 * put to sleep for priority reasons.
+		 */
+		if ((htcurr != htrq->idle && 
+			htcurr->static_prio > next->static_prio) ||
+			(rt_task(next) && !rt_task(htcurr)) ||
+			(htcurr == htrq->idle && htrq->nr_running))
+				resched_task(htcurr);
+	}
+#endif
+
 	if (next->activated > 0) {
 		unsigned long long delta = now - next->timestamp;
 
@@ -2809,6 +2868,7 @@ void __init sched_init(void)
 		prio_array_t *array;
 
 		rq = cpu_rq(i);
+		rq->cpu = (unsigned long)(i);
 		rq->active = rq->arrays;
 		rq->expired = rq->arrays + 1;
 		spin_lock_init(&rq->lock);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-02  9:27   ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
@ 2004-02-02 10:31     ` Ingo Molnar
  2004-02-03 10:52       ` Con Kolivas
  0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2004-02-02 10:31 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, Jos Hulzink


* Con Kolivas <kernel@kolivas.org> wrote:

> What this one does is the following; If there is a "nice" difference
> between tasks running on logical cores of the same cpu, the more
> "nice" one will run a proportion of time equal to the timeslice it
> would have been given relative to the less "nice" task.  ie a nice 19
> task running on one core and the nice 0 task running on the other core
> will let the nice 0 task run continuously (102ms is normal timeslice)
> and the nice 19 task will only run for the last 10ms of time the nice
> 0 task is running. This makes for a much more balanced resource
> distribution, gives significant preference to the higher priority
> task, but allows them to benefit from running on both logical cores.

this is a really good rule conceptually - the higher prio task will get
at least as much raw (unshared) physical CPU slice as it would get
without HT.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-02 10:31     ` Ingo Molnar
@ 2004-02-03 10:52       ` Con Kolivas
  2004-02-03 10:58         ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-02-03 10:52 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux kernel mailing list, Jos Hulzink

On Mon, 2 Feb 2004 21:31, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > What this one does is the following; If there is a "nice" difference
> > between tasks running on logical cores of the same cpu, the more
> > "nice" one will run a proportion of time equal to the timeslice it
> > would have been given relative to the less "nice" task.  ie a nice 19
> > task running on one core and the nice 0 task running on the other core
> > will let the nice 0 task run continuously (102ms is normal timeslice)
> > and the nice 19 task will only run for the last 10ms of time the nice
> > 0 task is running. This makes for a much more balanced resource
> > distribution, gives significant preference to the higher priority
> > task, but allows them to benefit from running on both logical cores.
>
> this is a really good rule conceptually - the higher prio task will get
> at least as much raw (unshared) physical CPU slice as it would get
> without HT.

Glad you agree.

>From the anandtech website a description of the P4 Prescott (next generation 
IA32) with hyperthreading shows this with the new SSE3 instruction set:

"Finally we have the two thread synchronization instructions – monitor and 
mwait. These two instructions work hand in hand to improve Hyper Threading 
performance. The instructions work by determining whether a thread being sent 
to the core is the OS’ idle thread or other non-productive threads generated 
by device drivers and then instructing the core to worry about those threads 
after working on whatever more useful thread it is working on at the time."

At least it appears Intel are well aware of the priority problem, but full 
priority support across logical cores is not likely. However I guess these 
new instructions are probably enough to work with if someone can do the 
coding.

Con

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-03 10:52       ` Con Kolivas
@ 2004-02-03 10:58         ` Ingo Molnar
  2004-02-03 11:07           ` Con Kolivas
  2004-02-03 22:59           ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2004-02-03 10:58 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, Jos Hulzink


* Con Kolivas <kernel@kolivas.org> wrote:

> At least it appears Intel are well aware of the priority problem, but
> full priority support across logical cores is not likely. However I
> guess these new instructions are probably enough to work with if
> someone can do the coding.

these instructions can be used in the idle=poll code instead of rep-nop. 
This way idle-wakeup can be done via the memory bus in essence, and the
idle threads wont waste CPU time. (right now idle=poll wastes lots of
cycles on HT boxes and is thus unusable.)

for lowprio tasks they are of little use, unless you modify gcc to
sprinkle mwait yields all around the 'lowprio code' - not very practical
i think.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-03 10:58         ` Ingo Molnar
@ 2004-02-03 11:07           ` Con Kolivas
  2004-02-03 11:12             ` Ingo Molnar
  2004-02-03 11:19             ` Nick Piggin
  2004-02-03 22:59           ` Andrew Morton
  1 sibling, 2 replies; 14+ messages in thread
From: Con Kolivas @ 2004-02-03 11:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux kernel mailing list, Nick Piggin

On Tue, 3 Feb 2004 21:58, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > At least it appears Intel are well aware of the priority problem, but
> > full priority support across logical cores is not likely. However I
> > guess these new instructions are probably enough to work with if
> > someone can do the coding.
>
> these instructions can be used in the idle=poll code instead of rep-nop.
> This way idle-wakeup can be done via the memory bus in essence, and the
> idle threads wont waste CPU time. (right now idle=poll wastes lots of
> cycles on HT boxes and is thus unusable.)

Thanks for explaining.

> for lowprio tasks they are of little use, unless you modify gcc to
> sprinkle mwait yields all around the 'lowprio code' - not very practical
> i think.

Yuck!

Looks like the kernel is the only thing likely to be smart enough to do this 
correctly for some time yet. 

Nick, any chance of seeing something like this in your sched domains? (that 
would be the right way unlike my hacking sched.c directly for a specific 
architecture).

Con

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-03 11:07           ` Con Kolivas
@ 2004-02-03 11:12             ` Ingo Molnar
  2004-02-03 11:14               ` Con Kolivas
  2004-02-03 11:19             ` Nick Piggin
  1 sibling, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2004-02-03 11:12 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, Nick Piggin


* Con Kolivas <kernel@kolivas.org> wrote:

> > for lowprio tasks they are of little use, unless you modify gcc to
> > sprinkle mwait yields all around the 'lowprio code' - not very practical
> > i think.
> 
> Yuck!
> 
> Looks like the kernel is the only thing likely to be smart enough to
> do this correctly for some time yet. 

no, there's no way for the kernel to do this 'correctly', without
further hardware help. mwait is suspending the current virtual CPU a bit
better than rep-nop did. This can be exploited for the idle loop because
the idle loop does nothing so it can execute the rep-nop. (mwait can
likely also be used for spinlocks but that is another issue.)

user-space code that is 'low-prio' cannot be slowed down via mwait,
without interleaving user-space instructions with mwait (or with
rep-nop).

this is a problem area that is not solved by mwait - giving priority to
virtual CPUs should be offered by CPUs, once the number of logical cores
increases significantly - if the interaction between those cores is
significant. (there are SMT designs where this isnt an issue.)

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-03 11:12             ` Ingo Molnar
@ 2004-02-03 11:14               ` Con Kolivas
  2004-02-03 11:47                 ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-02-03 11:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux kernel mailing list, Nick Piggin

On Tue, 3 Feb 2004 22:12, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > > for lowprio tasks they are of little use, unless you modify gcc to
> > > sprinkle mwait yields all around the 'lowprio code' - not very
> > > practical i think.
> >
> > Yuck!
> >
> > Looks like the kernel is the only thing likely to be smart enough to
> > do this correctly for some time yet.
>
> no, there's no way for the kernel to do this 'correctly', without
> further hardware help. mwait is suspending the current virtual CPU a bit
> better than rep-nop did. This can be exploited for the idle loop because
> the idle loop does nothing so it can execute the rep-nop. (mwait can
> likely also be used for spinlocks but that is another issue.)
>
> user-space code that is 'low-prio' cannot be slowed down via mwait,
> without interleaving user-space instructions with mwait (or with
> rep-nop).
>
> this is a problem area that is not solved by mwait - giving priority to
> virtual CPUs should be offered by CPUs, once the number of logical cores
> increases significantly - if the interaction between those cores is
> significant. (there are SMT designs where this isnt an issue.)

Actually I was trying to say something like my patch, but done correctly. I 
agree with new instructions not helping at the moment.

Con

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-03 11:07           ` Con Kolivas
  2004-02-03 11:12             ` Ingo Molnar
@ 2004-02-03 11:19             ` Nick Piggin
  1 sibling, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2004-02-03 11:19 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, linux kernel mailing list



Con Kolivas wrote:

>On Tue, 3 Feb 2004 21:58, Ingo Molnar wrote:
>
>>* Con Kolivas <kernel@kolivas.org> wrote:
>>
>>>At least it appears Intel are well aware of the priority problem, but
>>>full priority support across logical cores is not likely. However I
>>>guess these new instructions are probably enough to work with if
>>>someone can do the coding.
>>>
>>these instructions can be used in the idle=poll code instead of rep-nop.
>>This way idle-wakeup can be done via the memory bus in essence, and the
>>idle threads wont waste CPU time. (right now idle=poll wastes lots of
>>cycles on HT boxes and is thus unusable.)
>>
>
>Thanks for explaining.
>
>
>>for lowprio tasks they are of little use, unless you modify gcc to
>>sprinkle mwait yields all around the 'lowprio code' - not very practical
>>i think.
>>
>
>Yuck!
>
>Looks like the kernel is the only thing likely to be smart enough to do this 
>correctly for some time yet. 
>
>Nick, any chance of seeing something like this in your sched domains? (that 
>would be the right way unlike my hacking sched.c directly for a specific 
>architecture).
>
>

Yeah it wouldn't be too difficult

Con Kolivas wrote:

>On Tue, 3 Feb 2004 21:58, Ingo Molnar wrote:
>
>>* Con Kolivas <kernel@kolivas.org> wrote:
>>
>>>At least it appears Intel are well aware of the priority problem, but
>>>full priority support across logical cores is not likely. However I
>>>guess these new instructions are probably enough to work with if
>>>someone can do the coding.
>>>
>>these instructions can be used in the idle=poll code instead of rep-nop.
>>This way idle-wakeup can be done via the memory bus in essence, and the
>>idle threads wont waste CPU time. (right now idle=poll wastes lots of
>>cycles on HT boxes and is thus unusable.)
>>
>
>Thanks for explaining.
>
>
>>for lowprio tasks they are of little use, unless you modify gcc to
>>sprinkle mwait yields all around the 'lowprio code' - not very practical
>>i think.
>>
>
>Yuck!
>
>Looks like the kernel is the only thing likely to be smart enough to do this 
>correctly for some time yet. 
>
>Nick, any chance of seeing something like this in your sched domains? (that 
>would be the right way unlike my hacking sched.c directly for a specific 
>architecture).
>
>

Yeah it wouldn't be too difficult Con. Basically you can add a flag to
a domain to enable some scheduling "quirk".

In this case you would add a flag to the domain which balances logical
cores in the physical CPU. You can then look up your lowest domain
with cpu_sched_domain(cpu). If the domain has the required flag set,
you can look at its ->span - which in this case would give you all
logical CPUs (siblings) on this package.

I need to actually do a bit more work and verification on the SMT
setup and make sure it plays nicely with non-ht systems, but after
that I'll probably look at this issue if someone hasn't beaten me to
it.

At the moment I've got my hands pretty full though so it might take
a while...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-03 11:14               ` Con Kolivas
@ 2004-02-03 11:47                 ` Ingo Molnar
  0 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2004-02-03 11:47 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, Nick Piggin


* Con Kolivas <kernel@kolivas.org> wrote:

> Actually I was trying to say something like my patch, but done
> correctly. I agree with new instructions not helping at the moment.

ok - for that the sched-domains code is the right solution.

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
  2004-02-03 10:58         ` Ingo Molnar
  2004-02-03 11:07           ` Con Kolivas
@ 2004-02-03 22:59           ` Andrew Morton
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2004-02-03 22:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kernel, linux-kernel, josh

Ingo Molnar <mingo@elte.hu> wrote:
>
> * Con Kolivas <kernel@kolivas.org> wrote:
> 
> > At least it appears Intel are well aware of the priority problem, but
> > full priority support across logical cores is not likely. However I
> > guess these new instructions are probably enough to work with if
> > someone can do the coding.
> 
> these instructions can be used in the idle=poll code instead of rep-nop. 
> This way idle-wakeup can be done via the memory bus in essence, and the
> idle threads wont waste CPU time. (right now idle=poll wastes lots of
> cycles on HT boxes and is thus unusable.)

The code to do this was merged quite a while ago.  See
arch/i386/kernel/process.c:mwait_idle().

I was hoping to see a spinlock patch using mwait(), but nothing yet..

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-02-03 22:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-29  8:17 [PATCH] 2.6.1 Hyperthread smart "nice" Con Kolivas
2004-01-29  9:39 ` Jos Hulzink
2004-01-29 10:28   ` Con Kolivas
2004-01-29 10:36     ` Con Kolivas
2004-02-02  9:27   ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
2004-02-02 10:31     ` Ingo Molnar
2004-02-03 10:52       ` Con Kolivas
2004-02-03 10:58         ` Ingo Molnar
2004-02-03 11:07           ` Con Kolivas
2004-02-03 11:12             ` Ingo Molnar
2004-02-03 11:14               ` Con Kolivas
2004-02-03 11:47                 ` Ingo Molnar
2004-02-03 11:19             ` Nick Piggin
2004-02-03 22:59           ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.