* [PATCH] 2.6.1 Hyperthread smart "nice"
@ 2004-01-29 8:17 Con Kolivas
2004-01-29 9:39 ` Jos Hulzink
0 siblings, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-01-29 8:17 UTC (permalink / raw)
To: linux kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]
Hi all
Not pushing this for mainline because this is not the "right way" to do it,
but it works here and now for those who have P4HT processors.
A while back we had an lkml thread about the problem of running low priority
tasks on hyperthread enabled cpus in SMP mode. Brief summary: If you run a
P4HT in uniprocessor mode and run a cpu intensive task at nice +20 (like
setiathome), the most cpu it will get during periods of heavy usage is about
8%. If you boot a P4HT in SMP mode and run a cpu intensive task at nice +20
then if you run a task even at nice -20 concurrently, the nice +20 task will
get 50% of the cpu time even though you have a very high priority task. So
ironically booting in SMP mode makes your machine slower for running
background tasks.
This patch (together with the ht base patch) will not allow a priority >10
difference to run concurrently on both siblings, instead putting the low
priority one to sleep. Overall if you run concurrent nice 0 and nice 20 tasks
with this patch your cpu throughput will drop during heavy periods by up to
10% (the hyperthread benefit), but your nice 0 task will run about 90%
faster. It has no effect if you don't run any tasks at different "nice"
levels. It does not modify real time tasks or kernel threads, and will allow
niced tasks to run while a high priority kernel thread is running on the
sibling cpu.
http://ck.kolivas.org/patches/2.6/2.6.1/experimental/
There are other patches that go with it which is why these have slight offsets
but should work ok.
Con
[-- Attachment #2: patch-2.6.1.O21-htbase1 --]
[-- Type: text/x-diff, Size: 902 bytes --]
--- linux-2.6.1/kernel/sched.c 2004-01-27 16:28:49.295067104 +1100
+++ linux-2.6.1-ck1/kernel/sched.c 2004-01-27 16:29:12.683511520 +1100
@@ -208,6 +208,7 @@ struct runqueue {
atomic_t *node_nr_running;
int prev_node_load[MAX_NUMNODES];
#endif
+ unsigned long cpu;
task_t *migration_thread;
struct list_head migration_queue;
@@ -221,6 +222,10 @@ static DEFINE_PER_CPU(struct runqueue, r
#define task_rq(p) cpu_rq(task_cpu(p))
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
+#define ht_active (cpu_has_ht && smp_num_siblings > 1)
+#define ht_siblings(cpu1, cpu2) (ht_active && \
+ cpu_sibling_map[(cpu1)] == (cpu2))
+
/*
* Default context-switch locking:
*/
@@ -2814,6 +2819,7 @@ void __init sched_init(void)
prio_array_t *array;
rq = cpu_rq(i);
+ rq->cpu = (unsigned long)(i);
rq->active = rq->arrays;
rq->expired = rq->arrays + 1;
rq->best_expired_prio = MAX_PRIO;
[-- Attachment #3: patch-2.6.1.httweak1-htnice1 --]
[-- Type: text/x-diff, Size: 1939 bytes --]
--- linux-2.6.1/kernel/sched.c 2004-01-27 16:34:48.582447120 +1100
+++ linux-2.6.1-ck1/kernel/sched.c 2004-01-27 16:35:02.671305288 +1100
@@ -1561,6 +1561,20 @@ need_resched:
if (!rq->nr_running) {
next = rq->idle;
rq->expired_timestamp = 0;
+#ifdef CONFIG_SMP
+ if (ht_active) {
+ /*
+ * If a HT sibling task is sleeping due to
+ * priority reasons wake it up now
+ */
+ runqueue_t *htrq;
+ htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+
+ if (htrq->curr == htrq->idle &&
+ htrq->nr_running)
+ resched_task(htrq->idle);
+ }
+#endif
goto switch_tasks;
}
}
@@ -1581,6 +1595,47 @@ need_resched:
queue = array->queue + idx;
next = list_entry(queue->next, task_t, run_list);
+#ifdef CONFIG_SMP
+ if (ht_active && next->mm && !rt_task(next)) {
+ runqueue_t *htrq;
+ htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+ task_t *htcurr;
+ htcurr = htrq->curr;
+
+ if (likely(htcurr->mm && !rt_task(htcurr))){
+ /*
+ * If a user task with >10 dynamic +
+ * static priority difference from another
+ * running user task on the hyperthread sibling
+ * is trying to schedule, delay it to prevent a
+ * lower priority task from using an unfair
+ * proportion of the physical cpu resources.
+ */
+ if (next->prio + next->static_prio >
+ htcurr->prio + htcurr->static_prio + 10) {
+ next = rq->idle;
+ goto switch_tasks;
+ }
+
+ /*
+ * Reschedule a lower priority task
+ * on the HT sibling if present.
+ */
+ if (htcurr->prio + htcurr->static_prio >
+ next->prio + next->static_prio + 10)
+ resched_task(htcurr);
+ else
+ /*
+ * If a HT sibling task has been put to sleep
+ * previously for priority reasons wake it up
+ * now.
+ */
+ if (htcurr == htrq->idle && htrq->nr_running)
+ resched_task(htcurr);
+ }
+ }
+#endif
+
if (next->activated > 0) {
unsigned long long delta = now - next->timestamp;
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice"
2004-01-29 8:17 [PATCH] 2.6.1 Hyperthread smart "nice" Con Kolivas
@ 2004-01-29 9:39 ` Jos Hulzink
2004-01-29 10:28 ` Con Kolivas
2004-02-02 9:27 ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
0 siblings, 2 replies; 14+ messages in thread
From: Jos Hulzink @ 2004-01-29 9:39 UTC (permalink / raw)
To: Con Kolivas, linux kernel mailing list
On Thursday 29 Jan 2004 09:17, Con Kolivas wrote:
> Hi all
>
> This patch (together with the ht base patch) will not allow a priority >10
> difference to run concurrently on both siblings, instead putting the low
> priority one to sleep. Overall if you run concurrent nice 0 and nice 20
> tasks with this patch your cpu throughput will drop during heavy periods by
> up to 10% (the hyperthread benefit), but your nice 0 task will run about
> 90% faster. It has no effect if you don't run any tasks at different "nice"
> levels. It does not modify real time tasks or kernel threads, and will
> allow niced tasks to run while a high priority kernel thread is running on
> the sibling cpu.
If I read you correctly, if one thread has nothing else to do but the nice 0
task, the nice 20 task will never be scheduled at all ? Sounds like not the
perfect solution to me...
Jos
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice"
2004-01-29 9:39 ` Jos Hulzink
@ 2004-01-29 10:28 ` Con Kolivas
2004-01-29 10:36 ` Con Kolivas
2004-02-02 9:27 ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
1 sibling, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-01-29 10:28 UTC (permalink / raw)
To: Jos Hulzink, linux kernel mailing list
On Thu, 29 Jan 2004 20:39, Jos Hulzink wrote:
> On Thursday 29 Jan 2004 09:17, Con Kolivas wrote:
> > Hi all
> >
> > This patch (together with the ht base patch) will not allow a priority
> > >10 difference to run concurrently on both siblings, instead putting the
> > low priority one to sleep. Overall if you run concurrent nice 0 and nice
> > 20 tasks with this patch your cpu throughput will drop during heavy
> > periods by up to 10% (the hyperthread benefit), but your nice 0 task will
> > run about 90% faster. It has no effect if you don't run any tasks at
> > different "nice" levels. It does not modify real time tasks or kernel
> > threads, and will allow niced tasks to run while a high priority kernel
> > thread is running on the sibling cpu.
>
> If I read you correctly, if one thread has nothing else to do but the nice
> 0 task, the nice 20 task will never be scheduled at all ? Sounds like not
> the perfect solution to me...
Wrong.. there is the matter of the other runqueue in smp mode :)
Con
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice"
2004-01-29 10:28 ` Con Kolivas
@ 2004-01-29 10:36 ` Con Kolivas
0 siblings, 0 replies; 14+ messages in thread
From: Con Kolivas @ 2004-01-29 10:36 UTC (permalink / raw)
To: Jos Hulzink, linux kernel mailing list
On Thu, 29 Jan 2004 21:28, Con Kolivas wrote:
> On Thu, 29 Jan 2004 20:39, Jos Hulzink wrote:
> > On Thursday 29 Jan 2004 09:17, Con Kolivas wrote:
> > > Hi all
> > >
> > > This patch (together with the ht base patch) will not allow a priority
> > >
> > > >10 difference to run concurrently on both siblings, instead putting
> > > > the
> > >
> > > low priority one to sleep. Overall if you run concurrent nice 0 and
> > > nice 20 tasks with this patch your cpu throughput will drop during
> > > heavy periods by up to 10% (the hyperthread benefit), but your nice 0
> > > task will run about 90% faster. It has no effect if you don't run any
> > > tasks at different "nice" levels. It does not modify real time tasks or
> > > kernel threads, and will allow niced tasks to run while a high priority
> > > kernel thread is running on the sibling cpu.
> >
> > If I read you correctly, if one thread has nothing else to do but the
> > nice 0 task, the nice 20 task will never be scheduled at all ? Sounds
> > like not the perfect solution to me...
>
> Wrong.. there is the matter of the other runqueue in smp mode :)
Oops I should have been clearer than that. Shouldn't email in a hurry. Yes the
solution is not the right one, yes you can get longer periods of starvation
compared with UP mode, but if the constant bouncing and balancing of tasks
puts the low priority task on the same runqueue as the high priority one it
will get scheduled. This is why Nick's idea of unbalancing runqueues for
priority difference makes sense. However pushing and pulling tasks very
frequently may be expensive so it's hard to know how well that will work.
Con
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-01-29 9:39 ` Jos Hulzink
2004-01-29 10:28 ` Con Kolivas
@ 2004-02-02 9:27 ` Con Kolivas
2004-02-02 10:31 ` Ingo Molnar
1 sibling, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-02-02 9:27 UTC (permalink / raw)
To: linux kernel mailing list; +Cc: Jos Hulzink
[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]
Following on from the previous hyperthread smart nice patch;
>A while back we had an lkml thread about the problem of running low priority
>tasks on hyperthread enabled cpus in SMP mode. Brief summary: If you run a
>P4HT in uniprocessor mode and run a cpu intensive task at nice +20 (like
>setiathome), the most cpu it will get during periods of heavy usage is about
>8%. If you boot a P4HT in SMP mode and run a cpu intensive task at nice +20
>then if you run a task even at nice -20 concurrently, the nice +20 task will
>get 50% of the cpu time even though you have a very high priority task. So
>ironically booting in SMP mode makes your machine slower for running
>background tasks.
Criticism was laid at the previous patch for the way a more "nice" task might
never run on the sibling cpu if a high priority task was running. This patch
is a much better solution.
What this one does is the following; If there is a "nice" difference between
tasks running on logical cores of the same cpu, the more "nice" one will run
a proportion of time equal to the timeslice it would have been given relative
to the less "nice" task.
ie a nice 19 task running on one core and the nice 0 task running on the other
core will let the nice 0 task run continuously (102ms is normal timeslice)
and the nice 19 task will only run for the last 10ms of time the nice 0 task
is running. This makes for a much more balanced resource distribution, gives
significant preference to the higher priority task, but allows them to
benefit from running on both logical cores.
This seems to me a satisfactory solution to the hyperthread vs nice problem.
Once again this is too arch. specific a change to sched.c for mainline, but
as proof of concept I believe it works well for those who need something that
works that they can use now.
http://ck.kolivas.org/patches/2.6/2.6.1/experimental/
The stuff on my website is incremental with my other experiments, but the
attached patch applies cleanly to 2.6.1
Con
[-- Attachment #2: patch-2.6.1-htn2 --]
[-- Type: text/x-diff, Size: 2894 bytes --]
--- linux-2.6.1-base/kernel/sched.c 2004-01-09 22:57:04.000000000 +1100
+++ linux-2.6.1-htn2/kernel/sched.c 2004-02-02 20:01:17.042394133 +1100
@@ -208,6 +208,7 @@ struct runqueue {
atomic_t *node_nr_running;
int prev_node_load[MAX_NUMNODES];
#endif
+ unsigned long cpu;
task_t *migration_thread;
struct list_head migration_queue;
@@ -221,6 +222,10 @@ static DEFINE_PER_CPU(struct runqueue, r
#define task_rq(p) cpu_rq(task_cpu(p))
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
+#define ht_active (cpu_has_ht && smp_num_siblings > 1)
+#define ht_siblings(cpu1, cpu2) (ht_active && \
+ cpu_sibling_map[(cpu1)] == (cpu2))
+
/*
* Default context-switch locking:
*/
@@ -1380,6 +1385,10 @@ void scheduler_tick(int user_ticks, int
cpustat->iowait += sys_ticks;
else
cpustat->idle += sys_ticks;
+ if (rq->nr_running) {
+ resched_task(p);
+ goto out;
+ }
rebalance_tick(rq, 1);
return;
}
@@ -1536,6 +1545,20 @@ need_resched:
if (!rq->nr_running) {
next = rq->idle;
rq->expired_timestamp = 0;
+#ifdef CONFIG_SMP
+ if (ht_active) {
+ /*
+ * If a HT sibling task is sleeping due to
+ * priority reasons wake it up now
+ */
+ runqueue_t *htrq;
+ htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+
+ if (htrq->curr == htrq->idle &&
+ htrq->nr_running)
+ resched_task(htrq->idle);
+ }
+#endif
goto switch_tasks;
}
}
@@ -1555,6 +1578,42 @@ need_resched:
queue = array->queue + idx;
next = list_entry(queue->next, task_t, run_list);
+#ifdef CONFIG_SMP
+ if (ht_active) {
+ runqueue_t *htrq;
+ htrq = cpu_rq(cpu_sibling_map[(rq->cpu)]);
+ task_t *htcurr;
+ htcurr = htrq->curr;
+
+ /*
+ * If a user task with lower static priority than the
+ * running task on the hyperthread sibling is trying
+ * to schedule, delay it till there is equal timeslice
+ * left of the hyperthread task to prevent a lower priority
+ * task from using an unfair proportion of the physical
+ * cpu's resources.
+ */
+ if (next->mm && htcurr->mm && !rt_task(next) &&
+ ((next->static_prio >
+ htcurr->static_prio && htcurr->time_slice >
+ task_timeslice(next)) || rt_task(htcurr))) {
+ next = rq->idle;
+ goto switch_tasks;
+ }
+
+ /*
+ * Reschedule a lower priority task
+ * on the HT sibling, or wake it up if it has been
+ * put to sleep for priority reasons.
+ */
+ if ((htcurr != htrq->idle &&
+ htcurr->static_prio > next->static_prio) ||
+ (rt_task(next) && !rt_task(htcurr)) ||
+ (htcurr == htrq->idle && htrq->nr_running))
+ resched_task(htcurr);
+ }
+#endif
+
if (next->activated > 0) {
unsigned long long delta = now - next->timestamp;
@@ -2809,6 +2868,7 @@ void __init sched_init(void)
prio_array_t *array;
rq = cpu_rq(i);
+ rq->cpu = (unsigned long)(i);
rq->active = rq->arrays;
rq->expired = rq->arrays + 1;
spin_lock_init(&rq->lock);
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-02 9:27 ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
@ 2004-02-02 10:31 ` Ingo Molnar
2004-02-03 10:52 ` Con Kolivas
0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2004-02-02 10:31 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux kernel mailing list, Jos Hulzink
* Con Kolivas <kernel@kolivas.org> wrote:
> What this one does is the following; If there is a "nice" difference
> between tasks running on logical cores of the same cpu, the more
> "nice" one will run a proportion of time equal to the timeslice it
> would have been given relative to the less "nice" task. ie a nice 19
> task running on one core and the nice 0 task running on the other core
> will let the nice 0 task run continuously (102ms is normal timeslice)
> and the nice 19 task will only run for the last 10ms of time the nice
> 0 task is running. This makes for a much more balanced resource
> distribution, gives significant preference to the higher priority
> task, but allows them to benefit from running on both logical cores.
this is a really good rule conceptually - the higher prio task will get
at least as much raw (unshared) physical CPU slice as it would get
without HT.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-02 10:31 ` Ingo Molnar
@ 2004-02-03 10:52 ` Con Kolivas
2004-02-03 10:58 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-02-03 10:52 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux kernel mailing list, Jos Hulzink
On Mon, 2 Feb 2004 21:31, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > What this one does is the following; If there is a "nice" difference
> > between tasks running on logical cores of the same cpu, the more
> > "nice" one will run a proportion of time equal to the timeslice it
> > would have been given relative to the less "nice" task. ie a nice 19
> > task running on one core and the nice 0 task running on the other core
> > will let the nice 0 task run continuously (102ms is normal timeslice)
> > and the nice 19 task will only run for the last 10ms of time the nice
> > 0 task is running. This makes for a much more balanced resource
> > distribution, gives significant preference to the higher priority
> > task, but allows them to benefit from running on both logical cores.
>
> this is a really good rule conceptually - the higher prio task will get
> at least as much raw (unshared) physical CPU slice as it would get
> without HT.
Glad you agree.
>From the anandtech website a description of the P4 Prescott (next generation
IA32) with hyperthreading shows this with the new SSE3 instruction set:
"Finally we have the two thread synchronization instructions – monitor and
mwait. These two instructions work hand in hand to improve Hyper Threading
performance. The instructions work by determining whether a thread being sent
to the core is the OS’ idle thread or other non-productive threads generated
by device drivers and then instructing the core to worry about those threads
after working on whatever more useful thread it is working on at the time."
At least it appears Intel are well aware of the priority problem, but full
priority support across logical cores is not likely. However I guess these
new instructions are probably enough to work with if someone can do the
coding.
Con
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-03 10:52 ` Con Kolivas
@ 2004-02-03 10:58 ` Ingo Molnar
2004-02-03 11:07 ` Con Kolivas
2004-02-03 22:59 ` Andrew Morton
0 siblings, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2004-02-03 10:58 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux kernel mailing list, Jos Hulzink
* Con Kolivas <kernel@kolivas.org> wrote:
> At least it appears Intel are well aware of the priority problem, but
> full priority support across logical cores is not likely. However I
> guess these new instructions are probably enough to work with if
> someone can do the coding.
these instructions can be used in the idle=poll code instead of rep-nop.
This way idle-wakeup can be done via the memory bus in essence, and the
idle threads wont waste CPU time. (right now idle=poll wastes lots of
cycles on HT boxes and is thus unusable.)
for lowprio tasks they are of little use, unless you modify gcc to
sprinkle mwait yields all around the 'lowprio code' - not very practical
i think.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-03 10:58 ` Ingo Molnar
@ 2004-02-03 11:07 ` Con Kolivas
2004-02-03 11:12 ` Ingo Molnar
2004-02-03 11:19 ` Nick Piggin
2004-02-03 22:59 ` Andrew Morton
1 sibling, 2 replies; 14+ messages in thread
From: Con Kolivas @ 2004-02-03 11:07 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux kernel mailing list, Nick Piggin
On Tue, 3 Feb 2004 21:58, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > At least it appears Intel are well aware of the priority problem, but
> > full priority support across logical cores is not likely. However I
> > guess these new instructions are probably enough to work with if
> > someone can do the coding.
>
> these instructions can be used in the idle=poll code instead of rep-nop.
> This way idle-wakeup can be done via the memory bus in essence, and the
> idle threads wont waste CPU time. (right now idle=poll wastes lots of
> cycles on HT boxes and is thus unusable.)
Thanks for explaining.
> for lowprio tasks they are of little use, unless you modify gcc to
> sprinkle mwait yields all around the 'lowprio code' - not very practical
> i think.
Yuck!
Looks like the kernel is the only thing likely to be smart enough to do this
correctly for some time yet.
Nick, any chance of seeing something like this in your sched domains? (that
would be the right way unlike my hacking sched.c directly for a specific
architecture).
Con
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-03 11:07 ` Con Kolivas
@ 2004-02-03 11:12 ` Ingo Molnar
2004-02-03 11:14 ` Con Kolivas
2004-02-03 11:19 ` Nick Piggin
1 sibling, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2004-02-03 11:12 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux kernel mailing list, Nick Piggin
* Con Kolivas <kernel@kolivas.org> wrote:
> > for lowprio tasks they are of little use, unless you modify gcc to
> > sprinkle mwait yields all around the 'lowprio code' - not very practical
> > i think.
>
> Yuck!
>
> Looks like the kernel is the only thing likely to be smart enough to
> do this correctly for some time yet.
no, there's no way for the kernel to do this 'correctly', without
further hardware help. mwait is suspending the current virtual CPU a bit
better than rep-nop did. This can be exploited for the idle loop because
the idle loop does nothing so it can execute the rep-nop. (mwait can
likely also be used for spinlocks but that is another issue.)
user-space code that is 'low-prio' cannot be slowed down via mwait,
without interleaving user-space instructions with mwait (or with
rep-nop).
this is a problem area that is not solved by mwait - giving priority to
virtual CPUs should be offered by CPUs, once the number of logical cores
increases significantly - if the interaction between those cores is
significant. (there are SMT designs where this isnt an issue.)
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-03 11:12 ` Ingo Molnar
@ 2004-02-03 11:14 ` Con Kolivas
2004-02-03 11:47 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Con Kolivas @ 2004-02-03 11:14 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux kernel mailing list, Nick Piggin
On Tue, 3 Feb 2004 22:12, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > > for lowprio tasks they are of little use, unless you modify gcc to
> > > sprinkle mwait yields all around the 'lowprio code' - not very
> > > practical i think.
> >
> > Yuck!
> >
> > Looks like the kernel is the only thing likely to be smart enough to
> > do this correctly for some time yet.
>
> no, there's no way for the kernel to do this 'correctly', without
> further hardware help. mwait is suspending the current virtual CPU a bit
> better than rep-nop did. This can be exploited for the idle loop because
> the idle loop does nothing so it can execute the rep-nop. (mwait can
> likely also be used for spinlocks but that is another issue.)
>
> user-space code that is 'low-prio' cannot be slowed down via mwait,
> without interleaving user-space instructions with mwait (or with
> rep-nop).
>
> this is a problem area that is not solved by mwait - giving priority to
> virtual CPUs should be offered by CPUs, once the number of logical cores
> increases significantly - if the interaction between those cores is
> significant. (there are SMT designs where this isnt an issue.)
Actually I was trying to say something like my patch, but done correctly. I
agree with new instructions not helping at the moment.
Con
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-03 11:07 ` Con Kolivas
2004-02-03 11:12 ` Ingo Molnar
@ 2004-02-03 11:19 ` Nick Piggin
1 sibling, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2004-02-03 11:19 UTC (permalink / raw)
To: Con Kolivas; +Cc: Ingo Molnar, linux kernel mailing list
Con Kolivas wrote:
>On Tue, 3 Feb 2004 21:58, Ingo Molnar wrote:
>
>>* Con Kolivas <kernel@kolivas.org> wrote:
>>
>>>At least it appears Intel are well aware of the priority problem, but
>>>full priority support across logical cores is not likely. However I
>>>guess these new instructions are probably enough to work with if
>>>someone can do the coding.
>>>
>>these instructions can be used in the idle=poll code instead of rep-nop.
>>This way idle-wakeup can be done via the memory bus in essence, and the
>>idle threads wont waste CPU time. (right now idle=poll wastes lots of
>>cycles on HT boxes and is thus unusable.)
>>
>
>Thanks for explaining.
>
>
>>for lowprio tasks they are of little use, unless you modify gcc to
>>sprinkle mwait yields all around the 'lowprio code' - not very practical
>>i think.
>>
>
>Yuck!
>
>Looks like the kernel is the only thing likely to be smart enough to do this
>correctly for some time yet.
>
>Nick, any chance of seeing something like this in your sched domains? (that
>would be the right way unlike my hacking sched.c directly for a specific
>architecture).
>
>
Yeah it wouldn't be too difficult
Con Kolivas wrote:
>On Tue, 3 Feb 2004 21:58, Ingo Molnar wrote:
>
>>* Con Kolivas <kernel@kolivas.org> wrote:
>>
>>>At least it appears Intel are well aware of the priority problem, but
>>>full priority support across logical cores is not likely. However I
>>>guess these new instructions are probably enough to work with if
>>>someone can do the coding.
>>>
>>these instructions can be used in the idle=poll code instead of rep-nop.
>>This way idle-wakeup can be done via the memory bus in essence, and the
>>idle threads wont waste CPU time. (right now idle=poll wastes lots of
>>cycles on HT boxes and is thus unusable.)
>>
>
>Thanks for explaining.
>
>
>>for lowprio tasks they are of little use, unless you modify gcc to
>>sprinkle mwait yields all around the 'lowprio code' - not very practical
>>i think.
>>
>
>Yuck!
>
>Looks like the kernel is the only thing likely to be smart enough to do this
>correctly for some time yet.
>
>Nick, any chance of seeing something like this in your sched domains? (that
>would be the right way unlike my hacking sched.c directly for a specific
>architecture).
>
>
Yeah it wouldn't be too difficult Con. Basically you can add a flag to
a domain to enable some scheduling "quirk".
In this case you would add a flag to the domain which balances logical
cores in the physical CPU. You can then look up your lowest domain
with cpu_sched_domain(cpu). If the domain has the required flag set,
you can look at its ->span - which in this case would give you all
logical CPUs (siblings) on this package.
I need to actually do a bit more work and verification on the SMT
setup and make sure it plays nicely with non-ht systems, but after
that I'll probably look at this issue if someone hasn't beaten me to
it.
At the moment I've got my hands pretty full though so it might take
a while...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-03 11:14 ` Con Kolivas
@ 2004-02-03 11:47 ` Ingo Molnar
0 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2004-02-03 11:47 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux kernel mailing list, Nick Piggin
* Con Kolivas <kernel@kolivas.org> wrote:
> Actually I was trying to say something like my patch, but done
> correctly. I agree with new instructions not helping at the moment.
ok - for that the sched-domains code is the right solution.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.6.1 Hyperthread smart "nice" 2
2004-02-03 10:58 ` Ingo Molnar
2004-02-03 11:07 ` Con Kolivas
@ 2004-02-03 22:59 ` Andrew Morton
1 sibling, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2004-02-03 22:59 UTC (permalink / raw)
To: Ingo Molnar; +Cc: kernel, linux-kernel, josh
Ingo Molnar <mingo@elte.hu> wrote:
>
> * Con Kolivas <kernel@kolivas.org> wrote:
>
> > At least it appears Intel are well aware of the priority problem, but
> > full priority support across logical cores is not likely. However I
> > guess these new instructions are probably enough to work with if
> > someone can do the coding.
>
> these instructions can be used in the idle=poll code instead of rep-nop.
> This way idle-wakeup can be done via the memory bus in essence, and the
> idle threads wont waste CPU time. (right now idle=poll wastes lots of
> cycles on HT boxes and is thus unusable.)
The code to do this was merged quite a while ago. See
arch/i386/kernel/process.c:mwait_idle().
I was hoping to see a spinlock patch using mwait(), but nothing yet..
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2004-02-03 22:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-29 8:17 [PATCH] 2.6.1 Hyperthread smart "nice" Con Kolivas
2004-01-29 9:39 ` Jos Hulzink
2004-01-29 10:28 ` Con Kolivas
2004-01-29 10:36 ` Con Kolivas
2004-02-02 9:27 ` [PATCH] 2.6.1 Hyperthread smart "nice" 2 Con Kolivas
2004-02-02 10:31 ` Ingo Molnar
2004-02-03 10:52 ` Con Kolivas
2004-02-03 10:58 ` Ingo Molnar
2004-02-03 11:07 ` Con Kolivas
2004-02-03 11:12 ` Ingo Molnar
2004-02-03 11:14 ` Con Kolivas
2004-02-03 11:47 ` Ingo Molnar
2004-02-03 11:19 ` Nick Piggin
2004-02-03 22:59 ` Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.