* HT schedulers' performance on single HT processor @ 2003-12-12 14:57 Con Kolivas 2003-12-14 19:49 ` Nathan Fredrickson 2004-01-03 17:56 ` Bill Davidsen 0 siblings, 2 replies; 9+ messages in thread From: Con Kolivas @ 2003-12-12 14:57 UTC (permalink / raw) To: linux kernel mailing list; +Cc: Nick Piggin, Ingo Molnar I set out to find how the hyper-thread schedulers would affect the all important kernel compile benchmark on machines that most of us are likely to encounter soon. The single processor HT machine. Usual benchmark precautions taken; best of five runs (curiously the fastest was almost always the second run). Although for confirmation I really did this twice. Tested a kernel compile with make vmlinux, make -j2 and make -j8. make vmlinux - tests to ensure the sequential single threaded make doesn't suffer as a result of these tweaks make -j2 vmlinux - tests to see how well wasted idle time is avoided make -j8 vmlinux - maximum throughput test (4x nr_cpus seems to be ceiling for this). Hardware: P4 HT 3.066 Legend: UP - Uniprocessor 2.6.0-test11 kernel SMP - SMP kernel C1 - With Ingo's C1 hyperthread patch w26 - With Nick's w26 sched-rollup (hyperthread included) make vmlinux kernel time UP 65.96 SMP 65.80 C1 66.54 w26 66.25 I was concerned this might happen and indeed the sequential single threaded compile is slightly worse on both HT schedulers. (1) make -j2 vmlinux kernel time UP 65.17 SMP 57.77 C1 66.01 w26 57.94 Shows the smp kernel nicely utilises HT whereas the UP kernel doesn't. The C1 result was very repeatable and I was unable to get it lower than this.(2) make -j8 vmlinux kernel time UP 65.00 SMP 57.85 C1 58.25 w26 57.94 Results are not obviously better(3) but C1 is still a little slower (2) Ok so what happened as I see it? (1) My concern with the HT patches and single compiles was that in an effort to keep both logical cores busy, the next task would bounce to the other logical core. While very cheap on HT it's still more expensive than staying on the same core. I can't prove that happened. (2) We know the C1 patch has trouble booting on some hardware so maybe there's a bug in there affecting performance too. (3) There is a very real performance advantage in this benchmark to enabling SMP on a HT cpu. However, in the best case it only amounts to 11%. This means that if a specialised HT scheduler patch gained say 10% it would only amount to 1% overall - hardly an exciting amount. 1% should have been on the edge of statistical significance, but I haven't even been able to show any difference at all. This does _not_ mean there aren't performance benefits elsewhere, but they obviously need evidence. Conclusion? If you run nothing but kernel compiles all day on a P4 HT, make sure you compile it for SMP ;-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-12 14:57 HT schedulers' performance on single HT processor Con Kolivas @ 2003-12-14 19:49 ` Nathan Fredrickson 2003-12-14 20:35 ` Adam Kropelin 2003-12-15 10:11 ` Con Kolivas 2004-01-03 17:56 ` Bill Davidsen 1 sibling, 2 replies; 9+ messages in thread From: Nathan Fredrickson @ 2003-12-14 19:49 UTC (permalink / raw) To: Con Kolivas; +Cc: Linux Kernel Mailing List, Nick Piggin, Ingo Molnar On Fri, 2003-12-12 at 09:57, Con Kolivas wrote: > I set out to find how the hyper-thread schedulers would affect the all > important kernel compile benchmark on machines that most of us are likely to > encounter soon. The single processor HT machine. I ran some further tests since I have access to some SMP systems with HT (1, 2 and 4 physical processors). Tested a kernel compile with make -jX vmlinux, where X = 1...16. Results are the best real time out of five runs. Hardware: Xeon HT 2GHz Test cases: 1phys (uniproc) - UP test11 kernel with HT disabled in the BIOS 1phys w/HT - SMP test11 kernel on 1 physical proc with HT enabled 1phys w/HT (w26) - same as above with Nick's w26 sched-rollup patch 1phys w/HT (C1) - same as above with Ingo's C1 patch 2phys - SMP test11 kernel on 2 physical proc with HT disabled 2phys w/HT - SMP test11 kernel on 2 physical proc with HT enabled 2phys w/HT (w26) - same as above with Nick's w26 sched-rollup patch 2phys w/HT (C1) - same as above with Ingo's C1 patch I can also run the same on four physical processors if there is interest. Here are some of the results. The units are time in seconds so lower is better. The complete results and some graphs are available at: http://nrf.sortof.com/kbench/test11-kbench.html j = 1 2 3 4 8 1phys (uniproc) 305.86 306.07 306.47 306.63 306.69 1phys w/HT 311.70 311.01 267.05 267.16 267.62 1phys w/HT (w26) 311.85 311.58 267.20 267.53 267.76 1phys w/HT (C1) 313.72 312.89 268.16 269.17 268.67 2phys 306.00 305.00 161.15 161.31 161.51 2phys w/HT 309.02 308.36 196.91 151.70 145.80 2phys w/HT (w26) 310.65 309.34 167.16 151.37 145.22 2phys w/HT (C1) 310.86 307.90 162.05 152.16 145.82 Same table as above normalized to the j=1 uniproc case to make comparisons easier. Lower is still better. j = 1 2 3 4 8 1phys (uniproc) 1.00 1.00 1.00 1.00 1.00 1phys w/HT 1.02 1.02 0.87 0.87 0.87 1phys w/HT (w26) 1.02 1.02 0.87 0.87 0.88 1phys w/HT (C1) 1.03 1.02 0.88 0.88 0.88 2phys 1.00 1.00 0.53 0.53 0.53 2phys w/HT 1.01 1.01 0.64 0.50 0.48 2phys w/HT (w26) 1.02 1.01 0.55 0.49 0.47 2phys w/HT (C1) 1.02 1.01 0.53 0.50 0.48 Con Kolivas wrote: > I was concerned this might happen and indeed the sequential single threaded > compile is slightly worse on both HT schedulers. (1) My test showed the same (assuming -j1 is the same as omitting the option). The slowdown of the -j1 case with HT is 1-3%. There was not much benefit from either HT or SMP with j=2. Maximum speedup was not realized until j=3 for one physical processor and j=5 for 2 physical processors. This suggests that j should be set to at least the number of logical processors + 1. > (3) There is a very real performance advantage in this benchmark to enabling > SMP on a HT cpu. However, in the best case it only amounts to 11%. This means > that if a specialised HT scheduler patch gained say 10% it would only amount > to 1% overall - hardly an exciting amount. Agree, there is certainly an advantage to using HT as long as there are enough runnable processes (j>=3). Running additional processes in parallel (j=16) does not increase performance any further nor does it decease it. My best case speedup amounts to 15%, which is right in the middle of the 10-20% range that Intel talks about. > Conclusion? > If you run nothing but kernel compiles all day on a P4 HT, make sure you > compile it for SMP ;-) And make sure you compile with the -jX option with X >= logical_procs+1 Nathan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-14 19:49 ` Nathan Fredrickson @ 2003-12-14 20:35 ` Adam Kropelin 2003-12-14 21:15 ` Nathan Fredrickson 2003-12-15 10:11 ` Con Kolivas 1 sibling, 1 reply; 9+ messages in thread From: Adam Kropelin @ 2003-12-14 20:35 UTC (permalink / raw) To: Nathan Fredrickson Cc: Con Kolivas, Linux Kernel Mailing List, Nick Piggin, Ingo Molnar, sam On Sun, Dec 14, 2003 at 02:49:24PM -0500, Nathan Fredrickson wrote: > Same table as above normalized to the j=1 uniproc case to make > comparisons easier. Lower is still better. > > j = 1 2 3 4 8 > 1phys (uniproc) 1.00 1.00 1.00 1.00 1.00 > 1phys w/HT 1.02 1.02 0.87 0.87 0.87 > 1phys w/HT (w26) 1.02 1.02 0.87 0.87 0.88 > 1phys w/HT (C1) 1.03 1.02 0.88 0.88 0.88 > 2phys 1.00 1.00 0.53 0.53 0.53 ^^^^^ ^^^^ Ummm... > 2phys w/HT 1.01 1.01 0.64 0.50 0.48 > 2phys w/HT (w26) 1.02 1.01 0.55 0.49 0.47 > 2phys w/HT (C1) 1.02 1.01 0.53 0.50 0.48 > There was not much benefit from either HT or SMP with j=2. Maximum > speedup was not realized until j=3 for one physical processor and j=5 > for 2 physical processors. This is mighty suspicious. With -j2 did you check to see that there were indeed two parallel gcc's running? Since -test6 I've found that -j2 only results in a single gcc instance. I've seen this on both an old hacked-up RH 7.3 installation and a brand new RH 9 + updates installation. > This suggests that j should be set to at least the number of logical > processors + 1. Since -test6 I've found this to be the case for kernel builds, yes. But I don't think it has anything to do with the scheduler or HT vs SMP platforms. --Adam ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-14 20:35 ` Adam Kropelin @ 2003-12-14 21:15 ` Nathan Fredrickson 0 siblings, 0 replies; 9+ messages in thread From: Nathan Fredrickson @ 2003-12-14 21:15 UTC (permalink / raw) To: Adam Kropelin Cc: Con Kolivas, Linux Kernel Mailing List, Nick Piggin, Ingo Molnar, sam On Sun, 2003-12-14 at 15:35, Adam Kropelin wrote: > On Sun, Dec 14, 2003 at 02:49:24PM -0500, Nathan Fredrickson wrote: > > Same table as above normalized to the j=1 uniproc case to make > > comparisons easier. Lower is still better. > > > > j = 1 2 3 4 8 > > 1phys (uniproc) 1.00 1.00 1.00 1.00 1.00 > > 1phys w/HT 1.02 1.02 0.87 0.87 0.87 > > 1phys w/HT (w26) 1.02 1.02 0.87 0.87 0.88 > > 1phys w/HT (C1) 1.03 1.02 0.88 0.88 0.88 > > 2phys 1.00 1.00 0.53 0.53 0.53 > ^^^^^ ^^^^ > > Ummm... > > This is mighty suspicious. With -j2 did you check to see that there > were indeed two parallel gcc's running? Since -test6 I've found that > -j2 only results in a single gcc instance. I've seen this on both an > old hacked-up RH 7.3 installation and a brand new RH 9 + updates > installation. I just checked and you're right, the number of compilers that actually run is j-1, for all j>1. I assume this is a problem with the parallel build process, but it does not invalidate these results for comparing the scheduler performance with different patches. > > > This suggests that j should be set to at least the number of logical > > processors + 1. > > Since -test6 I've found this to be the case for kernel builds, yes. But > I don't think it has anything to do with the scheduler or HT vs SMP > platforms. The 1-3% performance loss when HT is enabled for -j1 is still very real. Nathan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-14 19:49 ` Nathan Fredrickson 2003-12-14 20:35 ` Adam Kropelin @ 2003-12-15 10:11 ` Con Kolivas 2003-12-16 0:16 ` Nathan Fredrickson 1 sibling, 1 reply; 9+ messages in thread From: Con Kolivas @ 2003-12-15 10:11 UTC (permalink / raw) To: Nathan Fredrickson Cc: Linux Kernel Mailing List, Nick Piggin, Ingo Molnar, Adam Kropelin On Mon, 15 Dec 2003 06:49, Nathan Fredrickson wrote: > On Fri, 2003-12-12 at 09:57, Con Kolivas wrote: > > I set out to find how the hyper-thread schedulers would affect the all > > important kernel compile benchmark on machines that most of us are likely > > to encounter soon. The single processor HT machine. > > I ran some further tests since I have access to some SMP systems with HT > (1, 2 and 4 physical processors). > I can also run the same on four physical processors if there is > interest. > j = 1 2 3 4 8 > 1phys (uniproc) 1.00 1.00 1.00 1.00 1.00 > 1phys w/HT 1.02 1.02 0.87 0.87 0.87 > 1phys w/HT (w26) 1.02 1.02 0.87 0.87 0.88 > 1phys w/HT (C1) 1.03 1.02 0.88 0.88 0.88 > 2phys 1.00 1.00 0.53 0.53 0.53 > 2phys w/HT 1.01 1.01 0.64 0.50 0.48 > 2phys w/HT (w26) 1.02 1.01 0.55 0.49 0.47 > 2phys w/HT (C1) 1.02 1.01 0.53 0.50 0.48 The specific HT scheduler benefits only start appearing with more physical cpus which is to be expected. Just for demonstration the four processor run would be nice (and obviously take you less time to do ;). I think it will demonstrate it even more. It would be nice to help the most common case of one HT cpu, though, instead of hindering it. Adam already pointed out that you -j2 didn't really get you 2 jobs. I was using a 2.4 kernel tree for the benchmarks and j2 was giving me two jobs although perhaps something about the C1 patch was preventing the second job from ever taking off which is why the result is the same as one job in my benches. Curious. > > Conclusion? > > If you run nothing but kernel compiles all day on a P4 HT, make sure you > > compile it for SMP ;-) > > And make sure you compile with the -jX option with X >= logical_procs+1 Of course. For now on the uniprocessor HT setup I'd recommend the unmodified scheduler in SMP mode. Con ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-15 10:11 ` Con Kolivas @ 2003-12-16 0:16 ` Nathan Fredrickson 2003-12-16 0:55 ` Con Kolivas 0 siblings, 1 reply; 9+ messages in thread From: Nathan Fredrickson @ 2003-12-16 0:16 UTC (permalink / raw) To: Con Kolivas Cc: Linux Kernel Mailing List, Nick Piggin, Ingo Molnar, Adam Kropelin On Mon, 2003-12-15 at 05:11, Con Kolivas wrote: > On Mon, 15 Dec 2003 06:49, Nathan Fredrickson wrote: > > I can also run the same on four physical processors if there is > > interest. > > The specific HT scheduler benefits only start appearing with more physical > cpus which is to be expected. Just for demonstration the four processor run > would be nice (and obviously take you less time to do ;). I think it will > demonstrate it even more. It would be nice to help the most common case of > one HT cpu, though, instead of hindering it. Here are some results on four physical processors. Unfortunately my quad systems are a different speed than the dual systems used for the previous tests so the results are not directly comparable. Same test as before, a 2.6.0 kernel compile with make -jX vmlinux. Results are the best real time out of five runs. Hardware: Xeon HT 1.4GHz Test cases: 1phys UP - UP test11 kernel with HT disabled in the BIOS 4phys SMP - SMP test11 kernel on 4 physical procs with HT disabled 4phys HT - SMP test11 kernel on 4 physical procs with HT enabled 4phys HT (w26)- same as above with Nick's w26 sched-rollup patch 4phys HT (C1) - same as above with Ingo's C1 patch Here are the results normalized to the X=1 UP case to make comparisons easier. Lower is better. X = 1 2 3 4 5 6 7 8 9 16 1phys UP 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 4phys SMP 1.00 0.99 0.51 0.35 0.27 0.27 0.27 0.27 0.27 0.27 4phys HT 1.01 1.00 0.55 0.40 0.33 0.29 0.27 0.26 0.25 0.26 4phys HT(w26) 1.01 1.01 0.54 0.37 0.31 0.27 0.26 0.26 0.26 0.26 4phys HT(C1) 1.01 1.00 0.52 0.36 0.29 0.28 0.27 0.26 0.25 0.26 Interesting that the overhead due to HT in the X=1 column is only 1% with 4 physical processors. It was 1-3% before with 1 or 2 physical processors. In the partial load columns where there are less compiler processes than logical CPUs (X=3,4,5,6,7), it appears that both patches are doing a better job scheduling than the standard scheduler. At full load (X=>8) all three HT test cases perform about equally and beat standard SMP by 1-2%. Hope these results are helpful. I'd be happy to run more cases and/or other patches. Nathan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-16 0:16 ` Nathan Fredrickson @ 2003-12-16 0:55 ` Con Kolivas 2003-12-16 3:57 ` Nathan Fredrickson 0 siblings, 1 reply; 9+ messages in thread From: Con Kolivas @ 2003-12-16 0:55 UTC (permalink / raw) To: Nathan Fredrickson; +Cc: Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 1362 bytes --] Quoting Nathan Fredrickson <8nrf@qlink.queensu.ca>: > X = 1 2 3 4 5 6 7 8 9 16 > 1phys UP 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 > 4phys SMP 1.00 0.99 0.51 0.35 0.27 0.27 0.27 0.27 0.27 0.27 > 4phys HT 1.01 1.00 0.55 0.40 0.33 0.29 0.27 0.26 0.25 0.26 > 4phys HT(w26) 1.01 1.01 0.54 0.37 0.31 0.27 0.26 0.26 0.26 0.26 > 4phys HT(C1) 1.01 1.00 0.52 0.36 0.29 0.28 0.27 0.26 0.25 0.26 > > Interesting that the overhead due to HT in the X=1 column is only 1% > with 4 physical processors. It was 1-3% before with 1 or 2 physical > processors. > > In the partial load columns where there are less compiler processes than > logical CPUs (X=3,4,5,6,7), it appears that both patches are doing a > better job scheduling than the standard scheduler. At full load (X=>8) > all three HT test cases perform about equally and beat standard SMP by > 1-2%. > > Hope these results are helpful. I'd be happy to run more cases and/or > other patches. (cc list stripped) Well since you asked... I've been looking for someone with more HT cpus to give a much simpler approach a try. Here's a sample patch for vanilla test11 with HT. This one actually helps UP HT performance ever so slightly and I'd be curious to see if it does anything on more cpus. Con [-- Attachment #2: patch-test11-ht-3 --] [-- Type: application/octet-stream, Size: 2612 bytes --] --- linux-2.6.0-test11-base/kernel/sched.c 2003-11-24 22:18:56.000000000 +1100 +++ linux-2.6.0-test11-ht3/kernel/sched.c 2003-12-15 23:38:33.250059542 +1100 @@ -204,6 +204,7 @@ struct runqueue { struct mm_struct *prev_mm; prio_array_t *active, *expired, arrays[2]; int prev_cpu_load[NR_CPUS]; + unsigned long cpu; #ifdef CONFIG_NUMA atomic_t *node_nr_running; int prev_node_load[MAX_NUMNODES]; @@ -221,6 +222,10 @@ static DEFINE_PER_CPU(struct runqueue, r #define task_rq(p) cpu_rq(task_cpu(p)) #define cpu_curr(cpu) (cpu_rq(cpu)->curr) +#define ht_active (cpu_has_ht && smp_num_siblings > 1) +#define ht_siblings(cpu1, cpu2) (ht_active && \ + cpu_sibling_map[(cpu1)] == (cpu2)) + /* * Default context-switch locking: */ @@ -1157,8 +1162,9 @@ can_migrate_task(task_t *tsk, runqueue_t { unsigned long delta = sched_clock() - tsk->timestamp; - if (!idle && (delta <= JIFFIES_TO_NS(cache_decay_ticks))) - return 0; + if (!idle && (delta <= JIFFIES_TO_NS(cache_decay_ticks)) && + !ht_siblings(this_cpu, task_cpu(tsk))) + return 0; if (task_running(rq, tsk)) return 0; if (!cpu_isset(this_cpu, tsk->cpus_allowed)) @@ -1193,15 +1199,23 @@ static void load_balance(runqueue_t *thi imbalance /= 2; /* + * For hyperthread siblings take tasks from the active array + * to get cache-warm tasks since they share caches. + */ + if (ht_siblings(this_cpu, busiest->cpu)) + array = busiest->active; + /* * We first consider expired tasks. Those will likely not be * executed in the near future, and they are most likely to * be cache-cold, thus switching CPUs has the least effect * on them. */ - if (busiest->expired->nr_active) - array = busiest->expired; - else - array = busiest->active; + else { + if (busiest->expired->nr_active) + array = busiest->expired; + else + array = busiest->active; + } new_array: /* Start searching at priority 0: */ @@ -1212,9 +1226,16 @@ skip_bitmap: else idx = find_next_bit(array->bitmap, MAX_PRIO, idx); if (idx >= MAX_PRIO) { - if (array == busiest->expired) { - array = busiest->active; - goto new_array; + if (ht_siblings(this_cpu, busiest->cpu)){ + if (array == busiest->active) { + array = busiest->expired; + goto new_array; + } + } else { + if (array == busiest->expired) { + array = busiest->active; + goto new_array; + } } goto out_unlock; } @@ -2812,6 +2833,7 @@ void __init sched_init(void) prio_array_t *array; rq = cpu_rq(i); + rq->cpu = (unsigned long)(i); rq->active = rq->arrays; rq->expired = rq->arrays + 1; spin_lock_init(&rq->lock); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-16 0:55 ` Con Kolivas @ 2003-12-16 3:57 ` Nathan Fredrickson 0 siblings, 0 replies; 9+ messages in thread From: Nathan Fredrickson @ 2003-12-16 3:57 UTC (permalink / raw) To: Con Kolivas; +Cc: Linux Kernel Mailing List On Mon, 2003-12-15 at 19:55, Con Kolivas wrote: > Well since you asked... I've been looking for someone with more HT cpus to give > a much simpler approach a try. Here's a sample patch for vanilla test11 with > HT. This one actually helps UP HT performance ever so slightly and I'd be > curious to see if it does anything on more cpus. Not much change with this patch. The new result is most similar to vanilla test11 with HT. Both perform worse than no-HT under partial load. Here are the results from earlier with the new test case appended: X = 1 2 3 4 5 6 7 8 9 16 1phys UP 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 4phys SMP 1.00 0.99 0.51 0.35 0.27 0.27 0.27 0.27 0.27 0.27 4phys HT 1.01 1.00 0.55 0.40 0.33 0.29 0.27 0.26 0.25 0.26 4phys HT(w26) 1.01 1.01 0.54 0.37 0.31 0.27 0.26 0.26 0.26 0.26 4phys HT(C1) 1.01 1.00 0.52 0.36 0.29 0.28 0.27 0.26 0.25 0.26 4phys HT(ht3) 1.01 1.00 0.53 0.39 0.33 0.29 0.27 0.26 0.26 0.26 Nathan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: HT schedulers' performance on single HT processor 2003-12-12 14:57 HT schedulers' performance on single HT processor Con Kolivas 2003-12-14 19:49 ` Nathan Fredrickson @ 2004-01-03 17:56 ` Bill Davidsen 1 sibling, 0 replies; 9+ messages in thread From: Bill Davidsen @ 2004-01-03 17:56 UTC (permalink / raw) To: Con Kolivas; +Cc: linux kernel mailing list, Nick Piggin, Ingo Molnar Con Kolivas wrote: > I set out to find how the hyper-thread schedulers would affect the all > important kernel compile benchmark on machines that most of us are likely to > encounter soon. The single processor HT machine. > > Usual benchmark precautions taken; best of five runs (curiously the fastest > was almost always the second run). Although for confirmation I really did > this twice. > > Tested a kernel compile with make vmlinux, make -j2 and make -j8. > > make vmlinux - tests to ensure the sequential single threaded make doesn't > suffer as a result of these tweaks > > make -j2 vmlinux - tests to see how well wasted idle time is avoided > > make -j8 vmlinux - maximum throughput test (4x nr_cpus seems to be ceiling for > this). > > Hardware: P4 HT 3.066 > > Legend: > UP - Uniprocessor 2.6.0-test11 kernel > SMP - SMP kernel > C1 - With Ingo's C1 hyperthread patch > w26 - With Nick's w26 sched-rollup (hyperthread included) > > make vmlinux > kernel time > UP 65.96 > SMP 65.80 > C1 66.54 > w26 66.25 > > I was concerned this might happen and indeed the sequential single threaded > compile is slightly worse on both HT schedulers. (1) > > make -j2 vmlinux > kernel time > UP 65.17 > SMP 57.77 > C1 66.01 > w26 57.94 > > Shows the smp kernel nicely utilises HT whereas the UP kernel doesn't. The C1 > result was very repeatable and I was unable to get it lower than this.(2) > > make -j8 vmlinux > kernel time > UP 65.00 > SMP 57.85 > C1 58.25 > w26 57.94 If you could make one more test, do the compile with -pipe set in the top level Makefile. I don't have play access to a HT uni, the only machines available to me at the moment are SMP and production at that. I did try it just for grins on a non-HT uni and saw this: opt real user sys idle -j1 406.2 308.1 19.0 79.1 -j1 -pipe 398.6 308.2 19.0 71.4 -j3 391.6 308.3 19.0 64.3 -j3 -pipe 388.7 308.4 19.0 61.3 P4-2.4MHz, 256MB, compiling 2.5.47-ac6 with just "make." Using -pipe *may* allow both siblings to cooperate better. I assume that CPU affinity should apply to all siblings in a package? -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-01-03 17:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-12-12 14:57 HT schedulers' performance on single HT processor Con Kolivas 2003-12-14 19:49 ` Nathan Fredrickson 2003-12-14 20:35 ` Adam Kropelin 2003-12-14 21:15 ` Nathan Fredrickson 2003-12-15 10:11 ` Con Kolivas 2003-12-16 0:16 ` Nathan Fredrickson 2003-12-16 0:55 ` Con Kolivas 2003-12-16 3:57 ` Nathan Fredrickson 2004-01-03 17:56 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).