* Mathieu Desnoyers wrote: > (on a uniprocessor 2.0 GHz Pentium M) > > * Without the patch: > > - wakeup-latency with SIGEV_THREAD in parallel with youtube video and > make -j10 > > maximum latency: 50107.8 µs > average latency: 6609.2 µs > missed timer events: 0 I tried your patches on a similar UP system, using wakeup-latency.c. I also measured the vanilla upstream kernel (cced86a) with the default granularity settings, and also vanilla with a sched_min_granularity/3 tune (patch attached below for that). I got the following results (make -j10 kbuild load, average of 3 runs): vanilla: maximum latency: 38278.9 µs average latency: 7730.1 µs mathieu-dyn: maximum latency: 28698.8 µs average latency: 7757.1 µs peterz-min_gran/3: maximum latency: 22702.1 µs average latency: 6684.8 µs A couple of notes: - As can be seen from the raw results further below, the max-latency sched-latency.c numbers were very noisy with all 3 kernels. (This is typical of most maxium latency metrics). But it can be said that within statistical noise both your patches and peterz's patch reduced maximum latencies - as expected. - average latency seems to have gone down a bit more via the min_gran/3 patch. Your patch produced a faster-than-vanilla result in one of the runs - but the numbers are too noisy in general. - ( Measurement methodology: find below the raw results of the 3 runs pasted, and find attached the kernel config i used. (I applied your second patch with a trivial conflict resolved.) For measurement i used: your scheduling latency benchmark: http://www.efficios.com/pub/elc2010/wakeup-latency-0.1.tar.bz2 ) In general, your patches have indeed produced a max-latency improvement - and so has the simpler min_gran/3 patch too. So as Peter has suggested in his review, much of the same latency improvement can be gotten by the implicit /3 tune your patches do to min-granularity. So perhaps it would be better to investigate/measure your series by making the min_gran/3 patch below your patch #1 - and thus your other changes (the nr_running dependency) could be evaluated relative to that. I.e. please re-phrase your series as: "what else does it give us beyond tuning down the minimum granularity to 33% of its current value?" Your patches might have further merit than these numbers alone show - here i tried to limit my measurements to the measurements you yourself used. Maybe your approach can handle the granularity tradeoffs better in some other workload, etc. Thanks, Ingo ----------------------------> vanilla (cced86a): maximum latency: 46980.9 µs average latency: 7696.9 µs missed timer events: 0 maximum latency: 35636.3 µs average latency: 7736.6 µs missed timer events: 0 maximum latency: 32219.6 µs average latency: 7757.0 µs mathieu-dyn (cced86a+the-2-patches-in-this-thread): maximum latency: 33999.4 µs average latency: 9410.9 µs missed timer events: 0 maximum latency: 26125.7 µs average latency: 7083.2 µs missed timer events: 0 maximum latency: 25971.5 µs average latency: 6777.3 µs missed timer events: 0 peterz-min_gran/3 (cced86a+the-patch-attached-below): maximum latency: 22366.3 µs average latency: 7163.5 µs missed timer events: 0 maximum latency: 15166.4 µs average latency: 6788.6 µs missed timer events: 0 maximum latency: 30573.8 µs average latency: 6102.5 µs --- kernel/sched_fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: linux/kernel/sched_fair.c =================================================================== --- linux.orig/kernel/sched_fair.c +++ linux/kernel/sched_fair.c @@ -54,13 +54,13 @@ enum sched_tunable_scaling sysctl_sched_ * Minimal preemption granularity for CPU-bound tasks: * (default: 2 msec * (1 + ilog(ncpus)), units: nanoseconds) */ -unsigned int sysctl_sched_min_granularity = 2000000ULL; -unsigned int normalized_sysctl_sched_min_granularity = 2000000ULL; +unsigned int sysctl_sched_min_granularity = 750000ULL; +unsigned int normalized_sysctl_sched_min_granularity = 750000ULL; /* * is kept at sysctl_sched_latency / sysctl_sched_min_granularity */ -static unsigned int sched_nr_latency = 3; +static unsigned int sched_nr_latency = 8; /* * After fork, child runs first. If set to 0 (default) then