From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751938AbcFNP2S (ORCPT ); Tue, 14 Jun 2016 11:28:18 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:36695 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751590AbcFNP2Q (ORCPT ); Tue, 14 Jun 2016 11:28:16 -0400 From: Frederic Weisbecker To: Peter Zijlstra Cc: LKML , Frederic Weisbecker , Ingo Molnar , Mike Galbraith , Thomas Gleixner Subject: [PATCH 2/3] sched: Unloop sched avg decaying Date: Tue, 14 Jun 2016 17:28:01 +0200 Message-Id: <1465918082-27005-3-git-send-email-fweisbec@gmail.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1465918082-27005-1-git-send-email-fweisbec@gmail.com> References: <1465918082-27005-1-git-send-email-fweisbec@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The code that computes sched rt avg decaying periodically (every 0.5 seconds) catches up with lost updates within a loop. This is ok when CPUs sleep in dynticks mode for short periods of time as they only miss a few updates that are caught up through a few iterations. But CPUs can sleep for undefined periods of time, leading to unbound number of loop iterations to catch up. As a result, the computation time of sched_avg_update() can virtually induce some latency issue. The solution in this patch propose to convert this loop into a single division. Testing on idle load has shown positive results. Below is showned a function profile comparison between the upstream version (renamed as sched_avg_update_old) against the new version (sched_avg_update) with both being called alternatively. New version can be up to 2 times faster on average. Function Hit Time Avg s^2 -------- --- ---- --- --- trace_stat/function0: sched_avg_update_old 135120 47612.36 us 0.352 us 1049.606 us trace_stat/function0: sched_avg_update 135120 42668.44 us 0.315 us 1617.970 us trace_stat/function1: sched_avg_update_old 132074 46618.41 us 0.352 us 3653.566 us trace_stat/function1: sched_avg_update 132074 41356.76 us 0.313 us 3161.585 us trace_stat/function2: sched_avg_update_old 121335 43381.10 us 0.357 us 1262.240 us trace_stat/function2: sched_avg_update 121335 38240.72 us 0.315 us 970.400 us trace_stat/function3: sched_avg_update_old 37563 17640.67 us 0.469 us 148.206 us trace_stat/function3: sched_avg_update 37563 11059.65 us 0.294 us 126.548 us trace_stat/function4: sched_avg_update_old 24200 13557.05 us 0.560 us 57.124 us trace_stat/function4: sched_avg_update 24200 6687.281 us 0.276 us 9.528 us trace_stat/function5: sched_avg_update_old 28572 15158.76 us 0.530 us 674.049 us trace_stat/function5: sched_avg_update 28573 8012.361 us 0.280 us 181.687 us trace_stat/function6: sched_avg_update_old 23424 12987.64 us 0.554 us 27.639 us trace_stat/function6: sched_avg_update 23425 6264.433 us 0.267 us 2.965 us trace_stat/function7: sched_avg_update_old 22192 13083.69 us 0.589 us 25.605 us trace_stat/function7: sched_avg_update 22191 5947.785 us 0.268 us 1.748 us Below is a snapshot of the same function profile under full buzy load (perf bench sched messaging). Old and new versions there show roughly the same results with tiny 2 or 3 nsecs in favour of the old version which shouldn't matter much as it is called every 0.5 seconds. Function Hit Time Avg s^2 -------- --- ---- --- --- trace_stat/function0: sched_avg_update 106699 8029.961 us 0.075 us 0.244 us trace_stat/function0: sched_avg_update_old 106698 7852.948 us 0.073 us 0.313 us trace_stat/function1: sched_avg_update 106547 8066.833 us 0.075 us 1.256 us trace_stat/function1: sched_avg_update_old 106547 7794.896 us 0.073 us 1.521 us trace_stat/function2: sched_avg_update 106527 8049.326 us 0.075 us 1.141 us trace_stat/function2: sched_avg_update_old 106528 7818.155 us 0.073 us 1.052 us trace_stat/function3: sched_avg_update 106534 8056.079 us 0.075 us 0.342 us trace_stat/function3: sched_avg_update_old 106535 7815.416 us 0.073 us 0.369 us trace_stat/function4: sched_avg_update 106433 8090.462 us 0.076 us 5.359 us trace_stat/function4: sched_avg_update_old 106433 7879.694 us 0.074 us 0.433 us trace_stat/function5: sched_avg_update 106426 8127.800 us 0.076 us 1.304 us trace_stat/function5: sched_avg_update_old 106425 7854.538 us 0.073 us 2.466 us trace_stat/function6: sched_avg_update 106436 8067.921 us 0.075 us 0.257 us trace_stat/function6: sched_avg_update_old 106436 7830.492 us 0.073 us 0.334 us trace_stat/function7: sched_avg_update 106427 8135.786 us 0.076 us 13.181 us trace_stat/function7: sched_avg_update_old 106428 7940.925 us 0.074 us 0.982 us As a conclusion, the new version behaves roughly the same under buzy load but has a faster and much more contained behaviour under idle load. Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Frederic Weisbecker --- kernel/sched/core.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 385c947..0c0578a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq) void sched_avg_update(struct rq *rq) { s64 period = sched_avg_period(); + s64 delta; + u64 rem; + int pending; - while ((s64)(rq_clock(rq) - rq->age_stamp) > period) { - /* - * Inline assembly required to prevent the compiler - * optimising this loop into a divmod call. - * See __iter_div_u64_rem() for another example of this. - */ - asm("" : "+rm" (rq->age_stamp)); - rq->age_stamp += period; - rq->rt_avg /= 2; - } + delta = (s64)(rq_clock(rq) - rq->age_stamp); + if (delta <= period) + return; + + pending = div64_u64_rem(delta, period, &rem); + rq->age_stamp += delta - rem; + rq->rt_avg >>= pending; } #endif /* CONFIG_SMP */ -- 2.7.0