From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0FD0C433E0 for ; Wed, 3 Jun 2020 14:59:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 840D120679 for ; Wed, 3 Jun 2020 14:59:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="kjWK618B" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726135AbgFCO7R (ORCPT ); Wed, 3 Jun 2020 10:59:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726099AbgFCO7Q (ORCPT ); Wed, 3 Jun 2020 10:59:16 -0400 Received: from mail-lj1-x244.google.com (mail-lj1-x244.google.com [IPv6:2a00:1450:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25FB0C08C5C2 for ; Wed, 3 Jun 2020 07:59:15 -0700 (PDT) Received: by mail-lj1-x244.google.com with SMTP id z18so3095680lji.12 for ; Wed, 03 Jun 2020 07:59:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=X3gULNc5o6rDWFlyFYtI7qphk+pGj3+0r/jIrUq0xCg=; b=kjWK618BvweQ6tbdxhKUdH3+E3jEV3i2X3hL3t3KPt8TtrOS7Dpy72DTP0xDdQrDCG np00IKlPluZEp7slLTcE/05nUU15vi+h7PSNwxitCV6YMJPLXDzAnWpaOEx4azPnF1ye 5GJ0sD+izd0JEmY4+DOOwY/vtGy7wlrBc3PzmHGM0Diz+u1Q1rP3Rgydi3IBzUeKdzJr H3QnjUC1ooFXv79Tnq9eRzR1qu9dDr9IPm9pI5uqTgeZgVtcYvCvQpqYMc3UbBe3k3/M ZCpmsVgeek0ueEQTJmDipKDkLgFz2G/JGJFqNeBOypkgNn1l3t407VUrtpmdtF4o7DSr g90g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=X3gULNc5o6rDWFlyFYtI7qphk+pGj3+0r/jIrUq0xCg=; b=pcahCZFVJWUMm+ikSFrAeIWdmlC0sYN60yiAq33yNgUy//fEpAQSatKvtvxFhDsAq9 gO8oXK9rTNdV+oZZXQVeO8l9PHQyeJzjAPBZnwMBco+SZz1DvVciquEqe0V4CVR4AR8m cBz7BTjqKBWti+0Y4NHczCpuR58ZHscqf67GDoJdbMwLir7gE9wk+BFExCFtfrIZjQEV rZlbAGtsM/qBih1iWloof2IynALorCV8WoZFZMwnEluqNKznPYnIh8+CXhjydPOU4WdR jRnJWveMFpkTqEDFhb1wvXfGB+TzqXTCqGyqNqYprwdr0NrIy0gLXFrguvEKcBm20nFw SWtg== X-Gm-Message-State: AOAM530Dgz1q+Y2MTJXN3iD2YOJJ/SeMj+trG9v3A7NsflpUTRaWomO1 QwRe3WxPK8OXTow1ZBfogXgW8d4ymCyCZVqVx+I/CA== X-Google-Smtp-Source: ABdhPJwLZn/G+w0RRGpk174MhHzR9IiVw4lvU3596knATh5HEXTsFJAT5vymUluEm4CL6dTS+rZDSnIIW1hhcbY2EyY= X-Received: by 2002:a2e:b8d4:: with SMTP id s20mr2329628ljp.177.1591196353337; Wed, 03 Jun 2020 07:59:13 -0700 (PDT) MIME-Version: 1.0 References: <20200511154053.7822-1-qais.yousef@arm.com> <20200528132327.GB706460@hirez.programming.kicks-ass.net> <20200528155800.yjrmx3hj72xreryh@e107158-lin.cambridge.arm.com> <20200528161112.GI2483@worktop.programming.kicks-ass.net> <20200529100806.GA3070@suse.de> <87v9k84knx.derkling@matbug.net> <20200603101022.GG3070@suse.de> In-Reply-To: <20200603101022.GG3070@suse.de> From: Vincent Guittot Date: Wed, 3 Jun 2020 16:59:00 +0200 Message-ID: Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value To: Mel Gorman Cc: Patrick Bellasi , Dietmar Eggemann , Peter Zijlstra , Qais Yousef , Ingo Molnar , Randy Dunlap , Jonathan Corbet , Juri Lelli , Steven Rostedt , Ben Segall , Luis Chamberlain , Kees Cook , Iurii Zaikin , Quentin Perret , Valentin Schneider , Pavan Kondeti , linux-doc@vger.kernel.org, linux-kernel , linux-fs Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 3 Jun 2020 at 12:10, Mel Gorman wrote: > > On Wed, Jun 03, 2020 at 10:29:22AM +0200, Patrick Bellasi wrote: > > > > Hi Dietmar, > > thanks for sharing these numbers. > > > > On Tue, Jun 02, 2020 at 18:46:00 +0200, Dietmar Eggemann wrote... > > > > [...] > > > > > I ran these tests on 'Ubuntu 18.04 Desktop' on Intel E5-2690 v2 > > > (2 sockets * 10 cores * 2 threads) with powersave governor as: > > > > > > $ numactl -N 0 ./run-mmtests.sh XXX > > > > Great setup, it's worth to rule out all possible noise source (freq > > scaling, thermal throttling, NUMA scheduler, etc.). > > config-network-netperf-cross-socket will do the binding of the server > and client to two CPUs that are on one socket. However, it does not take > care to avoid HT siblings although that could be implemented. The same > configuration should limit the CPU to C1. It does not change the governor > but all that would take is adding "cpupower frequency-set -g performance" > to the end of the configuration. > > > Wondering if disabling HT can also help here in reducing results "noise"? > > > > > w/ config-network-netperf-unbound. > > > > > > Running w/o 'numactl -N 0' gives slightly worse results. > > > > > > without-clamp : CONFIG_UCLAMP_TASK is not set > > > with-clamp : CONFIG_UCLAMP_TASK=y, > > > CONFIG_UCLAMP_TASK_GROUP is not set > > > with-clamp-tskgrp : CONFIG_UCLAMP_TASK=y, > > > CONFIG_UCLAMP_TASK_GROUP=y > > > > > > > > > netperf-udp > > > ./5.7.0-rc7 ./5.7.0-rc7 ./5.7.0-rc7 > > > without-clamp with-clamp with-clamp-tskgrp > > > > Can you please specify how to read the following scores? I give it a run > > to my local netperf and it reports Throughput, thous I would expect the > > higher the better... but... this seems something different. > > > > > Hmean send-64 153.62 ( 0.00%) 151.80 * -1.19%* 155.60 * 1.28%* > > > Hmean send-128 306.77 ( 0.00%) 306.27 * -0.16%* 309.39 * 0.85%* > > > Hmean send-256 608.54 ( 0.00%) 604.28 * -0.70%* 613.42 * 0.80%* > > > Hmean send-1024 2395.80 ( 0.00%) 2365.67 * -1.26%* 2409.50 * 0.57%* > > > Hmean send-2048 4608.70 ( 0.00%) 4544.02 * -1.40%* 4665.96 * 1.24%* > > > Hmean send-3312 7223.97 ( 0.00%) 7158.88 * -0.90%* 7331.23 * 1.48%* > > > Hmean send-4096 8729.53 ( 0.00%) 8598.78 * -1.50%* 8860.47 * 1.50%* > > > Hmean send-8192 14961.77 ( 0.00%) 14418.92 * -3.63%* 14908.36 * -0.36%* > > > Hmean send-16384 25799.50 ( 0.00%) 25025.64 * -3.00%* 25831.20 * 0.12%* > > > > If I read it as the lower the score the better, all the above results > > tell us that with-clamp is even better, while with-clamp-tskgrp > > is not that much worst. > > > > The figures are throughput to taking the first line > > without-clamp 153.62 > with-clamp 151.80 (worse, so the percentage difference is negative) > with-clamp-tskgrp 155.60 (better so the percentage different is positive) > > > The other way around (the higher the score the better) would look odd > > since we definitively add in more code and complexity when uclamp has > > the TG support enabled we would not expect better scores. > > > > Netperf for small differences is very fickle as small differences in timing > or code layout can make a difference. Boot-to-boot variance can also be > an issue and bisection is generally unreliable. In this case, I relied on > the perf annotation and differences in ftrace function_graph to determine > that uclamp was introducing enough overhead to be considered a problem. When I want to stress the fast path i usually use "perf bench sched pipe -T " The tip/sched/core on my arm octo core gives the following results for 20 iterations of perf bench sched pipe -T -l 50000 all uclamp config disabled 50035.4(+/- 0.334%) all uclamp config enabled 48749.8(+/- 0.339%) -2.64% It's quite easy to reproduce and probably easier to study the impact > > > > Hmean recv-64 153.62 ( 0.00%) 151.80 * -1.19%* 155.60 * 1.28%* > > > Hmean recv-128 306.77 ( 0.00%) 306.27 * -0.16%* 309.39 * 0.85%* > > > Hmean recv-256 608.54 ( 0.00%) 604.28 * -0.70%* 613.42 * 0.80%* > > > Hmean recv-1024 2395.80 ( 0.00%) 2365.67 * -1.26%* 2409.50 * 0.57%* > > > Hmean recv-2048 4608.70 ( 0.00%) 4544.02 * -1.40%* 4665.95 * 1.24%* > > > Hmean recv-3312 7223.97 ( 0.00%) 7158.88 * -0.90%* 7331.23 * 1.48%* > > > Hmean recv-4096 8729.53 ( 0.00%) 8598.78 * -1.50%* 8860.47 * 1.50%* > > > Hmean recv-8192 14961.61 ( 0.00%) 14418.88 * -3.63%* 14908.30 * -0.36%* > > > Hmean recv-16384 25799.39 ( 0.00%) 25025.49 * -3.00%* 25831.00 * 0.12%* > > > > > > netperf-tcp > > > > > > Hmean 64 818.65 ( 0.00%) 812.98 * -0.69%* 826.17 * 0.92%* > > > Hmean 128 1569.55 ( 0.00%) 1555.79 * -0.88%* 1586.94 * 1.11%* > > > Hmean 256 2952.86 ( 0.00%) 2915.07 * -1.28%* 2968.15 * 0.52%* > > > Hmean 1024 10425.91 ( 0.00%) 10296.68 * -1.24%* 10418.38 * -0.07%* > > > Hmean 2048 17454.51 ( 0.00%) 17369.57 * -0.49%* 17419.24 * -0.20%* > > > Hmean 3312 22509.95 ( 0.00%) 22229.69 * -1.25%* 22373.32 * -0.61%* > > > Hmean 4096 25033.23 ( 0.00%) 24859.59 * -0.69%* 24912.50 * -0.48%* > > > Hmean 8192 32080.51 ( 0.00%) 31744.51 * -1.05%* 31800.45 * -0.87%* > > > Hmean 16384 36531.86 ( 0.00%) 37064.68 * 1.46%* 37397.71 * 2.37%* > > > > > > The diffs are smaller than on openSUSE Leap 15.1 and some of the > > > uclamp taskgroup results are better? > > > > > > With this test setup we now can play with the uclamp code in > > > enqueue_task() and dequeue_task(). > > > > > > --- > > > > > > W/ config-network-netperf-unbound (only netperf-udp and buffer size 64): > > > > > > $ perf diff 5.7.0-rc7_without-clamp/perf.data 5.7.0-rc7_with-clamp/perf.data | grep activate_task > > > > > > # Event 'cycles:ppp' > > > # > > > # Baseline Delta Abs Shared Object Symbol > > > > > > 0.02% +0.54% [kernel.vmlinux] [k] activate_task > > > 0.02% +0.38% [kernel.vmlinux] [k] deactivate_task > > > > > > $ perf diff 5.7.0-rc7_without-clamp/perf.data 5.7.0-rc7_with-clamp-tskgrp/perf.data | grep activate_task > > > > > > 0.02% +0.35% [kernel.vmlinux] [k] activate_task > > > 0.02% +0.34% [kernel.vmlinux] [k] deactivate_task > > > > These data makes more sense to me, AFAIR we measured <1% impact in the > > wakeup path using cycletest. > > > > 1% doesn't sound like a lot but UDP_STREAM is an example of a load with > a *lot* of wakeups so even though the impact on each individual wakeup > is small, it builds up. > > > I would also suggest to always report the overheads for > > __update_load_avg_cfs_rq() > > as a reference point. We use that code quite a lot in the wakeup path > > and it's a good proxy for relative comparisons. > > > > > > > I still see 20 out of 90 tests with the warning message that the > > > desired confidence was not achieved though. > > > > Where the 90 comes from? From the above table we run 9 sizes for > > {udp-send, udp-recv, tcp} and 3 kernels. Should not give us 81 results? > > > > Maybe the Warning are generated only when a test has to be repeated? > > The warning is issued when it could not get a reliable result within the > iterations allowed. > > > > " > > > !!! WARNING > > > !!! Desired confidence was not achieved within the specified iterations. > > > !!! This implies that there was variability in the test environment that > > > !!! must be investigated before going further. > > > !!! Confidence intervals: Throughput : 6.727% <-- more than 5% !!! > > > !!! Local CPU util : 0.000% > > > !!! Remote CPU util : 0.000% > > > " > > > > > > mmtests seems to run netperf with the following '-I' and 'i' parameter > > > hardcoded: 'netperf -t UDP_STREAM -i 3,3 -I 95,5' > > > > This means that we compute a score's (average +-2.5%) with a 95% confidence. > > > > Does not that means that every +-2.5% difference in the results > > above should be considered in the noise? > > > > Usually yes but the impact is small enough to be within noise but > still detectable. Where we get hurt is when there are multiple problems > introduced where each contribute overhead that is within the noise but when > all added together there is a regression outside the noise. Uclamp is not > special in this respect, it just happens to be the current focus. We met > this type of problem before with PSI that was resolved by e0c274472d5d > ("psi: make disabling/enabling easier for vendor kernels"). > > > I would say that it could be useful to run with more iterations > > and, given the small numbers we are looking at (apparently we are > > scared by a 1% overhead), we should better use a more aggressive CI. > > > > What about something like: > > > > netperf -t UDP_STREAM -i 3,30 -I 99,1 > > > > ? > > > > You could but the runtime of netperf will be variable, it will not be > guaranteed to give consistent results and it may mask the true variability > of the workload. While we could debate which is a valid approach, I > think it makes sense to minimise the overhead of uclamp when it's not > configured even if that means putting it behind a static branch that is > enabled via a command-line parameter or a Kconfig that specifies whether > it's on or off by default. > > -- > Mel Gorman > SUSE Labs