From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B580C433E1 for ; Thu, 27 Aug 2020 14:09:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7BF832054F for ; Thu, 27 Aug 2020 14:09:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727772AbgH0OJr (ORCPT ); Thu, 27 Aug 2020 10:09:47 -0400 Received: from foss.arm.com ([217.140.110.172]:58514 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727056AbgH0OEW (ORCPT ); Thu, 27 Aug 2020 10:04:22 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8B2C8101E; Thu, 27 Aug 2020 06:53:39 -0700 (PDT) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 889CB3F68F; Thu, 27 Aug 2020 06:53:33 -0700 (PDT) Date: Thu, 27 Aug 2020 14:53:31 +0100 From: Qais Yousef To: Greg Kroah-Hartman , "Peter Zijlstra (Intel)" , Mel Gorman Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Lukasz Luba , Sasha Levin Subject: Re: [PATCH 5.8 130/232] sched/uclamp: Protect uclamp fast path code with static key Message-ID: <20200827135330.246pbwc7h5gvdli7@e107158-lin.cambridge.arm.com> References: <20200820091612.692383444@linuxfoundation.org> <20200820091619.114657136@linuxfoundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200820091619.114657136@linuxfoundation.org> User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/20/20 11:19, Greg Kroah-Hartman wrote: > From: Qais Yousef > > [ Upstream commit 46609ce227039fd192e0ecc7d940bed587fd2c78 ] > > There is a report that when uclamp is enabled, a netperf UDP test > regresses compared to a kernel compiled without uclamp. > > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/ > > While investigating the root cause, there were no sign that the uclamp > code is doing anything particularly expensive but could suffer from bad > cache behavior under certain circumstances that are yet to be > understood. > > https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/ > > To reduce the pressure on the fast path anyway, add a static key that is > by default will skip executing uclamp logic in the > enqueue/dequeue_task() fast path until it's needed. > > As soon as the user start using util clamp by: > > 1. Changing uclamp value of a task with sched_setattr() > 2. Modifying the default sysctl_sched_util_clamp_{min, max} > 3. Modifying the default cpu.uclamp.{min, max} value in cgroup > > We flip the static key now that the user has opted to use util clamp. > Effectively re-introducing uclamp logic in the enqueue/dequeue_task() > fast path. It stays on from that point forward until the next reboot. > > This should help minimize the effect of util clamp on workloads that > don't need it but still allow distros to ship their kernels with uclamp > compiled in by default. > > SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end > up with unbalanced call to uclamp_rq_dec_id() if we flip the key while > a task is running in the rq. Since we know it is harmless we just > quietly return if we attempt a uclamp_rq_dec_id() when > rq->uclamp[].bucket[].tasks is 0. > > In schedutil, we introduce a new uclamp_is_enabled() helper which takes > the static key into account to ensure RT boosting behavior is retained. > > The following results demonstrates how this helps on 2 Sockets Xeon E5 > 2x10-Cores system. > > nouclamp uclamp uclamp-static-key > Hmean send-64 162.43 ( 0.00%) 157.84 * -2.82%* 163.39 * 0.59%* > Hmean send-128 324.71 ( 0.00%) 314.78 * -3.06%* 326.18 * 0.45%* > Hmean send-256 641.55 ( 0.00%) 628.67 * -2.01%* 648.12 * 1.02%* > Hmean send-1024 2525.28 ( 0.00%) 2448.26 * -3.05%* 2543.73 * 0.73%* > Hmean send-2048 4836.14 ( 0.00%) 4712.08 * -2.57%* 4867.69 * 0.65%* > Hmean send-3312 7540.83 ( 0.00%) 7425.45 * -1.53%* 7621.06 * 1.06%* > Hmean send-4096 9124.53 ( 0.00%) 8948.82 * -1.93%* 9276.25 * 1.66%* > Hmean send-8192 15589.67 ( 0.00%) 15486.35 * -0.66%* 15819.98 * 1.48%* > Hmean send-16384 26386.47 ( 0.00%) 25752.25 * -2.40%* 26773.74 * 1.47%* > > The perf diff between nouclamp and uclamp-static-key when uclamp is > disabled in the fast path: > > 8.73% -1.55% [kernel.kallsyms] [k] try_to_wake_up > 0.07% +0.04% [kernel.kallsyms] [k] deactivate_task > 0.13% -0.02% [kernel.kallsyms] [k] activate_task > > The diff between nouclamp and uclamp-static-key when uclamp is enabled > in the fast path: > > 8.73% -0.72% [kernel.kallsyms] [k] try_to_wake_up > 0.13% +0.39% [kernel.kallsyms] [k] activate_task > 0.07% +0.38% [kernel.kallsyms] [k] deactivate_task > > Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") > Reported-by: Mel Gorman > Signed-off-by: Qais Yousef > Signed-off-by: Peter Zijlstra (Intel) > Tested-by: Lukasz Luba > Link: https://lkml.kernel.org/r/20200630112123.12076-3-qais.yousef@arm.com > Signed-off-by: Sasha Levin > --- Greg/Peter/Mel Should this go to 5.4 too? Not saying it should, but I don't know if distros could care about potential performance hit that this patch addresses. Thanks -- Qais Yousef