From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B29BC7618B for ; Wed, 15 Mar 2023 15:27:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232370AbjCOP1L (ORCPT ); Wed, 15 Mar 2023 11:27:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232262AbjCOP1J (ORCPT ); Wed, 15 Mar 2023 11:27:09 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D674F22C9F for ; Wed, 15 Mar 2023 08:27:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=K5ZrtxAQhAg4taBIjn5Ggqft0vr/AgFW0+J9EI0A4Iw=; b=pr2EXFWcfmlInLFnmMlP9TMUSh mFPaikx4ZU2hoTYIBSvcDksRp+NzoLTJNzmhIdUkRyzSGZEcc4+C4JrAxVfn86ARELhqWdbQ2Z6TP ATM7eiYiuvm0YUuXxWGaHGL7CyDHMb7X0yMBVYmNKewESvfX7PZapme8+GulcCwgO+kQyVpyTJ3lF T7dWy9Bv4yRCIE+5nRy6TN8xPnaDC284scJeNmWQbVfly+l1OFB3WCNopAQc+AF2/GW/fOimrQ8Wa sih2vP/PTJPDtahcUTz+zDLs5uHASs3f1MvJXKSBUmgfiqr8eWjlgNlC+Bl3tn6OoTCdGxq/pEpJM RHYfSnQg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pcT0n-00DwgJ-U1; Wed, 15 Mar 2023 15:25:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 25023300237; Wed, 15 Mar 2023 16:25:53 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 0D42B20FF20AC; Wed, 15 Mar 2023 16:25:53 +0100 (CET) Date: Wed, 15 Mar 2023 16:25:52 +0100 From: Peter Zijlstra To: Chen Yu Cc: Vincent Guittot , Ingo Molnar , Juri Lelli , Mel Gorman , Tim Chen , Dietmar Eggemann , Steven Rostedt , Ben Segall , K Prateek Nayak , Abel Wu , Yicong Yang , "Gautham R . Shenoy" , Honglei Wang , Len Brown , Chen Yu , Tianchen Ding , Joel Fernandes , Josh Don , Hillf Danton , kernel test robot , Arjan Van De Ven , linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 2/2] sched/fair: Introduce SIS_SHORT to wake up short task on current CPU Message-ID: <20230315152552.GF2006103@hirez.programming.kicks-ass.net> References: <373e6886e274f198608fa1b5f1c254e32b43845d.1677069490.git.yu.c.chen@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <373e6886e274f198608fa1b5f1c254e32b43845d.1677069490.git.yu.c.chen@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 22, 2023 at 10:09:55PM +0800, Chen Yu wrote: > will-it-scale > ============= > case load baseline compare% > context_switch1 224 groups 1.00 +946.68% > > There is a huge improvement in fast context switch test case, especially > when the number of groups equals the CPUs. > > netperf > ======= > case load baseline(std%) compare%( std%) > TCP_RR 56-threads 1.00 ( 1.12) -0.05 ( 0.97) > TCP_RR 112-threads 1.00 ( 0.50) +0.31 ( 0.35) > TCP_RR 168-threads 1.00 ( 3.46) +5.50 ( 2.08) > TCP_RR 224-threads 1.00 ( 2.52) +665.38 ( 3.38) > TCP_RR 280-threads 1.00 ( 38.59) +22.12 ( 11.36) > TCP_RR 336-threads 1.00 ( 15.88) -0.00 ( 19.96) > TCP_RR 392-threads 1.00 ( 27.22) +0.26 ( 24.26) > TCP_RR 448-threads 1.00 ( 37.88) +0.04 ( 27.87) > UDP_RR 56-threads 1.00 ( 2.39) -0.36 ( 8.33) > UDP_RR 112-threads 1.00 ( 22.62) -0.65 ( 24.66) > UDP_RR 168-threads 1.00 ( 15.72) +3.97 ( 5.02) > UDP_RR 224-threads 1.00 ( 15.90) +134.98 ( 28.59) > UDP_RR 280-threads 1.00 ( 32.43) +0.26 ( 29.68) > UDP_RR 336-threads 1.00 ( 39.21) -0.05 ( 39.71) > UDP_RR 392-threads 1.00 ( 31.76) -0.22 ( 32.00) > UDP_RR 448-threads 1.00 ( 44.90) +0.06 ( 31.83) > > There is significant 600+% improvement for TCP_RR and 100+% for UDP_RR > when the number of threads equals the CPUs. > > tbench > ====== > case load baseline(std%) compare%( std%) > loopback 56-threads 1.00 ( 0.15) +0.88 ( 0.08) > loopback 112-threads 1.00 ( 0.06) -0.41 ( 0.52) > loopback 168-threads 1.00 ( 0.17) +45.42 ( 39.54) > loopback 224-threads 1.00 ( 36.93) +24.10 ( 0.06) > loopback 280-threads 1.00 ( 0.04) -0.04 ( 0.04) > loopback 336-threads 1.00 ( 0.06) -0.16 ( 0.14) > loopback 392-threads 1.00 ( 0.05) +0.06 ( 0.02) > loopback 448-threads 1.00 ( 0.07) -0.02 ( 0.07) > > There is no noticeable impact on tbench. Although there is run-to-run variance > in 168/224 threads case, with or without this patch applied. So there is a very narrow, but significant, win at 4x overload. What about 3x/5x overload, they only have very marginal gains. So these patches are briliant if you run at exactly 4x overload, and very meh otherwise. Why do we care about 4x overload?