From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 443EBC433DB for ; Sat, 20 Mar 2021 02:06:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0EB2F6196D for ; Sat, 20 Mar 2021 02:06:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229723AbhCTCFZ convert rfc822-to-8bit (ORCPT ); Fri, 19 Mar 2021 22:05:25 -0400 Received: from out4436.biz.mail.alibaba.com ([47.88.44.36]:33935 "EHLO out4436.biz.mail.alibaba.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229447AbhCTCEw (ORCPT ); Fri, 19 Mar 2021 22:04:52 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=changhuaixin@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0USeTkdM_1616205878; Received: from 192.168.3.154(mailfrom:changhuaixin@linux.alibaba.com fp:SMTPD_---0USeTkdM_1616205878) by smtp.aliyun-inc.com(127.0.0.1); Sat, 20 Mar 2021 10:04:39 +0800 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: [PATCH v4 1/4] sched/fair: Introduce primitives for CFS bandwidth burst From: changhuaixin In-Reply-To: <2F207CE6-F849-457A-B0A6-3A8BFFE0AFFB@linux.alibaba.com> Date: Sat, 20 Mar 2021 10:06:52 +0800 Cc: changhuaixin , Benjamin Segall , dietmar.eggemann@arm.com, juri.lelli@redhat.com, khlebnikov@yandex-team.ru, open list , mgorman@suse.de, mingo@redhat.com, Odin Ugedal , Odin Ugedal , pauld@redhead.com, Paul Turner , rostedt@goodmis.org, Shanpei Chen , Tejun Heo , Vincent Guittot , xiyou.wangcong@gmail.com Content-Transfer-Encoding: 8BIT Message-Id: References: <20210316044931.39733-1-changhuaixin@linux.alibaba.com> <20210316044931.39733-2-changhuaixin@linux.alibaba.com> <2F207CE6-F849-457A-B0A6-3A8BFFE0AFFB@linux.alibaba.com> To: Peter Zijlstra X-Mailer: Apple Mail (2.3445.104.11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 19, 2021, at 8:39 PM, changhuaixin wrote: > > > >> On Mar 18, 2021, at 11:05 PM, Peter Zijlstra wrote: >> >> On Thu, Mar 18, 2021 at 09:26:58AM +0800, changhuaixin wrote: >>>> On Mar 17, 2021, at 4:06 PM, Peter Zijlstra wrote: >> >>>> So what is the typical avg,stdev,max and mode for the workloads where you find >>>> you need this? >>>> >>>> I would really like to put a limit on the burst. IMO a workload that has >>>> a burst many times longer than the quota is plain broken. >>> >>> I see. Then the problem comes down to how large the limit on burst shall be. >>> >>> I have sampled the CPU usage of a bursty container in 100ms periods. The statistics are: >> >> So CPU usage isn't exactly what is required, job execution time is what >> you're after. Assuming there is a relation... >> > > Yes, job execution time is important. To be specific, it is to improve the CPU usage of the whole > system to reduce the total cost of ownership, while not damaging job execution time. This > requires lower the average CPU resource of underutilized cgroups, and allowing their bursts > at the same time. > >>> average : 42.2% >>> stddev : 81.5% >>> max : 844.5% >>> P95 : 183.3% >>> P99 : 437.0% >> >> Then your WCET is 844% of 100ms ? , which is .84s. >> >> But you forgot your mode; what is the most common duration, given P95 is >> so high, I doubt that avg is representative of the most common duration. >> > > It is true. > >>> If quota is 100000ms, burst buffer needs to be 8 times more in order >>> for this workload not to be throttled. >> >> Where does that 100s come from? And an 800s burst is bizarre. >> >> Did you typo [us] as [ms] ? >> > > Sorry, it should be 100000us. > >>> I can't say this is typical, but these workloads exist. On a machine >>> running Kubernetes containers, where there is often room for such >>> burst and the interference is hard to notice, users would prefer >>> allowing such burst to being throttled occasionally. >> >> Users also want ponies. I've no idea what kubernetes actually is or what >> it has to do with containers. That's all just word salad. >> >>> In this sense, I suggest limit burst buffer to 16 times of quota or >>> around. That should be enough for users to improve tail latency caused >>> by throttling. And users might choose a smaller one or even none, if >>> the interference is unacceptable. What do you think? >> >> Well, normal RT theory would suggest you pick your runtime around 200% >> to get that P95 and then allow a full period burst to get your P99, but >> that same RT theory would also have you calculate the resulting >> interference and see if that works with the rest of the system... >> > > I am sorry that I don't know much about the RT theory you mentioned, and can't provide > the desired calculation now. But I'd like to try and do some reading if that is needed. > >> 16 times is horrific. > > So can we decide on a more relative value now? Or is the interference probabilities still the > missing piece? A more [realistic] value, I mean. > > Is the paper you mentioned about called "Insensitivity results in statistical bandwidth sharing", > or some related ones on statistical bandwidth results under some kind of fairness?