From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6D9EC433E0 for ; Sun, 7 Jun 2020 17:24:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A13142074B for ; Sun, 7 Jun 2020 17:24:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=telus.net header.i=@telus.net header.b="ZLKYyaAx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726818AbgFGRYc (ORCPT ); Sun, 7 Jun 2020 13:24:32 -0400 Received: from cmta16.telus.net ([209.171.16.89]:35234 "EHLO cmta16.telus.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726694AbgFGRYc (ORCPT ); Sun, 7 Jun 2020 13:24:32 -0400 Received: from dougxps ([173.180.45.4]) by cmsmtp with SMTP id hz21jefG52DNIhz22juPq7; Sun, 07 Jun 2020 11:24:30 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telus.net; s=neo; t=1591550670; bh=bjB5OQj4KnJNWDep2n0cWBgNUf5MHV0Y7f0NGsHlhIE=; h=From:To:Cc:References:In-Reply-To:Subject:Date; b=ZLKYyaAx5rLDrxUzmKoZhsKN9j8NEOvuqjK8fmHPTHbAg35GE1dIwE8TcqK1kkdzu WjYBbPyyWrkra7IMJgpvT7+zwTZowuV2zUW6EapiN3hviZTem1gK39gqdmyE7rdOZq IiZ94yqjISbsO0HEmmIrIS0zcaWLdtrejAjzFC+A5G6LV4DqyvMZnGFD9R8j5MMkWK 5GTcFCxL0IH/wokRbh41jBK7TxkxB30PtVOUF45Q1MSGEQlT9u10vAEwdvJW9QwFq7 5WEGUA7hVgMakLv5c5kCBl9YRGgkcUSU9o5yyTxojp/F74+ltGLUATWqqfMhJzGYRU 0vigkeCqcdwGA== X-Telus-Authed: none X-Authority-Analysis: v=2.3 cv=H+qlPNQi c=1 sm=1 tr=0 a=zJWegnE7BH9C0Gl4FFgQyA==:117 a=zJWegnE7BH9C0Gl4FFgQyA==:17 a=Pyq9K9CWowscuQLKlpiwfMBGOR0=:19 a=IkcTkHD0fZMA:10 a=ZrPGNbafH8e2tRZWgBQA:9 a=QEXdDO2ut3YA:10 From: "Doug Smythies" To: "'Alexander Monakov'" Cc: , "'Linux PM'" , "'Peter Zijlstra'" , "'Giovanni Gherdovich'" , , , "'Valentin Schneider'" , "'Vincent Guittot'" , "'Rafael J. Wysocki'" References: In-Reply-To: Subject: RE: schedutil issue with serial workloads Date: Sun, 7 Jun 2020 10:24:24 -0700 Message-ID: <000201d63cf0$7ff6d9f0$7fe48dd0$@net> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AdY7WYonjz95N/d2SmuVF/i870KCdQAvLDJQ Content-Language: en-ca X-CMAE-Envelope: MS4wfJy1ysNcj8Wpnx8SI0svMznv4ha65BMUhAaA5NjJJkF9x/0GObetZ+GN25ECQdHa/sxUgUq3op9/KpbafwzIW7G6cZ2teJzcFr4s0zwxW0iVTv7mZRIM TgLVB4k3CxWolDRvFE85ZfHufgd0XQP3x6pbMkOqCXlWUe63ktNeAjGPys1f4mY3UjHRm+xEkN8n9vt1664cyJKCngyyf7Ht1rL4dgV6TmqyBJOSoIpnboBn 7j3CeOwttKx/IE2K32aW2rblPdSbZZFoIgIfDCoW5RMvj5IFeaw9DeJli2NNBSBfNtiBeKlCR5LZtgLmg2YS1jomjy1iDx0ScizSzT+mS80zrmhsMgRoFsip zHMLM5q+kStIsJJqjGL4BGdSA3ivV6lIMhRGRIHP1ddCU0dO/esjL6Mwp0rodaI2JSEqBb81eTx94HZbzThXCNYFuklt1iU2QUObB4sMr1Ch9GtyzeT5o9Sh ApCePXo50SmRbxxY Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020.06.05 Rafael J. Wysocki wrote: > On 6/4/2020 11:29 PM, Alexander Monakov wrote: > > Hello, > > Hi, > > Let's make more people see your report. > > +Peter, Giovanni, Quentin, Juri, Valentin, Vincent, Doug, and linux-pm. > >> this is a question/bugreport about behavior of schedutil on serial workloads >> such as rsync, or './configure', or 'make install'. These workloads are >> such that there's no single task that takes a substantial portion of CPU >> time, but at any moment there's at least one runnable task, and overall >> the workload is compute-bound. To run the workload efficiently, cpufreq >> governor should select a high frequency. >> >> Assume the system is idle except for the workload in question. >> >> Sadly, schedutil will select the lowest frequency, unless the workload is >> confined to one core with taskset (in which case it will select the >> highest frequency, correctly though somewhat paradoxically). > > That's because the CPU utilization generated by the workload on all CPUs > is small. > > Confining it to one CPU causes the utilization of this one to grow and > so schedutil selects a higher frequency for it. > >> This sounds like it should be a known problem, but I couldn't find any >> mention of it in the documentation. Yes, this issue is very well known, and has been discussed on this list several times, going back many years (and I likely missed some of the discussions). In recent years Giovanni's git "make test" has been the "goto" example for this. From that test, which has run to run variability due to disk I/O, I made some test that varys PIDs per second verses time. Giovanni's recent work on frequency invariance made a huge difference for the schedutil response to this type of serialized workflow. For my part of it: I only ever focused on a new PID per work packet serialized workflow; Since my last testing on this subject in January, I fell behind with system issues and infrastructure updates. Your workflow example is fascinating and rather revealing. I will make use of it moving forward. Thank you. Yes, schedutil basically responds poorly as it did for PIDs/second based workflow before frequency invariance, but...(digression follows)... Typically, I merely set the performance governor whenever I know I will be doing serialized workflow, or whenever I just want the job done the fastest (i.e. kernel compile). If I use performance mode (hwp disabled, either active or passive, doesn't matter), then I can not get the CPU frequency to max, even if I set: $ grep . /sys/devices/system/cpu/intel_pstate/m??_perf_pct /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:100 I have to increase EPB all way to 1 to get to max CPU frequency. There also is extreme hysteresis, as I have to back to 9 for the frequency to drop again. The above was an i5-9600K. My much older i7-9600K, works fine with default EPB of 6. I had not previously realized there was so much difference between processors and EPB. I don't have time to dig deeper right now, but will in future. >> I was able to replicate the effect with a pair of 'ping-pong' programs >> that get a token, burn some cycles to simulate work, and pass the token. >> Thus, each program has 50% CPU utilization. To repeat my test: >> >> gcc -O2 pingpong.c -o pingpong >> mkfifo ping >> mkfifo pong >> taskset -c 0 ./pingpong 1000000 < ping > pong & >> taskset -c 1 ./pingpong 1000000 < pong > ping & >> echo > ping >> >> #include >> #include >> int main(int argc, char *argv[]) >> { >> unsigned i, n; >> sscanf(argv[1], "%u", &n); >> for (;;) { >> char c; >> read(0, &c, 1); >> for (i = n; i; i--) >> asm("" :: "r"(i)); >> write(1, &c, 1); >> } >> } >> >> Alexander It was not obvious to me what the approximate work/sleep frequency would be for your work flow. For my version of it I made the loop time slower on purpose, and because I could merely adjust "N" to compensate. I measured 100 hertz work/sleep frequency per CPU, but my pipeline is 6 instead of 2. Just for the record, this is what I did: doug@s18:~/c$ cat pingpong.c #include #include int main(int argc, char *argv[]) { unsigned i, n, k; sscanf(argv[1], "%u", &n); while(1) { char c; read(0, &c, 1); for (i = n; i; i--){ k = i; k = k++; } write(1, &c, 1); } } Compiled with: cc pingpong.c -o pingpong and run with (on purpose, I did not force CPU affinity, as I wanted schedutil to decide (when it was the governor, at least)): #! /bin/dash # # ping-pong-test Smythies 2019.06.06 # serialized workflow, but same PID. # from Alexander, but modified. # # because I always forget from last time killall pingpong rm --force pong1 rm --force pong2 rm --force pong3 rm --force pong4 rm --force pong5 rm --force pong6 mkfifo pong1 mkfifo pong2 mkfifo pong3 mkfifo pong4 mkfifo pong5 mkfifo pong6 ~/c/pingpong 1000000 < pong1 > pong2 & ~/c/pingpong 1000000 < pong2 > pong3 & ~/c/pingpong 1000000 < pong3 > pong4 & ~/c/pingpong 1000000 < pong4 > pong5 & ~/c/pingpong 1000000 < pong5 > pong6 & ~/c/pingpong 1000000 < pong6 > pong1 & echo > pong1 To measure work/sleep frequency, I made a version that would only run, say, 10,000 times and timed it. ... Doug