From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=HPGY=7U=vger.kernel.org=linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C6D9EC433E0
	for <linux-kernel@archiver.kernel.org>; Sun,  7 Jun 2020 17:24:33 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id A13142074B
	for <linux-kernel@archiver.kernel.org>; Sun,  7 Jun 2020 17:24:33 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=telus.net header.i=@telus.net header.b="ZLKYyaAx"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726818AbgFGRYc (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Sun, 7 Jun 2020 13:24:32 -0400
Received: from cmta16.telus.net ([209.171.16.89]:35234 "EHLO cmta16.telus.net"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1726694AbgFGRYc (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 7 Jun 2020 13:24:32 -0400
Received: from dougxps ([173.180.45.4])
        by cmsmtp with SMTP
        id hz21jefG52DNIhz22juPq7; Sun, 07 Jun 2020 11:24:30 -0600
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telus.net; s=neo;
        t=1591550670; bh=bjB5OQj4KnJNWDep2n0cWBgNUf5MHV0Y7f0NGsHlhIE=;
        h=From:To:Cc:References:In-Reply-To:Subject:Date;
        b=ZLKYyaAx5rLDrxUzmKoZhsKN9j8NEOvuqjK8fmHPTHbAg35GE1dIwE8TcqK1kkdzu
         WjYBbPyyWrkra7IMJgpvT7+zwTZowuV2zUW6EapiN3hviZTem1gK39gqdmyE7rdOZq
         IiZ94yqjISbsO0HEmmIrIS0zcaWLdtrejAjzFC+A5G6LV4DqyvMZnGFD9R8j5MMkWK
         5GTcFCxL0IH/wokRbh41jBK7TxkxB30PtVOUF45Q1MSGEQlT9u10vAEwdvJW9QwFq7
         5WEGUA7hVgMakLv5c5kCBl9YRGgkcUSU9o5yyTxojp/F74+ltGLUATWqqfMhJzGYRU
         0vigkeCqcdwGA==
X-Telus-Authed: none
X-Authority-Analysis: v=2.3 cv=H+qlPNQi c=1 sm=1 tr=0
 a=zJWegnE7BH9C0Gl4FFgQyA==:117 a=zJWegnE7BH9C0Gl4FFgQyA==:17
 a=Pyq9K9CWowscuQLKlpiwfMBGOR0=:19 a=IkcTkHD0fZMA:10 a=ZrPGNbafH8e2tRZWgBQA:9
 a=QEXdDO2ut3YA:10
From:   "Doug Smythies" <dsmythies@telus.net>
To:     "'Alexander Monakov'" <amonakov@ispras.ru>
Cc:     <linux-kernel@vger.kernel.org>,
        "'Linux PM'" <linux-pm@vger.kernel.org>,
        "'Peter Zijlstra'" <peterz@infradead.org>,
        "'Giovanni Gherdovich'" <ggherdovich@suse.cz>,
        <qperret@google.com>, <juri.lelli@redhat.com>,
        "'Valentin Schneider'" <valentin.schneider@arm.com>,
        "'Vincent Guittot'" <vincent.guittot@linaro.org>,
        "'Rafael J. Wysocki'" <rafael.j.wysocki@intel.com>
References: <alpine.LNX.2.20.13.2006042341160.3984@monopod.intra.ispras.ru> <c3145e26-56c8-4979-513c-cfac191e989b@intel.com>
In-Reply-To: <c3145e26-56c8-4979-513c-cfac191e989b@intel.com>
Subject: RE: schedutil issue with serial workloads
Date:   Sun, 7 Jun 2020 10:24:24 -0700
Message-ID: <000201d63cf0$7ff6d9f0$7fe48dd0$@net>
MIME-Version: 1.0
Content-Type: text/plain;
        charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AdY7WYonjz95N/d2SmuVF/i870KCdQAvLDJQ
Content-Language: en-ca
X-CMAE-Envelope: MS4wfJy1ysNcj8Wpnx8SI0svMznv4ha65BMUhAaA5NjJJkF9x/0GObetZ+GN25ECQdHa/sxUgUq3op9/KpbafwzIW7G6cZ2teJzcFr4s0zwxW0iVTv7mZRIM
 TgLVB4k3CxWolDRvFE85ZfHufgd0XQP3x6pbMkOqCXlWUe63ktNeAjGPys1f4mY3UjHRm+xEkN8n9vt1664cyJKCngyyf7Ht1rL4dgV6TmqyBJOSoIpnboBn
 7j3CeOwttKx/IE2K32aW2rblPdSbZZFoIgIfDCoW5RMvj5IFeaw9DeJli2NNBSBfNtiBeKlCR5LZtgLmg2YS1jomjy1iDx0ScizSzT+mS80zrmhsMgRoFsip
 zHMLM5q+kStIsJJqjGL4BGdSA3ivV6lIMhRGRIHP1ddCU0dO/esjL6Mwp0rodaI2JSEqBb81eTx94HZbzThXCNYFuklt1iU2QUObB4sMr1Ch9GtyzeT5o9Sh
 ApCePXo50SmRbxxY
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 2020.06.05 Rafael J. Wysocki wrote:
> On 6/4/2020 11:29 PM, Alexander Monakov wrote:
> > Hello,
> 
> Hi,
> 
> Let's make more people see your report.
> 
> +Peter, Giovanni, Quentin, Juri, Valentin, Vincent, Doug, and linux-pm.
> 
>> this is a question/bugreport about behavior of schedutil on serial workloads
>> such as rsync, or './configure', or 'make install'. These workloads are
>> such that there's no single task that takes a substantial portion of CPU
>> time, but at any moment there's at least one runnable task, and overall
>> the workload is compute-bound. To run the workload efficiently, cpufreq
>> governor should select a high frequency.
>>
>> Assume the system is idle except for the workload in question.
>>
>> Sadly, schedutil will select the lowest frequency, unless the workload is
>> confined to one core with taskset (in which case it will select the
>> highest frequency, correctly though somewhat paradoxically).
> 
> That's because the CPU utilization generated by the workload on all CPUs
> is small.
> 
> Confining it to one CPU causes the utilization of this one to grow and
> so schedutil selects a higher frequency for it.
> 
>> This sounds like it should be a known problem, but I couldn't find any
>> mention of it in the documentation.

Yes, this issue is very well known, and has been discussed on this list
several times, going back many years (and I likely missed some of the
discussions). In recent years Giovanni's git "make test" has
been the "goto" example for this. From that test, which has run to run
variability due to disk I/O, I made some test that varys PIDs per second
verses time. Giovanni's recent work on frequency invariance made a huge
difference for the schedutil response to this type of serialized workflow.

For my part of it:
I only ever focused on a new PID per work packet serialized workflow;
Since my last testing on this subject in January, I fell behind with
system issues and infrastructure updates.

Your workflow example is fascinating and rather revealing.
I will make use of it moving forward. Thank you.

Yes, schedutil basically responds poorly as it did for PIDs/second
based workflow before frequency invariance, but...(digression follows)...

Typically, I merely set the performance governor whenever I know
I will be doing serialized workflow, or whenever I just want the
job done the fastest (i.e. kernel compile).

If I use performance mode (hwp disabled, either active or passive,
doesn't matter), then I can not get the CPU frequency to max,
even if I set:

$ grep . /sys/devices/system/cpu/intel_pstate/m??_perf_pct
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100

I have to increase EPB all way to 1 to get to max CPU frequency.
There also is extreme hysteresis, as I have to back to 9 for
the frequency to drop again.

The above was an i5-9600K. My much older i7-9600K, works fine
with default EPB of 6. I had not previously realized there was
so much difference between processors and EPB.

I don't have time to dig deeper right now, but will in future.

>> I was able to replicate the effect with a pair of 'ping-pong' programs
>> that get a token, burn some cycles to simulate work, and pass the token.
>> Thus, each program has 50% CPU utilization. To repeat my test:
>>
>> gcc -O2 pingpong.c -o pingpong
>> mkfifo ping
>> mkfifo pong
>> taskset -c 0 ./pingpong 1000000 < ping > pong &
>> taskset -c 1 ./pingpong 1000000 < pong > ping &
>> echo > ping
>>
>> #include <stdio.h>
>> #include <unistd.h>
>> int main(int argc, char *argv[])
>> {
>> 	unsigned i, n;
>> 	sscanf(argv[1], "%u", &n);
>> 	for (;;) {
>> 		char c;
>> 		read(0, &c, 1);
>> 		for (i = n; i; i--)
>> 			asm("" :: "r"(i));
>> 		write(1, &c, 1);
>> 	}
>> }
>>
>> Alexander

It was not obvious to me what the approximate work/sleep frequency would be for
your work flow. For my version of it I made the loop time slower on purpose, and
because I could merely adjust "N" to compensate. I measured 100 hertz work/sleep
frequency per CPU, but my pipeline is 6 instead of 2.

Just for the record, this is what I did:

doug@s18:~/c$ cat pingpong.c
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
        unsigned i, n, k;
        sscanf(argv[1], "%u", &n);
        while(1) {
                char c;
                read(0, &c, 1);
                for (i = n; i; i--){
                        k = i;
                        k = k++;
                }
                write(1, &c, 1);
        }
}

Compiled with:

cc pingpong.c -o pingpong

and run with (on purpose, I did not force CPU affinity,
as I wanted schedutil to decide (when it was the
governor, at least)):

#! /bin/dash
#
# ping-pong-test Smythies 2019.06.06
#       serialized workflow, but same PID.
#       from Alexander, but modified.
#

# because I always forget from last time
killall pingpong

rm --force pong1
rm --force pong2
rm --force pong3
rm --force pong4
rm --force pong5
rm --force pong6

mkfifo pong1
mkfifo pong2
mkfifo pong3
mkfifo pong4
mkfifo pong5
mkfifo pong6
~/c/pingpong 1000000 < pong1 > pong2 &
~/c/pingpong 1000000 < pong2 > pong3 &
~/c/pingpong 1000000 < pong3 > pong4 &
~/c/pingpong 1000000 < pong4 > pong5 &
~/c/pingpong 1000000 < pong5 > pong6 &
~/c/pingpong 1000000 < pong6 > pong1 &
echo > pong1

To measure work/sleep frequency, I made a
version that would only run, say, 10,000 times
and timed it.

... Doug