linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wyes Karny <wkarny@gmail.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>
Subject: Re: [GIT PULL] Scheduler changes for v6.8
Date: Sun, 14 Jan 2024 18:07:59 +0530	[thread overview]
Message-ID: <20240114123759.pjs7ctexcpc6pshl@wyes-pc> (raw)
In-Reply-To: <ZaPC7o44lEswxOXp@vingu-book>

On Sun, Jan 14, 2024 at 12:18:06PM +0100, Vincent Guittot wrote:
> Hi Wyes,
> 
> Le dimanche 14 janv. 2024 à 14:42:40 (+0530), Wyes Karny a écrit :
> > On Wed, Jan 10, 2024 at 02:57:14PM -0800, Linus Torvalds wrote:
> > > On Wed, 10 Jan 2024 at 14:41, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > It's one of these two:
> > > >
> > > >   f12560779f9d sched/cpufreq: Rework iowait boost
> > > >   9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation
> > > >
> > > > one more boot to go, then I'll try to revert whichever causes my
> > > > machine to perform horribly much worse.
> > > 
> > > I guess it should come as no surprise that the result is
> > > 
> > >    9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d is the first bad commit
> > > 
> > > but to revert cleanly I will have to revert all of
> > > 
> > >       b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> > >       f12560779f9d ("sched/cpufreq: Rework iowait boost")
> > >       9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> > > performance estimation")
> > > 
> > > This is on a 32-core (64-thread) AMD Ryzen Threadripper 3970X, fwiw.
> > > 
> > > I'll keep that revert in my private test-tree for now (so that I have
> > > a working machine again), but I'll move it to my main branch soon
> > > unless somebody has a quick fix for this problem.
> > 
> > Hi Linus,
> > 
> > I'm able to reproduce this issue with my AMD Ryzen 5600G system.  But
> > only if I disable CPPC in BIOS and boot with acpi-cpufreq + schedutil.
> > (I believe for your case also CPPC is diabled as log "_CPC object is not
> > present" came). Enabling CPPC in BIOS issue not seen in my system.  For
> > AMD acpi-cpufreq also uses _CPC object to determine the boost ratio.
> > When CPPC is disabled in BIOS something is going wrong and max
> > capacity is becoming zero.
> > 
> > Hi Vincent, Qais,
> > 
> > I have collected some data with bpftracing:
> 
> Thanks for your tests results
> 
> > 
> > sudo bpftrace -e 'kretprobe:effective_cpu_util /cpu == 1/ { @eff_util = lhist(retval, 0, 1200, 50);} kprobe:get_next_freq /cpu == 1/ { @sugov_eff_util = lhist(arg1, 0, 1200, 50); @sugov_max_cap = lhist(arg2, 0, 1000, 2);} kretprobe:get_next_freq /cpu == 1/ { @sugov_freq = lhist(retval, 1000000, 5000000, 100000);}'
> > 
> > with running: taskset -c 1 make
> > 
> > issue case:
> > 
> > Attaching 3 probes...
> > @eff_util:
> > [0, 50)             1263 |@                                                   |
> > [50, 100)            517 |                                                    |
> > [100, 150)           233 |                                                    |
> > [150, 200)           297 |                                                    |
> > [200, 250)           162 |                                                    |
> > [250, 300)            98 |                                                    |
> > [300, 350)            75 |                                                    |
> > [350, 400)           205 |                                                    |
> > [400, 450)           210 |                                                    |
> > [450, 500)            16 |                                                    |
> > [500, 550)          1532 |@                                                   |
> > [550, 600)          1026 |                                                    |
> > [600, 650)           761 |                                                    |
> > [650, 700)           876 |                                                    |
> > [700, 750)          1085 |                                                    |
> > [750, 800)           891 |                                                    |
> > [800, 850)           816 |                                                    |
> > [850, 900)           983 |                                                    |
> > [900, 950)           661 |                                                    |
> > [950, 1000)          759 |                                                    |
> > [1000, 1050)       57433 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> 
> ok so the output of effective_cpu_util() seems correct or at least to maw utilization
> value. In order to be correct, it means that arch_scale_cpu_capacity(cpu) is not zero
> because of :
> 
> unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> 				 unsigned long *min,
> 				 unsigned long *max)
> {
> 	unsigned long util, irq, scale;
> 	struct rq *rq = cpu_rq(cpu);
> 
> 	scale = arch_scale_cpu_capacity(cpu);
> 
> 	/*
> 	 * Early check to see if IRQ/steal time saturates the CPU, can be
> 	 * because of inaccuracies in how we track these -- see
> 	 * update_irq_load_avg().
> 	 */
> 	irq = cpu_util_irq(rq);
> 	if (unlikely(irq >= scale)) {
> 		if (min)
> 			*min = scale;
> 		if (max)
> 			*max = scale;
> 		return scale;
> 	}
> ...
> }
> 
> If arch_scale_cpu_capacity(cpu) returns 0 then effective_cpu_util() should returns
> 0 too.
> 
> Now see below
> 
> > @sugov_eff_util:
> > [0, 50)             1074 |                                                    |
> > [50, 100)            571 |                                                    |
> > [100, 150)           259 |                                                    |
> > [150, 200)           169 |                                                    |
> > [200, 250)           237 |                                                    |
> > [250, 300)           156 |                                                    |
> > [300, 350)            91 |                                                    |
> > [350, 400)            46 |                                                    |
> > [400, 450)            52 |                                                    |
> > [450, 500)           195 |                                                    |
> > [500, 550)           175 |                                                    |
> > [550, 600)            46 |                                                    |
> > [600, 650)           493 |                                                    |
> > [650, 700)          1424 |@                                                   |
> > [700, 750)           646 |                                                    |
> > [750, 800)           628 |                                                    |
> > [800, 850)           612 |                                                    |
> > [850, 900)           840 |                                                    |
> > [900, 950)           893 |                                                    |
> > [950, 1000)          640 |                                                    |
> > [1000, 1050)       60679 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_freq:
> > [1400000, 1500000)   69911 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_max_cap:
> > [0, 2)             69926 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> In get_next_freq(struct sugov_policy *sg_policy, unsigned long util, unsigned long max)
> 
> max is 0 and we comes from this path:
> 
> static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
> 				     unsigned int flags)
> {
> 
> ...
> 	max_cap = arch_scale_cpu_capacity(sg_cpu->cpu);
> 
> 	if (!sugov_update_single_common(sg_cpu, time, max_cap, flags))
> 		return;
> 
> 	next_f = get_next_freq(sg_policy, sg_cpu->util, max_cap);
> ...
> 
> so here arch_scale_cpu_capacity(sg_cpu->cpu) returns 0 ...
> 
> AFAICT, AMD platform uses the default 
> static __always_inline
> unsigned long arch_scale_cpu_capacity(int cpu)
> {
> 	return SCHED_CAPACITY_SCALE;
> }
> 
> I'm missing something here
> 
> > 
> > 
> > good case:
> > 
> > Attaching 3 probes...
> > @eff_util:
> > [0, 50)              246 |@                                                   |
> > [50, 100)            150 |@                                                   |
> > [100, 150)           191 |@                                                   |
> > [150, 200)           239 |@                                                   |
> > [200, 250)           117 |                                                    |
> > [250, 300)          2101 |@@@@@@@@@@@@@@@                                     |
> > [300, 350)          2284 |@@@@@@@@@@@@@@@@                                    |
> > [350, 400)           713 |@@@@@                                               |
> > [400, 450)           151 |@                                                   |
> > [450, 500)           154 |@                                                   |
> > [500, 550)          1121 |@@@@@@@@                                            |
> > [550, 600)          1901 |@@@@@@@@@@@@@                                       |
> > [600, 650)          1208 |@@@@@@@@                                            |
> > [650, 700)           606 |@@@@                                                |
> > [700, 750)           557 |@@@                                                 |
> > [750, 800)           872 |@@@@@@                                              |
> > [800, 850)          1092 |@@@@@@@                                             |
> > [850, 900)          1416 |@@@@@@@@@@                                          |
> > [900, 950)          1107 |@@@@@@@                                             |
> > [950, 1000)         1051 |@@@@@@@                                             |
> > [1000, 1050)        7260 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_eff_util:
> > [0, 50)              241 |                                                    |
> > [50, 100)            149 |                                                    |
> > [100, 150)            72 |                                                    |
> > [150, 200)            95 |                                                    |
> > [200, 250)            43 |                                                    |
> > [250, 300)            49 |                                                    |
> > [300, 350)            19 |                                                    |
> > [350, 400)            56 |                                                    |
> > [400, 450)            22 |                                                    |
> > [450, 500)            29 |                                                    |
> > [500, 550)          1840 |@@@@@@                                              |
> > [550, 600)          1476 |@@@@@                                               |
> > [600, 650)          1027 |@@@                                                 |
> > [650, 700)           473 |@                                                   |
> > [700, 750)           366 |@                                                   |
> > [750, 800)           627 |@@                                                  |
> > [800, 850)           930 |@@@                                                 |
> > [850, 900)          1285 |@@@@                                                |
> > [900, 950)           971 |@@@                                                 |
> > [950, 1000)          946 |@@@                                                 |
> > [1000, 1050)       13839 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_freq:
> > [1400000, 1500000)     648 |@                                                   |
> > [1500000, 1600000)       0 |                                                    |
> > [1600000, 1700000)       0 |                                                    |
> > [1700000, 1800000)      25 |                                                    |
> > [1800000, 1900000)       0 |                                                    |
> > [1900000, 2000000)       0 |                                                    |
> > [2000000, 2100000)       0 |                                                    |
> > [2100000, 2200000)       0 |                                                    |
> > [2200000, 2300000)       0 |                                                    |
> > [2300000, 2400000)       0 |                                                    |
> > [2400000, 2500000)       0 |                                                    |
> > [2500000, 2600000)       0 |                                                    |
> > [2600000, 2700000)       0 |                                                    |
> > [2700000, 2800000)       0 |                                                    |
> > [2800000, 2900000)       0 |                                                    |
> > [2900000, 3000000)       0 |                                                    |
> > [3000000, 3100000)       0 |                                                    |
> > [3100000, 3125K)       0 |                                                    |
> > [3125K, 3300000)       0 |                                                    |
> > [3300000, 3400000)       0 |                                                    |
> > [3400000, 3500000)       0 |                                                    |
> > [3500000, 3600000)       0 |                                                    |
> > [3600000, 3700000)       0 |                                                    |
> > [3700000, 3800000)       0 |                                                    |
> > [3800000, 3900000)       0 |                                                    |
> > [3900000, 4000000)   23879 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_max_cap:
> > [0, 2)             24555 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > In both case max_cap is zero but selected freq is incorrect in bad case.
> 
> Also we have in get_next_freq():
> 	freq = map_util_freq(util, freq, max);
> 	       --> util * freq /max
> 
> If max was 0, we should have been an error ?
> 
> There is something strange that I don't understand
> 
> Could you trace on the return of sugov_get_util()
> the value of sg_cpu->util ?

Yeah, correct something was wrong in the bpftrace readings, max_cap is
not zero in traces.

             git-5511    [001] d.h1.   427.159763: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.163733: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.163735: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.167706: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.167708: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.171678: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.171679: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.175653: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.175655: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.s1.   427.175665: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.s1.   427.175665: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024

Debug patch applied:

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 95c3c097083e..5c9b3e1de7a0 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -166,6 +166,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,

        freq = get_capacity_ref_freq(policy);
        freq = map_util_freq(util, freq, max);
+       trace_printk("[DEBUG] : freq %llu, util %llu, max %llu\n", freq, util, max);

        if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
                return sg_policy->next_freq;
@@ -199,6 +200,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
        util = max(util, boost);
        sg_cpu->bw_min = min;
        sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max);
+       trace_printk("[DEBUG] : util %llu, sg_cpu->util %llu\n", util, sg_cpu->util);
 }

 /**


So, I guess map_util_freq going wrong somewhere.

Thanks,
Wyes
> 
> Thanks for you help
> Vincent
> 
> > 
> > Thanks,
> > Wyes
> > 

  reply	other threads:[~2024-01-14 12:38 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-28 12:23 [GIT PULL] Scheduler changes for v6.7 Ingo Molnar
2023-10-30 23:50 ` pr-tracker-bot
2024-01-08 14:07 ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-09  4:04   ` pr-tracker-bot
2024-01-10 22:19   ` Linus Torvalds
2024-01-10 22:41     ` Linus Torvalds
2024-01-10 22:57       ` Linus Torvalds
2024-01-11  8:11         ` Vincent Guittot
2024-01-11 17:45           ` Linus Torvalds
2024-01-11 17:53             ` Linus Torvalds
2024-01-11 18:16               ` Vincent Guittot
2024-01-12 14:23                 ` Dietmar Eggemann
2024-01-12 16:58                   ` Vincent Guittot
2024-01-12 18:18                   ` Qais Yousef
2024-01-12 19:03                     ` Vincent Guittot
2024-01-12 20:30                       ` Linus Torvalds
2024-01-12 20:49                         ` Linus Torvalds
2024-01-12 21:04                           ` Linus Torvalds
2024-01-13  1:04                             ` Qais Yousef
2024-01-13  1:24                               ` Linus Torvalds
2024-01-13  1:31                                 ` Linus Torvalds
2024-01-13 10:47                                   ` Vincent Guittot
2024-01-13 18:33                                     ` Qais Yousef
2024-01-13 18:37                                 ` Qais Yousef
2024-01-11 11:09         ` [GIT PULL] scheduler fixes Ingo Molnar
2024-01-11 13:04           ` Vincent Guittot
2024-01-11 20:48             ` [PATCH] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit Ingo Molnar
2024-01-11 22:22               ` Vincent Guittot
2024-01-12 18:24               ` Ingo Molnar
2024-01-12 18:26         ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-14  9:12         ` Wyes Karny
2024-01-14 11:18           ` Vincent Guittot
2024-01-14 12:37             ` Wyes Karny [this message]
2024-01-14 13:02               ` Dietmar Eggemann
2024-01-14 13:05                 ` Vincent Guittot
2024-01-14 13:03               ` Vincent Guittot
2024-01-14 15:12                 ` Qais Yousef
2024-01-14 15:20                   ` Vincent Guittot
2024-01-14 19:58                     ` Qais Yousef
2024-01-14 23:37                       ` Qais Yousef
2024-01-15  6:25                         ` Wyes Karny
2024-01-15 11:59                           ` Qais Yousef
2024-01-15  8:21                       ` Vincent Guittot
2024-01-15 12:09                         ` Qais Yousef
2024-01-15 13:26                           ` Vincent Guittot
2024-01-15 14:03                             ` Dietmar Eggemann
2024-01-15 15:26                               ` Vincent Guittot
2024-01-15 20:05                                 ` Dietmar Eggemann
2024-01-15  8:42                       ` David Laight
2024-01-14 18:11                 ` Wyes Karny
2024-01-14 18:18                   ` Vincent Guittot
2024-01-11  9:33     ` Ingo Molnar
2024-01-11 11:14     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commits tip-bot2 for Ingo Molnar
2024-01-11 20:55     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit tip-bot2 for Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240114123759.pjs7ctexcpc6pshl@wyes-pc \
    --to=wkarny@gmail.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).