All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wyes Karny <wkarny@gmail.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>
Subject: Re: [GIT PULL] Scheduler changes for v6.8
Date: Sun, 14 Jan 2024 18:07:59 +0530	[thread overview]
Message-ID: <20240114123759.pjs7ctexcpc6pshl@wyes-pc> (raw)
In-Reply-To: <ZaPC7o44lEswxOXp@vingu-book>

On Sun, Jan 14, 2024 at 12:18:06PM +0100, Vincent Guittot wrote:
> Hi Wyes,
> 
> Le dimanche 14 janv. 2024 à 14:42:40 (+0530), Wyes Karny a écrit :
> > On Wed, Jan 10, 2024 at 02:57:14PM -0800, Linus Torvalds wrote:
> > > On Wed, 10 Jan 2024 at 14:41, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > It's one of these two:
> > > >
> > > >   f12560779f9d sched/cpufreq: Rework iowait boost
> > > >   9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation
> > > >
> > > > one more boot to go, then I'll try to revert whichever causes my
> > > > machine to perform horribly much worse.
> > > 
> > > I guess it should come as no surprise that the result is
> > > 
> > >    9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d is the first bad commit
> > > 
> > > but to revert cleanly I will have to revert all of
> > > 
> > >       b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> > >       f12560779f9d ("sched/cpufreq: Rework iowait boost")
> > >       9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> > > performance estimation")
> > > 
> > > This is on a 32-core (64-thread) AMD Ryzen Threadripper 3970X, fwiw.
> > > 
> > > I'll keep that revert in my private test-tree for now (so that I have
> > > a working machine again), but I'll move it to my main branch soon
> > > unless somebody has a quick fix for this problem.
> > 
> > Hi Linus,
> > 
> > I'm able to reproduce this issue with my AMD Ryzen 5600G system.  But
> > only if I disable CPPC in BIOS and boot with acpi-cpufreq + schedutil.
> > (I believe for your case also CPPC is diabled as log "_CPC object is not
> > present" came). Enabling CPPC in BIOS issue not seen in my system.  For
> > AMD acpi-cpufreq also uses _CPC object to determine the boost ratio.
> > When CPPC is disabled in BIOS something is going wrong and max
> > capacity is becoming zero.
> > 
> > Hi Vincent, Qais,
> > 
> > I have collected some data with bpftracing:
> 
> Thanks for your tests results
> 
> > 
> > sudo bpftrace -e 'kretprobe:effective_cpu_util /cpu == 1/ { @eff_util = lhist(retval, 0, 1200, 50);} kprobe:get_next_freq /cpu == 1/ { @sugov_eff_util = lhist(arg1, 0, 1200, 50); @sugov_max_cap = lhist(arg2, 0, 1000, 2);} kretprobe:get_next_freq /cpu == 1/ { @sugov_freq = lhist(retval, 1000000, 5000000, 100000);}'
> > 
> > with running: taskset -c 1 make
> > 
> > issue case:
> > 
> > Attaching 3 probes...
> > @eff_util:
> > [0, 50)             1263 |@                                                   |
> > [50, 100)            517 |                                                    |
> > [100, 150)           233 |                                                    |
> > [150, 200)           297 |                                                    |
> > [200, 250)           162 |                                                    |
> > [250, 300)            98 |                                                    |
> > [300, 350)            75 |                                                    |
> > [350, 400)           205 |                                                    |
> > [400, 450)           210 |                                                    |
> > [450, 500)            16 |                                                    |
> > [500, 550)          1532 |@                                                   |
> > [550, 600)          1026 |                                                    |
> > [600, 650)           761 |                                                    |
> > [650, 700)           876 |                                                    |
> > [700, 750)          1085 |                                                    |
> > [750, 800)           891 |                                                    |
> > [800, 850)           816 |                                                    |
> > [850, 900)           983 |                                                    |
> > [900, 950)           661 |                                                    |
> > [950, 1000)          759 |                                                    |
> > [1000, 1050)       57433 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> 
> ok so the output of effective_cpu_util() seems correct or at least to maw utilization
> value. In order to be correct, it means that arch_scale_cpu_capacity(cpu) is not zero
> because of :
> 
> unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> 				 unsigned long *min,
> 				 unsigned long *max)
> {
> 	unsigned long util, irq, scale;
> 	struct rq *rq = cpu_rq(cpu);
> 
> 	scale = arch_scale_cpu_capacity(cpu);
> 
> 	/*
> 	 * Early check to see if IRQ/steal time saturates the CPU, can be
> 	 * because of inaccuracies in how we track these -- see
> 	 * update_irq_load_avg().
> 	 */
> 	irq = cpu_util_irq(rq);
> 	if (unlikely(irq >= scale)) {
> 		if (min)
> 			*min = scale;
> 		if (max)
> 			*max = scale;
> 		return scale;
> 	}
> ...
> }
> 
> If arch_scale_cpu_capacity(cpu) returns 0 then effective_cpu_util() should returns
> 0 too.
> 
> Now see below
> 
> > @sugov_eff_util:
> > [0, 50)             1074 |                                                    |
> > [50, 100)            571 |                                                    |
> > [100, 150)           259 |                                                    |
> > [150, 200)           169 |                                                    |
> > [200, 250)           237 |                                                    |
> > [250, 300)           156 |                                                    |
> > [300, 350)            91 |                                                    |
> > [350, 400)            46 |                                                    |
> > [400, 450)            52 |                                                    |
> > [450, 500)           195 |                                                    |
> > [500, 550)           175 |                                                    |
> > [550, 600)            46 |                                                    |
> > [600, 650)           493 |                                                    |
> > [650, 700)          1424 |@                                                   |
> > [700, 750)           646 |                                                    |
> > [750, 800)           628 |                                                    |
> > [800, 850)           612 |                                                    |
> > [850, 900)           840 |                                                    |
> > [900, 950)           893 |                                                    |
> > [950, 1000)          640 |                                                    |
> > [1000, 1050)       60679 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_freq:
> > [1400000, 1500000)   69911 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_max_cap:
> > [0, 2)             69926 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> 
> In get_next_freq(struct sugov_policy *sg_policy, unsigned long util, unsigned long max)
> 
> max is 0 and we comes from this path:
> 
> static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
> 				     unsigned int flags)
> {
> 
> ...
> 	max_cap = arch_scale_cpu_capacity(sg_cpu->cpu);
> 
> 	if (!sugov_update_single_common(sg_cpu, time, max_cap, flags))
> 		return;
> 
> 	next_f = get_next_freq(sg_policy, sg_cpu->util, max_cap);
> ...
> 
> so here arch_scale_cpu_capacity(sg_cpu->cpu) returns 0 ...
> 
> AFAICT, AMD platform uses the default 
> static __always_inline
> unsigned long arch_scale_cpu_capacity(int cpu)
> {
> 	return SCHED_CAPACITY_SCALE;
> }
> 
> I'm missing something here
> 
> > 
> > 
> > good case:
> > 
> > Attaching 3 probes...
> > @eff_util:
> > [0, 50)              246 |@                                                   |
> > [50, 100)            150 |@                                                   |
> > [100, 150)           191 |@                                                   |
> > [150, 200)           239 |@                                                   |
> > [200, 250)           117 |                                                    |
> > [250, 300)          2101 |@@@@@@@@@@@@@@@                                     |
> > [300, 350)          2284 |@@@@@@@@@@@@@@@@                                    |
> > [350, 400)           713 |@@@@@                                               |
> > [400, 450)           151 |@                                                   |
> > [450, 500)           154 |@                                                   |
> > [500, 550)          1121 |@@@@@@@@                                            |
> > [550, 600)          1901 |@@@@@@@@@@@@@                                       |
> > [600, 650)          1208 |@@@@@@@@                                            |
> > [650, 700)           606 |@@@@                                                |
> > [700, 750)           557 |@@@                                                 |
> > [750, 800)           872 |@@@@@@                                              |
> > [800, 850)          1092 |@@@@@@@                                             |
> > [850, 900)          1416 |@@@@@@@@@@                                          |
> > [900, 950)          1107 |@@@@@@@                                             |
> > [950, 1000)         1051 |@@@@@@@                                             |
> > [1000, 1050)        7260 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_eff_util:
> > [0, 50)              241 |                                                    |
> > [50, 100)            149 |                                                    |
> > [100, 150)            72 |                                                    |
> > [150, 200)            95 |                                                    |
> > [200, 250)            43 |                                                    |
> > [250, 300)            49 |                                                    |
> > [300, 350)            19 |                                                    |
> > [350, 400)            56 |                                                    |
> > [400, 450)            22 |                                                    |
> > [450, 500)            29 |                                                    |
> > [500, 550)          1840 |@@@@@@                                              |
> > [550, 600)          1476 |@@@@@                                               |
> > [600, 650)          1027 |@@@                                                 |
> > [650, 700)           473 |@                                                   |
> > [700, 750)           366 |@                                                   |
> > [750, 800)           627 |@@                                                  |
> > [800, 850)           930 |@@@                                                 |
> > [850, 900)          1285 |@@@@                                                |
> > [900, 950)           971 |@@@                                                 |
> > [950, 1000)          946 |@@@                                                 |
> > [1000, 1050)       13839 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_freq:
> > [1400000, 1500000)     648 |@                                                   |
> > [1500000, 1600000)       0 |                                                    |
> > [1600000, 1700000)       0 |                                                    |
> > [1700000, 1800000)      25 |                                                    |
> > [1800000, 1900000)       0 |                                                    |
> > [1900000, 2000000)       0 |                                                    |
> > [2000000, 2100000)       0 |                                                    |
> > [2100000, 2200000)       0 |                                                    |
> > [2200000, 2300000)       0 |                                                    |
> > [2300000, 2400000)       0 |                                                    |
> > [2400000, 2500000)       0 |                                                    |
> > [2500000, 2600000)       0 |                                                    |
> > [2600000, 2700000)       0 |                                                    |
> > [2700000, 2800000)       0 |                                                    |
> > [2800000, 2900000)       0 |                                                    |
> > [2900000, 3000000)       0 |                                                    |
> > [3000000, 3100000)       0 |                                                    |
> > [3100000, 3125K)       0 |                                                    |
> > [3125K, 3300000)       0 |                                                    |
> > [3300000, 3400000)       0 |                                                    |
> > [3400000, 3500000)       0 |                                                    |
> > [3500000, 3600000)       0 |                                                    |
> > [3600000, 3700000)       0 |                                                    |
> > [3700000, 3800000)       0 |                                                    |
> > [3800000, 3900000)       0 |                                                    |
> > [3900000, 4000000)   23879 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > @sugov_max_cap:
> > [0, 2)             24555 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > 
> > In both case max_cap is zero but selected freq is incorrect in bad case.
> 
> Also we have in get_next_freq():
> 	freq = map_util_freq(util, freq, max);
> 	       --> util * freq /max
> 
> If max was 0, we should have been an error ?
> 
> There is something strange that I don't understand
> 
> Could you trace on the return of sugov_get_util()
> the value of sg_cpu->util ?

Yeah, correct something was wrong in the bpftrace readings, max_cap is
not zero in traces.

             git-5511    [001] d.h1.   427.159763: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.163733: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.163735: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.167706: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.167708: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.171678: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.171679: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.h1.   427.175653: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.h1.   427.175655: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
             git-5511    [001] d.s1.   427.175665: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
             git-5511    [001] d.s1.   427.175665: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024

Debug patch applied:

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 95c3c097083e..5c9b3e1de7a0 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -166,6 +166,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,

        freq = get_capacity_ref_freq(policy);
        freq = map_util_freq(util, freq, max);
+       trace_printk("[DEBUG] : freq %llu, util %llu, max %llu\n", freq, util, max);

        if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
                return sg_policy->next_freq;
@@ -199,6 +200,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
        util = max(util, boost);
        sg_cpu->bw_min = min;
        sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max);
+       trace_printk("[DEBUG] : util %llu, sg_cpu->util %llu\n", util, sg_cpu->util);
 }

 /**


So, I guess map_util_freq going wrong somewhere.

Thanks,
Wyes
> 
> Thanks for you help
> Vincent
> 
> > 
> > Thanks,
> > Wyes
> > 

  reply	other threads:[~2024-01-14 12:38 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-28 12:23 [GIT PULL] Scheduler changes for v6.7 Ingo Molnar
2023-10-30 23:50 ` pr-tracker-bot
2024-01-08 14:07 ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-09  4:04   ` pr-tracker-bot
2024-01-10 22:19   ` Linus Torvalds
2024-01-10 22:41     ` Linus Torvalds
2024-01-10 22:57       ` Linus Torvalds
2024-01-11  8:11         ` Vincent Guittot
2024-01-11 17:45           ` Linus Torvalds
2024-01-11 17:53             ` Linus Torvalds
2024-01-11 18:16               ` Vincent Guittot
2024-01-12 14:23                 ` Dietmar Eggemann
2024-01-12 16:58                   ` Vincent Guittot
2024-01-12 18:18                   ` Qais Yousef
2024-01-12 19:03                     ` Vincent Guittot
2024-01-12 20:30                       ` Linus Torvalds
2024-01-12 20:49                         ` Linus Torvalds
2024-01-12 21:04                           ` Linus Torvalds
2024-01-13  1:04                             ` Qais Yousef
2024-01-13  1:24                               ` Linus Torvalds
2024-01-13  1:31                                 ` Linus Torvalds
2024-01-13 10:47                                   ` Vincent Guittot
2024-01-13 18:33                                     ` Qais Yousef
2024-01-13 18:37                                 ` Qais Yousef
2024-01-11 11:09         ` [GIT PULL] scheduler fixes Ingo Molnar
2024-01-11 13:04           ` Vincent Guittot
2024-01-11 20:48             ` [PATCH] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit Ingo Molnar
2024-01-11 22:22               ` Vincent Guittot
2024-01-12 18:24               ` Ingo Molnar
2024-01-12 18:26         ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-14  9:12         ` Wyes Karny
2024-01-14 11:18           ` Vincent Guittot
2024-01-14 12:37             ` Wyes Karny [this message]
2024-01-14 13:02               ` Dietmar Eggemann
2024-01-14 13:05                 ` Vincent Guittot
2024-01-14 13:03               ` Vincent Guittot
2024-01-14 15:12                 ` Qais Yousef
2024-01-14 15:20                   ` Vincent Guittot
2024-01-14 19:58                     ` Qais Yousef
2024-01-14 23:37                       ` Qais Yousef
2024-01-15  6:25                         ` Wyes Karny
2024-01-15 11:59                           ` Qais Yousef
2024-01-15  8:21                       ` Vincent Guittot
2024-01-15 12:09                         ` Qais Yousef
2024-01-15 13:26                           ` Vincent Guittot
2024-01-15 14:03                             ` Dietmar Eggemann
2024-01-15 15:26                               ` Vincent Guittot
2024-01-15 20:05                                 ` Dietmar Eggemann
2024-01-15  8:42                       ` David Laight
2024-01-14 18:11                 ` Wyes Karny
2024-01-14 18:18                   ` Vincent Guittot
2024-01-11  9:33     ` Ingo Molnar
2024-01-11 11:14     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commits tip-bot2 for Ingo Molnar
2024-01-11 20:55     ` [tip: sched/urgent] Revert "sched/cpufreq: Rework schedutil governor performance estimation" and dependent commit tip-bot2 for Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240114123759.pjs7ctexcpc6pshl@wyes-pc \
    --to=wkarny@gmail.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.