linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ricardo Neri <ricardo.neri@intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Len Brown <len.brown@intel.com>, Mel Gorman <mgorman@suse.de>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Valentin Schneider <vschneid@redhat.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	"Tim C . Chen" <tim.c.chen@intel.com>
Subject: Re: [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing
Date: Wed, 26 Oct 2022 20:30:37 -0700	[thread overview]
Message-ID: <20221027033037.GA9946@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <Y1j170UQc5DJPYgR@hirez.programming.kicks-ass.net>

On Wed, Oct 26, 2022 at 10:55:11AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 25, 2022 at 08:57:24PM -0700, Ricardo Neri wrote:
> 
> > Do you want me to add your Signed-off-by and Co-developed-by tags?
> 
> Nah; who cares ;-)
> 
> 
> > > @@ -8749,32 +8747,18 @@ static void compute_ilb_sg_task_class_sc
> > >  	if (!busy_cpus)
> > >  		return;
> > >  
> > > -	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
> > > -						     dst_cpu);
> > > +	score_on_dst_cpu = arch_get_task_class_score(sgcs->min_class, dst_cpu);
> > >  
> > > -	/*
> > > -	 * The simpest case. The single busy CPU in the current group will
> > > -	 * become idle after pulling its current task. The destination CPU is
> > > -	 * idle.
> > > -	 */
> > > -	if (busy_cpus == 1) {
> > > -		sgs->task_class_score_before = class_sgs->sum_score;
> > > -		sgs->task_class_score_after = score_on_dst_cpu;
> > > -		return;
> > > -	}
> > > +	before = sgcs->sum_score
> > > +	after  = before - sgcs->min_score + score_on_dst_cpu;
> > 
> > This works when the sched group being evaluated has only one busy CPU
> > because it will become idle if the destination CPU (which was idle) pulls
> > the current task.
> > 
> > >  
> > > -	/*
> > > -	 * Now compute the group score with and without the task with the
> > > -	 * lowest score. We assume that the tasks that remain in the group share
> > > -	 * the CPU resources equally.
> > > -	 */
> > > -	group_score = class_sgs->sum_score / busy_cpus;
> > > -
> > > -	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
> > > -			       (busy_cpus - 1);
> > > +	if (busy_cpus > 1) {
> > > +		before /= busy_cpus;
> > > +		after  /= busy_cpus;
> > 
> > 
> > However, I don't think this works when the sched group has more than one
> > busy CPU. 'before' and 'after' reflect the total throughput score of both
> > the sched group *and* the destination CPU.
> > 
> > One of the CPUs in the sched group will become idle after the balance.
> > 
> > Also, at this point we have already added score_on_dst_cpu. We are incorrectly
> > scaling it by the number of busy CPUs in the sched group.
> > 
> > We instead must scale 'after' by busy_cpus - 1 and then add score_on_dst_cpu.
> 
> So none of that makes sense.
> 
> 'x/n + y' != '(x+y)/(n+1)'
> 
> IOW:
> 
> > > +	}
> > >  
> > > -	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
> > > -	sgs->task_class_score_before = group_score;
> > > +	sgs->task_class_score_before = before;
> > > +	sgs->task_class_score_after  = after;
> 
> your task_class_score_after is a sum value for 2 cpus worth not a value
> for a single cpu,

Agreed.

> while your task_class_score_before is a single cpu
> average.

Agreed. You can also regard task_class_score_before as a value for 2 CPUs
worth, only that the contribution to throughput of dst_cpu is 0.

> You can't compare these numbers and have a sensible outcome.
> 
> If you have a number of values: x_1....x_n, their average is
> Sum(x_1...x_n) / n, which is a single value again.
> 
> If you want to update one of the x's, say x_i->x'_i, then the average
> changes like:
> 
> 	Sum(x_1...x_n-x_i+x'_i) / n
> 
> If you want to remove one of the x's, then you get:
> 
> 	Sum(x_1...x_n-x_i) / (n-1) ; 0<i<n
> 
> if you want to add an x:
> 
> 	Sum(x_1...x_n+i_i) / (n+1) ; i>n
> 
> Nowhere would you ever get:
> 
> 	Sum(x_1...x_n) / n + x_i
> 
> That's just straight up nonsense.

But we are not computing the average throughput. We are computing the
*total* throughput of two CPUs. Hence, what we need is the sum of the
throughput score of both CPUs.

We may be here because the sched group of sgcs is composed of SMT
siblings. Hence, we divide by busy_cpus assuming that all busy siblings
share the core resources evenly. (For non-SMT sched groups, busy_cpus is 1
at most).

> 
> So I might buy an argument for:
> 
> 	if (busy_cpus > 1) {
> 		before /= (busy_cpus-1);
> 		after  /= (busy_cpus+1);
> 	}
> 
> Or something along those lines (where you remove an entry from before
> and add it to after), but not this.

The entry that we remove from before will go to after. The entry will be
placed in the local group. This group has a different busy_cpus: all of its
SMT siblings, if any, idle (otherwise, we would not be here). It will
become 1 after the balance.

  reply	other threads:[~2022-10-27  3:23 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
2022-09-14 13:46   ` Peter Zijlstra
2022-09-16 14:41     ` Ricardo Neri
2022-09-27 13:01       ` Peter Zijlstra
2022-10-02 22:32         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 02/23] sched: Add interfaces for " Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 03/23] sched/core: Initialize the class of a new task Ricardo Neri
2022-09-26 14:57   ` Joel Fernandes
2022-09-26 21:53     ` Ricardo Neri
2022-09-27 13:04     ` Peter Zijlstra
2022-09-27 15:48       ` Joel Fernandes
2022-10-01 20:32       ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 04/23] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 05/23] sched/core: Move is_core_idle() out of fair.c Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
2022-09-14 13:44   ` Peter Zijlstra
2022-09-16 14:42     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 07/23] sched/fair: Collect load-balancing stats for task classes Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing Ricardo Neri
2022-09-27  9:15   ` Peter Zijlstra
2022-10-26  3:57     ` Ricardo Neri
2022-10-26  8:55       ` Peter Zijlstra
2022-10-27  3:30         ` Ricardo Neri [this message]
2022-09-09 23:11 ` [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group Ricardo Neri
2022-09-27 11:01   ` Peter Zijlstra
2022-10-05 23:38     ` Ricardo Neri
2022-10-06  8:37       ` Peter Zijlstra
2022-10-06 19:07         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue Ricardo Neri
2022-09-27 11:25   ` Peter Zijlstra
2022-10-07 23:36     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 11/23] thermal: intel: hfi: Introduce Hardware Feedback Interface classes Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants Ricardo Neri
2022-09-27 11:34   ` Peter Zijlstra
2022-09-27 11:36     ` Peter Zijlstra
2022-10-26  3:59       ` Ricardo Neri
2022-10-26  3:58     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 13/23] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task Ricardo Neri
2022-09-27 11:46   ` Peter Zijlstra
2022-10-07 20:34     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores Ricardo Neri
2022-09-27 11:59   ` Peter Zijlstra
2022-10-05 23:59     ` Ricardo Neri
2022-10-06  8:52       ` Peter Zijlstra
2022-10-06  9:14         ` Peter Zijlstra
2022-10-06 15:05           ` Brown, Len
2022-10-06 16:14             ` Peter Zijlstra
2022-10-07 11:20               ` Len Brown
2022-09-09 23:11 ` [RFC PATCH 16/23] thermal: intel: hfi: Define a default classification for unclassified tasks Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
2022-09-27 12:00   ` Peter Zijlstra
2022-10-06  1:50     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification Ricardo Neri
2022-09-27 11:52   ` Peter Zijlstra
2022-10-08  0:38     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 19/23] sched/core: Initialize helpers of " Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 20/23] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 21/23] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 22/23] x86/hreset: Configure history reset Ricardo Neri
2022-09-27 12:03   ` Peter Zijlstra
2022-10-02 22:34     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 23/23] x86/process: Reset hardware history in context switch Ricardo Neri
2022-09-27 12:52   ` Peter Zijlstra
2022-10-03 23:07     ` Ricardo Neri
2022-10-06  8:35       ` Peter Zijlstra
2022-10-06 22:55         ` Ricardo Neri
2022-09-27 12:53   ` Peter Zijlstra
2022-10-02 22:02     ` Ricardo Neri
2022-09-27 13:15   ` Borislav Petkov
2022-10-02 22:12     ` Ricardo Neri
2022-10-02 22:15       ` Borislav Petkov
2022-10-03 19:49         ` Ricardo Neri
2022-10-03 19:55           ` Borislav Petkov
     [not found] ` <20220910072120.2651-1-hdanton@sina.com>
2022-09-16 14:51   ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
2022-10-11 19:12 ` Trying to apply patch set Carlos Bilbao
2022-10-18  2:31   ` Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221027033037.GA9946@ranerica-svr.sc.intel.com \
    --to=ricardo.neri-calderon@linux.intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).