linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Morten Rasmussen <morten.rasmussen@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: "peterz@infradead.org" <peterz@infradead.org>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
	"kamalesh@linux.vnet.ibm.com" <kamalesh@linux.vnet.ibm.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"efault@gmx.de" <efault@gmx.de>,
	"nicolas.pitre@linaro.org" <nicolas.pitre@linaro.org>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>
Subject: Re: [PATCH v9 05/10] sched: make scale_rt invariant with frequency
Date: Mon, 24 Nov 2014 17:05:02 +0000	[thread overview]
Message-ID: <20141124170502.GK23177@e105550-lin.cambridge.arm.com> (raw)
In-Reply-To: <CAKfTPtAo3PzZ=-KtH-YS2nf9R9srMGAUJ2A2qkHrsPZmj18-Jw@mail.gmail.com>

On Mon, Nov 24, 2014 at 02:24:00PM +0000, Vincent Guittot wrote:
> On 21 November 2014 at 13:35, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> > On Mon, Nov 03, 2014 at 04:54:42PM +0000, Vincent Guittot wrote:
> 
> [snip]
> 
> >> The average running time of RT tasks is used to estimate the remaining compute
> >> @@ -5801,19 +5801,12 @@ static unsigned long scale_rt_capacity(int cpu)
> >>
> >>       total = sched_avg_period() + delta;
> >>
> >> -     if (unlikely(total < avg)) {
> >> -             /* Ensures that capacity won't end up being negative */
> >> -             available = 0;
> >> -     } else {
> >> -             available = total - avg;
> >> -     }
> >> +     used = div_u64(avg, total);
> >
> > I haven't looked through all the details of the rt avg tracking, but if
> > 'used' is in the range [0..SCHED_CAPACITY_SCALE], I believe it should
> > work. Is it guaranteed that total > 0 so we don't get division by zero?
> 
> static inline u64 sched_avg_period(void)
> {
> return (u64)sysctl_sched_time_avg * NSEC_PER_MSEC / 2;
> }
>

I see.

> >
> > It does get a slightly more complicated if we want to figure out the
> > available capacity at the current frequency (current < max) later. Say,
> > rt eats 25% of the compute capacity, but the current frequency is only
> > 50%. In that case get:
> >
> > curr_avail_capacity = (arch_scale_cpu_capacity() *
> >   (arch_scale_freq_capacity() - (SCHED_SCALE_CAPACITY - scale_rt_capacity())))
> >   >> SCHED_CAPACITY_SHIFT
> 
> You don't have to be so complicated but simply need to do:
> curr_avail_capacity for CFS = (capacity_of(CPU) *
> arch_scale_freq_capacity())  >> SCHED_CAPACITY_SHIFT
> 
> capacity_of(CPU) = 600 is the max available capacity for CFS tasks
> once we have removed the 25% of capacity that is used by RT tasks
> arch_scale_freq_capacity = 512 because we currently run at 50% of max freq
> 
> so curr_avail_capacity for CFS = 300

I don't think that is correct. It is at least not what I had in mind.

capacity_orig_of(cpu) = 800, we run at 50% frequency which means:

curr_capacity = capacity_orig_of(cpu) * arch_scale_freq_capacity()
                  >> SCHED_CAPACITY_SHIFT
              = 400

So the total capacity at the current frequency (50%) is 400, without
considering RT. scale_rt_capacity() is frequency invariant, so it takes
away capacity_orig_of(cpu) - capacity_of(cpu) = 200 worth of capacity
for RT.  We need to subtract that from the current capacity to get the
available capacity at the current frequency.

curr_available_capacity = curr_capacity - (capacity_orig_of(cpu) -
capacity_of(cpu)) = 200

In other words, 800 is the max capacity, we are currently running at 50%
frequency, which gives us 400. RT takes away 25% of 800
(frequency-invariant) from the 400, which leaves us with 200 left for
CFS tasks at the current frequency.

In your calculations you subtract the RT load before computing the
current capacity using arch_scale_freq_capacity(), where I think it
should be done after. You find the amount spare capacity you would have
at the maximum frequency when RT has been subtracted and then scale the
result by frequency which means indirectly scaling the RT load
contribution again (the rt avg has already been scaled). So instead of
taking away 200 of the 400 (current capacity @ 50% frequency), it only
takes away 100 which isn't right.

scale_rt_capacity() is frequency-invariant, so if the RT load is 50% and
the frequency is 50%, there are no spare cycles left.
curr_avail_capacity should be 0. But using your expression above you
would get capacity_of(cpu) = 400 after removing RT,
arch_scale_freq_capacity = 512 and you get 200. I don't think that is
right.

Morten

  reply	other threads:[~2014-11-24 17:04 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-03 16:54 [PATCH v9 00/10] sched: consolidation of CPU capacity and usage Vincent Guittot
2014-11-03 16:54 ` [PATCH v9 01/10] sched: add utilization_avg_contrib Vincent Guittot
2014-11-21 12:34   ` Morten Rasmussen
2014-11-24 14:04     ` Vincent Guittot
2014-11-24 17:34       ` Morten Rasmussen
2014-11-03 16:54 ` [PATCH v9 02/10] sched: Track group sched_entity usage contributions Vincent Guittot
2014-11-21 12:35   ` Morten Rasmussen
2014-11-24 14:04     ` Vincent Guittot
2014-11-24 15:39       ` Morten Rasmussen
2014-11-03 16:54 ` [PATCH v9 03/10] sched: remove frequency scaling from cpu_capacity Vincent Guittot
2014-11-21 12:35   ` Morten Rasmussen
2014-11-03 16:54 ` [PATCH v9 04/10] sched: Make sched entity usage tracking scale-invariant Vincent Guittot
2014-11-21 12:35   ` Morten Rasmussen
2014-11-26 16:05     ` Dietmar Eggemann
2014-11-03 16:54 ` [PATCH v9 05/10] sched: make scale_rt invariant with frequency Vincent Guittot
2014-11-21 12:35   ` Morten Rasmussen
2014-11-24 14:24     ` Vincent Guittot
2014-11-24 17:05       ` Morten Rasmussen [this message]
2014-11-25 13:48         ` Vincent Guittot
2014-11-26 11:57           ` Morten Rasmussen
2014-11-25  2:24   ` Wanpeng Li
2014-11-25 13:52     ` Vincent Guittot
2014-11-26  5:18       ` Wanpeng Li
2014-11-26  8:27         ` Vincent Guittot
2014-11-03 16:54 ` [PATCH v9 06/10] sched: add per rq cpu_capacity_orig Vincent Guittot
2014-11-03 16:54 ` [PATCH v9 07/10] sched: get CPU's usage statistic Vincent Guittot
2014-11-21 12:36   ` Morten Rasmussen
2014-11-03 16:54 ` [PATCH v9 08/10] sched: replace capacity_factor by usage Vincent Guittot
2014-11-19 15:15   ` pang.xunlei
2014-11-19 17:30     ` Vincent Guittot
2014-11-21 12:37   ` Morten Rasmussen
2014-11-24 14:41     ` Vincent Guittot
2014-11-24 17:16       ` Morten Rasmussen
2014-11-03 16:54 ` [PATCH v9 09/10] sched: add SD_PREFER_SIBLING for SMT level Vincent Guittot
2014-11-03 16:54 ` [PATCH v9 10/10] sched: move cfs task on a CPU with higher capacity Vincent Guittot
2014-11-21 12:37   ` Morten Rasmussen
2014-11-24 14:45     ` Vincent Guittot
2014-11-24 17:30       ` Morten Rasmussen
2014-11-21 12:34 ` [PATCH v9 00/10] sched: consolidation of CPU capacity and usage Morten Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141124170502.GK23177@e105550-lin.cambridge.arm.com \
    --to=morten.rasmussen@arm.com \
    --cc=efault@gmx.de \
    --cc=kamalesh@linux.vnet.ibm.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=nicolas.pitre@linaro.org \
    --cc=peterz@infradead.org \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).