All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@android.com>,
	Joel Fernandes <joelaf@google.com>
Subject: Re: [PATCH v2 2/4] sched/fair: add util_est on top of PELT
Date: Fri, 15 Dec 2017 15:22:25 +0000	[thread overview]
Message-ID: <20171215152225.GD19821@e110439-lin> (raw)
In-Reply-To: <20171215140751.3ajilhsmj4zhzhzy@hirez.programming.kicks-ass.net>

On 15-Dec 15:07, Peter Zijlstra wrote:
> On Fri, Dec 15, 2017 at 02:02:18PM +0000, Patrick Bellasi wrote:
> > On 13-Dec 17:05, Peter Zijlstra wrote:
> > > On Tue, Dec 05, 2017 at 05:10:16PM +0000, Patrick Bellasi wrote:
> > > > +	if (cfs_rq->nr_running > 0) {
> > > > +		util_est  = cfs_rq->util_est_runnable;
> > > > +		util_est -= task_util_est(p);
> > > > +		if (util_est < 0)
> > > > +			util_est = 0;
> > > > +		cfs_rq->util_est_runnable = util_est;
> > > > +	} else {
> > > 
> > > I'm thinking that's an explicit load-store to avoid intermediate values
> > > landing in cfs_rq->util_esp_runnable, right?
> > 
> > Was mainly to have an unsigned util_est for the following "sub"...
> > 
> > 
> > > That would need READ_ONCE() / WRITE_ONCE() I think, without that the
> > > compiler is free to munge the lot together.
> > 
> > ... do we still need the {READ,WRITE}_ONCE() in this case?
> > I guess adding them however does not hurts.
> 

This is just to better understand....

> I think the compiler is free to munge it into something like:
> 
> 	cfs_rq->util_est_runnable -= task_util_est();
> 	if (cfs_rq->util_est_runnable < 0)
> 		cfs_rq->util_est_runnable = 0
> 

I'm still confused, we have:

            long util_est
   unsigned long cfs_rq->util_est_runnable

The optimization above can potentially generate an overflow, isn't it?

> and its a fairly simple optimization at that, it just needs to observe
> that util_est is an alias for cfs_rq->util_est_runnable.

Since the first is signed and the last unsigned, can the compiler still
considered them an alias?

At least on ARM I would expect a load of an unsigned value, some
computations on "signed registers", and finally a store of an unsigned
value. This is what I get:


        if (cfs_rq->nr_running > 0) {
    51e4:       91020000        add     x0, x0, #0x80
    51e8:       b9401802        ldr     w2, [x0,#24]
    51ec:       340004a2        cbz     w2, 5280 <dequeue_task_fair+0xb20>
	// skip branch for nr_running == 0

        return max(p->util_est.ewma, p->util_est.last);
    51f0:       f9403ba2        ldr     x2, [x29,#112]
    51f4:       f9418044        ldr     x4, [x2,#768]
    51f8:       f9418443        ldr     x3, [x2,#776]
	// x3 := p->util_est.ewma
	// x4 := p->util_est.last

                util_est -= task_util_est(p);
    51fc:       f9405002        ldr     x2, [x0,#160]
	// x2 := cfs_rq->util_est_runnable

        return max(p->util_est.ewma, p->util_est.last);
    5200:       eb04007f        cmp     x3, x4
    5204:       9a842063        csel    x3, x3, x4, cs
	// x3 := task_util_est(p) := max(p->util_est.ewma, p->util_est.last)

                cfs_rq->util_est_runnable = util_est;
    5208:       eb030042        subs    x2, x2, x3
	// x2 := util_est -= task_util_est(p);

    520c:       9a9f5042        csel    x2, x2, xzr, pl
	// x2 := max(0, util_est)

    5210:       f9005002        str     x2, [x0,#160]
	// store back into cfs_rq->util_est_runnable


And by adding {READ,WRITE}_ONCE I still get the same code.

While, compiling for x86, I get two different versions, here is
the one without {READ,WRITE}_ONCE:

       if (cfs_rq->nr_running > 0) {
    3e3e:       8b 90 98 00 00 00       mov    0x98(%rax),%edx
    3e44:       85 d2                   test   %edx,%edx
    3e46:       0f 84 e0 00 00 00       je     3f2c <dequeue_task_fair+0xf9c>
                util_est  = cfs_rq->util_est_runnable;
                util_est -= task_util_est(p);
    3e4c:       48 8b 74 24 28          mov    0x28(%rsp),%rsi
    3e51:       48 8b 96 80 02 00 00    mov    0x280(%rsi),%rdx
    3e58:       48 39 96 88 02 00 00    cmp    %rdx,0x288(%rsi)
    3e5f:       48 0f 43 96 88 02 00    cmovae 0x288(%rsi),%rdx
    3e66:       00 
                if (util_est < 0)
                        util_est = 0;
                cfs_rq->util_est_runnable = util_est;
    3e67:       48 8b b0 20 01 00 00    mov    0x120(%rax),%rsi
    3e6e:       48 29 d6                sub    %rdx,%rsi
    3e71:       48 89 f2                mov    %rsi,%rdx
    3e74:       be 00 00 00 00          mov    $0x0,%esi
    3e79:       48 0f 48 d6             cmovs  %rsi,%rdx
    3e7d:       48 89 90 20 01 00 00    mov    %rdx,0x120(%rax)

but I'm not confident on "parsing it"...


> Using the volatile load/store completely destroys that optimization
> though, so yes, I'd say its definitely needed.

Ok, since it's definitively not harmful, I'll add it.

-- 
#include <best/regards.h>

Patrick Bellasi

  reply	other threads:[~2017-12-15 15:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-05 17:10 [PATCH v2 0/4] Utilization estimation (util_est) for FAIR tasks Patrick Bellasi
2017-12-05 17:10 ` [PATCH v2 1/4] sched/fair: always used unsigned long for utilization Patrick Bellasi
2017-12-06  8:56   ` Vincent Guittot
2018-01-10 12:14   ` [tip:sched/core] sched/fair: Use 'unsigned long' for utilization, consistently tip-bot for Patrick Bellasi
2017-12-05 17:10 ` [PATCH v2 2/4] sched/fair: add util_est on top of PELT Patrick Bellasi
2017-12-13 16:05   ` Peter Zijlstra
2017-12-15 14:02     ` Patrick Bellasi
2017-12-15 14:07       ` Peter Zijlstra
2017-12-15 15:22         ` Patrick Bellasi [this message]
2017-12-13 16:16   ` Peter Zijlstra
2017-12-15 12:14     ` Patrick Bellasi
2017-12-15 12:53       ` Peter Zijlstra
2017-12-15 15:41         ` Patrick Bellasi
2017-12-20  8:57           ` Peter Zijlstra
2017-12-20  9:02             ` Peter Zijlstra
2017-12-13 16:19   ` Peter Zijlstra
2017-12-13 16:36     ` Patrick Bellasi
2017-12-13 17:03       ` Peter Zijlstra
2017-12-15 12:03         ` Patrick Bellasi
2017-12-15 12:58           ` Peter Zijlstra
2017-12-05 17:10 ` [PATCH v2 3/4] sched/fair: use util_est in LB and WU paths Patrick Bellasi
2017-12-05 17:10 ` [PATCH v2 4/4] sched/cpufreq_schedutil: use util_est for OPP selection Patrick Bellasi
2017-12-16  2:35   ` Rafael J. Wysocki
2017-12-18 10:48     ` Patrick Bellasi
2017-12-13 16:03 ` [PATCH v2 0/4] Utilization estimation (util_est) for FAIR tasks Peter Zijlstra
2017-12-13 16:23   ` Patrick Bellasi
2017-12-13 17:56 ` Mike Galbraith
2017-12-15 16:13   ` Patrick Bellasi
2017-12-15 20:23     ` Mike Galbraith
2017-12-16  6:37       ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171215152225.GD19821@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tkjos@android.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.