linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Peter Zijlstra <peterz@infradead.org>,
	"Josef Bacik" <josef@toxicpanda.com>
Cc: "Rik van Riel" <riel@redhat.com>,
	linux-kernel@vger.kernel.org, jhladky@redhat.com,
	mingo@kernel.org, mgorman@suse.de
Subject: Re: [PATCH] sched/fair: Fix wake_affine() for !NUMA_BALANCING
Date: Thu, 24 Aug 2017 23:29:55 +0100	[thread overview]
Message-ID: <150361379597.22505.2216754249003350303@mail.alporthouse.com> (raw)
In-Reply-To: <20170801214307.ig62jhe7smtisvlr@hirez.programming.kicks-ass.net>

We've stumbled across the same regression on Broxton/Apollolake (J3455).

Quoting Peter Zijlstra (2017-08-01 22:43:07)
> On Tue, Aug 01, 2017 at 03:26:51PM -0400, Josef Bacik wrote:
> > > @@ -7574,6 +7607,18 @@ static inline void update_sd_lb_stats(st
> > >                     env->dst_rq->rd->overload = overload;
> > >     }
> > >  
> > > +   /*
> > > +    * Since these are sums over groups they can contain some CPUs
> > > +    * multiple times for the NUMA domains.
> > > +    *
> > > +    * Currently only wake_affine_llc() and find_busiest_group()
> > > +    * uses these numbers, only the last is affected by this problem.
> > > +    *
> > > +    * XXX fix that.
> > > +    */
> > > +   WRITE_ONCE(shared->nr_running,  sds->total_running);
> > > +   WRITE_ONCE(shared->load,        sds->total_load);
> > > +   WRITE_ONCE(shared->capacity,    sds->total_capacity);
> > 
> > This panic's on boot for me because shared is NULL.  Same happens in
> > select_task_rq_fair when it tries to do the READ_ONCE.  Here is my .config in
> > case it's something strange with my config.  Thanks,
> 
> Nah, its just me being an idiot and booting the wrong kernel. Unless I
> messed up again, this one boots.
> 
> There is state during boot and hotplug where there are no domains, and
> thus also no shared. Simply ignoring things when that happens should be
> good enough I think.

This is still not as effective as the previous code in spreading across
siblings.

> +/*
> + * Can a task be moved from prev_cpu to this_cpu without causing a load
> + * imbalance that would trigger the load balancer?
> + *
> + * Since we're running on 'stale' values, we might in fact create an imbalance
> + * but recomputing these values is expensive, as that'd mean iteration 2 cache
> + * domains worth of CPUs.
> + */
> +static bool
> +wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> +               int this_cpu, int prev_cpu, int sync)
> +{
> +       struct llc_stats prev_stats, this_stats;
> +       s64 this_eff_load, prev_eff_load;
> +       unsigned long task_load;
> +
> +       get_llc_stats(&prev_stats, prev_cpu);
> +       get_llc_stats(&this_stats, this_cpu);
> +
> +       /*
> +        * If sync wakeup then subtract the (maximum possible)
> +        * effect of the currently running task from the load
> +        * of the current LLC.
> +        */
> +       if (sync) {
> +               unsigned long current_load = task_h_load(current);
> +
> +               /* in this case load hits 0 and this LLC is considered 'idle' */
> +               if (current_load > this_stats.load)
> +                       return true;
> +
> +               this_stats.load -= current_load;
> +       }
> +
> +       /*
> +        * The has_capacity stuff is not SMT aware, but by trying to balance
> +        * the nr_running on both ends we try and fill the domain at equal
> +        * rates, thereby first consuming cores before siblings.
> +        */
> +
> +       /* if the old cache has capacity, stay there */
> +       if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
> +               return false;
> +
> +       /* if this cache has capacity, come here */
> +       if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)

I think you mean,

	if (this_stats.has_capacity && this_stats.nr_running + 1 < prev_stats.nr_running)

and with that our particular graphics benchmark behaves similarly to as
before (the regression appears fixed). But I'll let Eero double check.

> +               return true;
> +
> +
> +       /*
> +        * Check to see if we can move the load without causing too much
> +        * imbalance.
> +        */
> +       task_load = task_h_load(p);
> +
> +       this_eff_load = 100;
> +       this_eff_load *= prev_stats.capacity;
> +
> +       prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
> +       prev_eff_load *= this_stats.capacity;
> +
> +       this_eff_load *= this_stats.load + task_load;
> +       prev_eff_load *= prev_stats.load - task_load;
> +
> +       return this_eff_load <= prev_eff_load;
> +}

  reply	other threads:[~2017-08-24 22:30 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-23 16:55 [PATCH 0/4] NUMA improvements with task wakeup and load balancing riel
2017-06-23 16:55 ` [PATCH 1/4] sched,numa: override part of migrate_degrades_locality when idle balancing riel
2017-06-24  6:58   ` Ingo Molnar
2017-06-24 23:45     ` Rik van Riel
2017-06-24  7:22   ` [tip:sched/core] sched/numa: Override part of migrate_degrades_locality() " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 2/4] sched: simplify wake_affine for single socket case riel
2017-06-24  7:22   ` [tip:sched/core] sched/fair: Simplify wake_affine() for the " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 3/4] sched,numa: implement numa node level wake_affine riel
2017-06-24  7:23   ` [tip:sched/core] sched/numa: Implement NUMA node level wake_affine() tip-bot for Rik van Riel
2017-06-26 14:43   ` [PATCH 3/4] sched,numa: implement numa node level wake_affine Peter Zijlstra
2017-06-23 16:55 ` [PATCH 4/4] sched,fair: remove effective_load riel
2017-06-24  7:23   ` [tip:sched/core] sched/fair: Remove effective_load() tip-bot for Rik van Riel
2017-06-26 14:44   ` [PATCH 4/4] sched,fair: remove effective_load Peter Zijlstra
2017-06-26 14:46     ` Peter Zijlstra
2017-06-26 14:55       ` Rik van Riel
2017-06-26 15:04         ` Peter Zijlstra
2017-06-26 15:20           ` Rik van Riel
2017-06-26 16:12             ` Peter Zijlstra
2017-06-26 19:34               ` Rik van Riel
2017-06-27  5:39                 ` Peter Zijlstra
2017-06-27 14:55                   ` Rik van Riel
2017-08-01 12:19                     ` [PATCH] sched/fair: Fix wake_affine() for !NUMA_BALANCING Peter Zijlstra
2017-08-01 19:26                       ` Josef Bacik
2017-08-01 21:43                         ` Peter Zijlstra
2017-08-24 22:29                           ` Chris Wilson [this message]
2017-08-25 15:46                           ` Chris Wilson
2017-06-27 18:27               ` [PATCH 4/4] sched,fair: remove effective_load Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=150361379597.22505.2216754249003350303@mail.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=jhladky@redhat.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).