All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Peter Zijlstra <peterz@infradead.org>,
	"Josef Bacik" <josef@toxicpanda.com>
Cc: "Rik van Riel" <riel@redhat.com>,
	linux-kernel@vger.kernel.org, jhladky@redhat.com,
	mingo@kernel.org, mgorman@suse.de
Subject: Re: [PATCH] sched/fair: Fix wake_affine() for !NUMA_BALANCING
Date: Thu, 24 Aug 2017 23:29:55 +0100	[thread overview]
Message-ID: <150361379597.22505.2216754249003350303@mail.alporthouse.com> (raw)
In-Reply-To: <20170801214307.ig62jhe7smtisvlr@hirez.programming.kicks-ass.net>

We've stumbled across the same regression on Broxton/Apollolake (J3455).

Quoting Peter Zijlstra (2017-08-01 22:43:07)
> On Tue, Aug 01, 2017 at 03:26:51PM -0400, Josef Bacik wrote:
> > > @@ -7574,6 +7607,18 @@ static inline void update_sd_lb_stats(st
> > >                     env->dst_rq->rd->overload = overload;
> > >     }
> > >  
> > > +   /*
> > > +    * Since these are sums over groups they can contain some CPUs
> > > +    * multiple times for the NUMA domains.
> > > +    *
> > > +    * Currently only wake_affine_llc() and find_busiest_group()
> > > +    * uses these numbers, only the last is affected by this problem.
> > > +    *
> > > +    * XXX fix that.
> > > +    */
> > > +   WRITE_ONCE(shared->nr_running,  sds->total_running);
> > > +   WRITE_ONCE(shared->load,        sds->total_load);
> > > +   WRITE_ONCE(shared->capacity,    sds->total_capacity);
> > 
> > This panic's on boot for me because shared is NULL.  Same happens in
> > select_task_rq_fair when it tries to do the READ_ONCE.  Here is my .config in
> > case it's something strange with my config.  Thanks,
> 
> Nah, its just me being an idiot and booting the wrong kernel. Unless I
> messed up again, this one boots.
> 
> There is state during boot and hotplug where there are no domains, and
> thus also no shared. Simply ignoring things when that happens should be
> good enough I think.

This is still not as effective as the previous code in spreading across
siblings.

> +/*
> + * Can a task be moved from prev_cpu to this_cpu without causing a load
> + * imbalance that would trigger the load balancer?
> + *
> + * Since we're running on 'stale' values, we might in fact create an imbalance
> + * but recomputing these values is expensive, as that'd mean iteration 2 cache
> + * domains worth of CPUs.
> + */
> +static bool
> +wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> +               int this_cpu, int prev_cpu, int sync)
> +{
> +       struct llc_stats prev_stats, this_stats;
> +       s64 this_eff_load, prev_eff_load;
> +       unsigned long task_load;
> +
> +       get_llc_stats(&prev_stats, prev_cpu);
> +       get_llc_stats(&this_stats, this_cpu);
> +
> +       /*
> +        * If sync wakeup then subtract the (maximum possible)
> +        * effect of the currently running task from the load
> +        * of the current LLC.
> +        */
> +       if (sync) {
> +               unsigned long current_load = task_h_load(current);
> +
> +               /* in this case load hits 0 and this LLC is considered 'idle' */
> +               if (current_load > this_stats.load)
> +                       return true;
> +
> +               this_stats.load -= current_load;
> +       }
> +
> +       /*
> +        * The has_capacity stuff is not SMT aware, but by trying to balance
> +        * the nr_running on both ends we try and fill the domain at equal
> +        * rates, thereby first consuming cores before siblings.
> +        */
> +
> +       /* if the old cache has capacity, stay there */
> +       if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
> +               return false;
> +
> +       /* if this cache has capacity, come here */
> +       if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)

I think you mean,

	if (this_stats.has_capacity && this_stats.nr_running + 1 < prev_stats.nr_running)

and with that our particular graphics benchmark behaves similarly to as
before (the regression appears fixed). But I'll let Eero double check.

> +               return true;
> +
> +
> +       /*
> +        * Check to see if we can move the load without causing too much
> +        * imbalance.
> +        */
> +       task_load = task_h_load(p);
> +
> +       this_eff_load = 100;
> +       this_eff_load *= prev_stats.capacity;
> +
> +       prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
> +       prev_eff_load *= this_stats.capacity;
> +
> +       this_eff_load *= this_stats.load + task_load;
> +       prev_eff_load *= prev_stats.load - task_load;
> +
> +       return this_eff_load <= prev_eff_load;
> +}

  reply	other threads:[~2017-08-24 22:30 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-23 16:55 [PATCH 0/4] NUMA improvements with task wakeup and load balancing riel
2017-06-23 16:55 ` [PATCH 1/4] sched,numa: override part of migrate_degrades_locality when idle balancing riel
2017-06-24  6:58   ` Ingo Molnar
2017-06-24 23:45     ` Rik van Riel
2017-06-24  7:22   ` [tip:sched/core] sched/numa: Override part of migrate_degrades_locality() " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 2/4] sched: simplify wake_affine for single socket case riel
2017-06-24  7:22   ` [tip:sched/core] sched/fair: Simplify wake_affine() for the " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 3/4] sched,numa: implement numa node level wake_affine riel
2017-06-24  7:23   ` [tip:sched/core] sched/numa: Implement NUMA node level wake_affine() tip-bot for Rik van Riel
2017-06-26 14:43   ` [PATCH 3/4] sched,numa: implement numa node level wake_affine Peter Zijlstra
2017-06-23 16:55 ` [PATCH 4/4] sched,fair: remove effective_load riel
2017-06-24  7:23   ` [tip:sched/core] sched/fair: Remove effective_load() tip-bot for Rik van Riel
2017-06-26 14:44   ` [PATCH 4/4] sched,fair: remove effective_load Peter Zijlstra
2017-06-26 14:46     ` Peter Zijlstra
2017-06-26 14:55       ` Rik van Riel
2017-06-26 15:04         ` Peter Zijlstra
2017-06-26 15:20           ` Rik van Riel
2017-06-26 16:12             ` Peter Zijlstra
2017-06-26 19:34               ` Rik van Riel
2017-06-27  5:39                 ` Peter Zijlstra
2017-06-27 14:55                   ` Rik van Riel
2017-08-01 12:19                     ` [PATCH] sched/fair: Fix wake_affine() for !NUMA_BALANCING Peter Zijlstra
2017-08-01 19:26                       ` Josef Bacik
2017-08-01 21:43                         ` Peter Zijlstra
2017-08-24 22:29                           ` Chris Wilson [this message]
2017-08-25 15:46                           ` Chris Wilson
2017-06-27 18:27               ` [PATCH 4/4] sched,fair: remove effective_load Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=150361379597.22505.2216754249003350303@mail.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=jhladky@redhat.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.