All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, jhladky@redhat.com,
	mingo@kernel.org, mgorman@suse.de
Subject: Re: [PATCH 4/4] sched,fair: remove effective_load
Date: Tue, 27 Jun 2017 10:55:58 -0400	[thread overview]
Message-ID: <1498575358.20270.114.camel@redhat.com> (raw)
In-Reply-To: <20170627053906.GA7287@worktop>

[-- Attachment #1: Type: text/plain, Size: 3743 bytes --]

On Tue, 2017-06-27 at 07:39 +0200, Peter Zijlstra wrote:
> On Mon, Jun 26, 2017 at 03:34:49PM -0400, Rik van Riel wrote:
> > On Mon, 2017-06-26 at 18:12 +0200, Peter Zijlstra wrote:
> > > On Mon, Jun 26, 2017 at 11:20:54AM -0400, Rik van Riel wrote:
> > > 
> > > > Oh, indeed.  I guess in wake_affine() we should test
> > > > whether the CPUs are in the same NUMA node, rather than
> > > > doing cpus_share_cache() ?
> > > 
> > > Well, since select_idle_sibling() is on LLC; the early test on
> > > cpus_share_cache(prev,this) seems to actually make sense.
> > > 
> > > But then cutting out all the other bits seems wrong. Not in the
> > > least
> > > because !NUMA_BALACING should also still keep working.
> > 
> > Even when !NUMA_BALANCING, I suspect it makes little sense
> > to compare the loads just one the cores in question, since
> > select_idle_sibling() will likely move the task somewhere
> > else.
> > 
> > I suspect we want to compare the load on the whole LLC
> > for that reason, even with NUMA_BALANCING disabled.
> 
> But we don't have that data around :/ One thing we could do is try
> and
> keep a copy of the last s*_lb_stats around in the sched_domain_shared
> stuff or something and try and use that.
> 
> That way we can strictly keep things at the LLC level and not confuse
> things with NUMA.
> 
> Similarly, we could use that same data to then avoid re-computing
> things
> for the NUMA domain as well and do away with numa_stats.

That does seem like a useful optimization, though
I guess we would have to invalidate the cached data
every time we actually move a task?

The current code simply walks all the CPUs in the
cpumask_t, and adds up capacity and load. Doing
that appears to be better than poor task placement
(Jirka's numbers speak for themselves), but optimizing
this code path does seem like a worthwhile goal.

I'll look into it.

> > > > Or, alternatively, have an update_numa_stats() variant
> > > > for numa_wake_affine() that works on the LLC level?
> > > 
> > > I think we want to retain the existing behaviour for everything
> > > larger than LLC, and when NUMA_BALANCING, smaller than NUMA.
> > 
> > What do you mean by this, exactly?
> 
> As you noted, when prev and this are in the same LLC, it doesn't
> matter
> and select_idle_sibling() will do its thing. So anything smaller than
> the LLC need not do anything.
> 
> When NUMA_BALANCING we have the numa_stats thing and we can, as you
> propose use that.
> 
> If LLC < NUMA or !NUMA_BALANCING we have a region that needs to do
> _something_.

Agreed. I will fix this. Given that this is a bit
of a corner case, I guess I can fix this with follow-up
patches, to be merged into -tip before the whole series
is sent on to Linus?

> > > Also note that your use of task_h_load() in the new numa thing
> > > suffers
> > > from exactly the problem effective_load() is trying to solve.
> > 
> > Are you saying task_h_load is wrong in task_numa_compare()
> > too, then?  Should both use effective_load()?
> 
> I need more than the few minutes I currently have, but probably. The
> question is of course, how much does it matter and how painful will
> it
> be to do it better.

I suspect it does not matter at all currenly, since the 
load balancing code does not use effective_load, and
having the wake_affine logic calculate things differently
from the load balancer is likely to result in both pieces
of code fighting against each other.

I suspect we should either use task_h_load everywhere,
or effective_load everywhere, but not have a mix and
match situation where one is used in some places, and
the other in others.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  reply	other threads:[~2017-06-27 14:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-23 16:55 [PATCH 0/4] NUMA improvements with task wakeup and load balancing riel
2017-06-23 16:55 ` [PATCH 1/4] sched,numa: override part of migrate_degrades_locality when idle balancing riel
2017-06-24  6:58   ` Ingo Molnar
2017-06-24 23:45     ` Rik van Riel
2017-06-24  7:22   ` [tip:sched/core] sched/numa: Override part of migrate_degrades_locality() " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 2/4] sched: simplify wake_affine for single socket case riel
2017-06-24  7:22   ` [tip:sched/core] sched/fair: Simplify wake_affine() for the " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 3/4] sched,numa: implement numa node level wake_affine riel
2017-06-24  7:23   ` [tip:sched/core] sched/numa: Implement NUMA node level wake_affine() tip-bot for Rik van Riel
2017-06-26 14:43   ` [PATCH 3/4] sched,numa: implement numa node level wake_affine Peter Zijlstra
2017-06-23 16:55 ` [PATCH 4/4] sched,fair: remove effective_load riel
2017-06-24  7:23   ` [tip:sched/core] sched/fair: Remove effective_load() tip-bot for Rik van Riel
2017-06-26 14:44   ` [PATCH 4/4] sched,fair: remove effective_load Peter Zijlstra
2017-06-26 14:46     ` Peter Zijlstra
2017-06-26 14:55       ` Rik van Riel
2017-06-26 15:04         ` Peter Zijlstra
2017-06-26 15:20           ` Rik van Riel
2017-06-26 16:12             ` Peter Zijlstra
2017-06-26 19:34               ` Rik van Riel
2017-06-27  5:39                 ` Peter Zijlstra
2017-06-27 14:55                   ` Rik van Riel [this message]
2017-08-01 12:19                     ` [PATCH] sched/fair: Fix wake_affine() for !NUMA_BALANCING Peter Zijlstra
2017-08-01 19:26                       ` Josef Bacik
2017-08-01 21:43                         ` Peter Zijlstra
2017-08-24 22:29                           ` Chris Wilson
2017-08-25 15:46                           ` Chris Wilson
2017-06-27 18:27               ` [PATCH 4/4] sched,fair: remove effective_load Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1498575358.20270.114.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=jhladky@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.