All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	rostedt@goodmis.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com,
	fweisbec@gmail.com, patches@linaro.org
Subject: Re: [PATCH RFC tip/core/rcu 6/6] rcu: Reduce cache-miss initialization latencies for large systems
Date: Fri, 27 Apr 2012 07:17:00 -0700	[thread overview]
Message-ID: <20120427141700.GA12854@linux.vnet.ibm.com> (raw)
In-Reply-To: <1335477685.2463.128.camel@laptop>

On Fri, Apr 27, 2012 at 12:01:25AM +0200, Peter Zijlstra wrote:
> On Thu, 2012-04-26 at 13:28 -0700, Paul E. McKenney wrote:

[ . . . ]

> > I think I see what you
> > are getting at, though I am having a hard time seeing how to pack
> > it into a linear array.
> 
> Yeah, I'm not sure you can either. Hence me building a tree ;-) But you
> too have a tree, its tree-rcu after all.
> 
> > The idea seems to be to compute a per-CPU list of CPU masks, with the first
> > entry having bits set for the CPUs closest to the CPU corresponding to
> > the list, and subsequent entries adding more-distant CPUs.  The last
> > CPU mask would presumably have bits set for all CPUs.
> 
> Indeed. So the scheduler already knows about nodes (included in the
> default_topology thing), here we're constructing masks spanning nodes
> based on distance.
> 
> So the first level is all nodes that are directly connected, the second
> level are all nodes that have one intermediate hop, etc.. with indeed
> the last level being the entire machine.
> 
> > I take it that there is no data structure listing per-node CPU masks,
> > indicating which CPUs are members of a given node?  Or is something else
> > going on here?
> 
> There is, its cpumask_of_node(), you'll find it used in the above
> code :-) We do the for_each_cpu loop because we need the mask per-node
> and there's no such thing as per-node memory so we fudge it using
> per-cpu memory.
> 
> This could be optimized to reduce overhead if this all turns out to work
> well.
> 
> So in short: for every 'i < level', for every cpu, we build a mask of
> which cpus are '<= i' hops away from our current node.

So this information could be used to create a cache-friendly CPU ordering,
such that CPU i and CPU i+1 tend to be electrically close to each other.
One could solve the traveling salesmans problem, but doing a traveral
of the CPUs following the node tree should be much simpler and come
pretty close.

If someone were to show significant performance degradation due to
RCU's using the smp_processor_id() ordering for its rcu_node tree,
I would try this ordering.  It would cause the rcu_node tree performance
to be much less sensitive to the rcu_node tree's geometry.

> > > +
> > > +     tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
> > > +                     sizeof(struct sched_domain_topology_level), GFP_KERNEL);
> > > +     if (!tl)
> > > +             return;
> > > +
> > > +     for (i = 0; default_topology[i].init; i++)
> > > +             tl[i] = default_topology[i];
> > > +
> > > +     for (j = 0; j < level; i++, j++) {
> > > +             tl[i] = (struct sched_domain_topology_level){
> > 
> > tl[j]?
> 
> No, [i]. See how we allocate an array of ARRAY_SIZE(default_topology) +
> level, then copy the default topology array then continue i by j
> additional levels.

OK, good thing I correctly characterized my comments.  ;-)

							Thanx, Paul

> > > +                     .init = sd_init_NUMA,
> > > +                     .mask = sd_numa_mask,
> > > +                     .flags = SDTL_OVERLAP,
> > > +                     .numa_level = j,
> > > +             };
> > > +     }
> > > +
> > > +     sched_domain_topology = tl;
> > > +} 
> 


  reply	other threads:[~2012-04-27 15:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 16:41 [PATCH RFC 0/6] Miscellaneous RCU fixes for 3.5 Paul E. McKenney
2012-04-23 16:42 ` [PATCH RFC tip/core/rcu 1/6] rcu: Stabilize use of num_online_cpus() for GP short circuit Paul E. McKenney
2012-04-23 16:42   ` [PATCH RFC tip/core/rcu 2/6] rcu: List-debug variants of rcu list routines Paul E. McKenney
2012-04-23 16:42   ` [PATCH RFC tip/core/rcu 3/6] rcu: Replace list_first_entry_rcu() with list_first_or_null_rcu() Paul E. McKenney
2012-04-23 16:42   ` [PATCH RFC tip/core/rcu 4/6] rcu: Clarify help text for RCU_BOOST_PRIO Paul E. McKenney
2012-04-26 12:46     ` Peter Zijlstra
2012-04-26 17:28       ` Paul E. McKenney
2012-04-23 16:42   ` [PATCH RFC tip/core/rcu 5/6] rcu: Make __kfree_rcu() less dependent on compiler choices Paul E. McKenney
2012-04-26 12:48     ` Peter Zijlstra
2012-04-26 13:29       ` Jan Engelhardt
2012-04-26 13:50         ` Peter Zijlstra
2012-04-23 16:42   ` [PATCH RFC tip/core/rcu 6/6] rcu: Reduce cache-miss initialization latencies for large systems Paul E. McKenney
2012-04-26 12:51     ` Peter Zijlstra
2012-04-26 14:12       ` Paul E. McKenney
2012-04-26 15:28         ` Peter Zijlstra
2012-04-26 16:15           ` Paul E. McKenney
2012-04-26 19:41             ` Peter Zijlstra
2012-04-26 19:47               ` Peter Zijlstra
2012-04-26 20:29                 ` Paul E. McKenney
2012-04-26 22:04                   ` Peter Zijlstra
2012-04-26 20:28               ` Paul E. McKenney
2012-04-26 22:01                 ` Peter Zijlstra
2012-04-27 14:17                   ` Paul E. McKenney [this message]
2012-04-27  4:36     ` Mike Galbraith
2012-04-27 15:15       ` Paul E. McKenney
2012-04-28  4:42         ` Mike Galbraith
2012-04-28 17:21           ` Paul E. McKenney
2012-04-29  3:54             ` Mike Galbraith
2012-04-24 15:35   ` [PATCH RFC tip/core/rcu 1/6] rcu: Stabilize use of num_online_cpus() for GP short circuit Srivatsa S. Bhat
2012-04-24 16:50     ` Paul E. McKenney
2012-04-24 17:46       ` Srivatsa S. Bhat
2012-05-07  3:47       ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120427141700.GA12854@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=patches@linaro.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.