All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>,
	Arjan van de Ven <arjan@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	"Siddha, Suresh B" <suresh.b.siddha@intel.com>,
	Yinghai Lu <yinghai@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	David Miller <davem@davemloft.net>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH v6] x86/apic: limit irq affinity
Date: Sat, 05 Dec 2009 02:38:09 -0800	[thread overview]
Message-ID: <1260009489.3565.34.camel@localhost> (raw)
In-Reply-To: <m1eina9vw1.fsf@fess.ebiederm.org>

On Fri, 2009-12-04 at 15:12 -0800, Eric W. Biederman wrote:
> Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> writes:
> 
> >
> >> > 
> >> > > 
> >> > > Also, can we add a restricted mask as I mention above into this scheme?  If we can't send an IRQ to some node, we don't want to bother attempting to change affinity to cpus on that node (hopefully code in the kernel will eventually restrict this).
> >> > > 
> >> > 
> >> > The interface allows you to put in any CPU mask.  The way it's written 
> >> > now, whatever mask you put in, irqbalance *only* balances within that 
> >> > mask.  It won't ever try and go outside that mask.
> >> 
> >> OK.  Given that, it might be nice to combine the restricted cpus that I'm describing with your node_affinity mask, but we could expose them as separate masks (node_affinity and restricted_affinity, as I describe above).
> >> 
> >
> > I think this might be getting too complicated.  The only thing
> > irqbalance is lacking today, in my mind, is the feedback mechanism,
> > telling it what subset of CPU masks to balance within.
> 
> You mean besides knowing that devices can have more than one irq?

Why does it matter if it does or doesn't?  The interrupts have to go
somewhere.

> You mean besides making good on it's promise not to move networking
> irqs?  A policy of BALANCE_CORE sure doesn't look like a policy of
> don't touch.

Not moving network irqs is something Arjan said could be a bug, and he'd
be happy to either look into it, or welcome a patch if it really is
broken.   As for BALANCE_CORE, I have no idea what you're talking about.

> You mean besides realizing that irqs can only be directed at one cpu on
> x86?  At least when you have more than 8 logical cores in the system, the
> cases that matter.
> 

Huh?  I can have all of my interrupts directed to a single CPU on x86.
Can you give me an example here?

> > There is a
> > allowed_mask, but that is used for a different purpose.  Hence why I
> > added another.  But I think your needs can be met 100% with what I have
> > already, and we can come up with a different name that's more generic.
> > The flows would be something like this:
> 
> Two masks?  You are asking the kernel to move irqs for you then?

Absolutely not.  Were you not following this thread earlier when this
was being discussed with Thomas?

> > Driver:
> > - Driver comes online, allocates memory in a sensible NUMA fashion
> > - Driver requests kernel for interrupts, ties them into handlers
> > - Driver now sets a NUMA-friendly affinity for each interrupt, to match
> > with its initial memory allocation
> > - irqbalance balances interrupts within their new "hinted" affinities.
> >
> > Other:
> > - System comes online
> > - In your case, interrupts must be kept away from certain CPUs.
> > - Some mechanism in your architecture init can set the "hinted" affinity
> > mask for each interrupt.
> > - irqbalance will not move interrupts to the CPUs you left out of the
> > "hinted" affinity.
> >
> > Does this make more sense?
> 
> 
> >> > > As a matter of fact, driver's allocating rings, buffers, queues on other nodes should optimally be made aware of the restriction.
> >> > 
> >> > The idea is that the driver will do its memory allocations for everything 
> >> > across nodes.  When it does that, it will use the kernel interface 
> >> > (function call) to set the corresponding mask it wants for those queue 
> >> > resources.  That is my end-goal for this code.
> >> > 
> >> 
> >> OK, but we will eventually have to reject any irqbalance attempts to send irqs to restricted nodes.
> >
> > See above.
> 
> Either I am parsing this conversation wrong or there is a strong
> reality distortion field in place.  It appears you are asking that we
> depend on a user space application to not attempt the physically
> impossible, when we could just as easily ignore or report -EINVAL to.
> 

You are parsing this conversation incorrectly.  I also don't understand
why you always have a very negative view of how impossible everything
is.  Do you think we get no work done in the kernel?  We deal with
countless issues across the kernel that are hard.  Being hard doesn't
mean they're impossible, it just means we may have to try something new
and unknown.

What I'm asking is we make some mechanism for drivers to manage their
interrupt affinities.  Today drivers have no influence or control where
their interrupts land.  This is a limitation, plain and simple.  We need
a mechanism to allow a driver to say "hey, this interrupt needs to run
only on these CPUs.  Going elsewhere can severely impact performance of
your network."  Whatever listens and acts on that mechanism is
irrelevant.

> We really have two separate problems hear.
> - How to avoid the impossible.

Really man, this type of view is neither helpful or useful.  Either help
people solve problems, or keep your negative views on proposed solutions
to problems to yourself.

> - How to deal with NUMA affinity.

More generally, how to deal with a device's preferred affinity.  That is
the real issue I'm trying to solve.

Cheers,
-PJ


  reply	other threads:[~2009-12-05 10:38 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-20 21:11 [PATCH v6] x86/apic: limit irq affinity Dimitri Sivanich
2009-11-21 18:49 ` Eric W. Biederman
2009-11-22  1:14   ` Dimitri Sivanich
2009-11-24 13:20     ` Thomas Gleixner
2009-11-24 13:39       ` Peter Zijlstra
2009-11-24 13:55         ` Thomas Gleixner
2009-11-24 14:50           ` Arjan van de Ven
2009-11-24 17:41             ` Eric W. Biederman
2009-11-24 18:00               ` Peter P Waskiewicz Jr
2009-11-24 18:20               ` Ingo Molnar
2009-11-24 18:27                 ` Yinghai Lu
2009-11-24 18:32                   ` Peter Zijlstra
2009-11-24 18:59                     ` Yinghai Lu
2009-11-24 21:41               ` Dimitri Sivanich
2009-11-24 21:51                 ` Thomas Gleixner
2009-11-24 23:06                   ` Eric W. Biederman
2009-11-25  1:23                     ` Thomas Gleixner
2009-11-24 22:42                 ` Eric W. Biederman
2009-11-25 15:40               ` Arjan van de Ven
2009-12-03 16:50                 ` Dimitri Sivanich
2009-12-03 16:53                   ` Waskiewicz Jr, Peter P
2009-12-03 17:01                     ` Dimitri Sivanich
2009-12-03 17:07                       ` Waskiewicz Jr, Peter P
2009-12-03 17:19                         ` Dimitri Sivanich
2009-12-03 18:50                           ` Waskiewicz Jr, Peter P
2009-12-04 16:42                             ` Dimitri Sivanich
2009-12-04 21:17                               ` Peter P Waskiewicz Jr
2009-12-04 23:12                                 ` Eric W. Biederman
2009-12-05 10:38                                   ` Peter P Waskiewicz Jr [this message]
2009-12-07 13:44                                   ` Dimitri Sivanich
2009-12-07 13:39                                 ` Dimitri Sivanich
2009-12-07 23:28                                   ` Peter P Waskiewicz Jr
2009-12-08 15:04                                     ` Dimitri Sivanich
2009-12-11  3:16                 ` david

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1260009489.3565.34.camel@localhost \
    --to=peter.p.waskiewicz.jr@intel.com \
    --cc=arjan@infradead.org \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=sivanich@sgi.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.