All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suresh Siddha <suresh.b.siddha@intel.com>
To: Alexander Gordeev <agordeev@redhat.com>
Cc: yinghai@kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org,
	gorcunov@openvz.org, Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain
Date: Wed, 06 Jun 2012 16:02:05 -0700	[thread overview]
Message-ID: <1339023725.28766.55.camel@sbsiddha-desk.sc.intel.com> (raw)
In-Reply-To: <20120606172017.GA4777@dhcp-26-207.brq.redhat.com>

On Wed, 2012-06-06 at 19:20 +0200, Alexander Gordeev wrote:
> On Mon, May 21, 2012 at 04:58:01PM -0700, Suresh Siddha wrote:
> > Until now, irq_cfg domain is mostly static. Either all cpu's (used by flat
> > mode) or one cpu (first cpu in the irq afffinity mask) to which irq is being
> > migrated (this is used by the rest of apic modes).
> > 
> > Upcoming x2apic cluster mode optimization patch allows the irq to be sent
> > to any cpu in the x2apic cluster (if supported by the HW). So irq_cfg
> > domain changes on the fly (depending on which cpu in the x2apic cluster
> > is online).
> > 
> > Instead of checking for any intersection between the new irq affinity
> > mask and the current irq_cfg domain, check if the new irq affinity mask
> > is a subset of the current irq_cfg domain. Otherwise proceed with
> > updating the irq_cfg domain aswell as assigning vector's on all the cpu's
> > specified in the new mask.
> > 
> > This also cleans up a workaround in updating irq_cfg domain for legacy irq's
> > that are handled by the IO-APIC.
> 
> Suresh,
> 
> I thought you posted these patches for reference and held off with my comments
> until you are collecting the data. But since Ingo picked the patches I will
> sound my concerns in this thread.

These are tested patches and I am ok with Ingo picking it up for getting
further baked in -tip. About the data collection, I have to find the
right system/bios to run the tests for power-aware/round-robin interrupt
routing. Anyways logical xapic mode already has this capability and we
are adding the capability for x2apic cluster mode here. And also,
irqbalance has to ultimately take advantage of this by specifying
multiple cpu's when migrating an interrupt.

Only concern I have with this patchset is what I already mentioned in
the changelog of the second patch. i.e., it reduces the number of IRQ's
that the platform can handle, as we reduce the available number of
vectors by a factor of 16.

If this indeed becomes a problem, then there are few options. Either
reserve the vectors based on the irq destination mask (rather than
reserving on all the cluster members) or reducing the grouping from 16
to a smaller number etc. I can post another patch shortly for this.

> > 
> > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> > ---
> >  arch/x86/kernel/apic/io_apic.c |   15 ++++++---------
> >  1 files changed, 6 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> > index ffdc152..bbf8c43 100644
> > --- a/arch/x86/kernel/apic/io_apic.c
> > +++ b/arch/x86/kernel/apic/io_apic.c
> > @@ -1137,8 +1137,7 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
> >  	old_vector = cfg->vector;
> >  	if (old_vector) {
> >  		cpumask_and(tmp_mask, mask, cpu_online_mask);
> > -		cpumask_and(tmp_mask, cfg->domain, tmp_mask);
> > -		if (!cpumask_empty(tmp_mask)) {
> > +		if (cpumask_subset(tmp_mask, cfg->domain)) {
> 
> Imagine that passed mask is a subset of cfg->domain and also contains at least
> one online CPU from a different cluster. Since domains are always one cluster
> wide this condition ^^^ will fail and we go further.
> 
> >  			free_cpumask_var(tmp_mask);
> >  			return 0;
> >  		}
> > @@ -1152,6 +1151,11 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
> >  
> >  		apic->vector_allocation_domain(cpu, tmp_mask);
> >  
> > +		if (cpumask_subset(tmp_mask, cfg->domain)) {
> 
> Because the mask intersects with cfg->domain this condition ^^^ may succeed and
> we could return with no change from here.
> 
> That raises few concerns to me:
> - The first check is not perfect, because it failed to recognize the
> intersection right away. Instead, we possibly lost multiple loops through the
> mask before we realized we do not need any change at all. Therefore...
> 
> - It would be better to recognize the intersection even before entering the
> loop. But that is exactly what the removed code has been doing before.
> 
> - Depending from the passed mask, we equally likely could have select another
> cluster and switch to it, even though the current cfg->domain is contained
> within the requested mask. Besides it is just not nice, we are also switching
> from a cache-hot cluster. If you suggested that it is enough to pick a first
> found cluster (rather than select a best possible) then there is even less
> reason to switch from cfg->domain here.

Few things to keep in perspective.

this is generic portion of the vector handling code and has to work
across different apic drivers and their cfg domains. And also most of
the intelligence lies in the irqbalance which specifies the irq
destination mask. Traditionally Kernel code selected the first possible
destination and not the best destination among the specified mask.

Anyways the above hunks are trying to address scenario like this for
example: During boot, all the IO-APIC interrupts (legacy/non-legacy) are
routed to cpu-0 with only cpu-0 in their cfg->domain (as we don't know
which other cpu's fall into the same x2apic cluster, we can't pre-set
them in the cfg->domain). Consider a single socket system. After the SMP
bringup of other siblings, those io-apic irq's affinity is modified to
all cpu's in setup_ioapic_dest(). And with the current code,
assign_irq_vector() will bail immediately with out reserving the
corresponding vector on all the cluster members that are now online. And
the interrupt ends up going to only cpu-0 and it will not get corrected
as long as cpu-0 is in the specified interrupt destination mask.

> > +			free_cpumask_var(tmp_mask);
> > +			return 0;
> > +		}
> > +
> >  		vector = current_vector;
> >  		offset = current_offset;
> >  next:
> > @@ -1357,13 +1361,6 @@ static void setup_ioapic_irq(unsigned int irq, struct irq_cfg *cfg,
> >  
> >  	if (!IO_APIC_IRQ(irq))
> >  		return;
> > -	/*
> > -	 * For legacy irqs, cfg->domain starts with cpu 0 for legacy
> > -	 * controllers like 8259. Now that IO-APIC can handle this irq, update
> > -	 * the cfg->domain.
> > -	 */
> > -	if (irq < legacy_pic->nr_legacy_irqs && cpumask_test_cpu(0, cfg->domain))
> > -		apic->vector_allocation_domain(0, cfg->domain);
> 
> 
> This hunk reverts your 69c89ef commit. Regression?
> 

As I mentioned in the changelog, this patch removes the need for that
hacky workaround. commit 69c89ef didn't really fix the underlying
problem (and hence we re-encountered the similar issue (above mentioned)
in the context of x2apic cluster). Clean fix is to address the issue in
assign_irq_vector() which is what this patch does.

thanks,
suresh


  reply	other threads:[~2012-06-06 23:02 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 10:26 [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority delivery mode Alexander Gordeev
2012-05-18 14:41 ` Cyrill Gorcunov
2012-05-18 15:42   ` Alexander Gordeev
2012-05-18 15:51     ` Cyrill Gorcunov
2012-05-19 10:47       ` Cyrill Gorcunov
2012-05-21  7:11         ` Alexander Gordeev
2012-05-21  9:46           ` Cyrill Gorcunov
2012-05-19 20:53 ` Yinghai Lu
2012-05-21  8:13   ` Alexander Gordeev
2012-05-21 23:02     ` Yinghai Lu
2012-05-21 23:33       ` Yinghai Lu
2012-05-22  9:36         ` Alexander Gordeev
2012-05-21 23:44     ` Suresh Siddha
2012-05-21 23:58       ` [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain Suresh Siddha
2012-05-21 23:58         ` [PATCH 2/2] x2apic, cluster: use all the members of one cluster specified in the smp_affinity mask for the interrupt desintation Suresh Siddha
2012-05-22  7:04           ` Ingo Molnar
2012-05-22  7:34             ` Cyrill Gorcunov
2012-05-22 17:21             ` Suresh Siddha
2012-05-22 17:39               ` Cyrill Gorcunov
2012-05-22 17:42                 ` Suresh Siddha
2012-05-22 17:45                   ` Cyrill Gorcunov
2012-05-22 20:03           ` Yinghai Lu
2012-06-06 15:04           ` [tip:x86/apic] x86/x2apic/cluster: Use all the members of one cluster specified in the smp_affinity mask for the interrupt destination tip-bot for Suresh Siddha
2012-06-06 22:21             ` Yinghai Lu
2012-06-06 23:14               ` Suresh Siddha
2012-06-06 15:03         ` [tip:x86/apic] x86/irq: Update irq_cfg domain unless the new affinity is a subset of the current domain tip-bot for Suresh Siddha
2012-08-07 15:31           ` Robert Richter
2012-08-07 15:41             ` do_IRQ: 1.55 No irq handler for vector (irq -1) Borislav Petkov
2012-08-07 16:24               ` Suresh Siddha
2012-08-07 17:28                 ` Robert Richter
2012-08-07 17:47                   ` Suresh Siddha
2012-08-07 17:45                 ` Eric W. Biederman
2012-08-07 20:57                   ` Borislav Petkov
2012-08-07 22:39                     ` Suresh Siddha
2012-08-08  8:58                       ` Robert Richter
2012-08-08 11:04                         ` Borislav Petkov
2012-08-08 19:16                           ` Suresh Siddha
2012-08-14 17:02                             ` [tip:x86/urgent] x86, apic: fix broken legacy interrupts in the logical apic mode tip-bot for Suresh Siddha
2012-06-06 17:20         ` [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain Alexander Gordeev
2012-06-06 23:02           ` Suresh Siddha [this message]
2012-06-16  0:25           ` Suresh Siddha
2012-06-18  9:17             ` Alexander Gordeev
2012-06-19  0:51               ` Suresh Siddha
2012-06-19 23:43                 ` [PATCH 1/2] x86, apic: optimize cpu traversal in __assign_irq_vector() using domain membership Suresh Siddha
2012-06-19 23:43                   ` [PATCH 2/2] x86, x2apic: limit the vector reservation to the user specified mask Suresh Siddha
2012-06-20  5:56                     ` Yinghai Lu
2012-06-21  9:04                     ` Alexander Gordeev
2012-06-21 21:51                       ` Suresh Siddha
2012-06-20  5:53                   ` [PATCH 1/2] x86, apic: optimize cpu traversal in __assign_irq_vector() using domain membership Yinghai Lu
2012-06-21  8:31                   ` Alexander Gordeev
2012-06-21 21:53                     ` Suresh Siddha
2012-06-20  0:18               ` [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain Suresh Siddha
2012-06-21 11:00                 ` Alexander Gordeev
2012-06-21 21:58                   ` Suresh Siddha
2012-05-22 10:12       ` [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority delivery mode Alexander Gordeev
2012-05-21  8:22 ` Ingo Molnar
2012-05-21  9:36   ` Alexander Gordeev
2012-05-21 12:40     ` Ingo Molnar
2012-05-21 14:48       ` Alexander Gordeev
2012-05-21 14:59         ` Ingo Molnar
2012-05-21 15:22           ` Alexander Gordeev
2012-05-21 15:34           ` Cyrill Gorcunov
2012-05-21 15:36           ` Linus Torvalds
2012-05-21 18:07             ` Suresh Siddha
2012-05-21 18:18               ` Linus Torvalds
2012-05-21 18:37                 ` Suresh Siddha
2012-05-21 19:30                   ` Ingo Molnar
2012-05-21 19:15             ` Ingo Molnar
2012-05-21 19:56               ` Suresh Siddha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1339023725.28766.55.camel@sbsiddha-desk.sc.intel.com \
    --to=suresh.b.siddha@intel.com \
    --cc=agordeev@redhat.com \
    --cc=gorcunov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.