Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: David Woodhouse <dwmw2@infradead.org>, x86@kernel.org
Cc: iommu <iommu@lists.linux-foundation.org>,
	kvm <kvm@vger.kernel.org>,
	linux-hyperv@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
Date: Wed, 07 Oct 2020 15:37:39 +0200
Message-ID: <87tuv640nw.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <75d79c50d586c18f0b1509423ed673670fc76431.camel@infradead.org>

On Wed, Oct 07 2020 at 08:19, David Woodhouse wrote:
> On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote:
>> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
>> > From: David Woodhouse <dwmw@amazon.co.uk>
>> > 
>> > This is the maximum possible set of CPUs which can be used. Use it
>> > to calculate the default affinity requested from __irq_alloc_descs()
>> > by first attempting to find the intersection with irq_default_affinity,
>> > or falling back to using just the max_affinity if the intersection
>> > would be empty.
>> 
>> And why do we need that as yet another argument?
>> 
>> This is an optional property of the irq domain, really and no caller has
>> any business with that. 
>
> Because irq_domain_alloc_descs() doesn't actually *take* the domain as
> an argument. It's more of an internal function, which is only non-
> static because it's used from kernel/irq/ipi.c too for some reason. If
> we convert the IPI code to just call __irq_alloc_descs() directly,
> perhaps that we can actually make irq_domain_alloc_decs() static.

What is preventing you to change the function signature? But handing
down irqdomain here is not cutting it. The right thing to do is to
replace 'struct irq_affinity_desc *affinity' with something more
flexible.

>> >  int irq_domain_alloc_descs(int virq, unsigned int cnt, irq_hw_number_t hwirq,
>> > -			   int node, const struct irq_affinity_desc *affinity)
>> > +			   int node, const struct irq_affinity_desc *affinity,
>> > +			   const struct cpumask *max_affinity)
>> >  {
>> > +	cpumask_var_t default_affinity;
>> >  	unsigned int hint;
>> > +	int i;
>> > +
>> > +	/* Check requested per-IRQ affinities are in the possible range */
>> > +	if (affinity && max_affinity) {
>> > +		for (i = 0; i < cnt; i++)
>> > +			if (!cpumask_subset(&affinity[i].mask, max_affinity))
>> > +				return -EINVAL;
>> 
>> https://lore.kernel.org/r/alpine.DEB.2.20.1701171956290.3645@nanos
>> 
>> What is preventing the affinity spreading code from spreading the masks
>> out to unusable CPUs? The changelog is silent about that part.
>
> I'm coming to the conclusion that we should allow unusable CPUs to be
> specified at this point, just as we do offline CPUs. That's largely
> driven by the realisation that our x86_non_ir_cpumask is only going to
> contain online CPUs anyway, and hotplugged CPUs only get added to it as
> they are brought online.

Can you please stop looking at this from a x86 only perspective. It's
largely irrelevant what particular needs x86 or virt or whatever has.

Fact is, that if there are CPUs which cannot be targeted by device
interrupts then the multiqueue affinity mechanism has to be fixed to
handle this. Right now it's just broken.

Passing yet more cpumasks and random pointers around through device
drivers and whatever is just not going to happen. Neither are we going
to have

        arch_can_be_used_for_device_interrupts_mask

or whatever you come up with and claim it to be 'generic'.

The whole affinity control mechanism needs to be refactored from ground
up and the information about CPUs which can be targeted has to be
retrievable through the irqdomain hierarchy.

Anything else is just tinkering and I have zero interest in mopping up
after you.

It's pretty obvious that the irq domains are the right place to store
that information:

const struct cpumask *irqdomain_get_possible_affinity(struct irq_domain *d)
{
        while (d) {
        	if (d->get_possible_affinity)
                	return d->get_possible_affinity(d);
                d = d->parent;
        }
        return cpu_possible_mask;
}

So if you look at X86 then you have either:

   [VECTOR] ----------------- [IO/APIC]
                          |-- [MSI]
                          |-- [WHATEVER]

or

   [VECTOR] ---[REMAP]------- [IO/APIC]
             |            |-- [MSI]
             |----------------[WHATEVER]

So if REMAP allows cpu_possible_mask and VECTOR some restricted subset
then irqdomain_get_possible_affinity() will return the correct result
independent whether remapping is enabled or not.

This allows to use that for other things like per node restrictions or
whatever people come up with, without sprinkling more insanities through
the tree.

Thanks,

        tglx

  reply index

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-05 15:28 [PATCH 0/13] Fix per-domain IRQ affinity, allow >255 CPUs on x86 without IRQ remapping David Woodhouse
2020-10-05 15:28 ` [PATCH 01/13] x86/apic: Use x2apic in guest kernels even with unusable CPUs David Woodhouse
2020-10-05 15:28   ` [PATCH 02/13] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-06 20:45     ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 03/13] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-05 15:28   ` [PATCH 04/13] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
2020-10-05 15:28   ` [PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs() David Woodhouse
2020-10-06 21:01     ` Thomas Gleixner
2020-10-06 21:07       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 06/13] genirq: Add default_affinity argument " David Woodhouse
2020-10-06 21:06     ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs() David Woodhouse
2020-10-06 21:26     ` Thomas Gleixner
2020-10-07  7:19       ` David Woodhouse
2020-10-07 13:37         ` Thomas Gleixner [this message]
2020-10-07 14:10           ` David Woodhouse
2020-10-07 15:57             ` Thomas Gleixner
2020-10-07 16:11               ` David Woodhouse
2020-10-07 20:53                 ` Thomas Gleixner
2020-10-08  7:21               ` David Woodhouse
2020-10-08  9:34                 ` Thomas Gleixner
2020-10-08 11:10                   ` David Woodhouse
2020-10-08 12:40                     ` Thomas Gleixner
2020-10-09  7:54                       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 08/13] genirq: Add irq_domain_set_affinity() David Woodhouse
2020-10-06 21:32     ` Thomas Gleixner
2020-10-07  7:22       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask David Woodhouse
2020-10-06 21:42     ` Thomas Gleixner
2020-10-07  7:25       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR David Woodhouse
2020-10-06 21:54     ` Thomas Gleixner
2020-10-07  7:48       ` David Woodhouse
2020-10-07 12:59         ` Thomas Gleixner
2020-10-07 13:08           ` David Woodhouse
2020-10-07 14:05             ` Thomas Gleixner
2020-10-07 14:23               ` David Woodhouse
2020-10-07 16:02                 ` Thomas Gleixner
2020-10-07 16:15                   ` David Woodhouse
2020-10-07 15:05               ` David Woodhouse
2020-10-07 15:25                 ` Thomas Gleixner
2020-10-07 15:46                   ` David Woodhouse
2020-10-07 17:23                     ` Thomas Gleixner
2020-10-07 17:34                       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 11/13] x86/smp: Allow more than 255 CPUs even without interrupt remapping David Woodhouse
2020-10-05 15:28   ` [PATCH 12/13] iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant David Woodhouse
2020-10-05 15:28   ` [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-07  8:14     ` Paolo Bonzini
2020-10-07  8:59       ` David Woodhouse
2020-10-07 11:15         ` Paolo Bonzini
2020-10-07 12:04           ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuv640nw.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git