linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>, x86@kernel.org
Cc: iommu <iommu@lists.linux-foundation.org>,
	kvm <kvm@vger.kernel.org>,
	linux-hyperv@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
Date: Thu, 08 Oct 2020 08:21:17 +0100	[thread overview]
Message-ID: <119c2f993cac5d57c54d4720addc9f32bf1daadd.camel@infradead.org> (raw)
In-Reply-To: <87a6wy3u6n.fsf@nanos.tec.linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 2997 bytes --]

On Wed, 2020-10-07 at 17:57 +0200, Thomas Gleixner wrote:
> TLDR & HTF;
> 
> Multiqueue devices want to have at max 1 queue per CPU or if the device
> has less queues than CPUs they want the queues to have a fixed
> association to a set of CPUs.
> 
> At setup time this is established considering possible CPUs to handle
> 'physical' hotplug correctly.
> 
> If a queue has no online CPUs it cannot be started. If it's active and
> the last CPU goes down then it's quiesced and stopped and the core code
> shuts down the interrupt and does not move it to a still online CPU.
> 
> So with your hackery, we end up in a situation where we have a large
> possible mask, but not all CPUs in that mask can be reached, which means
> in a 1 queue per CPU scenario all unreachable CPUs would have
> disfunctional queues.
> 
> So that spreading algorithm needs to know about this limitation.

OK, thanks. So the queue exists, with an MSI assigned to point to an
offline CPU(s), but it cannot actually be used until/unless at least
one CPU in its mask comes online.

So when I said I wanted to try treating "reachable" the same way as
"online", that would mean the queue can't start until/unless at least
one *reachable* CPU in its mask comes online.


The underlying problem here is that until a CPU comes online, we don't
actually *know* if it's reachable or not.

So if we want carefully create the affinity masks at setup time so that
they don't include any unreachable CPUs... that basically means we
don't include any non-present CPUs at all (unless they've been added
once and then removed).

That's because — at least on x86 — we don't assign CPU numbers to CPUs
which haven't yet been added. Theoretically we *could* but if there are
more than NR_CPUS listed in (e.g.) the MADT then we could run out of
CPU numbers. Then if the admin hotplugs one of the CPUs we *didn't*
have space for, we'd not be able to handle it.

I suppose there might be options like pre-assigning CPU numbers only to
non-present APIC IDs below 256 (or 32768). Or *grouping* the CPU
numbers, so some not-yet-assigned CPU#s are for low APICIDs and some
are for high APICIDs but it doesn't actually have to be a 1:1
predetermined mapping.

But those really do seem like hacks which might only apply on x86,
while the generic approach of treating "reachable" like "online" seems
like it would work in other cases too.

Fundamentally, there are three sets of CPUs. There are those known to
be reachable, those known not to be, and those which are not yet known.

So another approach we could use is to work with a cpumask of those
*known* not to be reachable, and to filter those *out* of the prebuilt
affinities. That gives us basically the right behaviour without
hotplug, but does include absent CPUs in a mask that *if* they are ever
added, wouldn't be able to receive the IRQ. Which does mean we'd have
to refrain from bringing up the corresponding queue. 


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

  parent reply	other threads:[~2020-10-08  7:21 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-05 15:28 [PATCH 0/13] Fix per-domain IRQ affinity, allow >255 CPUs on x86 without IRQ remapping David Woodhouse
2020-10-05 15:28 ` [PATCH 01/13] x86/apic: Use x2apic in guest kernels even with unusable CPUs David Woodhouse
2020-10-05 15:28   ` [PATCH 02/13] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-06 20:45     ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 03/13] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-05 15:28   ` [PATCH 04/13] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
2020-10-05 15:28   ` [PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs() David Woodhouse
2020-10-06 21:01     ` Thomas Gleixner
2020-10-06 21:07       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 06/13] genirq: Add default_affinity argument " David Woodhouse
2020-10-06 21:06     ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs() David Woodhouse
2020-10-06 21:26     ` Thomas Gleixner
2020-10-07  7:19       ` David Woodhouse
2020-10-07 13:37         ` Thomas Gleixner
2020-10-07 14:10           ` David Woodhouse
2020-10-07 15:57             ` Thomas Gleixner
2020-10-07 16:11               ` David Woodhouse
2020-10-07 20:53                 ` Thomas Gleixner
2020-10-08  7:21               ` David Woodhouse [this message]
2020-10-08  9:34                 ` Thomas Gleixner
2020-10-08 11:10                   ` David Woodhouse
2020-10-08 12:40                     ` Thomas Gleixner
2020-10-09  7:54                       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 08/13] genirq: Add irq_domain_set_affinity() David Woodhouse
2020-10-06 21:32     ` Thomas Gleixner
2020-10-07  7:22       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask David Woodhouse
2020-10-06 21:42     ` Thomas Gleixner
2020-10-07  7:25       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR David Woodhouse
2020-10-06 21:54     ` Thomas Gleixner
2020-10-07  7:48       ` David Woodhouse
2020-10-07 12:59         ` Thomas Gleixner
2020-10-07 13:08           ` David Woodhouse
2020-10-07 14:05             ` Thomas Gleixner
2020-10-07 14:23               ` David Woodhouse
2020-10-07 16:02                 ` Thomas Gleixner
2020-10-07 16:15                   ` David Woodhouse
2020-10-07 15:05               ` David Woodhouse
2020-10-07 15:25                 ` Thomas Gleixner
2020-10-07 15:46                   ` David Woodhouse
2020-10-07 17:23                     ` Thomas Gleixner
2020-10-07 17:34                       ` David Woodhouse
2020-10-05 15:28   ` [PATCH 11/13] x86/smp: Allow more than 255 CPUs even without interrupt remapping David Woodhouse
2020-10-05 15:28   ` [PATCH 12/13] iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant David Woodhouse
2020-10-05 15:28   ` [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-07  8:14     ` Paolo Bonzini
2020-10-07  8:59       ` David Woodhouse
2020-10-07 11:15         ` Paolo Bonzini
2020-10-07 12:04           ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=119c2f993cac5d57c54d4720addc9f32bf1daadd.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).