On Wed, 2020-10-07 at 17:57 +0200, Thomas Gleixner wrote: > TLDR & HTF; > > Multiqueue devices want to have at max 1 queue per CPU or if the device > has less queues than CPUs they want the queues to have a fixed > association to a set of CPUs. > > At setup time this is established considering possible CPUs to handle > 'physical' hotplug correctly. > > If a queue has no online CPUs it cannot be started. If it's active and > the last CPU goes down then it's quiesced and stopped and the core code > shuts down the interrupt and does not move it to a still online CPU. > > So with your hackery, we end up in a situation where we have a large > possible mask, but not all CPUs in that mask can be reached, which means > in a 1 queue per CPU scenario all unreachable CPUs would have > disfunctional queues. > > So that spreading algorithm needs to know about this limitation. OK, thanks. So the queue exists, with an MSI assigned to point to an offline CPU(s), but it cannot actually be used until/unless at least one CPU in its mask comes online. So when I said I wanted to try treating "reachable" the same way as "online", that would mean the queue can't start until/unless at least one *reachable* CPU in its mask comes online. The underlying problem here is that until a CPU comes online, we don't actually *know* if it's reachable or not. So if we want carefully create the affinity masks at setup time so that they don't include any unreachable CPUs... that basically means we don't include any non-present CPUs at all (unless they've been added once and then removed). That's because — at least on x86 — we don't assign CPU numbers to CPUs which haven't yet been added. Theoretically we *could* but if there are more than NR_CPUS listed in (e.g.) the MADT then we could run out of CPU numbers. Then if the admin hotplugs one of the CPUs we *didn't* have space for, we'd not be able to handle it. I suppose there might be options like pre-assigning CPU numbers only to non-present APIC IDs below 256 (or 32768). Or *grouping* the CPU numbers, so some not-yet-assigned CPU#s are for low APICIDs and some are for high APICIDs but it doesn't actually have to be a 1:1 predetermined mapping. But those really do seem like hacks which might only apply on x86, while the generic approach of treating "reachable" like "online" seems like it would work in other cases too. Fundamentally, there are three sets of CPUs. There are those known to be reachable, those known not to be, and those which are not yet known. So another approach we could use is to work with a cpumask of those *known* not to be reachable, and to filter those *out* of the prebuilt affinities. That gives us basically the right behaviour without hotplug, but does include absent CPUs in a mask that *if* they are ever added, wouldn't be able to receive the IRQ. Which does mean we'd have to refrain from bringing up the corresponding queue.