On Wed, 2020-10-07 at 17:25 +0200, Thomas Gleixner wrote: > It's clearly how the hardware works. MSI has a message store of some > sorts and if the entry is enabled then the MSI chip (in PCI or > elsewhere) will send exactly the message which is in that message > store. It knows absolutely nothing about what the message means and how > it is composed. The only things which MSI knows about is whether the > message address is 64bit wide or not and whether the entries are > maskable or not and how many entries it can store. > > Software allocates a message target at the underlying irq domain (vector > or remap) and that underlying irq domain defines the properties. > > If qemu emulates it differently then it's qemu's problem, but that does > not make it in anyway something which influences the irq domain > abstractions which are correctly modeled after how the hardware works. > > > Not really the important part to deal with right now, either way. > > Oh yes it is. We define that upfront and not after the fact. The way the hardware works is that something handles physical address cycles to addresses in the (on x86) 0xFEExxxxx range, and turns them into actual interrupts on the appropriate CPU — where the APIC ID and vector (etc.) are directly encoded in the bits of the address and the data written. That compatibility x86 APIC MSI format is where the 8-bit (or 15-bit) limit comes from. Then interrupt remapping comes along, and now those physical address cycles are actually handled by the IOMMU — which can either handle the compatibility format as before, or use a different format of address/data bits and perform a lookup in its IRTE table. The PCI MSI domain, HPET, and even the IOAPIC are just the things out there on the bus which might perform those physical address cycles. And yes, as you say they're just a message store sending exactly the message that was composed for them. They know absolutely nothing about what the message means and how it is composed. It so happens that in Linux, we don't really architect the software like that. So each of the PCI MSI domain, HPET, and IOAPIC have their *own* message composer which has the same limits and composes basically the same messages as if it was *their* format, not dictated to them by the APIC upstream. And that's what we're both getting our panties in a knot about, I think. It really doesn't matter that much to the underlying generic irqdomain support for limited affinities. Except that you want to make the generic code support the concept of a child domain supporting *more* CPUs than its parent, which really doesn't make much sense if you think about it. But it isn't that hard to do, and if it means we don't have to argue any more about the x86 hierarchy not matching the hardware then it's a price worth paying.