A recent discussion about the PCI/MSI management for virtio unearthed a violation of the MSI-X specification vs. writing the MSI-X message: under certain circumstances the entry is written without being masked. While looking at that and the related violation of the x86 non-remapped interrupt affinity mechanism a few other issues were discovered by inspection. The following series addresses these. Note this does not fix the virtio issue, but while staring at the above problems I came up with a plan to address this. I'm still trying to convince myself that I can get away without sprinkling locking all over the place, so don't hold your breath that this will materialize tomorrow. The series is also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git irq/msi V1 can be found here: https://lore.kernel.org/r/20210721191126.274946280@linutronix.de Changes vs. V1: - Identified and addressed more inconsistencies, especially the lack of serialization for multi-MSI masking - Removed the extra vector masking in S390 - Addressed review comments and picked up tags where applicable - Clean up of the naming of msi_desc::masked as discussed in the V1 thread - Consolidation of the mask/unmask functions Thanks, tglx --- arch/s390/pci/pci_irq.c | 4 arch/x86/kernel/apic/io_apic.c | 6 arch/x86/kernel/apic/msi.c | 11 + arch/x86/kernel/hpet.c | 2 drivers/base/core.c | 1 drivers/pci/msi.c | 274 ++++++++++++++++++++++------------------- include/linux/device.h | 1 include/linux/irq.h | 2 include/linux/msi.h | 10 - kernel/irq/chip.c | 5 10 files changed, 178 insertions(+), 138 deletions(-)
From: Thomas Gleixner <tglx@linutronix.de> The ordering of MSI-X enable in hardware is dysfunctional: 1) MSI-X is disabled in the control register 2) Various setup functions 3) pci_msi_setup_msi_irqs() is invoked which ends up accessing the MSI-X table entries 4) MSI-X is enabled and masked in the control register with the comment that enabling is required for some hardware to access the MSI-X table Step #4 obviously contradicts #3. The history of this is an issue with the NIU hardware. When #4 was introduced the table access actually happened in msix_program_entries() which was invoked after enabling and masking MSI-X. This was changed in commit d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") which removed the table write from msix_program_entries(). Interestingly enough nobody noticed and either NIU still works or it did not get any testing with a kernel 3.19 or later. Nevertheless this is inconsistent and there is no reason why MSI-X can't be enabled and masked in the control register early on, i.e. move step #4 above to step #1. This preserves the NIU workaround and has no side effects on other hardware. Fixes: d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/msi.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -772,18 +772,25 @@ static int msix_capability_init(struct p u16 control; void __iomem *base; - /* Ensure MSI-X is disabled while it is set up */ - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + /* + * Some devices require MSI-X to be enabled before the MSI-X + * registers can be accessed. Mask all the vectors to prevent + * interrupts coming in before they're fully set up. + */ + pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL | + PCI_MSIX_FLAGS_ENABLE); pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ base = msix_map_region(dev, msix_table_size(control)); - if (!base) - return -ENOMEM; + if (!base) { + ret = -ENOMEM; + goto out_disable; + } ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) - return ret; + goto out_disable; ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX); if (ret) @@ -794,14 +801,6 @@ static int msix_capability_init(struct p if (ret) goto out_free; - /* - * Some devices require MSI-X to be enabled before we can touch the - * MSI-X registers. We need to mask all the vectors to prevent - * interrupts coming in before they're fully set up. - */ - pci_msix_clear_and_set_ctrl(dev, 0, - PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE); - msix_program_entries(dev, entries); ret = populate_msi_sysfs(dev); @@ -836,6 +835,9 @@ static int msix_capability_init(struct p out_free: free_msi_irqs(dev); +out_disable: + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + return ret; }
From: Thomas Gleixner <tglx@linutronix.de> When MSI-X is enabled the ordering of calls is: msix_map_region(); msix_setup_entries(); pci_msi_setup_msi_irqs(); msix_program_entries(); This has a few interesting issues: 1) msix_setup_entries() allocates the MSI descriptors and initializes them except for the msi_desc:masked member which is left zero initialized. 2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets up the MSI interrupts which ends up in pci_write_msi_msg() unless the interrupt chip provides its own irq_write_msi_msg() function. 3) msix_program_entries() does not do what the name suggests. It solely updates the entries array (if not NULL) and initializes the masked member for each MSI descriptor by reading the hardware state and then masks the entry. Obviously this has some issues: 1) The uninitialized masked member of msi_desc prevents the enforcement of masking the entry in pci_write_msi_msg() depending on the cached masked bit. Aside of that half initialized data is a NONO in general 2) msix_program_entries() only ensures that the actually allocated entries are masked. This is wrong as experimentation with crash testing and crash kernel kexec has shown. This limited testing unearthed that when the production kernel had more entries in use and unmasked when it crashed and the crash kernel allocated a smaller amount of entries, then a full scan of all entries found unmasked entries which were in use in the production kernel. This is obviously a device or emulation issue as the device reset should mask all MSI-X table entries, but obviously that's just part of the paper specification. Cure this by: 1) Masking all table entries in hardware 2) Initializing msi_desc::masked in msix_setup_entries() 3) Removing the mask dance in msix_program_entries() 4) Renaming msix_program_entries() to msix_update_entries() to reflect the purpose of that function. As the masking of unused entries has never been done the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/msi.c | 45 +++++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 18 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -691,6 +691,7 @@ static int msix_setup_entries(struct pci { struct irq_affinity_desc *curmsk, *masks = NULL; struct msi_desc *entry; + void __iomem *addr; int ret, i; int vec_count = pci_msix_vec_count(dev); @@ -711,6 +712,7 @@ static int msix_setup_entries(struct pci entry->msi_attrib.is_msix = 1; entry->msi_attrib.is_64 = 1; + if (entries) entry->msi_attrib.entry_nr = entries[i].entry; else @@ -722,6 +724,10 @@ static int msix_setup_entries(struct pci entry->msi_attrib.default_irq = dev->irq; entry->mask_base = base; + addr = pci_msix_desc_addr(entry); + if (addr) + entry->masked = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); if (masks) curmsk++; @@ -732,26 +738,25 @@ static int msix_setup_entries(struct pci return ret; } -static void msix_program_entries(struct pci_dev *dev, - struct msix_entry *entries) +static void msix_update_entries(struct pci_dev *dev, struct msix_entry *entries) { struct msi_desc *entry; - int i = 0; - void __iomem *desc_addr; for_each_pci_msi_entry(entry, dev) { - if (entries) - entries[i++].vector = entry->irq; + if (entries) { + entries->vector = entry->irq; + entries++; + } + } +} - desc_addr = pci_msix_desc_addr(entry); - if (desc_addr) - entry->masked = readl(desc_addr + - PCI_MSIX_ENTRY_VECTOR_CTRL); - else - entry->masked = 0; +static void msix_mask_all(void __iomem *base, int tsize) +{ + u32 ctrl = PCI_MSIX_ENTRY_CTRL_MASKBIT; + int i; - msix_mask_irq(entry, 1); - } + for (i = 0; i < tsize; i++, base += PCI_MSIX_ENTRY_SIZE) + writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL); } /** @@ -768,9 +773,9 @@ static void msix_program_entries(struct static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, int nvec, struct irq_affinity *affd) { - int ret; - u16 control; void __iomem *base; + int ret, tsize; + u16 control; /* * Some devices require MSI-X to be enabled before the MSI-X @@ -782,12 +787,16 @@ static int msix_capability_init(struct p pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ - base = msix_map_region(dev, msix_table_size(control)); + tsize = msix_table_size(control); + base = msix_map_region(dev, tsize); if (!base) { ret = -ENOMEM; goto out_disable; } + /* Ensure that all table entries are masked. */ + msix_mask_all(base, tsize); + ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) goto out_disable; @@ -801,7 +810,7 @@ static int msix_capability_init(struct p if (ret) goto out_free; - msix_program_entries(dev, entries); + msix_update_entries(dev, entries); ret = populate_msi_sysfs(dev); if (ret)
From: Thomas Gleixner <tglx@linutronix.de> The specification (PCIe r5.0, sec 6.1.4.5) states: For MSI-X, a function is permitted to cache Address and Data values from unmasked MSI-X Table entries. However, anytime software unmasks a currently masked MSI-X Table entry either by clearing its Mask bit or by clearing the Function Mask bit, the function must update any Address or Data values that it cached from that entry. If software changes the Address or Data value of an entry while the entry is unmasked, the result is undefined. The Linux kernel's MSI-X support never enforced that the entry is masked before the entry is modified hence the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Enforce the entry to be masked across the update. There is no point in enforcing this to be handled at all possible call sites as this is just pointless code duplication and the common update function is the obvious place to enforce this. Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/msi.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -289,13 +289,28 @@ void __pci_write_msi_msg(struct msi_desc /* Don't touch the hardware now */ } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); + bool unmasked = !(entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT); if (!base) goto skip; + /* + * The specification mandates that the entry is masked + * when the message is modified: + * + * "If software changes the Address or Data value of an + * entry while the entry is unmasked, the result is + * undefined." + */ + if (unmasked) + __pci_msix_desc_mask_irq(entry, PCI_MSIX_ENTRY_CTRL_MASKBIT); + writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); writel(msg->data, base + PCI_MSIX_ENTRY_DATA); + + if (unmasked) + __pci_msix_desc_mask_irq(entry, 0); } else { int pos = dev->msi_cap; u16 msgctl;
From: Thomas Gleixner <tglx@linutronix.de> Nothing enforces the posted writes to be visible when the function returns. Flush them even if the flush might be redundant when the entry is masked already as the unmask will flush as well. This is either setup or a rare affinity change event so the extra flush is not the end of the world. While this is more a theoretical issue especially the logic in the X86 specific msi_set_affinity() function relies on the assumption that the update has reached the hardware when the function returns. Again, as this never has been enforced the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/msi.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -311,6 +311,9 @@ void __pci_write_msi_msg(struct msi_desc if (unmasked) __pci_msix_desc_mask_irq(entry, 0); + + /* Ensure that the writes are visible in the device */ + readl(base + PCI_MSIX_ENTRY_DATA); } else { int pos = dev->msi_cap; u16 msgctl; @@ -331,6 +334,8 @@ void __pci_write_msi_msg(struct msi_desc pci_write_config_word(dev, pos + PCI_MSI_DATA_32, msg->data); } + /* Ensure that the writes are visible in the device */ + pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl); } skip:
msi_mask_irq() takes a mask and a flags argument. The mask argument is used to mask out bits from the cached mask and the flags argument to set bits. Some places invoke it with a flags argument which sets bits which are not used by the device, i.e. when the device supports up to 8 vectors a full unmask in some places sets the mask to 0xFFFFFF00. While devices probably do not care, it's still bad practice. Fixes: 7ba1930db02f ("PCI MSI: Unmask MSI if setup failed") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -656,21 +656,21 @@ static int msi_capability_init(struct pc /* Configure MSI capability structure */ ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; } ret = msi_verify_entries(dev); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; } ret = populate_msi_sysfs(dev); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; } @@ -962,7 +962,7 @@ static void pci_msi_shutdown(struct pci_ /* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); /* Keep cached state to be restored */ - __pci_msi_desc_mask_irq(desc, mask, ~mask); + __pci_msi_desc_mask_irq(desc, mask, 0); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
The comments about preserving the cached state in pci_msi[x]_shutdown() are misleading as the MSI descriptors are freed right after those functions return. So there is nothing to restore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -961,7 +961,6 @@ static void pci_msi_shutdown(struct pci_ /* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); - /* Keep cached state to be restored */ __pci_msi_desc_mask_irq(desc, mask, 0); /* Restore dev->irq to its default pin-assertion IRQ */ @@ -1047,10 +1046,8 @@ static void pci_msix_shutdown(struct pci } /* Return the device with MSI-X masked as initial states */ - for_each_pci_msi_entry(entry, dev) { - /* Keep cached states to be restored */ + for_each_pci_msi_entry(entry, dev) __pci_msix_desc_mask_irq(entry, 1); - } pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); pci_intx_for_msi(dev, 1);
No point in using the raw write function from shutdown. Preparatory change to introduce proper serialization for the msi_desc::masked cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -961,7 +961,7 @@ static void pci_msi_shutdown(struct pci_ /* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); - __pci_msi_desc_mask_irq(desc, mask, 0); + msi_mask_irq(desc, mask, 0); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
Multi-MSI uses a single MSI descriptor and there is a single mask register when the device supports per vector masking. To avoid reading back the mask register the value is cached in the MSI descriptor and updates are done by clearing and setting bits in the cache and writing it to the device. But nothing protects msi_desc::masked and the mask register from being modified concurrently on two different CPUs for two different Linux interrupts which belong to the same multi-MSI descriptor. Add a lock to struct device and protect any operation on the mask and the mask register with it. This makes the update of msi_desc::masked unconditional, but there is no place which requires a modification of the hardware register without updating the masked cache. msi_mask_irq() is now an empty wrapper which will be cleaned up in follow up changes. The problem goes way back to the initial support of multi-MSI, but picking the commit which introduced the mask cache is a valid cut off point (2.6.30). Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- drivers/base/core.c | 1 + drivers/pci/msi.c | 19 ++++++++++--------- include/linux/device.h | 1 + include/linux/msi.h | 2 +- 4 files changed, 13 insertions(+), 10 deletions(-) --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2837,6 +2837,7 @@ void device_initialize(struct device *de device_pm_init(dev); set_dev_node(dev, -1); #ifdef CONFIG_GENERIC_MSI_IRQ + raw_spin_lock_init(&dev->msi_lock); INIT_LIST_HEAD(&dev->msi_list); #endif INIT_LIST_HEAD(&dev->links.consumers); --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -143,24 +143,25 @@ static inline __attribute_const__ u32 ms * reliably as devices without an INTx disable bit will then generate a * level IRQ which will never be cleared. */ -u32 __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) +void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { - u32 mask_bits = desc->masked; + raw_spinlock_t *lock = &desc->dev->msi_lock; + unsigned long flags; if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) - return 0; + return; - mask_bits &= ~mask; - mask_bits |= flag; + raw_spin_lock_irqsave(lock, flags); + desc->masked &= ~mask; + desc->masked |= flag; pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, - mask_bits); - - return mask_bits; + desc->masked); + raw_spin_unlock_irqrestore(lock, flags); } static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { - desc->masked = __pci_msi_desc_mask_irq(desc, mask, flag); + __pci_msi_desc_mask_irq(desc, mask, flag); } static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) --- a/include/linux/device.h +++ b/include/linux/device.h @@ -506,6 +506,7 @@ struct device { struct dev_pin_info *pins; #endif #ifdef CONFIG_GENERIC_MSI_IRQ + raw_spinlock_t msi_lock; struct list_head msi_list; #endif #ifdef CONFIG_DMA_OPS --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -233,7 +233,7 @@ void __pci_read_msi_msg(struct msi_desc void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg); u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag); -u32 __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); +void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); void pci_msi_mask_irq(struct irq_data *data); void pci_msi_unmask_irq(struct irq_data *data);
From: Thomas Gleixner <tglx@linutronix.de> X86 IO/APIC and MSI interrupts (when used without interrupts remapping) require that the affinity setup on startup is done before the interrupt is enabled for the first time as the non-remapped operation mode cannot safely migrate enabled interrupts from arbitrary contexts. Provide a new irq chip flag which allows affected hardware to request this. This has to be opt-in because there have been reports in the past that some interrupt chips cannot handle affinity setting before startup. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> --- include/linux/irq.h | 2 ++ kernel/irq/chip.c | 5 ++++- 2 files changed, 6 insertions(+), 1 deletion(-) --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -569,6 +569,7 @@ struct irq_chip { * IRQCHIP_SUPPORTS_NMI: Chip can deliver NMIs, only for root irqchips * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND: Invokes __enable_irq()/__disable_irq() for wake irqs * in the suspend path if they are in disabled state + * IRQCHIP_AFFINITY_PRE_STARTUP: Default affinity update before startup */ enum { IRQCHIP_SET_TYPE_MASKED = (1 << 0), @@ -581,6 +582,7 @@ enum { IRQCHIP_SUPPORTS_LEVEL_MSI = (1 << 7), IRQCHIP_SUPPORTS_NMI = (1 << 8), IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND = (1 << 9), + IRQCHIP_AFFINITY_PRE_STARTUP = (1 << 10), }; #include <linux/irqdesc.h> --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -265,8 +265,11 @@ int irq_startup(struct irq_desc *desc, b } else { switch (__irq_startup_managed(desc, aff, force)) { case IRQ_STARTUP_NORMAL: + if (d->chip->flags & IRQCHIP_AFFINITY_PRE_STARTUP) + irq_setup_affinity(desc); ret = __irq_startup(desc); - irq_setup_affinity(desc); + if (!(d->chip->flags & IRQCHIP_AFFINITY_PRE_STARTUP)) + irq_setup_affinity(desc); break; case IRQ_STARTUP_MANAGED: irq_do_set_affinity(d, aff, false);
From: Thomas Gleixner <tglx@linutronix.de> The IO/APIC cannot handle interrupt affinity changes safely after startup other than from an interrupt handler. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> --- arch/x86/kernel/apic/io_apic.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1986,7 +1986,8 @@ static struct irq_chip ioapic_chip __rea .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_get_irqchip_state = ioapic_irq_get_chip_state, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static struct irq_chip ioapic_ir_chip __read_mostly = { @@ -1999,7 +2000,8 @@ static struct irq_chip ioapic_ir_chip __ .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_get_irqchip_state = ioapic_irq_get_chip_state, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static inline void init_IO_APIC_traps(void)
From: Thomas Gleixner <tglx@linutronix.de> The X86 MSI mechanism cannot handle interrupt affinity changes safely after startup other than from an interrupt handler, unless interrupt remapping is enabled. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. While the interrupt remapping MSI chip does not require this, there is no point in treating it differently as this might spare an interrupt to a CPU which is not in the default affinity mask. For the non-remapping case go to the direct write path when the interrupt is not yet started similar to the not yet activated case. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> --- arch/x86/kernel/apic/msi.c | 11 ++++++++--- arch/x86/kernel/hpet.c | 2 +- 2 files changed, 9 insertions(+), 4 deletions(-) --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -58,11 +58,13 @@ msi_set_affinity(struct irq_data *irqd, * The quirk bit is not set in this case. * - The new vector is the same as the old vector * - The old vector is MANAGED_IRQ_SHUTDOWN_VECTOR (interrupt starts up) + * - The interrupt is not yet started up * - The new destination CPU is the same as the old destination CPU */ if (!irqd_msi_nomask_quirk(irqd) || cfg->vector == old_cfg.vector || old_cfg.vector == MANAGED_IRQ_SHUTDOWN_VECTOR || + !irqd_is_started(irqd) || cfg->dest_apicid == old_cfg.dest_apicid) { irq_msi_update_msg(irqd, cfg); return ret; @@ -150,7 +152,8 @@ static struct irq_chip pci_msi_controlle .irq_ack = irq_chip_ack_parent, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_set_affinity = msi_set_affinity, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, @@ -219,7 +222,8 @@ static struct irq_chip pci_msi_ir_contro .irq_mask = pci_msi_mask_irq, .irq_ack = irq_chip_ack_parent, .irq_retrigger = irq_chip_retrigger_hierarchy, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static struct msi_domain_info pci_msi_ir_domain_info = { @@ -273,7 +277,8 @@ static struct irq_chip dmar_msi_controll .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_compose_msi_msg = dmar_msi_compose_msg, .irq_write_msi_msg = dmar_msi_write_msg, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static int dmar_msi_init(struct irq_domain *domain, --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -508,7 +508,7 @@ static struct irq_chip hpet_msi_controll .irq_set_affinity = msi_domain_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_write_msi_msg = hpet_msi_write_msg, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_AFFINITY_PRE_STARTUP, }; static int hpet_msi_init(struct irq_domain *domain,
The PCI core already ensures that the MSI[-X] state is correct when MSI[-X] is disabled. For MSI the reset state is all entries unmasked and for MSI-X all vectors are masked. S390 masks all MSI entries and masks the already masked MSI-X entries again. Remove it and let the device in the correct state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-s390@vger.kernel.org Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> --- arch/s390/pci/pci_irq.c | 4 ---- drivers/pci/msi.c | 4 ++-- include/linux/msi.h | 2 -- 3 files changed, 2 insertions(+), 8 deletions(-) --- a/arch/s390/pci/pci_irq.c +++ b/arch/s390/pci/pci_irq.c @@ -365,10 +365,6 @@ void arch_teardown_msi_irqs(struct pci_d for_each_pci_msi_entry(msi, pdev) { if (!msi->irq) continue; - if (msi->msi_attrib.is_msix) - __pci_msix_desc_mask_irq(msi, 1); - else - __pci_msi_desc_mask_irq(msi, 1, 1); irq_set_msi_desc(msi->irq, NULL); irq_free_desc(msi->irq); msi->msg.address_lo = 0; --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -143,7 +143,7 @@ static inline __attribute_const__ u32 ms * reliably as devices without an INTx disable bit will then generate a * level IRQ which will never be cleared. */ -void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) +static void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { raw_spinlock_t *lock = &desc->dev->msi_lock; unsigned long flags; @@ -180,7 +180,7 @@ static void __iomem *pci_msix_desc_addr( * file. This saves a few milliseconds when initialising devices with lots * of MSI-X interrupts. */ -u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) +static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) { u32 mask_bits = desc->masked; void __iomem *desc_addr; --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -232,8 +232,6 @@ void free_msi_entry(struct msi_desc *ent void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg); void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg); -u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag); -void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); void pci_msi_mask_irq(struct irq_data *data); void pci_msi_unmask_irq(struct irq_data *data);
From: Thomas Gleixner <tglx@linutronix.de> No point in looping over all entries when 64bit addressing mode is enabled for nothing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/msi.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -613,8 +613,11 @@ static int msi_verify_entries(struct pci { struct msi_desc *entry; + if (!dev->no_64bit_msi) + return 0; + for_each_pci_msi_entry(entry, dev) { - if (entry->msg.address_hi && dev->no_64bit_msi) { + if (entry->msg.address_hi) { pci_err(dev, "arch assigned 64-bit MSI address %#x%08x but device only supports 32 bits\n", entry->msg.address_hi, entry->msg.address_lo); return -EIO;
msi_desc::masked is a misnomer. For MSI it's used to cache the MSI mask bits when the device supports per vector masking. For MSI-X it's used to cache the content of the vector control word which contains the mask bit for the vector. Replace it with a union of msi_mask and msix_ctrl to make the purpose clear and fix up the usage sites. No functional change Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 30 +++++++++++++++--------------- include/linux/msi.h | 8 ++++++-- 2 files changed, 21 insertions(+), 17 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -152,10 +152,10 @@ static void __pci_msi_desc_mask_irq(stru return; raw_spin_lock_irqsave(lock, flags); - desc->masked &= ~mask; - desc->masked |= flag; + desc->msi_mask &= ~mask; + desc->msi_mask |= flag; pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, - desc->masked); + desc->msi_mask); raw_spin_unlock_irqrestore(lock, flags); } @@ -182,7 +182,7 @@ static void __iomem *pci_msix_desc_addr( */ static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) { - u32 mask_bits = desc->masked; + u32 ctrl = desc->msix_ctrl; void __iomem *desc_addr; if (pci_msi_ignore_mask) @@ -192,18 +192,18 @@ static u32 __pci_msix_desc_mask_irq(stru if (!desc_addr) return 0; - mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; - if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) - mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; + ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; + if (ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) + ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; - writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); - return mask_bits; + return ctrl; } static void msix_mask_irq(struct msi_desc *desc, u32 flag) { - desc->masked = __pci_msix_desc_mask_irq(desc, flag); + desc->msix_ctrl = __pci_msix_desc_mask_irq(desc, flag); } static void msi_set_mask_bit(struct irq_data *data, u32 flag) @@ -290,7 +290,7 @@ void __pci_write_msi_msg(struct msi_desc /* Don't touch the hardware now */ } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); - bool unmasked = !(entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT); + bool unmasked = !(entry->msix_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); if (!base) goto skip; @@ -430,7 +430,7 @@ static void __pci_restore_msi_state(stru pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control); msi_mask_irq(entry, msi_mask(entry->msi_attrib.multi_cap), - entry->masked); + entry->msi_mask); control &= ~PCI_MSI_FLAGS_QSIZE; control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE; pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control); @@ -461,7 +461,7 @@ static void __pci_restore_msix_state(str arch_restore_msi_irqs(dev); for_each_pci_msi_entry(entry, dev) - msix_mask_irq(entry, entry->masked); + msix_mask_irq(entry, entry->msix_ctrl); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); } @@ -602,7 +602,7 @@ msi_setup_entry(struct pci_dev *dev, int /* Save the initial mask status */ if (entry->msi_attrib.maskbit) - pci_read_config_dword(dev, entry->mask_pos, &entry->masked); + pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask); out: kfree(masks); @@ -750,7 +750,7 @@ static int msix_setup_entries(struct pci addr = pci_msix_desc_addr(entry); if (addr) - entry->masked = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); if (masks) --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -107,7 +107,8 @@ struct ti_sci_inta_msi_desc { * address or data changes * @write_msi_msg_data: Data parameter for the callback. * - * @masked: [PCI MSI/X] Mask bits + * @msi_mask: [PCI MSI] MSI cached mask bits + * @msix_ctrl: [PCI MSI-X] MSI-X cached per vector control bits * @is_msix: [PCI MSI/X] True if MSI-X * @multiple: [PCI MSI/X] log2 num of messages allocated * @multi_cap: [PCI MSI/X] log2 num of messages supported @@ -139,7 +140,10 @@ struct msi_desc { union { /* PCI MSI/X specific data */ struct { - u32 masked; + union { + u32 msi_mask; + u32 msix_ctrl; + }; struct { u8 is_msix : 1; u8 multiple : 3;
Three error exits doing exactly the same ask for a common error exit point. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -659,25 +659,16 @@ static int msi_capability_init(struct pc /* Configure MSI capability structure */ ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI); - if (ret) { - msi_mask_irq(entry, mask, 0); - free_msi_irqs(dev); - return ret; - } + if (ret) + goto err; ret = msi_verify_entries(dev); - if (ret) { - msi_mask_irq(entry, mask, 0); - free_msi_irqs(dev); - return ret; - } + if (ret) + goto err; ret = populate_msi_sysfs(dev); - if (ret) { - msi_mask_irq(entry, mask, 0); - free_msi_irqs(dev); - return ret; - } + if (ret) + goto err; /* Set MSI enabled bits */ pci_intx_for_msi(dev, 0); @@ -687,6 +678,11 @@ static int msi_capability_init(struct pc pcibios_free_irq(dev); dev->irq = entry->irq; return 0; + +err: + msi_mask_irq(entry, mask, 0); + free_msi_irqs(dev); + return ret; } static void __iomem *msix_map_region(struct pci_dev *dev, unsigned nr_entries)
Handling of virtual MSI-X is obfuscated by letting pci_msix_desc_addr() return NULL and checking the pointer. Just use msi_desc::msi_attrib.is_virtual at the call sites and get rid of that pointer check. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 25 ++++++++----------------- 1 file changed, 8 insertions(+), 17 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -166,11 +166,7 @@ static void msi_mask_irq(struct msi_desc static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) { - if (desc->msi_attrib.is_virtual) - return NULL; - - return desc->mask_base + - desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; + return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; } /* @@ -182,14 +178,10 @@ static void __iomem *pci_msix_desc_addr( */ static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) { + void __iomem *desc_addr = pci_msix_desc_addr(desc); u32 ctrl = desc->msix_ctrl; - void __iomem *desc_addr; - - if (pci_msi_ignore_mask) - return 0; - desc_addr = pci_msix_desc_addr(desc); - if (!desc_addr) + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) return 0; ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; @@ -256,10 +248,8 @@ void __pci_read_msi_msg(struct msi_desc if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); - if (!base) { - WARN_ON(1); + if (WARN_ON_ONCE(entry->msi_attrib.is_virtual)) return; - } msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR); msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR); @@ -292,7 +282,7 @@ void __pci_write_msi_msg(struct msi_desc void __iomem *base = pci_msix_desc_addr(entry); bool unmasked = !(entry->msix_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); - if (!base) + if (entry->msi_attrib.is_virtual) goto skip; /* @@ -744,9 +734,10 @@ static int msix_setup_entries(struct pci entry->msi_attrib.default_irq = dev->irq; entry->mask_base = base; - addr = pci_msix_desc_addr(entry); - if (addr) + if (!entry->msi_attrib.is_virtual) { + addr = pci_msix_desc_addr(entry); entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + } list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); if (masks)
msi_mask() is calculating the possible mask bits for MSI per vector masking. Rename it to msi_multi_mask() and hand the MSI descriptor pointer into it to simplify the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -129,14 +129,6 @@ void __weak arch_restore_msi_irqs(struct return default_restore_msi_irqs(dev); } -static inline __attribute_const__ u32 msi_mask(unsigned x) -{ - /* Don't shift by >= width of type */ - if (x >= 5) - return 0xffffffff; - return (1 << (1 << x)) - 1; -} - /* * PCI 2.3 does not specify mask bits for each MSI interrupt. Attempting to * mask all MSI interrupts by clearing the MSI enable bit does not work @@ -211,6 +203,14 @@ static void msi_set_mask_bit(struct irq_ } } +static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc) +{ + /* Don't shift by >= width of type */ + if (desc->msi_attrib.multi_cap >= 5) + return 0xffffffff; + return (1 << (1 << desc->msi_attrib.multi_cap)) - 1; +} + /** * pci_msi_mask_irq - Generic IRQ chip callback to mask PCI/MSI interrupts * @data: pointer to irqdata associated to that interrupt @@ -419,8 +419,7 @@ static void __pci_restore_msi_state(stru arch_restore_msi_irqs(dev); pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control); - msi_mask_irq(entry, msi_mask(entry->msi_attrib.multi_cap), - entry->msi_mask); + msi_mask_irq(entry, msi_multi_mask(entry), entry->msi_mask); control &= ~PCI_MSI_FLAGS_QSIZE; control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE; pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control); @@ -642,7 +641,7 @@ static int msi_capability_init(struct pc return -ENOMEM; /* All MSIs are unmasked by default; mask them all */ - mask = msi_mask(entry->msi_attrib.multi_cap); + mask = msi_multi_mask(entry); msi_mask_irq(entry, mask, mask); list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); @@ -938,7 +937,6 @@ EXPORT_SYMBOL(pci_msi_vec_count); static void pci_msi_shutdown(struct pci_dev *dev) { struct msi_desc *desc; - u32 mask; if (!pci_msi_enable || !dev || !dev->msi_enabled) return; @@ -951,8 +949,7 @@ static void pci_msi_shutdown(struct pci_ dev->msi_enabled = 0; /* Return the device with MSI unmasked as initial states */ - mask = msi_mask(desc->msi_attrib.multi_cap); - msi_mask_irq(desc, mask, 0); + msi_mask_irq(desc, msi_multi_mask(desc), 0); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
The existing mask/unmask functions are convoluted and generate suboptimal assembly code. Provide a new set of functions which will be used in later patches to replace the exisiting ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -211,6 +211,78 @@ static inline __attribute_const__ u32 ms return (1 << (1 << desc->msi_attrib.multi_cap)) - 1; } +static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set) +{ + raw_spinlock_t *lock = &desc->dev->msi_lock; + unsigned long flags; + + raw_spin_lock_irqsave(lock, flags); + desc->msi_mask &= ~clear; + desc->msi_mask |= set; + pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, + desc->msi_mask); + raw_spin_unlock_irqrestore(lock, flags); +} + +static inline void pci_msi_mask(struct msi_desc *desc, u32 mask) +{ + pci_msi_update_mask(desc, 0, mask); +} + +static inline void pci_msi_unmask(struct msi_desc *desc, u32 mask) +{ + pci_msi_update_mask(desc, mask, 0); +} + +/* + * This internal function does not flush PCI writes to the device. All + * users must ensure that they read from the device before either assuming + * that the device state is up to date, or returning out of this file. + * It does not affect the msi_desc::msix_ctrl cache either. Use with care! + */ +static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl) +{ + void __iomem *desc_addr = pci_msix_desc_addr(desc); + + writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); +} + +static inline void pci_msix_mask(struct msi_desc *desc) +{ + desc->msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; + pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); + /* Flush write to device */ + readl(desc->mask_base); +} + +static inline void pci_msix_unmask(struct msi_desc *desc) +{ + desc->msix_ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; + pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); +} + +static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) +{ + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + + if (desc->msi_attrib.is_msix) + pci_msix_mask(desc); + else if (!desc->msi_attrib.maskbit) + pci_msi_mask(desc, mask); +} + +static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask) +{ + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + + if (desc->msi_attrib.is_msix) + pci_msix_unmask(desc); + else if (!desc->msi_attrib.maskbit) + pci_msi_unmask(desc, mask); +} + /** * pci_msi_mask_irq - Generic IRQ chip callback to mask PCI/MSI interrupts * @data: pointer to irqdata associated to that interrupt
Switch the PCI/MSI core to use the new mask/unmask functions. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: New patch --- drivers/pci/msi.c | 102 +++++++++++------------------------------------------- 1 file changed, 21 insertions(+), 81 deletions(-) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -135,74 +135,6 @@ void __weak arch_restore_msi_irqs(struct * reliably as devices without an INTx disable bit will then generate a * level IRQ which will never be cleared. */ -static void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) -{ - raw_spinlock_t *lock = &desc->dev->msi_lock; - unsigned long flags; - - if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) - return; - - raw_spin_lock_irqsave(lock, flags); - desc->msi_mask &= ~mask; - desc->msi_mask |= flag; - pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, - desc->msi_mask); - raw_spin_unlock_irqrestore(lock, flags); -} - -static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) -{ - __pci_msi_desc_mask_irq(desc, mask, flag); -} - -static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) -{ - return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; -} - -/* - * This internal function does not flush PCI writes to the device. - * All users must ensure that they read from the device before either - * assuming that the device state is up to date, or returning out of this - * file. This saves a few milliseconds when initialising devices with lots - * of MSI-X interrupts. - */ -static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) -{ - void __iomem *desc_addr = pci_msix_desc_addr(desc); - u32 ctrl = desc->msix_ctrl; - - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) - return 0; - - ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; - if (ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) - ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; - - writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); - - return ctrl; -} - -static void msix_mask_irq(struct msi_desc *desc, u32 flag) -{ - desc->msix_ctrl = __pci_msix_desc_mask_irq(desc, flag); -} - -static void msi_set_mask_bit(struct irq_data *data, u32 flag) -{ - struct msi_desc *desc = irq_data_get_msi_desc(data); - - if (desc->msi_attrib.is_msix) { - msix_mask_irq(desc, flag); - readl(desc->mask_base); /* Flush write to device */ - } else { - unsigned offset = data->irq - desc->irq; - msi_mask_irq(desc, 1 << offset, flag << offset); - } -} - static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc) { /* Don't shift by >= width of type */ @@ -234,6 +166,11 @@ static inline void pci_msi_unmask(struct pci_msi_update_mask(desc, mask, 0); } +static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc) +{ + return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; +} + /* * This internal function does not flush PCI writes to the device. All * users must ensure that they read from the device before either assuming @@ -289,7 +226,9 @@ static void __pci_msi_unmask_desc(struct */ void pci_msi_mask_irq(struct irq_data *data) { - msi_set_mask_bit(data, 1); + struct msi_desc *desc = irq_data_get_msi_desc(data); + + __pci_msi_mask_desc(desc, BIT(data->irq - desc->irq)); } EXPORT_SYMBOL_GPL(pci_msi_mask_irq); @@ -299,7 +238,9 @@ EXPORT_SYMBOL_GPL(pci_msi_mask_irq); */ void pci_msi_unmask_irq(struct irq_data *data) { - msi_set_mask_bit(data, 0); + struct msi_desc *desc = irq_data_get_msi_desc(data); + + __pci_msi_unmask_desc(desc, BIT(data->irq - desc->irq)); } EXPORT_SYMBOL_GPL(pci_msi_unmask_irq); @@ -352,7 +293,8 @@ void __pci_write_msi_msg(struct msi_desc /* Don't touch the hardware now */ } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); - bool unmasked = !(entry->msix_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); + u32 ctrl = entry->msix_ctrl; + bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); if (entry->msi_attrib.is_virtual) goto skip; @@ -366,14 +308,14 @@ void __pci_write_msi_msg(struct msi_desc * undefined." */ if (unmasked) - __pci_msix_desc_mask_irq(entry, PCI_MSIX_ENTRY_CTRL_MASKBIT); + pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT); writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); writel(msg->data, base + PCI_MSIX_ENTRY_DATA); if (unmasked) - __pci_msix_desc_mask_irq(entry, 0); + pci_msix_write_vector_ctrl(entry, ctrl); /* Ensure that the writes are visible in the device */ readl(base + PCI_MSIX_ENTRY_DATA); @@ -491,7 +433,7 @@ static void __pci_restore_msi_state(stru arch_restore_msi_irqs(dev); pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control); - msi_mask_irq(entry, msi_multi_mask(entry), entry->msi_mask); + pci_msi_update_mask(entry, 0, 0); control &= ~PCI_MSI_FLAGS_QSIZE; control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE; pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control); @@ -522,7 +464,7 @@ static void __pci_restore_msix_state(str arch_restore_msi_irqs(dev); for_each_pci_msi_entry(entry, dev) - msix_mask_irq(entry, entry->msix_ctrl); + pci_msix_write_vector_ctrl(entry, entry->msix_ctrl); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); } @@ -704,7 +646,6 @@ static int msi_capability_init(struct pc { struct msi_desc *entry; int ret; - unsigned mask; pci_msi_set_enable(dev, 0); /* Disable MSI during set up */ @@ -713,8 +654,7 @@ static int msi_capability_init(struct pc return -ENOMEM; /* All MSIs are unmasked by default; mask them all */ - mask = msi_multi_mask(entry); - msi_mask_irq(entry, mask, mask); + pci_msi_mask(entry, msi_multi_mask(entry)); list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); @@ -741,7 +681,7 @@ static int msi_capability_init(struct pc return 0; err: - msi_mask_irq(entry, mask, 0); + pci_msi_unmask(entry, msi_multi_mask(entry)); free_msi_irqs(dev); return ret; } @@ -1021,7 +961,7 @@ static void pci_msi_shutdown(struct pci_ dev->msi_enabled = 0; /* Return the device with MSI unmasked as initial states */ - msi_mask_irq(desc, msi_multi_mask(desc), 0); + pci_msi_unmask(desc, msi_multi_mask(desc)); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq; @@ -1107,7 +1047,7 @@ static void pci_msix_shutdown(struct pci /* Return the device with MSI-X masked as initial states */ for_each_pci_msi_entry(entry, dev) - __pci_msix_desc_mask_irq(entry, 1); + pci_msix_mask(entry); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); pci_intx_for_msi(dev, 1);
On Thu, 2021-07-29 at 23:51 +0200, Thomas Gleixner wrote:
> The PCI core already ensures that the MSI[-X] state is correct when MSI[-X]
> is disabled. For MSI the reset state is all entries unmasked and for MSI-X
> all vectors are masked.
>
> S390 masks all MSI entries and masks the already masked MSI-X entries
> again. Remove it and let the device in the correct state.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-s390@vger.kernel.org
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
> arch/s390/pci/pci_irq.c | 4 ----
> drivers/pci/msi.c | 4 ++--
> include/linux/msi.h | 2 --
> 3 files changed, 2 insertions(+), 8 deletions(-)
>
> --- a/arch/s390/pci/pci_irq.c
> +++ b/arch/s390/pci/pci_irq.c
> @@ -365,10 +365,6 @@ void arch_teardown_msi_irqs(struct pci_d
> for_each_pci_msi_entry(msi, pdev) {
> if (!msi->irq)
> continue;
> - if (msi->msi_attrib.is_msix)
> - __pci_msix_desc_mask_irq(msi, 1);
> - else
> - __pci_msi_desc_mask_irq(msi, 1, 1);
> irq_set_msi_desc(msi->irq, NULL);
> irq_free_desc(msi->irq);
> msi->msg.address_lo = 0;
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -143,7 +143,7 @@ static inline __attribute_const__ u32 ms
> * reliably as devices without an INTx disable bit will then generate a
> * level IRQ which will never be cleared.
> */
> -void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
> +static void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
> {
> raw_spinlock_t *lock = &desc->dev->msi_lock;
> unsigned long flags;
> @@ -180,7 +180,7 @@ static void __iomem *pci_msix_desc_addr(
> * file. This saves a few milliseconds when initialising devices with lots
> * of MSI-X interrupts.
> */
> -u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
> +static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
> {
> u32 mask_bits = desc->masked;
> void __iomem *desc_addr;
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -232,8 +232,6 @@ void free_msi_entry(struct msi_desc *ent
> void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
> void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
>
> -u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag);
> -void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag);
> void pci_msi_mask_irq(struct irq_data *data);
> void pci_msi_unmask_irq(struct irq_data *data);
>
>
I gave this patch a try, adapted for v5.14-rc4 where
__pci_msi_desc_msg() returns a u32, and all looks good. I tested with
our pretty quirky ISM devices too, which are the only MSI ones on
s390x.
It also makes sense to me to let the common code handle this so feel
free to add my:
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
and/or
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Thanks.
On Fri, Aug 06 2021 at 15:39, Marc Zyngier wrote:
> On Thu, 29 Jul 2021 22:51:57 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> +static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
>> +{
>> + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
>> + return;
>> +
>> + if (desc->msi_attrib.is_msix)
>> + pci_msix_mask(desc);
>> + else if (!desc->msi_attrib.maskbit)
>
> This negation is preventing one of my boxes from working correctly (no
> idea why the i350 driver refuses to use MSI-X and sticks to a single
> MSI, but hey, that's another story), as the device supports MSI
> masking, and we definitely don't try to mask/unmask in this case...
>
> Dropping the '!' here and on the unmask path fixes it for me.
Duh. I'm a moron. Of course this needs to check maskbit if it wants to
mask. Sigh.
The existing mask/unmask functions are convoluted and generate suboptimal assembly code. Provide a new set of functions which will be used in later patches to replace the exisiting ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210729222543.257079238@linutronix.de --- V3: Check maskbit when masking (Marc) V2: New patch --- drivers/pci/msi.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -211,6 +211,78 @@ static inline __attribute_const__ u32 ms return (1 << (1 << desc->msi_attrib.multi_cap)) - 1; } +static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set) +{ + raw_spinlock_t *lock = &desc->dev->msi_lock; + unsigned long flags; + + raw_spin_lock_irqsave(lock, flags); + desc->msi_mask &= ~clear; + desc->msi_mask |= set; + pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, + desc->msi_mask); + raw_spin_unlock_irqrestore(lock, flags); +} + +static inline void pci_msi_mask(struct msi_desc *desc, u32 mask) +{ + pci_msi_update_mask(desc, 0, mask); +} + +static inline void pci_msi_unmask(struct msi_desc *desc, u32 mask) +{ + pci_msi_update_mask(desc, mask, 0); +} + +/* + * This internal function does not flush PCI writes to the device. All + * users must ensure that they read from the device before either assuming + * that the device state is up to date, or returning out of this file. + * It does not affect the msi_desc::msix_ctrl cache either. Use with care! + */ +static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl) +{ + void __iomem *desc_addr = pci_msix_desc_addr(desc); + + writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); +} + +static inline void pci_msix_mask(struct msi_desc *desc) +{ + desc->msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; + pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); + /* Flush write to device */ + readl(desc->mask_base); +} + +static inline void pci_msix_unmask(struct msi_desc *desc) +{ + desc->msix_ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; + pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); +} + +static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) +{ + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + + if (desc->msi_attrib.is_msix) + pci_msix_mask(desc); + else if (desc->msi_attrib.maskbit) + pci_msi_mask(desc, mask); +} + +static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask) +{ + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + + if (desc->msi_attrib.is_msix) + pci_msix_unmask(desc); + else if (desc->msi_attrib.maskbit) + pci_msi_unmask(desc, mask); +} + /** * pci_msi_mask_irq - Generic IRQ chip callback to mask PCI/MSI interrupts * @data: pointer to irqdata associated to that interrupt
On 2021-07-29 22:51, Thomas Gleixner wrote:
> A recent discussion about the PCI/MSI management for virtio unearthed a
> violation of the MSI-X specification vs. writing the MSI-X message:
> under
> certain circumstances the entry is written without being masked.
>
> While looking at that and the related violation of the x86 non-remapped
> interrupt affinity mechanism a few other issues were discovered by
> inspection.
>
> The following series addresses these.
>
> Note this does not fix the virtio issue, but while staring at the above
> problems I came up with a plan to address this. I'm still trying to
> convince myself that I can get away without sprinkling locking all over
> the
> place, so don't hold your breath that this will materialize tomorrow.
For the patches I haven't acked yet, and with the fix on patch 18:
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Thanks,
M.
The following commit has been merged into the irq/core branch of tip: Commit-ID: 446a98b19fd6da97a1fb148abb1766ad89c9b767 Gitweb: https://git.kernel.org/tip/446a98b19fd6da97a1fb148abb1766ad89c9b767 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:58 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:30 +02:00 PCI/MSI: Use new mask/unmask functions Switch the PCI/MSI core to use the new mask/unmask functions. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210729222543.311207034@linutronix.de --- drivers/pci/msi.c | 102 +++++++++------------------------------------ 1 file changed, 21 insertions(+), 81 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 26dd91f..ce841f3 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -135,74 +135,6 @@ void __weak arch_restore_msi_irqs(struct pci_dev *dev) * reliably as devices without an INTx disable bit will then generate a * level IRQ which will never be cleared. */ -static void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) -{ - raw_spinlock_t *lock = &desc->dev->msi_lock; - unsigned long flags; - - if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) - return; - - raw_spin_lock_irqsave(lock, flags); - desc->msi_mask &= ~mask; - desc->msi_mask |= flag; - pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, - desc->msi_mask); - raw_spin_unlock_irqrestore(lock, flags); -} - -static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) -{ - __pci_msi_desc_mask_irq(desc, mask, flag); -} - -static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) -{ - return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; -} - -/* - * This internal function does not flush PCI writes to the device. - * All users must ensure that they read from the device before either - * assuming that the device state is up to date, or returning out of this - * file. This saves a few milliseconds when initialising devices with lots - * of MSI-X interrupts. - */ -static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) -{ - void __iomem *desc_addr = pci_msix_desc_addr(desc); - u32 ctrl = desc->msix_ctrl; - - if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) - return 0; - - ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; - if (ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) - ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; - - writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); - - return ctrl; -} - -static void msix_mask_irq(struct msi_desc *desc, u32 flag) -{ - desc->msix_ctrl = __pci_msix_desc_mask_irq(desc, flag); -} - -static void msi_set_mask_bit(struct irq_data *data, u32 flag) -{ - struct msi_desc *desc = irq_data_get_msi_desc(data); - - if (desc->msi_attrib.is_msix) { - msix_mask_irq(desc, flag); - readl(desc->mask_base); /* Flush write to device */ - } else { - unsigned offset = data->irq - desc->irq; - msi_mask_irq(desc, 1 << offset, flag << offset); - } -} - static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc) { /* Don't shift by >= width of type */ @@ -234,6 +166,11 @@ static inline void pci_msi_unmask(struct msi_desc *desc, u32 mask) pci_msi_update_mask(desc, mask, 0); } +static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc) +{ + return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; +} + /* * This internal function does not flush PCI writes to the device. All * users must ensure that they read from the device before either assuming @@ -289,7 +226,9 @@ static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask) */ void pci_msi_mask_irq(struct irq_data *data) { - msi_set_mask_bit(data, 1); + struct msi_desc *desc = irq_data_get_msi_desc(data); + + __pci_msi_mask_desc(desc, BIT(data->irq - desc->irq)); } EXPORT_SYMBOL_GPL(pci_msi_mask_irq); @@ -299,7 +238,9 @@ EXPORT_SYMBOL_GPL(pci_msi_mask_irq); */ void pci_msi_unmask_irq(struct irq_data *data) { - msi_set_mask_bit(data, 0); + struct msi_desc *desc = irq_data_get_msi_desc(data); + + __pci_msi_unmask_desc(desc, BIT(data->irq - desc->irq)); } EXPORT_SYMBOL_GPL(pci_msi_unmask_irq); @@ -352,7 +293,8 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) /* Don't touch the hardware now */ } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); - bool unmasked = !(entry->msix_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); + u32 ctrl = entry->msix_ctrl; + bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); if (entry->msi_attrib.is_virtual) goto skip; @@ -366,14 +308,14 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) * undefined." */ if (unmasked) - __pci_msix_desc_mask_irq(entry, PCI_MSIX_ENTRY_CTRL_MASKBIT); + pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT); writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); writel(msg->data, base + PCI_MSIX_ENTRY_DATA); if (unmasked) - __pci_msix_desc_mask_irq(entry, 0); + pci_msix_write_vector_ctrl(entry, ctrl); /* Ensure that the writes are visible in the device */ readl(base + PCI_MSIX_ENTRY_DATA); @@ -491,7 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev) arch_restore_msi_irqs(dev); pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control); - msi_mask_irq(entry, msi_multi_mask(entry), entry->msi_mask); + pci_msi_update_mask(entry, 0, 0); control &= ~PCI_MSI_FLAGS_QSIZE; control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE; pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control); @@ -522,7 +464,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev) arch_restore_msi_irqs(dev); for_each_pci_msi_entry(entry, dev) - msix_mask_irq(entry, entry->msix_ctrl); + pci_msix_write_vector_ctrl(entry, entry->msix_ctrl); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); } @@ -704,7 +646,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, { struct msi_desc *entry; int ret; - unsigned mask; pci_msi_set_enable(dev, 0); /* Disable MSI during set up */ @@ -713,8 +654,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, return -ENOMEM; /* All MSIs are unmasked by default; mask them all */ - mask = msi_multi_mask(entry); - msi_mask_irq(entry, mask, mask); + pci_msi_mask(entry, msi_multi_mask(entry)); list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); @@ -741,7 +681,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, return 0; err: - msi_mask_irq(entry, mask, 0); + pci_msi_unmask(entry, msi_multi_mask(entry)); free_msi_irqs(dev); return ret; } @@ -1021,7 +961,7 @@ static void pci_msi_shutdown(struct pci_dev *dev) dev->msi_enabled = 0; /* Return the device with MSI unmasked as initial states */ - msi_mask_irq(desc, msi_multi_mask(desc), 0); + pci_msi_unmask(desc, msi_multi_mask(desc)); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq; @@ -1107,7 +1047,7 @@ static void pci_msix_shutdown(struct pci_dev *dev) /* Return the device with MSI-X masked as initial states */ for_each_pci_msi_entry(entry, dev) - __pci_msix_desc_mask_irq(entry, 1); + pci_msix_mask(entry); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); pci_intx_for_msi(dev, 1);
The following commit has been merged into the irq/core branch of tip: Commit-ID: 7327cefebb85d440fa6a589fdf53979d55b29a5a Gitweb: https://git.kernel.org/tip/7327cefebb85d440fa6a589fdf53979d55b29a5a Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:56 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:30 +02:00 PCI/MSI: Cleanup msi_mask() msi_mask() is calculating the possible mask bits for MSI per vector masking. Rename it to msi_multi_mask() and hand the MSI descriptor pointer into it to simplify the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210729222543.203905260@linutronix.de --- drivers/pci/msi.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 1bde9ec..2eab07b 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -129,14 +129,6 @@ void __weak arch_restore_msi_irqs(struct pci_dev *dev) return default_restore_msi_irqs(dev); } -static inline __attribute_const__ u32 msi_mask(unsigned x) -{ - /* Don't shift by >= width of type */ - if (x >= 5) - return 0xffffffff; - return (1 << (1 << x)) - 1; -} - /* * PCI 2.3 does not specify mask bits for each MSI interrupt. Attempting to * mask all MSI interrupts by clearing the MSI enable bit does not work @@ -211,6 +203,14 @@ static void msi_set_mask_bit(struct irq_data *data, u32 flag) } } +static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc) +{ + /* Don't shift by >= width of type */ + if (desc->msi_attrib.multi_cap >= 5) + return 0xffffffff; + return (1 << (1 << desc->msi_attrib.multi_cap)) - 1; +} + /** * pci_msi_mask_irq - Generic IRQ chip callback to mask PCI/MSI interrupts * @data: pointer to irqdata associated to that interrupt @@ -419,8 +419,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev) arch_restore_msi_irqs(dev); pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control); - msi_mask_irq(entry, msi_mask(entry->msi_attrib.multi_cap), - entry->msi_mask); + msi_mask_irq(entry, msi_multi_mask(entry), entry->msi_mask); control &= ~PCI_MSI_FLAGS_QSIZE; control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE; pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control); @@ -642,7 +641,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, return -ENOMEM; /* All MSIs are unmasked by default; mask them all */ - mask = msi_mask(entry->msi_attrib.multi_cap); + mask = msi_multi_mask(entry); msi_mask_irq(entry, mask, mask); list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); @@ -938,7 +937,6 @@ EXPORT_SYMBOL(pci_msi_vec_count); static void pci_msi_shutdown(struct pci_dev *dev) { struct msi_desc *desc; - u32 mask; if (!pci_msi_enable || !dev || !dev->msi_enabled) return; @@ -951,8 +949,7 @@ static void pci_msi_shutdown(struct pci_dev *dev) dev->msi_enabled = 0; /* Return the device with MSI unmasked as initial states */ - mask = msi_mask(desc->msi_attrib.multi_cap); - msi_mask_irq(desc, mask, 0); + msi_mask_irq(desc, msi_multi_mask(desc), 0); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
The following commit has been merged into the irq/core branch of tip: Commit-ID: fcacdfbef5a1633211ebfac1b669a7739f5b553e Gitweb: https://git.kernel.org/tip/fcacdfbef5a1633211ebfac1b669a7739f5b553e Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Mon, 09 Aug 2021 21:08:56 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:30 +02:00 PCI/MSI: Provide a new set of mask and unmask functions The existing mask/unmask functions are convoluted and generate suboptimal assembly code. Provide a new set of functions which will be used in later patches to replace the exisiting ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/875ywetozb.ffs@tglx --- drivers/pci/msi.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 2eab07b..26dd91f 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -211,6 +211,78 @@ static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc) return (1 << (1 << desc->msi_attrib.multi_cap)) - 1; } +static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 set) +{ + raw_spinlock_t *lock = &desc->dev->msi_lock; + unsigned long flags; + + raw_spin_lock_irqsave(lock, flags); + desc->msi_mask &= ~clear; + desc->msi_mask |= set; + pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, + desc->msi_mask); + raw_spin_unlock_irqrestore(lock, flags); +} + +static inline void pci_msi_mask(struct msi_desc *desc, u32 mask) +{ + pci_msi_update_mask(desc, 0, mask); +} + +static inline void pci_msi_unmask(struct msi_desc *desc, u32 mask) +{ + pci_msi_update_mask(desc, mask, 0); +} + +/* + * This internal function does not flush PCI writes to the device. All + * users must ensure that they read from the device before either assuming + * that the device state is up to date, or returning out of this file. + * It does not affect the msi_desc::msix_ctrl cache either. Use with care! + */ +static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl) +{ + void __iomem *desc_addr = pci_msix_desc_addr(desc); + + writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); +} + +static inline void pci_msix_mask(struct msi_desc *desc) +{ + desc->msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; + pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); + /* Flush write to device */ + readl(desc->mask_base); +} + +static inline void pci_msix_unmask(struct msi_desc *desc) +{ + desc->msix_ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; + pci_msix_write_vector_ctrl(desc, desc->msix_ctrl); +} + +static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask) +{ + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + + if (desc->msi_attrib.is_msix) + pci_msix_mask(desc); + else if (desc->msi_attrib.maskbit) + pci_msi_mask(desc, mask); +} + +static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask) +{ + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) + return; + + if (desc->msi_attrib.is_msix) + pci_msix_unmask(desc); + else if (desc->msi_attrib.maskbit) + pci_msi_unmask(desc, mask); +} + /** * pci_msi_mask_irq - Generic IRQ chip callback to mask PCI/MSI interrupts * @data: pointer to irqdata associated to that interrupt
The following commit has been merged into the irq/core branch of tip: Commit-ID: b296ababcc4bbf8efbb603d3aec6024a78662c1b Gitweb: https://git.kernel.org/tip/b296ababcc4bbf8efbb603d3aec6024a78662c1b Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:55 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:30 +02:00 PCI/MSI: Deobfuscate virtual MSI-X Handling of virtual MSI-X is obfuscated by letting pci_msix_desc_addr() return NULL and checking the pointer. Just use msi_desc::msi_attrib.is_virtual at the call sites and get rid of that pointer check. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210729222543.151522318@linutronix.de --- drivers/pci/msi.c | 25 ++++++++----------------- 1 file changed, 8 insertions(+), 17 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index d2821bb..1bde9ec 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -166,11 +166,7 @@ static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) { - if (desc->msi_attrib.is_virtual) - return NULL; - - return desc->mask_base + - desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; + return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; } /* @@ -182,14 +178,10 @@ static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) */ static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) { + void __iomem *desc_addr = pci_msix_desc_addr(desc); u32 ctrl = desc->msix_ctrl; - void __iomem *desc_addr; - - if (pci_msi_ignore_mask) - return 0; - desc_addr = pci_msix_desc_addr(desc); - if (!desc_addr) + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual) return 0; ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; @@ -256,10 +248,8 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); - if (!base) { - WARN_ON(1); + if (WARN_ON_ONCE(entry->msi_attrib.is_virtual)) return; - } msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR); msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR); @@ -292,7 +282,7 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) void __iomem *base = pci_msix_desc_addr(entry); bool unmasked = !(entry->msix_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); - if (!base) + if (entry->msi_attrib.is_virtual) goto skip; /* @@ -744,9 +734,10 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base, entry->msi_attrib.default_irq = dev->irq; entry->mask_base = base; - addr = pci_msix_desc_addr(entry); - if (addr) + if (!entry->msi_attrib.is_virtual) { + addr = pci_msix_desc_addr(entry); entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + } list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); if (masks)
The following commit has been merged into the irq/core branch of tip: Commit-ID: 8eb5ce3f78a5e5d3f1a12248f6b7dc64ebf71da6 Gitweb: https://git.kernel.org/tip/8eb5ce3f78a5e5d3f1a12248f6b7dc64ebf71da6 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:54 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:29 +02:00 PCI/MSI: Consolidate error handling in msi_capability_init() Three error exits doing exactly the same ask for a common error exit point. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210729222543.098828720@linutronix.de --- drivers/pci/msi.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 175f4d6..d2821bb 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -659,25 +659,16 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, /* Configure MSI capability structure */ ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI); - if (ret) { - msi_mask_irq(entry, mask, 0); - free_msi_irqs(dev); - return ret; - } + if (ret) + goto err; ret = msi_verify_entries(dev); - if (ret) { - msi_mask_irq(entry, mask, 0); - free_msi_irqs(dev); - return ret; - } + if (ret) + goto err; ret = populate_msi_sysfs(dev); - if (ret) { - msi_mask_irq(entry, mask, 0); - free_msi_irqs(dev); - return ret; - } + if (ret) + goto err; /* Set MSI enabled bits */ pci_intx_for_msi(dev, 0); @@ -687,6 +678,11 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, pcibios_free_irq(dev); dev->irq = entry->irq; return 0; + +err: + msi_mask_irq(entry, mask, 0); + free_msi_irqs(dev); + return ret; } static void __iomem *msix_map_region(struct pci_dev *dev, unsigned nr_entries)
The following commit has been merged into the irq/core branch of tip: Commit-ID: 67961e77a39b8e975dd1906179b9224f29150357 Gitweb: https://git.kernel.org/tip/67961e77a39b8e975dd1906179b9224f29150357 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:53 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:29 +02:00 PCI/MSI: Rename msi_desc::masked msi_desc::masked is a misnomer. For MSI it's used to cache the MSI mask bits when the device supports per vector masking. For MSI-X it's used to cache the content of the vector control word which contains the mask bit for the vector. Replace it with a union of msi_mask and msix_ctrl to make the purpose clear and fix up the usage sites. No functional change Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210729222543.045993608@linutronix.de --- drivers/pci/msi.c | 30 +++++++++++++++--------------- include/linux/msi.h | 8 ++++++-- 2 files changed, 21 insertions(+), 17 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index b59957c..175f4d6 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -152,10 +152,10 @@ static void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) return; raw_spin_lock_irqsave(lock, flags); - desc->masked &= ~mask; - desc->masked |= flag; + desc->msi_mask &= ~mask; + desc->msi_mask |= flag; pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, - desc->masked); + desc->msi_mask); raw_spin_unlock_irqrestore(lock, flags); } @@ -182,7 +182,7 @@ static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) */ static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) { - u32 mask_bits = desc->masked; + u32 ctrl = desc->msix_ctrl; void __iomem *desc_addr; if (pci_msi_ignore_mask) @@ -192,18 +192,18 @@ static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) if (!desc_addr) return 0; - mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; - if (flag & PCI_MSIX_ENTRY_CTRL_MASKBIT) - mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; + ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; + if (ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) + ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT; - writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); - return mask_bits; + return ctrl; } static void msix_mask_irq(struct msi_desc *desc, u32 flag) { - desc->masked = __pci_msix_desc_mask_irq(desc, flag); + desc->msix_ctrl = __pci_msix_desc_mask_irq(desc, flag); } static void msi_set_mask_bit(struct irq_data *data, u32 flag) @@ -290,7 +290,7 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) /* Don't touch the hardware now */ } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); - bool unmasked = !(entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT); + bool unmasked = !(entry->msix_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT); if (!base) goto skip; @@ -430,7 +430,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev) pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control); msi_mask_irq(entry, msi_mask(entry->msi_attrib.multi_cap), - entry->masked); + entry->msi_mask); control &= ~PCI_MSI_FLAGS_QSIZE; control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE; pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control); @@ -461,7 +461,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev) arch_restore_msi_irqs(dev); for_each_pci_msi_entry(entry, dev) - msix_mask_irq(entry, entry->masked); + msix_mask_irq(entry, entry->msix_ctrl); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); } @@ -602,7 +602,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd) /* Save the initial mask status */ if (entry->msi_attrib.maskbit) - pci_read_config_dword(dev, entry->mask_pos, &entry->masked); + pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask); out: kfree(masks); @@ -750,7 +750,7 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base, addr = pci_msix_desc_addr(entry); if (addr) - entry->masked = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); if (masks) diff --git a/include/linux/msi.h b/include/linux/msi.h index 3d0e747..a20dc66 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -107,7 +107,8 @@ struct ti_sci_inta_msi_desc { * address or data changes * @write_msi_msg_data: Data parameter for the callback. * - * @masked: [PCI MSI/X] Mask bits + * @msi_mask: [PCI MSI] MSI cached mask bits + * @msix_ctrl: [PCI MSI-X] MSI-X cached per vector control bits * @is_msix: [PCI MSI/X] True if MSI-X * @multiple: [PCI MSI/X] log2 num of messages allocated * @multi_cap: [PCI MSI/X] log2 num of messages supported @@ -139,7 +140,10 @@ struct msi_desc { union { /* PCI MSI/X specific data */ struct { - u32 masked; + union { + u32 msi_mask; + u32 msix_ctrl; + }; struct { u8 is_msix : 1; u8 multiple : 3;
The following commit has been merged into the irq/core branch of tip: Commit-ID: a6e8b946508cda3c3bf0f9b0e133d293dc9754f6 Gitweb: https://git.kernel.org/tip/a6e8b946508cda3c3bf0f9b0e133d293dc9754f6 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:52 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:29 +02:00 PCI/MSI: Simplify msi_verify_entries() No point in looping over all entries when 64bit addressing mode is enabled for nothing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20210729222542.992849326@linutronix.de --- drivers/pci/msi.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 95e6ce4..b59957c 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -613,8 +613,11 @@ static int msi_verify_entries(struct pci_dev *dev) { struct msi_desc *entry; + if (!dev->no_64bit_msi) + return 0; + for_each_pci_msi_entry(entry, dev) { - if (entry->msg.address_hi && dev->no_64bit_msi) { + if (entry->msg.address_hi) { pci_err(dev, "arch assigned 64-bit MSI address %#x%08x but device only supports 32 bits\n", entry->msg.address_hi, entry->msg.address_lo); return -EIO;
The following commit has been merged into the irq/core branch of tip: Commit-ID: 3998527d2e3ee2bfdf710a45b7b90968ff87babc Gitweb: https://git.kernel.org/tip/3998527d2e3ee2bfdf710a45b7b90968ff87babc Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:51 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 11:03:29 +02:00 s390/pci: Do not mask MSI[-X] entries on teardown The PCI core already ensures that the MSI[-X] state is correct when MSI[-X] is disabled. For MSI the reset state is all entries unmasked and for MSI-X all vectors are masked. S390 masks all MSI entries and masks the already masked MSI-X entries again. Remove it and let the device in the correct state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20210729222542.939798136@linutronix.de --- arch/s390/pci/pci_irq.c | 4 ---- drivers/pci/msi.c | 4 ++-- include/linux/msi.h | 2 -- 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c index 9c7de90..3823e15 100644 --- a/arch/s390/pci/pci_irq.c +++ b/arch/s390/pci/pci_irq.c @@ -365,10 +365,6 @@ void arch_teardown_msi_irqs(struct pci_dev *pdev) for_each_pci_msi_entry(msi, pdev) { if (!msi->irq) continue; - if (msi->msi_attrib.is_msix) - __pci_msix_desc_mask_irq(msi, 1); - else - __pci_msi_desc_mask_irq(msi, 1, 1); irq_set_msi_desc(msi->irq, NULL); irq_free_desc(msi->irq); msi->msg.address_lo = 0; diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index e5e7533..95e6ce4 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -143,7 +143,7 @@ static inline __attribute_const__ u32 msi_mask(unsigned x) * reliably as devices without an INTx disable bit will then generate a * level IRQ which will never be cleared. */ -void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) +static void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { raw_spinlock_t *lock = &desc->dev->msi_lock; unsigned long flags; @@ -180,7 +180,7 @@ static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) * file. This saves a few milliseconds when initialising devices with lots * of MSI-X interrupts. */ -u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) +static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) { u32 mask_bits = desc->masked; void __iomem *desc_addr; diff --git a/include/linux/msi.h b/include/linux/msi.h index e8bdcb8..3d0e747 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -232,8 +232,6 @@ void free_msi_entry(struct msi_desc *entry); void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg); void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg); -u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag); -void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); void pci_msi_mask_irq(struct irq_data *data); void pci_msi_unmask_irq(struct irq_data *data);
The following commit has been merged into the irq/core branch of tip: Commit-ID: 0c0e37dc11671384e53ba6ede53a4d91162a2cc5 Gitweb: https://git.kernel.org/tip/0c0e37dc11671384e53ba6ede53a4d91162a2cc5 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:49 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:21 +02:00 x86/ioapic: Force affinity setup before startup The IO/APIC cannot handle interrupt affinity changes safely after startup other than from an interrupt handler. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de --- arch/x86/kernel/apic/io_apic.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index d5c691a..39224e0 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1986,7 +1986,8 @@ static struct irq_chip ioapic_chip __read_mostly = { .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_get_irqchip_state = ioapic_irq_get_chip_state, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static struct irq_chip ioapic_ir_chip __read_mostly = { @@ -1999,7 +2000,8 @@ static struct irq_chip ioapic_ir_chip __read_mostly = { .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_get_irqchip_state = ioapic_irq_get_chip_state, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static inline void init_IO_APIC_traps(void)
The following commit has been merged into the irq/core branch of tip: Commit-ID: ff363f480e5997051dd1de949121ffda3b753741 Gitweb: https://git.kernel.org/tip/ff363f480e5997051dd1de949121ffda3b753741 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:50 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:21 +02:00 x86/msi: Force affinity setup before startup The X86 MSI mechanism cannot handle interrupt affinity changes safely after startup other than from an interrupt handler, unless interrupt remapping is enabled. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. While the interrupt remapping MSI chip does not require this, there is no point in treating it differently as this might spare an interrupt to a CPU which is not in the default affinity mask. For the non-remapping case go to the direct write path when the interrupt is not yet started similar to the not yet activated case. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de --- arch/x86/kernel/apic/msi.c | 11 ++++++++--- arch/x86/kernel/hpet.c | 2 +- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 44ebe25..dbacb9e 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -58,11 +58,13 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force) * The quirk bit is not set in this case. * - The new vector is the same as the old vector * - The old vector is MANAGED_IRQ_SHUTDOWN_VECTOR (interrupt starts up) + * - The interrupt is not yet started up * - The new destination CPU is the same as the old destination CPU */ if (!irqd_msi_nomask_quirk(irqd) || cfg->vector == old_cfg.vector || old_cfg.vector == MANAGED_IRQ_SHUTDOWN_VECTOR || + !irqd_is_started(irqd) || cfg->dest_apicid == old_cfg.dest_apicid) { irq_msi_update_msg(irqd, cfg); return ret; @@ -150,7 +152,8 @@ static struct irq_chip pci_msi_controller = { .irq_ack = irq_chip_ack_parent, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_set_affinity = msi_set_affinity, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, @@ -219,7 +222,8 @@ static struct irq_chip pci_msi_ir_controller = { .irq_mask = pci_msi_mask_irq, .irq_ack = irq_chip_ack_parent, .irq_retrigger = irq_chip_retrigger_hierarchy, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static struct msi_domain_info pci_msi_ir_domain_info = { @@ -273,7 +277,8 @@ static struct irq_chip dmar_msi_controller = { .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_compose_msi_msg = dmar_msi_compose_msg, .irq_write_msi_msg = dmar_msi_write_msg, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, }; static int dmar_msi_init(struct irq_domain *domain, diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 08651a4..42fc41d 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -508,7 +508,7 @@ static struct irq_chip hpet_msi_controller __ro_after_init = { .irq_set_affinity = msi_domain_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_write_msi_msg = hpet_msi_write_msg, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_AFFINITY_PRE_STARTUP, }; static int hpet_msi_init(struct irq_domain *domain,
The following commit has been merged into the irq/core branch of tip: Commit-ID: 77e89afc25f30abd56e76a809ee2884d7c1b63ce Gitweb: https://git.kernel.org/tip/77e89afc25f30abd56e76a809ee2884d7c1b63ce Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:47 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Protect msi_desc::masked for multi-MSI Multi-MSI uses a single MSI descriptor and there is a single mask register when the device supports per vector masking. To avoid reading back the mask register the value is cached in the MSI descriptor and updates are done by clearing and setting bits in the cache and writing it to the device. But nothing protects msi_desc::masked and the mask register from being modified concurrently on two different CPUs for two different Linux interrupts which belong to the same multi-MSI descriptor. Add a lock to struct device and protect any operation on the mask and the mask register with it. This makes the update of msi_desc::masked unconditional, but there is no place which requires a modification of the hardware register without updating the masked cache. msi_mask_irq() is now an empty wrapper which will be cleaned up in follow up changes. The problem goes way back to the initial support of multi-MSI, but picking the commit which introduced the mask cache is a valid cut off point (2.6.30). Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de --- drivers/base/core.c | 1 + drivers/pci/msi.c | 19 ++++++++++--------- include/linux/device.h | 1 + include/linux/msi.h | 2 +- 4 files changed, 13 insertions(+), 10 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index f636049..6c0ef9d 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2837,6 +2837,7 @@ void device_initialize(struct device *dev) device_pm_init(dev); set_dev_node(dev, -1); #ifdef CONFIG_GENERIC_MSI_IRQ + raw_spin_lock_init(&dev->msi_lock); INIT_LIST_HEAD(&dev->msi_list); #endif INIT_LIST_HEAD(&dev->links.consumers); diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index f0f7026..e5e7533 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -143,24 +143,25 @@ static inline __attribute_const__ u32 msi_mask(unsigned x) * reliably as devices without an INTx disable bit will then generate a * level IRQ which will never be cleared. */ -u32 __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) +void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { - u32 mask_bits = desc->masked; + raw_spinlock_t *lock = &desc->dev->msi_lock; + unsigned long flags; if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) - return 0; + return; - mask_bits &= ~mask; - mask_bits |= flag; + raw_spin_lock_irqsave(lock, flags); + desc->masked &= ~mask; + desc->masked |= flag; pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, - mask_bits); - - return mask_bits; + desc->masked); + raw_spin_unlock_irqrestore(lock, flags); } static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { - desc->masked = __pci_msi_desc_mask_irq(desc, mask, flag); + __pci_msi_desc_mask_irq(desc, mask, flag); } static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) diff --git a/include/linux/device.h b/include/linux/device.h index 59940f1..e53aa50 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -506,6 +506,7 @@ struct device { struct dev_pin_info *pins; #endif #ifdef CONFIG_GENERIC_MSI_IRQ + raw_spinlock_t msi_lock; struct list_head msi_list; #endif #ifdef CONFIG_DMA_OPS diff --git a/include/linux/msi.h b/include/linux/msi.h index 6aff469..e8bdcb8 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -233,7 +233,7 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg); void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg); u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag); -u32 __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); +void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); void pci_msi_mask_irq(struct irq_data *data); void pci_msi_unmask_irq(struct irq_data *data);
The following commit has been merged into the irq/core branch of tip: Commit-ID: 826da771291fc25a428e871f9e7fb465e390f852 Gitweb: https://git.kernel.org/tip/826da771291fc25a428e871f9e7fb465e390f852 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:48 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP X86 IO/APIC and MSI interrupts (when used without interrupts remapping) require that the affinity setup on startup is done before the interrupt is enabled for the first time as the non-remapped operation mode cannot safely migrate enabled interrupts from arbitrary contexts. Provide a new irq chip flag which allows affected hardware to request this. This has to be opt-in because there have been reports in the past that some interrupt chips cannot handle affinity setting before startup. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de --- include/linux/irq.h | 2 ++ kernel/irq/chip.c | 5 ++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 8e9a9ae..c8293c8 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -569,6 +569,7 @@ struct irq_chip { * IRQCHIP_SUPPORTS_NMI: Chip can deliver NMIs, only for root irqchips * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND: Invokes __enable_irq()/__disable_irq() for wake irqs * in the suspend path if they are in disabled state + * IRQCHIP_AFFINITY_PRE_STARTUP: Default affinity update before startup */ enum { IRQCHIP_SET_TYPE_MASKED = (1 << 0), @@ -581,6 +582,7 @@ enum { IRQCHIP_SUPPORTS_LEVEL_MSI = (1 << 7), IRQCHIP_SUPPORTS_NMI = (1 << 8), IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND = (1 << 9), + IRQCHIP_AFFINITY_PRE_STARTUP = (1 << 10), }; #include <linux/irqdesc.h> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 7f04c7d..a98bcfc 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -265,8 +265,11 @@ int irq_startup(struct irq_desc *desc, bool resend, bool force) } else { switch (__irq_startup_managed(desc, aff, force)) { case IRQ_STARTUP_NORMAL: + if (d->chip->flags & IRQCHIP_AFFINITY_PRE_STARTUP) + irq_setup_affinity(desc); ret = __irq_startup(desc); - irq_setup_affinity(desc); + if (!(d->chip->flags & IRQCHIP_AFFINITY_PRE_STARTUP)) + irq_setup_affinity(desc); break; case IRQ_STARTUP_MANAGED: irq_do_set_affinity(d, aff, false);
The following commit has been merged into the irq/core branch of tip: Commit-ID: 689e6b5351573c38ccf92a0dd8b3e2c2241e4aff Gitweb: https://git.kernel.org/tip/689e6b5351573c38ccf92a0dd8b3e2c2241e4aff Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:45 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Correct misleading comments The comments about preserving the cached state in pci_msi[x]_shutdown() are misleading as the MSI descriptors are freed right after those functions return. So there is nothing to restore. Preparatory change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.621609423@linutronix.de --- drivers/pci/msi.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index e27ac6b..b3f5807 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -961,7 +961,6 @@ static void pci_msi_shutdown(struct pci_dev *dev) /* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); - /* Keep cached state to be restored */ __pci_msi_desc_mask_irq(desc, mask, 0); /* Restore dev->irq to its default pin-assertion IRQ */ @@ -1047,10 +1046,8 @@ static void pci_msix_shutdown(struct pci_dev *dev) } /* Return the device with MSI-X masked as initial states */ - for_each_pci_msi_entry(entry, dev) { - /* Keep cached states to be restored */ + for_each_pci_msi_entry(entry, dev) __pci_msix_desc_mask_irq(entry, 1); - } pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); pci_intx_for_msi(dev, 1);
The following commit has been merged into the irq/core branch of tip: Commit-ID: d28d4ad2a1aef27458b3383725bb179beb8d015c Gitweb: https://git.kernel.org/tip/d28d4ad2a1aef27458b3383725bb179beb8d015c Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:46 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() No point in using the raw write function from shutdown. Preparatory change to introduce proper serialization for the msi_desc::masked cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.674391354@linutronix.de --- drivers/pci/msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index b3f5807..f0f7026 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -961,7 +961,7 @@ static void pci_msi_shutdown(struct pci_dev *dev) /* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); - __pci_msi_desc_mask_irq(desc, mask, 0); + msi_mask_irq(desc, mask, 0); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
The following commit has been merged into the irq/core branch of tip: Commit-ID: 361fd37397f77578735907341579397d5bed0a2d Gitweb: https://git.kernel.org/tip/361fd37397f77578735907341579397d5bed0a2d Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:44 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Do not set invalid bits in MSI mask msi_mask_irq() takes a mask and a flags argument. The mask argument is used to mask out bits from the cached mask and the flags argument to set bits. Some places invoke it with a flags argument which sets bits which are not used by the device, i.e. when the device supports up to 8 vectors a full unmask in some places sets the mask to 0xFFFFFF00. While devices probably do not care, it's still bad practice. Fixes: 7ba1930db02f ("PCI MSI: Unmask MSI if setup failed") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de --- drivers/pci/msi.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 434c704..e27ac6b 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -656,21 +656,21 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, /* Configure MSI capability structure */ ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; } ret = msi_verify_entries(dev); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; } ret = populate_msi_sysfs(dev); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; } @@ -962,7 +962,7 @@ static void pci_msi_shutdown(struct pci_dev *dev) /* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); /* Keep cached state to be restored */ - __pci_msi_desc_mask_irq(desc, mask, ~mask); + __pci_msi_desc_mask_irq(desc, mask, 0); /* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
The following commit has been merged into the irq/core branch of tip: Commit-ID: b9255a7cb51754e8d2645b65dd31805e282b4f3e Gitweb: https://git.kernel.org/tip/b9255a7cb51754e8d2645b65dd31805e282b4f3e Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:43 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Enforce MSI[X] entry updates to be visible Nothing enforces the posted writes to be visible when the function returns. Flush them even if the flush might be redundant when the entry is masked already as the unmask will flush as well. This is either setup or a rare affinity change event so the extra flush is not the end of the world. While this is more a theoretical issue especially the logic in the X86 specific msi_set_affinity() function relies on the assumption that the update has reached the hardware when the function returns. Again, as this never has been enforced the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de --- drivers/pci/msi.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7ee1ac4..434c704 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -311,6 +311,9 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) if (unmasked) __pci_msix_desc_mask_irq(entry, 0); + + /* Ensure that the writes are visible in the device */ + readl(base + PCI_MSIX_ENTRY_DATA); } else { int pos = dev->msi_cap; u16 msgctl; @@ -331,6 +334,8 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) pci_write_config_word(dev, pos + PCI_MSI_DATA_32, msg->data); } + /* Ensure that the writes are visible in the device */ + pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl); } skip:
The following commit has been merged into the irq/core branch of tip: Commit-ID: da181dc974ad667579baece33c2c8d2d1e4558d5 Gitweb: https://git.kernel.org/tip/da181dc974ad667579baece33c2c8d2d1e4558d5 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:42 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Enforce that MSI-X table entry is masked for update The specification (PCIe r5.0, sec 6.1.4.5) states: For MSI-X, a function is permitted to cache Address and Data values from unmasked MSI-X Table entries. However, anytime software unmasks a currently masked MSI-X Table entry either by clearing its Mask bit or by clearing the Function Mask bit, the function must update any Address or Data values that it cached from that entry. If software changes the Address or Data value of an entry while the entry is unmasked, the result is undefined. The Linux kernel's MSI-X support never enforced that the entry is masked before the entry is modified hence the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Enforce the entry to be masked across the update. There is no point in enforcing this to be handled at all possible call sites as this is just pointless code duplication and the common update function is the obvious place to enforce this. Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de --- drivers/pci/msi.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 57c9ec9..7ee1ac4 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -289,13 +289,28 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) /* Don't touch the hardware now */ } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); + bool unmasked = !(entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT); if (!base) goto skip; + /* + * The specification mandates that the entry is masked + * when the message is modified: + * + * "If software changes the Address or Data value of an + * entry while the entry is unmasked, the result is + * undefined." + */ + if (unmasked) + __pci_msix_desc_mask_irq(entry, PCI_MSIX_ENTRY_CTRL_MASKBIT); + writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); writel(msg->data, base + PCI_MSIX_ENTRY_DATA); + + if (unmasked) + __pci_msix_desc_mask_irq(entry, 0); } else { int pos = dev->msi_cap; u16 msgctl;
The following commit has been merged into the irq/core branch of tip: Commit-ID: 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f Gitweb: https://git.kernel.org/tip/7d5ec3d3612396dc6d4b76366d20ab9fc06f399f Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:41 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Mask all unused MSI-X entries When MSI-X is enabled the ordering of calls is: msix_map_region(); msix_setup_entries(); pci_msi_setup_msi_irqs(); msix_program_entries(); This has a few interesting issues: 1) msix_setup_entries() allocates the MSI descriptors and initializes them except for the msi_desc:masked member which is left zero initialized. 2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets up the MSI interrupts which ends up in pci_write_msi_msg() unless the interrupt chip provides its own irq_write_msi_msg() function. 3) msix_program_entries() does not do what the name suggests. It solely updates the entries array (if not NULL) and initializes the masked member for each MSI descriptor by reading the hardware state and then masks the entry. Obviously this has some issues: 1) The uninitialized masked member of msi_desc prevents the enforcement of masking the entry in pci_write_msi_msg() depending on the cached masked bit. Aside of that half initialized data is a NONO in general 2) msix_program_entries() only ensures that the actually allocated entries are masked. This is wrong as experimentation with crash testing and crash kernel kexec has shown. This limited testing unearthed that when the production kernel had more entries in use and unmasked when it crashed and the crash kernel allocated a smaller amount of entries, then a full scan of all entries found unmasked entries which were in use in the production kernel. This is obviously a device or emulation issue as the device reset should mask all MSI-X table entries, but obviously that's just part of the paper specification. Cure this by: 1) Masking all table entries in hardware 2) Initializing msi_desc::masked in msix_setup_entries() 3) Removing the mask dance in msix_program_entries() 4) Renaming msix_program_entries() to msix_update_entries() to reflect the purpose of that function. As the masking of unused entries has never been done the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de --- drivers/pci/msi.c | 45 +++++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 5d39ed8..57c9ec9 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -691,6 +691,7 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base, { struct irq_affinity_desc *curmsk, *masks = NULL; struct msi_desc *entry; + void __iomem *addr; int ret, i; int vec_count = pci_msix_vec_count(dev); @@ -711,6 +712,7 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base, entry->msi_attrib.is_msix = 1; entry->msi_attrib.is_64 = 1; + if (entries) entry->msi_attrib.entry_nr = entries[i].entry; else @@ -722,6 +724,10 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base, entry->msi_attrib.default_irq = dev->irq; entry->mask_base = base; + addr = pci_msix_desc_addr(entry); + if (addr) + entry->masked = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); if (masks) curmsk++; @@ -732,26 +738,25 @@ out: return ret; } -static void msix_program_entries(struct pci_dev *dev, - struct msix_entry *entries) +static void msix_update_entries(struct pci_dev *dev, struct msix_entry *entries) { struct msi_desc *entry; - int i = 0; - void __iomem *desc_addr; for_each_pci_msi_entry(entry, dev) { - if (entries) - entries[i++].vector = entry->irq; + if (entries) { + entries->vector = entry->irq; + entries++; + } + } +} - desc_addr = pci_msix_desc_addr(entry); - if (desc_addr) - entry->masked = readl(desc_addr + - PCI_MSIX_ENTRY_VECTOR_CTRL); - else - entry->masked = 0; +static void msix_mask_all(void __iomem *base, int tsize) +{ + u32 ctrl = PCI_MSIX_ENTRY_CTRL_MASKBIT; + int i; - msix_mask_irq(entry, 1); - } + for (i = 0; i < tsize; i++, base += PCI_MSIX_ENTRY_SIZE) + writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL); } /** @@ -768,9 +773,9 @@ static void msix_program_entries(struct pci_dev *dev, static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, int nvec, struct irq_affinity *affd) { - int ret; - u16 control; void __iomem *base; + int ret, tsize; + u16 control; /* * Some devices require MSI-X to be enabled before the MSI-X @@ -782,12 +787,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ - base = msix_map_region(dev, msix_table_size(control)); + tsize = msix_table_size(control); + base = msix_map_region(dev, tsize); if (!base) { ret = -ENOMEM; goto out_disable; } + /* Ensure that all table entries are masked. */ + msix_mask_all(base, tsize); + ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) goto out_disable; @@ -801,7 +810,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, if (ret) goto out_free; - msix_program_entries(dev, entries); + msix_update_entries(dev, entries); ret = populate_msi_sysfs(dev); if (ret)
The following commit has been merged into the irq/core branch of tip: Commit-ID: 438553958ba19296663c6d6583d208dfb6792830 Gitweb: https://git.kernel.org/tip/438553958ba19296663c6d6583d208dfb6792830 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Jul 2021 23:51:40 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 10 Aug 2021 10:59:20 +02:00 PCI/MSI: Enable and mask MSI-X early The ordering of MSI-X enable in hardware is dysfunctional: 1) MSI-X is disabled in the control register 2) Various setup functions 3) pci_msi_setup_msi_irqs() is invoked which ends up accessing the MSI-X table entries 4) MSI-X is enabled and masked in the control register with the comment that enabling is required for some hardware to access the MSI-X table Step #4 obviously contradicts #3. The history of this is an issue with the NIU hardware. When #4 was introduced the table access actually happened in msix_program_entries() which was invoked after enabling and masking MSI-X. This was changed in commit d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") which removed the table write from msix_program_entries(). Interestingly enough nobody noticed and either NIU still works or it did not get any testing with a kernel 3.19 or later. Nevertheless this is inconsistent and there is no reason why MSI-X can't be enabled and masked in the control register early on, i.e. move step #4 above to step #1. This preserves the NIU workaround and has no side effects on other hardware. Fixes: d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de --- drivers/pci/msi.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 9232255..5d39ed8 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -772,18 +772,25 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, u16 control; void __iomem *base; - /* Ensure MSI-X is disabled while it is set up */ - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + /* + * Some devices require MSI-X to be enabled before the MSI-X + * registers can be accessed. Mask all the vectors to prevent + * interrupts coming in before they're fully set up. + */ + pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL | + PCI_MSIX_FLAGS_ENABLE); pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ base = msix_map_region(dev, msix_table_size(control)); - if (!base) - return -ENOMEM; + if (!base) { + ret = -ENOMEM; + goto out_disable; + } ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) - return ret; + goto out_disable; ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX); if (ret) @@ -794,14 +801,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, if (ret) goto out_free; - /* - * Some devices require MSI-X to be enabled before we can touch the - * MSI-X registers. We need to mask all the vectors to prevent - * interrupts coming in before they're fully set up. - */ - pci_msix_clear_and_set_ctrl(dev, 0, - PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE); - msix_program_entries(dev, entries); ret = populate_msi_sysfs(dev); @@ -836,6 +835,9 @@ out_avail: out_free: free_msi_irqs(dev); +out_disable: + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + return ret; }