[PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs

* [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs
@ 2020-01-18  0:25 Evan Green
  2020-01-22 11:25 ` Rajat Jain
  2020-01-25 18:32 ` Jacob Pan
  0 siblings, 2 replies; 26+ messages in thread
From: Evan Green @ 2020-01-18  0:25 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Evan Green, linux-pci, linux-kernel

__pci_write_msi_msg() updates three registers in the device: address
high, address low, and data. On x86 systems, address low contains
CPU targeting info, and data contains the vector. The order of writes
is address, then data.

This is problematic if an interrupt comes in after address has
been written, but before data is updated, and both the SMP affinity
and target vector are being changed. In this case, the interrupt targets
the wrong vector on the new CPU.

This case is pretty easy to stumble into using xhci and CPU hotplugging.
Create a script that repeatedly targets interrupts at a set of cores and
then offlines those cores. Put some stress on USB, and then watch xhci
lose an interrupt and die.

Avoid this by disabling MSIs during the update.

Signed-off-by: Evan Green <evgreen@chromium.org>
---

Changes in v2:
- Also mask msi-x interrupts during the update
- Restore the enable/mask bit to its previous value, rather than
unconditionally enabling interrupts


Bjorn,
I was unsure whether disabling MSIs temporarily is actually an okay
thing to do. I considered using the mask bit, but got the impression
that not all devices support the mask bit. Let me know if this going to
cause problems or there's a better way. I can include the repro
script I used to cause mayhem if needed.

---
 drivers/pci/msi.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 6b43a5455c7af..bb21a7739fa2c 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -311,6 +311,7 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
 	struct pci_dev *dev = msi_desc_to_pci_dev(entry);
+	u16 msgctl;
 
 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
 		/* Don't touch the hardware now */
@@ -320,15 +321,25 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 		if (!base)
 			goto skip;
 
+		pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
+				     &msgctl);
+
+		pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
+				      msgctl | PCI_MSIX_FLAGS_MASKALL);
+
 		writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
 		writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
 		writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
+		pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
+				      msgctl);
+
 	} else {
 		int pos = dev->msi_cap;
-		u16 msgctl;
+		u16 enabled;
 
 		pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
-		msgctl &= ~PCI_MSI_FLAGS_QSIZE;
+		enabled = msgctl & PCI_MSI_FLAGS_ENABLE;
+		msgctl &= ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
 		msgctl |= entry->msi_attrib.multiple << 4;
 		pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
 
@@ -343,6 +354,9 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 			pci_write_config_word(dev, pos + PCI_MSI_DATA_32,
 					      msg->data);
 		}
+
+		msgctl |= enabled;
+		pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
 	}
 
 skip:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread