All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
@ 2011-06-01 10:05 Laszlo Ersek
  2011-06-01 11:02 ` Laszlo Ersek
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Laszlo Ersek @ 2011-06-01 10:05 UTC (permalink / raw)
  To: xen-devel

Hi,

this is more of an RFC than a patch now for linux-2.6.18-xen. Describing the situation captured in RHBZ#688673, in a nutshell:

- let's say we have an Intel 82576 card (igb),
- the card exports virtual functions (igbvf),
- one virtual function is passed through to a PV guest,
- the igbvf driver, co-operating with pcifront, pciback, and the hypervisor, sets up three MSI-X vectors for the virtual function PCI device,
- when the domU is shut down, the MSI-X vectors are "leaked" in dom0, because nobody ever reaches dom0's pci_disable_msix() / msi_remove_pci_irq_vectors() during shutdown,
- therefore configuration of the VF during the next boot of such a guest doesn't succeed (dom0 thinks, based on its stale list, that MSI-X vectors are already allocated/mapped for the device)

  dom0 (msix_capability_init(), output beautified a bit):
    msix entry 0 for dev 01:10:0 are not freed before acquire again.
    msix entry 1 for dev 01:10:0 are not freed before acquire again.
    msix entry 2 for dev 01:10:0 are not freed before acquire again.

  guest:
      Determining IP information for eth1...
      Failed to obtain physical IRQ 255
      Failed to obtain physical IRQ 254
      Failed to obtain physical IRQ 253

- if the device or the full igbvf module is removed before shutdown in the guest (in case the boot was successful to begin with!), then pci_disable_msix() is called and everything works fine; the next boot / ifup succeeds.

I backported c/s 1070 to the RHEL-5 kernel, but it was not enough -- this is not an abrupt termination (destroy), but a contolled shutdown. Here's what I suspect happens (in RHEL-5) when device_shutdown() walks the list of devices in the PV domU and calls the appropriate shutdown hook for igbvf:

domU: igbvf_shutdown()
        igbvf_suspend()
          pci_disable_device()
            pcibios_disable_device()
              pcibios_disable_resources()
                pci_write_config_word(
                    dev, PCI_COMMAND,
                    orig & ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)
                )
                  ...
                    pcifront_bus_write()
                      do_pci_op(XEN_PCI_OP_conf_write)
dom0:                   pciback_do_op()
                          pciback_config_write()
                            conf_space_write()
                              command_write() [via PCI_COMMAND funcptr]
                                pci_disable_device()
                                  disable_msi_mode()
                                    dev->msix_enabled = 0;

The final assignment above precludes c/s 1070 from doing the job.

Of course igbvf_shutdown() could invoke igbvf_remove() instead (or another lower-level function that ends up in dom0's pci_disable_msix()), but it should not depend on the guest playing nice.

Below's my proposed patch for RHEL-5, ported to upstream. 

----v----
introduce msi_prune_pci_irq_vectors()

extend pciback_reset_device() with the following functionality (msi_prune_pci_irq_vectors()):
- remove all (MSI-X vector, entry number) pairs that might still belong, in dom0's mind, to the PCI device being reset,
- unmap some space "used for saving/restoring MSI-X tables" that might otherwise go leaked

The function, modeled after msi_remove_pci_irq_vectors(), doesn't touch the hypervisor, nor the PCI device; it only trims the list used by dom0 for filtering. Justification being: when this is called, either the owner domain is gone already (or very close to being gone), or the device was already correctly detached.
----^----

Now I think one comment in the patch below does not hold for upstream: "msi_dev_head is only maintained in dom0". According to c/s 680, this doesn't seem to be the case for upstream.

Is the case described above possible at all in upstream? Do you think the fix I propose is correct? It seemed to do the job with RHEL-5 in my testing.

  dom0:
    PCI: 0000:01:10.0: cleaning up MSI-X entry 0
    PCI: 0000:01:10.0: cleaning up MSI-X entry 1
    PCI: 0000:01:10.0: cleaning up MSI-X entry 2
    PCI: 0000:01:10.0: cleaning up mask_base

Just in case:

 drivers/pci/msi-xen.c             |   46 ++++++++++++++++++++++++++++++++++++++
 drivers/xen/pciback/pciback_ops.c |    5 ++++
 include/linux/pci.h               |    1 
 3 files changed, 52 insertions(+)

Signed-off-by: Laszlo Ersek<lersek@redhat.com>

Thank you very much!
lacos


diff -r 876a5aaac026 drivers/pci/msi-xen.c
--- a/drivers/pci/msi-xen.c	Thu May 26 12:33:41 2011 +0100
+++ b/drivers/pci/msi-xen.c	Tue May 31 19:17:26 2011 +0200
@@ -894,6 +894,51 @@ void msi_remove_pci_irq_vectors(struct p
 	dev->irq = msi_dev_entry->default_irq;
 }
 
+void msi_prune_pci_irq_vectors(struct pci_dev *dev)
+{
+	unsigned long flags;
+	struct msi_dev_list *msi_dev_scan, *msi_dev_entry = NULL;
+	struct msi_pirq_entry *pirq_entry, *tmp;
+
+	if (!pci_msi_enable || !dev)
+		return;
+
+	/* msi_dev_head is only maintained in dom0 */
+	BUG_ON(!is_initial_xendomain());
+
+	/* search for the set of MSI-X vectors, don't extend list */
+	spin_lock_irqsave(&msi_dev_lock, flags);
+	list_for_each_entry(msi_dev_scan, &msi_dev_head, list)
+		if (msi_dev_scan->dev == dev) {
+			msi_dev_entry = msi_dev_scan;
+			break;
+		}
+	spin_unlock_irqrestore(&msi_dev_lock, flags);
+	if (msi_dev_entry == NULL)
+		return;
+
+	/* Empty the set. No PHYSDEVOP_unmap_pirq hypercalls: the domU is
+	 * already gone, or the device is already unplugged.
+	 */
+	spin_lock_irqsave(&msi_dev_entry->pirq_list_lock, flags);
+	if (!list_empty(&msi_dev_entry->pirq_list_head))
+		list_for_each_entry_safe(pirq_entry, tmp,
+		                         &msi_dev_entry->pirq_list_head, list) {
+			printk(KERN_INFO "PCI: %s: cleaning up MSI-X entry "
+			       "%d\n", pci_name(dev), pirq_entry->entry_nr);
+			list_del(&pirq_entry->list);
+			kfree(pirq_entry);
+		}
+	spin_unlock_irqrestore(&msi_dev_entry->pirq_list_lock, flags);
+
+	if (msi_dev_entry->mask_base != NULL) {
+		printk(KERN_INFO "PCI: %s: cleaning up mask_base\n",
+		       pci_name(dev));
+		iounmap(msi_dev_entry->mask_base);
+		msi_dev_entry->mask_base = NULL;
+	}
+}
+
 void pci_no_msi(void)
 {
 	pci_msi_enable = 0;
@@ -906,5 +951,6 @@ EXPORT_SYMBOL(pci_disable_msix);
 #ifdef CONFIG_XEN
 EXPORT_SYMBOL(register_msi_get_owner);
 EXPORT_SYMBOL(unregister_msi_get_owner);
+EXPORT_SYMBOL(msi_prune_pci_irq_vectors);
 #endif
 
diff -r 876a5aaac026 drivers/xen/pciback/pciback_ops.c
--- a/drivers/xen/pciback/pciback_ops.c	Thu May 26 12:33:41 2011 +0100
+++ b/drivers/xen/pciback/pciback_ops.c	Tue May 31 19:17:26 2011 +0200
@@ -29,6 +29,11 @@ void pciback_reset_device(struct pci_dev
 			pci_disable_msix(dev);
 		if (dev->msi_enabled)
 			pci_disable_msi(dev);
+
+		/* After a controlled shutdown or the crash fixup above, prune
+		 * dom0's MSI-X vector list for the device.
+		 */
+		msi_prune_pci_irq_vectors(dev);
 #endif
 		pci_disable_device(dev);
 
diff -r 876a5aaac026 include/linux/pci.h
--- a/include/linux/pci.h	Thu May 26 12:33:41 2011 +0100
+++ b/include/linux/pci.h	Tue May 31 19:17:26 2011 +0200
@@ -640,6 +640,7 @@ extern void msi_remove_pci_irq_vectors(s
 #ifdef CONFIG_XEN
 extern int register_msi_get_owner(int (*func)(struct pci_dev *dev));
 extern int unregister_msi_get_owner(int (*func)(struct pci_dev *dev));
+extern void msi_prune_pci_irq_vectors(struct pci_dev *dev);
 #endif
 #endif

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-06-02 20:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01 10:05 [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Laszlo Ersek
2011-06-01 11:02 ` Laszlo Ersek
2011-06-01 14:31   ` Konrad Rzeszutek Wilk
2011-06-01 16:18     ` Paolo Bonzini
2011-06-01 17:32     ` Laszlo Ersek
2011-06-01 18:13       ` 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft Konrad Rzeszutek Wilk
2011-06-02  7:42         ` Laszlo Ersek
2011-06-02 20:49         ` Pasi Kärkkäinen
2011-06-01 12:56 ` [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Jan Beulich
2011-06-01 14:12   ` Laszlo Ersek
2011-06-01 14:59     ` Jan Beulich
2011-06-01 15:27       ` Laszlo Ersek
2011-06-01 16:07         ` Jan Beulich
2011-06-01 16:25           ` Andrew Jones
2011-06-01 16:27             ` Paolo Bonzini
2011-06-01 16:16       ` Paolo Bonzini
2011-06-01 14:51 ` Konrad Rzeszutek Wilk
2011-06-01 15:01   ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.