xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Make the pcidevs_lock a recursive one
@ 2016-03-09 13:17 Quan Xu
  2016-03-09 13:17 ` [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
  2016-03-09 13:17 ` [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
  0 siblings, 2 replies; 20+ messages in thread
From: Quan Xu @ 2016-03-09 13:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, Quan Xu, Keir Fraser

This patch set makes the pcidevs_lock a recursive one. It is a prereq
patch set for Patch:'VT-d Device-TLB flush issue', as the pcidevs_lock
may be recursively held for hiding the ATS device, when VT-d Device-TLB
flush timed out.

In detail:
 1. Fix a bug found in AMD IOMMU initialization.
    pcidevs_lock should be held with interrupt enabled. However there remains
    an exception in AMD IOMMU code, where the lock is acquired with interrupt
    disabled. This inconsistency might lead to deadlock.

    The fix is straightforward to use spin_lock instead. Also interrupt has been
    enabled when this function is invoked, so we're sure consistency around
    pcidevs_lock can be guaranteed after this fix.

 2. Make the pcidevs_lock a recursive one.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Feng Wu <feng.wu@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>

--Changes in v3:
 * Enhance patch1/2 changelog.

Quan Xu (2):
  IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  IOMMU/spinlock: Make the pcidevs_lock a recursive one

 xen/arch/x86/domctl.c                       |  8 +--
 xen/arch/x86/hvm/vmsi.c                     |  4 +-
 xen/arch/x86/irq.c                          |  8 +--
 xen/arch/x86/msi.c                          | 16 ++---
 xen/arch/x86/pci.c                          |  4 +-
 xen/arch/x86/physdev.c                      | 16 ++---
 xen/common/sysctl.c                         |  4 +-
 xen/drivers/passthrough/amd/iommu_init.c    |  9 ++-
 xen/drivers/passthrough/amd/iommu_map.c     |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  4 +-
 xen/drivers/passthrough/pci.c               | 96 +++++++++++++++++------------
 xen/drivers/passthrough/vtd/intremap.c      |  2 +-
 xen/drivers/passthrough/vtd/iommu.c         | 14 ++---
 xen/drivers/video/vga.c                     |  4 +-
 xen/include/xen/pci.h                       |  5 +-
 15 files changed, 109 insertions(+), 87 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-09 13:17 [PATCH v3 0/2] Make the pcidevs_lock a recursive one Quan Xu
@ 2016-03-09 13:17 ` Quan Xu
  2016-03-09 14:59   ` Dario Faggioli
  2016-03-11  3:24   ` Meng Xu
  2016-03-09 13:17 ` [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
  1 sibling, 2 replies; 20+ messages in thread
From: Quan Xu @ 2016-03-09 13:17 UTC (permalink / raw)
  To: xen-devel; +Cc: Dario Faggioli, Jan Beulich, Suravee Suthikulpanit, Quan Xu

pcidevs_lock should be held with interrupt enabled. However there remains
an exception in AMD IOMMU code, where the lock is acquired with interrupt
disabled. This inconsistency might lead to deadlock.

The fix is straightforward to use spin_lock instead. Also interrupt has been
enabled when this function is invoked, so we're sure consistency around
pcidevs_lock can be guaranteed after this fix.

Signed-off-by: Quan Xu <quan.xu@intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/drivers/passthrough/amd/iommu_init.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index d90a2d2..a400497 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -778,7 +778,6 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
     hw_irq_controller *handler;
-    unsigned long flags;
     u16 control;
 
     irq = create_irq(NUMA_NO_NODE);
@@ -788,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
         return 0;
     }
 
-    spin_lock_irqsave(&pcidevs_lock, flags);
+    spin_lock(&pcidevs_lock);
     iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf),
                                   PCI_DEVFN2(iommu->bdf));
-    spin_unlock_irqrestore(&pcidevs_lock, flags);
+    spin_unlock(&pcidevs_lock);
     if ( !iommu->msi.dev )
     {
         AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-09 13:17 [PATCH v3 0/2] Make the pcidevs_lock a recursive one Quan Xu
  2016-03-09 13:17 ` [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
@ 2016-03-09 13:17 ` Quan Xu
  2016-03-09 17:45   ` Dario Faggioli
  2016-03-10  9:52   ` Jan Beulich
  1 sibling, 2 replies; 20+ messages in thread
From: Quan Xu @ 2016-03-09 13:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, Quan Xu, Keir Fraser

Signed-off-by: Quan Xu <quan.xu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Feng Wu <feng.wu@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/arch/x86/domctl.c                       |  8 +--
 xen/arch/x86/hvm/vmsi.c                     |  4 +-
 xen/arch/x86/irq.c                          |  8 +--
 xen/arch/x86/msi.c                          | 16 ++---
 xen/arch/x86/pci.c                          |  4 +-
 xen/arch/x86/physdev.c                      | 16 ++---
 xen/common/sysctl.c                         |  4 +-
 xen/drivers/passthrough/amd/iommu_init.c    |  8 +--
 xen/drivers/passthrough/amd/iommu_map.c     |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  4 +-
 xen/drivers/passthrough/pci.c               | 96 +++++++++++++++++------------
 xen/drivers/passthrough/vtd/intremap.c      |  2 +-
 xen/drivers/passthrough/vtd/iommu.c         | 14 ++---
 xen/drivers/video/vga.c                     |  4 +-
 xen/include/xen/pci.h                       |  5 +-
 15 files changed, 109 insertions(+), 86 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 55aecdc..b34a295 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -472,9 +472,9 @@ long arch_do_domctl(
         ret = -ESRCH;
         if ( iommu_enabled )
         {
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
             ret = pt_irq_create_bind(d, bind);
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
         }
         if ( ret < 0 )
             printk(XENLOG_G_ERR "pt_irq_create_bind failed (%ld) for dom%d\n",
@@ -497,9 +497,9 @@ long arch_do_domctl(
 
         if ( iommu_enabled )
         {
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
             ret = pt_irq_destroy_bind(d, bind);
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
         }
         if ( ret < 0 )
             printk(XENLOG_G_ERR "pt_irq_destroy_bind failed (%ld) for dom%d\n",
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 499151e..b41aba5 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -388,7 +388,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
     struct msixtbl_entry *entry, *new_entry;
     int r = -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(spin_is_locked(&d->event_lock));
 
     if ( !has_vlapic(d) )
@@ -446,7 +446,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
     struct pci_dev *pdev;
     struct msixtbl_entry *entry;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(spin_is_locked(&d->event_lock));
 
     if ( !has_vlapic(d) )
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 1228568..78e4263 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1960,7 +1960,7 @@ int map_domain_pirq(
         struct pci_dev *pdev;
         unsigned int nr = 0;
 
-        ASSERT(spin_is_locked(&pcidevs_lock));
+        ASSERT(pcidevs_is_locked());
 
         ret = -ENODEV;
         if ( !cpu_has_apic )
@@ -2105,7 +2105,7 @@ int unmap_domain_pirq(struct domain *d, int pirq)
     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(spin_is_locked(&d->event_lock));
 
     info = pirq_info(d, pirq);
@@ -2231,7 +2231,7 @@ void free_domain_pirqs(struct domain *d)
 {
     int i;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     spin_lock(&d->event_lock);
 
     for ( i = 0; i < d->nr_pirqs; i++ )
@@ -2239,7 +2239,7 @@ void free_domain_pirqs(struct domain *d)
             unmap_domain_pirq(d, i);
 
     spin_unlock(&d->event_lock);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 static void dump_irqs(unsigned char key)
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 5a481f6..a6018a2 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -694,7 +694,7 @@ static int msi_capability_init(struct pci_dev *dev,
     u8 slot = PCI_SLOT(dev->devfn);
     u8 func = PCI_FUNC(dev->devfn);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
     if ( !pos )
         return -ENODEV;
@@ -852,7 +852,7 @@ static int msix_capability_init(struct pci_dev *dev,
     u8 func = PCI_FUNC(dev->devfn);
     bool_t maskall = msix->host_maskall;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     control = pci_conf_read16(seg, bus, slot, func, msix_control_reg(pos));
     /*
@@ -1042,7 +1042,7 @@ static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
     struct pci_dev *pdev;
     struct msi_desc *old_desc;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pdev = pci_get_pdev(msi->seg, msi->bus, msi->devfn);
     if ( !pdev )
         return -ENODEV;
@@ -1103,7 +1103,7 @@ static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc)
     u8 func = PCI_FUNC(msi->devfn);
     struct msi_desc *old_desc;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pdev = pci_get_pdev(msi->seg, msi->bus, msi->devfn);
     pos = pci_find_cap_offset(msi->seg, msi->bus, slot, func, PCI_CAP_ID_MSIX);
     if ( !pdev || !pos )
@@ -1205,7 +1205,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off)
     if ( !pos )
         return -ENODEV;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_pdev(seg, bus, devfn);
     if ( !pdev )
         rc = -ENODEV;
@@ -1224,7 +1224,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off)
         rc = msix_capability_init(pdev, pos, NULL, NULL,
                                   multi_msix_capable(control));
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
@@ -1235,7 +1235,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off)
  */
 int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
 {
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( !use_msi )
         return -EPERM;
@@ -1351,7 +1351,7 @@ int pci_restore_msi_state(struct pci_dev *pdev)
     unsigned int type = 0, pos = 0;
     u16 control = 0;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( !use_msi )
         return -EOPNOTSUPP;
diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c
index 4b87cab..a9decd4 100644
--- a/xen/arch/x86/pci.c
+++ b/xen/arch/x86/pci.c
@@ -88,13 +88,13 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf,
     if ( reg < 64 || reg >= 256 )
         return 0;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
 
     pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
     if ( pdev )
         rc = pci_msi_conf_write_intercept(pdev, reg, size, data);
 
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 57b7800..1cb9b58 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -167,7 +167,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,
         goto free_domain;
     }
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     /* Verify or get pirq. */
     spin_lock(&d->event_lock);
     pirq = domain_irq_to_pirq(d, irq);
@@ -237,7 +237,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,
 
  done:
     spin_unlock(&d->event_lock);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     if ( ret != 0 )
         switch ( type )
         {
@@ -275,11 +275,11 @@ int physdev_unmap_pirq(domid_t domid, int pirq)
             goto free_domain;
     }
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     spin_lock(&d->event_lock);
     ret = unmap_domain_pirq(d, pirq);
     spin_unlock(&d->event_lock);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
  free_domain:
     rcu_unlock_domain(d);
@@ -689,10 +689,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&restore_msi, arg, 1) != 0 )
             break;
 
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         pdev = pci_get_pdev(0, restore_msi.bus, restore_msi.devfn);
         ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV;
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         break;
     }
 
@@ -704,10 +704,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&dev, arg, 1) != 0 )
             break;
 
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
         ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV;
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         break;
     }
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 58162f5..253b7c8 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -426,7 +426,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
                 break;
             }
 
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
             pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
             if ( !pdev )
                 node = XEN_INVALID_DEV;
@@ -434,7 +434,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
                 node = XEN_INVALID_NODE_ID;
             else
                 node = pdev->node;
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
 
             if ( copy_to_guest_offset(ti->nodes, i, &node, 1) )
             {
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index a400497..4536106 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -673,9 +673,9 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
     bus = PCI_BUS(device_id);
     devfn = PCI_DEVFN2(device_id);
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_real_pdev(iommu->seg, bus, devfn);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     if ( pdev )
         guest_iommu_add_ppr_log(pdev->domain, entry);
@@ -787,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
         return 0;
     }
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf),
                                   PCI_DEVFN2(iommu->bdf));
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     if ( !iommu->msi.dev )
     {
         AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 78862c9..9d91dfc 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -593,7 +593,7 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
         hd->arch.paging_mode = level;
         hd->arch.root_table = new_root;
 
-        if ( !spin_is_locked(&pcidevs_lock) )
+        if ( !pcidevs_is_locked() )
             AMD_IOMMU_DEBUG("%s Try to access pdev_list "
                             "without aquiring pcidevs_lock.\n", __func__);
 
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index c1c0b6b..dc3bfab 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -158,7 +158,7 @@ static void amd_iommu_setup_domain_device(
 
     spin_unlock_irqrestore(&iommu->lock, flags);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
          !pci_ats_enabled(iommu->seg, bus, pdev->devfn) )
@@ -345,7 +345,7 @@ void amd_iommu_disable_domain_device(struct domain *domain,
     }
     spin_unlock_irqrestore(&iommu->lock, flags);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( devfn == pdev->devfn &&
          pci_ats_device(iommu->seg, bus, devfn) &&
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 62b311b..e09f7d1 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -48,7 +48,7 @@ struct pci_seg {
     } bus2bridge[MAX_BUSES];
 };
 
-spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;
+static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
 static struct radix_tree_root pci_segments;
 
 static inline struct pci_seg *get_pseg(u16 seg)
@@ -103,6 +103,26 @@ static int pci_segments_iterate(
     return rc;
 }
 
+void pcidevs_lock(void)
+{
+    spin_lock_recursive(&_pcidevs_lock);
+}
+
+void pcidevs_unlock(void)
+{
+    spin_unlock_recursive(&_pcidevs_lock);
+}
+
+int pcidevs_is_locked(void)
+{
+    return spin_is_locked(&_pcidevs_lock);
+}
+
+int pcidevs_trylock(void)
+{
+    return spin_trylock_recursive(&_pcidevs_lock);
+}
+
 void __init pt_pci_init(void)
 {
     radix_tree_init(&pci_segments);
@@ -412,14 +432,14 @@ int __init pci_hide_device(int bus, int devfn)
     struct pci_dev *pdev;
     int rc = -ENOMEM;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = alloc_pdev(get_pseg(0), bus, devfn);
     if ( pdev )
     {
         _pci_hide_device(pdev);
         rc = 0;
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
@@ -456,7 +476,7 @@ struct pci_dev *pci_get_pdev(int seg, int bus, int devfn)
     struct pci_seg *pseg = get_pseg(seg);
     struct pci_dev *pdev = NULL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(seg != -1 || bus == -1);
     ASSERT(bus != -1 || devfn == -1);
 
@@ -581,9 +601,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
         pdev_type = "extended function";
     else if (info->is_virtfn)
     {
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         if ( !pdev )
             pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
                            NULL, node);
@@ -601,7 +621,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 
     ret = -ENOMEM;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pseg = alloc_pseg(seg);
     if ( !pseg )
         goto out;
@@ -703,7 +723,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
     pci_enable_acs(pdev);
 
 out:
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     if ( !ret )
     {
         printk(XENLOG_DEBUG "PCI add %s %04x:%02x:%02x.%u\n", pdev_type,
@@ -735,7 +755,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     if ( !pseg )
         return -ENODEV;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
@@ -749,7 +769,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
             break;
         }
 
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     return ret;
 }
 
@@ -807,11 +827,11 @@ int pci_release_devices(struct domain *d)
     u8 bus, devfn;
     int ret;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     ret = pci_clean_dpci_irqs(d);
     if ( ret )
     {
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         return ret;
     }
     while ( (pdev = pci_get_pdev_by_domain(d, -1, -1, -1)) )
@@ -823,7 +843,7 @@ int pci_release_devices(struct domain *d)
                    d->domain_id, pdev->seg, bus,
                    PCI_SLOT(devfn), PCI_FUNC(devfn));
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return 0;
 }
@@ -920,7 +940,7 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn)
     s_time_t now = NOW();
     u16 cword;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_real_pdev(seg, bus, devfn);
     if ( pdev )
     {
@@ -931,7 +951,7 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn)
         if ( ++pdev->fault.count < PT_FAULT_THRESHOLD )
             pdev = NULL;
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     if ( !pdev )
         return;
@@ -988,9 +1008,9 @@ int __init scan_pci_devices(void)
 {
     int ret;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     ret = pci_segments_iterate(_scan_pci_devices, NULL);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return ret;
 }
@@ -1054,17 +1074,17 @@ static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void *arg
 
             if ( iommu_verbose )
             {
-                spin_unlock(&pcidevs_lock);
+                pcidevs_unlock();
                 process_pending_softirqs();
-                spin_lock(&pcidevs_lock);
+                pcidevs_lock();
             }
         }
 
         if ( !iommu_verbose )
         {
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
             process_pending_softirqs();
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
         }
     }
 
@@ -1076,9 +1096,9 @@ void __hwdom_init setup_hwdom_pci_devices(
 {
     struct setup_hwdom ctxt = { .d = d, .handler = handler };
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pci_segments_iterate(_setup_hwdom_pci_devices, &ctxt);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 #ifdef CONFIG_ACPI
@@ -1206,9 +1226,9 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
 static void dump_pci_devices(unsigned char ch)
 {
     printk("==== PCI devices ====\n");
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pci_segments_iterate(_dump_pci_devices, NULL);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 static int __init setup_dump_pcidevs(void)
@@ -1242,7 +1262,7 @@ int iommu_add_device(struct pci_dev *pdev)
     if ( !pdev->domain )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     hd = domain_hvm_iommu(pdev->domain);
     if ( !iommu_enabled || !hd->platform_ops )
@@ -1271,7 +1291,7 @@ int iommu_enable_device(struct pci_dev *pdev)
     if ( !pdev->domain )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     hd = domain_hvm_iommu(pdev->domain);
     if ( !iommu_enabled || !hd->platform_ops ||
@@ -1320,9 +1340,9 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
 {
     struct pci_dev *pdev;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return pdev ? 0 : -EBUSY;
 }
@@ -1344,13 +1364,13 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
              p2m_get_hostp2m(d)->global_logdirty)) )
         return -EXDEV;
 
-    if ( !spin_trylock(&pcidevs_lock) )
+    if ( !pcidevs_trylock() )
         return -ERESTART;
 
     rc = iommu_construct(d);
     if ( rc )
     {
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         return rc;
     }
 
@@ -1381,7 +1401,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
  done:
     if ( !has_arch_pdevs(d) && need_iommu(d) )
         iommu_teardown(d);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
@@ -1396,7 +1416,7 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
     if ( !iommu_enabled || !hd->platform_ops )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
     if ( !pdev )
         return -ENODEV;
@@ -1451,7 +1471,7 @@ static int iommu_get_device_group(
 
     group_id = ops->get_device_group_id(seg, bus, devfn);
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     for_each_pdev( d, pdev )
     {
         if ( (pdev->seg != seg) ||
@@ -1470,14 +1490,14 @@ static int iommu_get_device_group(
 
             if ( unlikely(copy_to_guest_offset(buf, i, &bdf, 1)) )
             {
-                spin_unlock(&pcidevs_lock);
+                pcidevs_unlock();
                 return -1;
             }
             i++;
         }
     }
 
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return i;
 }
@@ -1605,9 +1625,9 @@ int iommu_do_pci_domctl(
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
 
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         ret = deassign_device(d, seg, bus, devfn);
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         if ( ret )
             printk(XENLOG_G_ERR
                    "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n",
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 0ee3fb2..f5e3f49 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -984,7 +984,7 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
 
     spin_unlock_irq(&desc->lock);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     /*
      * FIXME: For performance reasons we should store the 'iommu' pointer in
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 8022702..8409100 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1282,7 +1282,7 @@ int domain_context_mapping_one(
     u16 seg = iommu->intel->drhd->segment;
     int agaw;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     spin_lock(&iommu->lock);
     maddr = bus_to_context_maddr(iommu, bus);
     context_entries = (struct context_entry *)map_vtd_domain_page(maddr);
@@ -1424,7 +1424,7 @@ static int domain_context_mapping(
     if ( !drhd )
         return -ENODEV;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     switch ( pdev->type )
     {
@@ -1506,7 +1506,7 @@ int domain_context_unmap_one(
     u64 maddr;
     int iommu_domid;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     spin_lock(&iommu->lock);
 
     maddr = bus_to_context_maddr(iommu, bus);
@@ -1816,7 +1816,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
     struct mapped_rmrr *mrmrr;
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(rmrr->base_address < rmrr->end_address);
 
     /*
@@ -1881,7 +1881,7 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
     u16 bdf;
     int ret, i;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( !pdev->domain )
         return -EINVAL;
@@ -2109,7 +2109,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     u16 bdf;
     int ret, i;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     for_each_rmrr_device ( rmrr, bdf, i )
     {
         /*
@@ -2123,7 +2123,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 int __init intel_vtd_setup(void)
diff --git a/xen/drivers/video/vga.c b/xen/drivers/video/vga.c
index 40e5963..61c6b13 100644
--- a/xen/drivers/video/vga.c
+++ b/xen/drivers/video/vga.c
@@ -117,9 +117,9 @@ void __init video_endboot(void)
                 const struct pci_dev *pdev;
                 u8 b = bus, df = devfn, sb;
 
-                spin_lock(&pcidevs_lock);
+                pcidevs_lock();
                 pdev = pci_get_pdev(0, bus, devfn);
-                spin_unlock(&pcidevs_lock);
+                pcidevs_unlock();
 
                 if ( !pdev ||
                      pci_conf_read16(0, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index a5aef55..b87571d 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -94,7 +94,10 @@ struct pci_dev {
  * interrupt handling related (the mask bit register).
  */
 
-extern spinlock_t pcidevs_lock;
+void pcidevs_lock(void);
+void pcidevs_unlock(void);
+int pcidevs_is_locked(void);
+int pcidevs_trylock(void);
 
 bool_t pci_known_segment(u16 seg);
 bool_t pci_device_detect(u16 seg, u8 bus, u8 dev, u8 func);
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-09 13:17 ` [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
@ 2016-03-09 14:59   ` Dario Faggioli
  2016-03-10  6:12     ` Xu, Quan
  2016-03-11  3:24   ` Meng Xu
  1 sibling, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2016-03-09 14:59 UTC (permalink / raw)
  To: Quan Xu, xen-devel; +Cc: Jan Beulich, Suravee Suthikulpanit


[-- Attachment #1.1: Type: text/plain, Size: 1631 bytes --]

On Wed, 2016-03-09 at 21:17 +0800, Quan Xu wrote:
> pcidevs_lock should be held with interrupt enabled. 
>
There's a message from Jan when he says:
<<Well, I'd say something like "pcidevs_lock doesn't require interrupts
to be disabled while being acquired".>>

:-O

> However there remains
> an exception in AMD IOMMU code, where the lock is acquired with
> interrupt
> disabled. This inconsistency might lead to deadlock.
> 
I appreciate that 'might' is weaker than 'can'. Personally, I still
dob't find this correct, or at least clear enough, referred to this
patch, but I won't be in the way because of this.

> The fix is straightforward to use spin_lock instead. Also interrupt
> has been
> enabled when this function is invoked, so we're sure consistency
> around
> pcidevs_lock can be guaranteed after this fix.
> 
And I also can't really get what "so we're sure consistency
around pcidevs_lock can be guaranteed after this fix" actually means,
but again, that's probably me, and it's fine.

However,

> Signed-off-by: Quan Xu <quan.xu@intel.com>
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
>
This still stands **only** if the very first sentence "pcidevs_lock
should be held with interrupt enabled" is changed to "pcidevs_lock
doesn't require interrupts to be disabled while being acquired".

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-09 13:17 ` [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
@ 2016-03-09 17:45   ` Dario Faggioli
  2016-03-10  1:21     ` Xu, Quan
  2016-03-10  9:52   ` Jan Beulich
  1 sibling, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2016-03-09 17:45 UTC (permalink / raw)
  To: Quan Xu, xen-devel
  Cc: Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper,
	Suravee Suthikulpanit, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 582 bytes --]

On Wed, 2016-03-09 at 21:17 +0800, Quan Xu wrote:
> Signed-off-by: Quan Xu <quan.xu@intel.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

And I've applied and build tested it, against current staging, and this
time, it worked. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-09 17:45   ` Dario Faggioli
@ 2016-03-10  1:21     ` Xu, Quan
  0 siblings, 0 replies; 20+ messages in thread
From: Xu, Quan @ 2016-03-10  1:21 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, Wu, Feng, Jan Beulich, Andrew Cooper,
	Suravee Suthikulpanit, Keir Fraser

On March 10, 2016 1:45am, <dario.faggioli@citrix.com> wrote:
> On Wed, 2016-03-09 at 21:17 +0800, Quan Xu wrote:
> > Signed-off-by: Quan Xu <quan.xu@intel.com>
> > Acked-by: Kevin Tian <kevin.tian@intel.com>
> >
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> And I've applied and build tested it, against current staging, and this time, it
> worked. :-)
> 

Dario, thanks!! :):)
Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-09 14:59   ` Dario Faggioli
@ 2016-03-10  6:12     ` Xu, Quan
  0 siblings, 0 replies; 20+ messages in thread
From: Xu, Quan @ 2016-03-10  6:12 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Tian, Kevin, Jan Beulich, Suravee Suthikulpanit

CC Kevin, 

On March 09, 2016 11:00pm, <dario.faggioli@citrix.com> wrote:
> On Wed, 2016-03-09 at 21:17 +0800, Quan Xu wrote:
> > pcidevs_lock should be held with interrupt enabled.
> >
> There's a message from Jan when he says:
> <<Well, I'd say something like "pcidevs_lock doesn't require interrupts to be
> disabled while being acquired".>>
> 
> :-O
> 
> > However there remains
> > an exception in AMD IOMMU code, where the lock is acquired with
> > interrupt disabled. This inconsistency might lead to deadlock.
> >
> I appreciate that 'might' is weaker than 'can'. Personally, I still dob't find this
> correct, or at least clear enough, referred to this patch, but I won't be in the
> way because of this.
> 
> > The fix is straightforward to use spin_lock instead. Also interrupt
> > has been enabled when this function is invoked, so we're sure
> > consistency around pcidevs_lock can be guaranteed after this fix.
> >
> And I also can't really get what "so we're sure consistency around pcidevs_lock
> can be guaranteed after this fix" actually means, but again, that's probably me,
> and it's fine.
> 
> However,
> 
> > Signed-off-by: Quan Xu <quan.xu@intel.com>
> > Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> >
> This still stands **only** if the very first sentence "pcidevs_lock should be held
> with interrupt enabled" is changed to "pcidevs_lock doesn't require interrupts to
> be disabled while being acquired".
> 

Dario,
I am open for this change:).

When I read:
1. (look at "Lesson 3: spinlocks revisited.")
  https://www.kernel.org/doc/Documentation/locking/spinlocks.txt
2. comment inside check_lock(), in xen/common/spinlock.c, in Xen's codebase.
3. the "Linux Device Drivers, 3rd Edition" book , http://www.makelinux.net/ldd3/chp-5-sect-5 

I found that:
    - "We partition locks into IRQ-safe (__always__ held with IRQs disabled) and IRQ-unsafe (__always__ held with IRQs enabled) types".

It looks like Kevin's description is better. 

I also found that you mentioned in this thread:
"...  __except for very special situations__".
If it is true, I think your description is better.

Thanks for your time..:):)


Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-09 13:17 ` [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
  2016-03-09 17:45   ` Dario Faggioli
@ 2016-03-10  9:52   ` Jan Beulich
  2016-03-10 11:27     ` Xu, Quan
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2016-03-10  9:52 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, Andrew Cooper, Dario Faggioli, xen-devel,
	Suravee Suthikulpanit, Keir Fraser

>>> On 09.03.16 at 14:17, <quan.xu@intel.com> wrote:
> Signed-off-by: Quan Xu <quan.xu@intel.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>

The patch itself looks mostly fine now (see below for the minor left
issues), but the complete lack of a description (which should state
why this change is being done) makes this not ready to go in
anyway.

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -48,7 +48,7 @@ struct pci_seg {
>      } bus2bridge[MAX_BUSES];
>  };
>  
> -spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;
> +static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;

Why the renaming?

> @@ -103,6 +103,26 @@ static int pci_segments_iterate(
>      return rc;
>  }
>  
> +void pcidevs_lock(void)
> +{
> +    spin_lock_recursive(&_pcidevs_lock);
> +}
> +
> +void pcidevs_unlock(void)
> +{
> +    spin_unlock_recursive(&_pcidevs_lock);
> +}
> +
> +int pcidevs_is_locked(void)

bool_t

> +{
> +    return spin_is_locked(&_pcidevs_lock);
> +}
> +
> +int pcidevs_trylock(void)

bool_t

(To avoid another round, please be aware that the underlying
spin lock primitives still [wrongly] use "int", so to be fully correct
you will need to use !! in both return statements, unless you
feel like (in another prereq patch) to adjust those primitives too.

> +{
> +    return spin_trylock_recursive(&_pcidevs_lock);
> +}

I also think that it would be a good idea to place these helpers
and the lock definition next to each other.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-10  9:52   ` Jan Beulich
@ 2016-03-10 11:27     ` Xu, Quan
  2016-03-10 13:06       ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Xu, Quan @ 2016-03-10 11:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, Andrew Cooper, Dario Faggioli, xen-devel,
	Suravee Suthikulpanit, Keir Fraser

On March 10, 2016 5:53pm, <March 10, 2016 5:53> wrote:
> >>> On 09.03.16 at 14:17, <quan.xu@intel.com> wrote:
> > Signed-off-by: Quan Xu <quan.xu@intel.com>
> > Acked-by: Kevin Tian <kevin.tian@intel.com>
> 
> The patch itself looks mostly fine now (see below for the minor left issues), but
> the complete lack of a description (which should state why this change is being
> done) makes this not ready to go in anyway.
> 

What about the following description:
--
The pcidevs_lock may be recursively taken for hiding ATS device,
when VT-d Device-TLB flush timed out.



> > --- a/xen/drivers/passthrough/pci.c
> > +++ b/xen/drivers/passthrough/pci.c
> > @@ -48,7 +48,7 @@ struct pci_seg {
> >      } bus2bridge[MAX_BUSES];
> >  };
> >
> > -spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;
> > +static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
> 
> Why the renaming?
> 

The original name would cause compiler errors, as I also introduce a new helper with same name -- void pcidevs_lock(void).


> > @@ -103,6 +103,26 @@ static int pci_segments_iterate(
> >      return rc;
> >  }
> >
> > +void pcidevs_lock(void)
> > +{
> > +    spin_lock_recursive(&_pcidevs_lock);
> > +}
> > +
> > +void pcidevs_unlock(void)
> > +{
> > +    spin_unlock_recursive(&_pcidevs_lock);
> > +}
> > +
> > +int pcidevs_is_locked(void)
> 
> bool_t
> 
> > +{
> > +    return spin_is_locked(&_pcidevs_lock); }
> > +
> > +int pcidevs_trylock(void)
> 
> bool_t
> 
> (To avoid another round, please be aware that the underlying spin lock
> primitives still [wrongly] use "int", so to be fully correct you will need to use !! in
> both return statements, unless you feel like (in another prereq patch) to adjust
> those primitives too.
> 
Thanks for reminding.
I'd prefer to use !! in both return statements.
btw, could you help me check the code style?:

+bool_t pcidevs_is_locked(void)
+{
+    return !!(spin_is_locked(&_pcidevs_lock));
+}
+
+bool_t pcidevs_trylock(void)
+{
+    return !!(spin_trylock_recursive(&_pcidevs_lock));
+}

Is it right?

> > +{
> > +    return spin_trylock_recursive(&_pcidevs_lock);
> > +}
> 
> I also think that it would be a good idea to place these helpers and the lock
> definition next to each other.
> 
Agreed, make sense.

Quan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-10 11:27     ` Xu, Quan
@ 2016-03-10 13:06       ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2016-03-10 13:06 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, Andrew Cooper, Dario Faggioli, xen-devel,
	Suravee Suthikulpanit, Keir Fraser

>>> On 10.03.16 at 12:27, <quan.xu@intel.com> wrote:
> On March 10, 2016 5:53pm, <March 10, 2016 5:53> wrote:
>> >>> On 09.03.16 at 14:17, <quan.xu@intel.com> wrote:
>> > Signed-off-by: Quan Xu <quan.xu@intel.com>
>> > Acked-by: Kevin Tian <kevin.tian@intel.com>
>> 
>> The patch itself looks mostly fine now (see below for the minor left 
> issues), but
>> the complete lack of a description (which should state why this change is 
> being
>> done) makes this not ready to go in anyway.
>> 
> 
> What about the following description:
> --
> The pcidevs_lock may be recursively taken for hiding ATS device,
> when VT-d Device-TLB flush timed out.

s/may/is going to/ (since right now that isn't the case yet)

>> > --- a/xen/drivers/passthrough/pci.c
>> > +++ b/xen/drivers/passthrough/pci.c
>> > @@ -48,7 +48,7 @@ struct pci_seg {
>> >      } bus2bridge[MAX_BUSES];
>> >  };
>> >
>> > -spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;
>> > +static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
>> 
>> Why the renaming?
>> 
> 
> The original name would cause compiler errors, as I also introduce a new 
> helper with same name -- void pcidevs_lock(void).

Ah, of course.

>> > @@ -103,6 +103,26 @@ static int pci_segments_iterate(
>> >      return rc;
>> >  }
>> >
>> > +void pcidevs_lock(void)
>> > +{
>> > +    spin_lock_recursive(&_pcidevs_lock);
>> > +}
>> > +
>> > +void pcidevs_unlock(void)
>> > +{
>> > +    spin_unlock_recursive(&_pcidevs_lock);
>> > +}
>> > +
>> > +int pcidevs_is_locked(void)
>> 
>> bool_t
>> 
>> > +{
>> > +    return spin_is_locked(&_pcidevs_lock); }
>> > +
>> > +int pcidevs_trylock(void)
>> 
>> bool_t
>> 
>> (To avoid another round, please be aware that the underlying spin lock
>> primitives still [wrongly] use "int", so to be fully correct you will need 
> to use !! in
>> both return statements, unless you feel like (in another prereq patch) to 
> adjust
>> those primitives too.
>> 
> Thanks for reminding.
> I'd prefer to use !! in both return statements.
> btw, could you help me check the code style?:
> 
> +bool_t pcidevs_is_locked(void)
> +{
> +    return !!(spin_is_locked(&_pcidevs_lock));
> +}
> +
> +bool_t pcidevs_trylock(void)
> +{
> +    return !!(spin_trylock_recursive(&_pcidevs_lock));
> +}
> 
> Is it right?

The outermost parentheses aren't needed, and harm readability.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-09 13:17 ` [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
  2016-03-09 14:59   ` Dario Faggioli
@ 2016-03-11  3:24   ` Meng Xu
  2016-03-11  6:54     ` Xu, Quan
  1 sibling, 1 reply; 20+ messages in thread
From: Meng Xu @ 2016-03-11  3:24 UTC (permalink / raw)
  To: Quan Xu; +Cc: Dario Faggioli, Suravee Suthikulpanit, Jan Beulich, xen-devel

Hi Quan,

Can I ask a dumb question?

On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote:
> pcidevs_lock should be held with interrupt enabled. However there remains
> an exception in AMD IOMMU code, where the lock is acquired with interrupt
> disabled. This inconsistency might lead to deadlock.

Why will this inconsistency lead to deadlock?
I understand the difference between spin_lock_irqsave(), which disable
interrupt,  and spin_lock(), which allows interrupt, however, I didn't
get why misuse the spin_lock_irqsave() at the place of spin_lock()
could potentially lead to a deadlock?
Would you minding pointing me to somewhere I can find the reason or
enlighten me?

Thank you very much!

Best,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11  3:24   ` Meng Xu
@ 2016-03-11  6:54     ` Xu, Quan
  2016-03-11 10:35       ` Dario Faggioli
  0 siblings, 1 reply; 20+ messages in thread
From: Xu, Quan @ 2016-03-11  6:54 UTC (permalink / raw)
  To: Meng Xu; +Cc: Dario Faggioli, Suravee Suthikulpanit, Jan Beulich, xen-devel

On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote:
> On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote:
> > pcidevs_lock should be held with interrupt enabled. However there
> > remains an exception in AMD IOMMU code, where the lock is acquired
> > with interrupt disabled. This inconsistency might lead to deadlock.
> 
> Why will this inconsistency lead to deadlock?
> I understand the difference between spin_lock_irqsave(), which disable interrupt,
> and spin_lock(), which allows interrupt, however, I didn't get why misuse the
> spin_lock_irqsave() at the place of spin_lock() could potentially lead to a
> deadlock?

For above 2 questions:
Mix is just one of _possible_ scenarios(IMO, not 100% leading to a deadlock):

-where an interrupt tries to lock an already locked variable. This is ok if
the other interrupt happens on another CPU, but it is _not_ ok if the
interrupt happens on the same CPU that already holds the lock

I got it from the bellow 2 points:
 1).As Jan mentioned, The implication from disabling interrupts while acquiring a lock is that the lock is also being acquired by some interrupt handler.
  If you mix acquire types, the one not disabling interrupts is prone to be interrupted, and the interrupt trying to get hold of the lock the same CPU already owns.

 2). Comment inside check_lock(), 
we partition locks into IRQ-safe (always held with IRQs disabled) and
IRQ-unsafe (always held with IRQs enabled) types. The convention for
every lock must be consistently observed else we can deadlock in
IRQ-context rendezvous functions (__a rendezvous which gets every CPU
into IRQ context before any CPU is released from the rendezvous__).
If we can mix IRQ-disabled and IRQ-enabled callers, the following can
happen:
 * Lock is held by CPU A, with IRQs enabled
 * CPU B is spinning on same lock, with IRQs disabled
 * Rendezvous starts -- CPU A takes interrupt and enters rendezbous spin
 * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never exit
               the rendezvous, and will hence never release the lock.

To guard against this subtle bug we latch the IRQ safety of every
spinlock in the system, on first use.

> Would you minding pointing me to somewhere I can find the reason or
> enlighten me?
> 

Some reference material:

   * (look at "Lesson 3: spinlocks revisited.")
     https://www.kernel.org/doc/Documentation/locking/spinlocks.txt
   * comment inside check_lock(), in xen/common/spinlock.c, in Xen's codebase.
   * the "Linux Device Drivers, 3rd Edition" book , http://www.makelinux.net/ldd3/chp-5-sect-5

Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11  6:54     ` Xu, Quan
@ 2016-03-11 10:35       ` Dario Faggioli
  2016-03-11 12:36         ` Xu, Quan
  2016-03-11 14:41         ` Meng Xu
  0 siblings, 2 replies; 20+ messages in thread
From: Dario Faggioli @ 2016-03-11 10:35 UTC (permalink / raw)
  To: Xu, Quan, Meng Xu; +Cc: Suravee Suthikulpanit, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 7095 bytes --]

On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote:
> On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote:
> > 
> > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote:
> > > 
> > > pcidevs_lock should be held with interrupt enabled. However there
> > > remains an exception in AMD IOMMU code, where the lock is
> > > acquired
> > > with interrupt disabled. This inconsistency might lead to
> > > deadlock.
> > Why will this inconsistency lead to deadlock?
> > I understand the difference between spin_lock_irqsave(), which
> > disable interrupt,
> > and spin_lock(), which allows interrupt, however, I didn't get why
> > misuse the
> > spin_lock_irqsave() at the place of spin_lock() could potentially
> > lead to a
> > deadlock?
> 
>  1).As Jan mentioned, The implication from disabling interrupts while
> acquiring a lock is that the lock is also being acquired by some
> interrupt handler.
>   If you mix acquire types, the one not disabling interrupts is prone
> to be interrupted, and the interrupt trying to get hold of the lock
> the same CPU already owns.
> 
The key issue is whether or not a lock can be acquired from IRQ context
(i.e., in an interrupt handler, or a function called by that, as a
result of an interrupt occurring).

For lock that can, IRQ disabling variants must be used *everywhere* the
lock is taken (so, e.g., not only when it is actually taken from IRQ
context, just *always*!).

If that rule is not honored, we are ok if the lock is taken on CPUA,
and the interrupt handled on CPUB:

  CPUA              CPUB
  .                 .
  .                 .
  spin_lock(l)      .
  .                 .
  .                 . <-- Interrupt!
  .                        .
  .                       spin_lock(l); //spins on l
  .                       x             //spins on l
  .                       x             //spins on l
  spin_unlock(l)          .             //takes l
  .                       .
  .                       spin_unlock(l);
  .                 . <-- .
  .                 .

but the following can happen, if the interrupt is delivered to the CPU
that has already taken the lock:

     CPU
     .
     .
[1]  spin_lock(l);       //takes l
     .
     . <-- Interrupt!
           .
           spin_lock(l) // CPU deadlocks

If, in the latter case, spin_lock_irq(l) were used at [1], things would
have been fine, as the interrupt wouldn't have occurred until l weren't
released:

  CPU
  .
  .
  spin_lock_irq(l)                        //takes l
  .
  . <---------------- Interrupt!
  .                   -                   //IRQs are disabled
  .                   -                   //IRQs are disabled
  .                   -                   //IRQs are disabled
  spin_unlock_irq(l)  .                   //IRQs re-hanbled
                      spin_lock_irq(l);   //takes l
                      .
                      .
                      spin_unlock_irq(l);
 . <----------------- .
 .

Note that whether spin_lock_irq() or spin_lock_irqsave() should be
used, is completely independent from this, and it must be decided
according to whether IRQs are disabled already or not when taking the
lock. And (quite obviously, but since wre're here) it is fine to mix
spin_lock_irq() and spin_lock_irqsave(), as they both disable
interrupts, which is what matters.

So, now, if we know for sure that a lock is _never_ever_ever_ taken
from interrupt context, can we mix spin_lock() and spin_lock_irq() on
it (for whatever reason)? Well, as far as the above reasoning is
concerned, yes.

In fact, the deadlock arises because IRQs interrupt asynchronously what
the CPU is doing, and that can happen when the CPU has taken the lock
already. But if the 'asynchronous' part goes away, we really don't care
whether a lock is take at time t1 with IRQ enabled, and at time t2 with
IRQ disabled, don't you think?

Well, here it is where what the comment inside check_lock() describes
comes into play. As quoted by Qaun already:

>  2). Comment inside check_lock(), 
> we partition locks into IRQ-safe (always held with IRQs disabled) and
> IRQ-unsafe (always held with IRQs enabled) types. The convention for
> every lock must be consistently observed else we can deadlock in
> IRQ-context rendezvous functions (__a rendezvous which gets every CPU
> into IRQ context before any CPU is released from the rendezvous__).
> If we can mix IRQ-disabled and IRQ-enabled callers, the following can
> happen:
>  * Lock is held by CPU A, with IRQs enabled
>  * CPU B is spinning on same lock, with IRQs disabled
>  * Rendezvous starts -- CPU A takes interrupt and enters rendezbous
> spin
>  * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never
> exit
>                the rendezvous, and will hence never release the lock.
> 
> To guard against this subtle bug we latch the IRQ safety of every
> spinlock in the system, on first use.
> 
This is a very good safety measure, but it can be quite a restriction
in some (luckily limited) cases. And that's why it is possible --
although absolutely discouraged-- to relax it. See, for an example of
this, the comment in start_secondary(), in xen/arch/x86/smpboot.c:

    ...
    /*
     * Just as during early bootstrap, it is convenient here to disable
     * spinlock checking while we have IRQs disabled. This allows us to
     * acquire IRQ-unsafe locks when it would otherwise be disallowed.
     * 
     * It is safe because the race we are usually trying to avoid involves
     * a group of CPUs rendezvousing in an IPI handler, where one cannot
     * join because it is spinning with IRQs disabled waiting to acquire a
     * lock held by another in the rendezvous group (the lock must be an
     * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and
     * hence had IRQs enabled). This is a deadlock scenario.
     * 
     * However, no CPU can be involved in rendezvous until it is online,
     * hence no such group can be waiting for this CPU until it is
     * visible in cpu_online_map. Hence such a deadlock is not possible.
     */
    spin_debug_disable();
    ...

Just FTR, at least as far as I can remember, Linux does not enforce
anything like that.

Hope this clarifies things.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11 10:35       ` Dario Faggioli
@ 2016-03-11 12:36         ` Xu, Quan
  2016-03-11 13:58           ` Dario Faggioli
  2016-03-11 14:49           ` Meng Xu
  2016-03-11 14:41         ` Meng Xu
  1 sibling, 2 replies; 20+ messages in thread
From: Xu, Quan @ 2016-03-11 12:36 UTC (permalink / raw)
  To: Dario Faggioli, Meng Xu; +Cc: Jan Beulich, Suravee Suthikulpanit, xen-devel

On March 11, 2016 6:36pm, <Dario Faggioli> wrote:
> On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote:
> > On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote:
> > >
> > > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote:
> > > >
> > > > pcidevs_lock should be held with interrupt enabled. However there
> > > > remains an exception in AMD IOMMU code, where the lock is acquired
> > > > with interrupt disabled. This inconsistency might lead to
> > > > deadlock.
> > > Why will this inconsistency lead to deadlock?
> > > I understand the difference between spin_lock_irqsave(), which
> > > disable interrupt, and spin_lock(), which allows interrupt, however,
> > > I didn't get why misuse the
> > > spin_lock_irqsave() at the place of spin_lock() could potentially
> > > lead to a deadlock?
> >
> >  1).As Jan mentioned, The implication from disabling interrupts while
> > acquiring a lock is that the lock is also being acquired by some
> > interrupt handler.
> >   If you mix acquire types, the one not disabling interrupts is prone
> > to be interrupted, and the interrupt trying to get hold of the lock
> > the same CPU already owns.
> >
> The key issue is whether or not a lock can be acquired from IRQ context (i.e., in
> an interrupt handler, or a function called by that, as a result of an interrupt
> occurring).
> 
> For lock that can, IRQ disabling variants must be used *everywhere* the lock is
> taken (so, e.g., not only when it is actually taken from IRQ context, just
> *always*!).
> 
> If that rule is not honored, we are ok if the lock is taken on CPUA, and the
> interrupt handled on CPUB:
> 
>   CPUA              CPUB
>   .                 .
>   .                 .
>   spin_lock(l)      .
>   .                 .
>   .                 . <-- Interrupt!
>   .                        .
>   .                       spin_lock(l); //spins on l
>   .                       x             //spins on l
>   .                       x             //spins on l
>   spin_unlock(l)          .             //takes l
>   .                       .
>   .                       spin_unlock(l);
>   .                 . <-- .
>   .                 .
> 
> but the following can happen, if the interrupt is delivered to the CPU that has
> already taken the lock:
> 
>      CPU
>      .
>      .
> [1]  spin_lock(l);       //takes l
>      .
>      . <-- Interrupt!
>            .
>            spin_lock(l) // CPU deadlocks
> 
> If, in the latter case, spin_lock_irq(l) were used at [1], things would have been
> fine, as the interrupt wouldn't have occurred until l weren't
> released:
> 
>   CPU
>   .
>   .
>   spin_lock_irq(l)                        //takes l
>   .
>   . <---------------- Interrupt!
>   .                   -                   //IRQs are disabled
>   .                   -                   //IRQs are disabled
>   .                   -                   //IRQs are disabled
>   spin_unlock_irq(l)  .                   //IRQs re-hanbled
>                       spin_lock_irq(l);   //takes l
>                       .
>                       .
>                       spin_unlock_irq(l);
>  . <----------------- .
>  .
> 
> Note that whether spin_lock_irq() or spin_lock_irqsave() should be used, is
> completely independent from this, and it must be decided according to whether
> IRQs are disabled already or not when taking the lock. And (quite obviously, but
> since wre're here) it is fine to mix
> spin_lock_irq() and spin_lock_irqsave(), as they both disable interrupts, which is
> what matters.
> 
> So, now, if we know for sure that a lock is _never_ever_ever_ taken from
> interrupt context, can we mix spin_lock() and spin_lock_irq() on it (for whatever
> reason)? Well, as far as the above reasoning is concerned, yes.
> 
I appreciate Dario's explanation.
btw, be careful of spin_lock_irq(), which includes 'ASSERT(local_irq_is_enabled()'..

> In fact, the deadlock arises because IRQs interrupt asynchronously what the
> CPU is doing, and that can happen when the CPU has taken the lock already. But
> if the 'asynchronous' part goes away, we really don't care whether a lock is take
> at time t1 with IRQ enabled, and at time t2 with IRQ disabled, don't you think?
> 

Yes.
Consistency may be helpful to avoid some easy-to-avoid lock errors.
Moreover, without my fix, I think it would not lead dead lock, as the pcidevs_lock is not being taken
In IRQ context. Right? 


For deadlock, I think the key problems are:
  - A lock can be acquired from IRQ context
  -The interrupt is delivered to the _same_CPU_ that already holds the lock.

Quan

> Well, here it is where what the comment inside check_lock() describes comes
> into play. As quoted by Qaun already:
> 
> >  2). Comment inside check_lock(),
> > we partition locks into IRQ-safe (always held with IRQs disabled) and
> > IRQ-unsafe (always held with IRQs enabled) types. The convention for
> > every lock must be consistently observed else we can deadlock in
> > IRQ-context rendezvous functions (__a rendezvous which gets every CPU
> > into IRQ context before any CPU is released from the rendezvous__).
> > If we can mix IRQ-disabled and IRQ-enabled callers, the following can
> > happen:
> >  * Lock is held by CPU A, with IRQs enabled
> >  * CPU B is spinning on same lock, with IRQs disabled
> >  * Rendezvous starts -- CPU A takes interrupt and enters rendezbous
> > spin
> >  * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never
> > exit
> >                the rendezvous, and will hence never release the lock.
> >
> > To guard against this subtle bug we latch the IRQ safety of every
> > spinlock in the system, on first use.
> >
> This is a very good safety measure, but it can be quite a restriction in some
> (luckily limited) cases. And that's why it is possible -- although absolutely
> discouraged-- to relax it. See, for an example of this, the comment in
> start_secondary(), in xen/arch/x86/smpboot.c:
> 
>     ...
>     /*
>      * Just as during early bootstrap, it is convenient here to disable
>      * spinlock checking while we have IRQs disabled. This allows us to
>      * acquire IRQ-unsafe locks when it would otherwise be disallowed.
>      *
>      * It is safe because the race we are usually trying to avoid involves
>      * a group of CPUs rendezvousing in an IPI handler, where one cannot
>      * join because it is spinning with IRQs disabled waiting to acquire a
>      * lock held by another in the rendezvous group (the lock must be an
>      * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and
>      * hence had IRQs enabled). This is a deadlock scenario.
>      *
>      * However, no CPU can be involved in rendezvous until it is online,
>      * hence no such group can be waiting for this CPU until it is
>      * visible in cpu_online_map. Hence such a deadlock is not possible.
>      */
>     spin_debug_disable();
>     ...
> 
> Just FTR, at least as far as I can remember, Linux does not enforce anything like
> that.
> 
> Hope this clarifies things.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11 12:36         ` Xu, Quan
@ 2016-03-11 13:58           ` Dario Faggioli
  2016-03-11 14:49           ` Meng Xu
  1 sibling, 0 replies; 20+ messages in thread
From: Dario Faggioli @ 2016-03-11 13:58 UTC (permalink / raw)
  To: Xu, Quan, Meng Xu; +Cc: Jan Beulich, Suravee Suthikulpanit, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2376 bytes --]

On Fri, 2016-03-11 at 12:36 +0000, Xu, Quan wrote:
> On March 11, 2016 6:36pm, <Dario Faggioli> wrote:
> > On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote:
> > > 
> > So, now, if we know for sure that a lock is _never_ever_ever_ taken
> > from
> > interrupt context, can we mix spin_lock() and spin_lock_irq() on it
> > (for whatever
> > reason)? Well, as far as the above reasoning is concerned, yes.
> > 
> I appreciate Dario's explanation.
> btw, be careful of spin_lock_irq(), which includes
> 'ASSERT(local_irq_is_enabled()'..
> 
It does. What about it? That is exactly what marks the difference
between spin_lock_irq() and spin_lock_irqsave(). In fact, the former
should not be used if interrupts are already disabled because then,
when calling the corresponding spin_unlock_irq(), they'd be re-enabled
while instead they shouldn't.

But as said, from the point of view of preventing deadlock, for locks
taken both from inside and outside IRQ context, they're equivalent.

> > 
> > In fact, the deadlock arises because IRQs interrupt asynchronously
> > what the
> > CPU is doing, and that can happen when the CPU has taken the lock
> > already. But
> > if the 'asynchronous' part goes away, we really don't care whether
> > a lock is take
> > at time t1 with IRQ enabled, and at time t2 with IRQ disabled,
> > don't you think?
> > 
> Yes.
> Consistency may be helpful to avoid some easy-to-avoid lock errors.
> Moreover, without my fix, I think it would not lead dead lock, as the
> pcidevs_lock is not being taken
> In IRQ context. Right? 
> 
> 
> For deadlock, I think the key problems are:
>   - A lock can be acquired from IRQ context
>   -The interrupt is delivered to the _same_CPU_ that already holds
> the lock.
> 
In your case, pcidevs_lock is certainly not being taken from both
inside and outside IRQ context. If it where, using spin_lock() --as it
happens basically everywhere in the code-- would be wrong, and using
spin_lock_irq[save]() --as it happens in the only case you're patching-
- would be what would be right!

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11 10:35       ` Dario Faggioli
  2016-03-11 12:36         ` Xu, Quan
@ 2016-03-11 14:41         ` Meng Xu
  2016-03-11 16:12           ` Dario Faggioli
  1 sibling, 1 reply; 20+ messages in thread
From: Meng Xu @ 2016-03-11 14:41 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Suravee Suthikulpanit, Jan Beulich, Xu, Quan, xen-devel

Hi Quan and Dario,

On Fri, Mar 11, 2016 at 5:35 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote:
>> On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote:
>> >
>> > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote:
>> > >
>> > > pcidevs_lock should be held with interrupt enabled. However there
>> > > remains an exception in AMD IOMMU code, where the lock is
>> > > acquired
>> > > with interrupt disabled. This inconsistency might lead to
>> > > deadlock.
>> > Why will this inconsistency lead to deadlock?
>> > I understand the difference between spin_lock_irqsave(), which
>> > disable interrupt,
>> > and spin_lock(), which allows interrupt, however, I didn't get why
>> > misuse the
>> > spin_lock_irqsave() at the place of spin_lock() could potentially
>> > lead to a
>> > deadlock?
>>
>>  1).As Jan mentioned, The implication from disabling interrupts while
>> acquiring a lock is that the lock is also being acquired by some
>> interrupt handler.
>>   If you mix acquire types, the one not disabling interrupts is prone
>> to be interrupted, and the interrupt trying to get hold of the lock
>> the same CPU already owns.
>>
> The key issue is whether or not a lock can be acquired from IRQ context
> (i.e., in an interrupt handler, or a function called by that, as a
> result of an interrupt occurring).
>
> For lock that can, IRQ disabling variants must be used *everywhere* the
> lock is taken (so, e.g., not only when it is actually taken from IRQ
> context, just *always*!).
>
> If that rule is not honored, we are ok if the lock is taken on CPUA,
> and the interrupt handled on CPUB:
>
>   CPUA              CPUB
>   .                 .
>   .                 .
>   spin_lock(l)      .
>   .                 .
>   .                 . <-- Interrupt!
>   .                        .
>   .                       spin_lock(l); //spins on l
>   .                       x             //spins on l
>   .                       x             //spins on l
>   spin_unlock(l)          .             //takes l
>   .                       .
>   .                       spin_unlock(l);
>   .                 . <-- .
>   .                 .
>
> but the following can happen, if the interrupt is delivered to the CPU
> that has already taken the lock:
>
>      CPU
>      .
>      .
> [1]  spin_lock(l);       //takes l
>      .
>      . <-- Interrupt!
>            .
>            spin_lock(l) // CPU deadlocks
>
> If, in the latter case, spin_lock_irq(l) were used at [1], things would
> have been fine, as the interrupt wouldn't have occurred until l weren't
> released:
>
>   CPU
>   .
>   .
>   spin_lock_irq(l)                        //takes l
>   .
>   . <---------------- Interrupt!
>   .                   -                   //IRQs are disabled
>   .                   -                   //IRQs are disabled
>   .                   -                   //IRQs are disabled
>   spin_unlock_irq(l)  .                   //IRQs re-hanbled
>                       spin_lock_irq(l);   //takes l
>                       .
>                       .
>                       spin_unlock_irq(l);
>  . <----------------- .
>  .
>
> Note that whether spin_lock_irq() or spin_lock_irqsave() should be
> used, is completely independent from this, and it must be decided
> according to whether IRQs are disabled already or not when taking the
> lock. And (quite obviously, but since wre're here) it is fine to mix
> spin_lock_irq() and spin_lock_irqsave(), as they both disable
> interrupts, which is what matters.
>
> So, now, if we know for sure that a lock is _never_ever_ever_ taken
> from interrupt context, can we mix spin_lock() and spin_lock_irq() on
> it (for whatever reason)? Well, as far as the above reasoning is
> concerned, yes.
>
> In fact, the deadlock arises because IRQs interrupt asynchronously what
> the CPU is doing, and that can happen when the CPU has taken the lock
> already. But if the 'asynchronous' part goes away, we really don't care
> whether a lock is take at time t1 with IRQ enabled, and at time t2 with
> IRQ disabled, don't you think?
>
> Well, here it is where what the comment inside check_lock() describes
> comes into play. As quoted by Qaun already:
>
>>  2). Comment inside check_lock(),
>> we partition locks into IRQ-safe (always held with IRQs disabled) and
>> IRQ-unsafe (always held with IRQs enabled) types. The convention for
>> every lock must be consistently observed else we can deadlock in
>> IRQ-context rendezvous functions (__a rendezvous which gets every CPU
>> into IRQ context before any CPU is released from the rendezvous__).
>> If we can mix IRQ-disabled and IRQ-enabled callers, the following can
>> happen:
>>  * Lock is held by CPU A, with IRQs enabled
>>  * CPU B is spinning on same lock, with IRQs disabled
>>  * Rendezvous starts -- CPU A takes interrupt and enters rendezbous
>> spin
>>  * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never
>> exit
>>                the rendezvous, and will hence never release the lock.
>>
>> To guard against this subtle bug we latch the IRQ safety of every
>> spinlock in the system, on first use.
>>
> This is a very good safety measure, but it can be quite a restriction
> in some (luckily limited) cases. And that's why it is possible --
> although absolutely discouraged-- to relax it. See, for an example of
> this, the comment in start_secondary(), in xen/arch/x86/smpboot.c:
>
>     ...
>     /*
>      * Just as during early bootstrap, it is convenient here to disable
>      * spinlock checking while we have IRQs disabled. This allows us to
>      * acquire IRQ-unsafe locks when it would otherwise be disallowed.
>      *
>      * It is safe because the race we are usually trying to avoid involves
>      * a group of CPUs rendezvousing in an IPI handler, where one cannot
>      * join because it is spinning with IRQs disabled waiting to acquire a
>      * lock held by another in the rendezvous group (the lock must be an
>      * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and
>      * hence had IRQs enabled). This is a deadlock scenario.
>      *
>      * However, no CPU can be involved in rendezvous until it is online,
>      * hence no such group can be waiting for this CPU until it is
>      * visible in cpu_online_map. Hence such a deadlock is not possible.
>      */
>     spin_debug_disable();
>     ...
>
> Just FTR, at least as far as I can remember, Linux does not enforce
> anything like that.
>
> Hope this clarifies things.

Thank you very much for your explanation and education! They are
really helpful! :-D

Let me summarize: ;-)
|                                                      | spin_lock |
spin_lock_irq | spin_lock_irqsave
| Can run in irq context?                  | No            |  Yes
        | Yes
| Can run in irq disabled context?   | No            |  No                | Yes

Why deadlock may occur if we mix the spin_lock and spin_lock_irq(save)?
If we mix the spin_lock and spin_lock_irq(save), and a group of CPUs
rendezvousing in an IPI handler, we will have deadlock. Because the
CPU A that takes spin_lock will wait for the rendezvousing condition
to be satisfied, while the CPU B that takes th spin_lock_irq(save)
will not enter into the rendezvousing condition (since it disable the
interrupt). Then,
CPU A waits for CPU B to handle the IPI to get out of the
rendezvousing condition (kind of synchrous point), which won't until
it gets the spin_lock.
CPU B waits for CPU A to release the spin_lock, which won't until it
get out of the rendezvousing condition;

Are my understanding and summary correct?

Thanks and Best Regards,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11 12:36         ` Xu, Quan
  2016-03-11 13:58           ` Dario Faggioli
@ 2016-03-11 14:49           ` Meng Xu
  2016-03-11 15:55             ` Dario Faggioli
  1 sibling, 1 reply; 20+ messages in thread
From: Meng Xu @ 2016-03-11 14:49 UTC (permalink / raw)
  To: Xu, Quan; +Cc: Dario Faggioli, Jan Beulich, Suravee Suthikulpanit, xen-devel

>
>> In fact, the deadlock arises because IRQs interrupt asynchronously what the
>> CPU is doing, and that can happen when the CPU has taken the lock already. But
>> if the 'asynchronous' part goes away, we really don't care whether a lock is take
>> at time t1 with IRQ enabled, and at time t2 with IRQ disabled, don't you think?
>>
>
> Yes.
> Consistency may be helpful to avoid some easy-to-avoid lock errors.
> Moreover, without my fix, I think it would not lead dead lock, as the pcidevs_lock is not being taken
> In IRQ context. Right?

I think without your fix, the deadlock may still happen due to the
rendezvous condition.
           CPU A                                |    CPU B
     | CPU C
Step 1| spin_lock                           |
Step 2|                                           | spin_lock_irq           |
Step 3|                                            | wait for A to unlock |
Step 4|
              | send rendezvous IPI to A and B
Step 5| receive IPI                             | wait for A to unlock |
Step 6| wait for B to handle the IPI    | wait for A to unlock |
Step 7| spin_unlock


Deadlock occurs at Step 6, IMO.

Unless we can prove that rendezvous won't happen while
spin_lock_irqsave is taken, we have the deadlock hazard.

Actually, I'm not quite sure when the rendezvous condition may happen
in  this situation. So probably, in reality, we don't have the
rendezvous condition.

>
>
> For deadlock, I think the key problems are:
>   - A lock can be acquired from IRQ context
>   -The interrupt is delivered to the _same_CPU_ that already holds the lock.
>
>

This is one type of deadlock, not the one due to rendezvous condition,
I think. :-)

Thanks and Best Regards,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11 14:49           ` Meng Xu
@ 2016-03-11 15:55             ` Dario Faggioli
  2016-03-11 17:17               ` Meng Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2016-03-11 15:55 UTC (permalink / raw)
  To: Meng Xu, Xu, Quan; +Cc: Suravee Suthikulpanit, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4599 bytes --]

On Fri, 2016-03-11 at 09:49 -0500, Meng Xu wrote:
> > Yes.
> > Consistency may be helpful to avoid some easy-to-avoid lock errors.
> > Moreover, without my fix, I think it would not lead dead lock, as
> > the pcidevs_lock is not being taken
> > In IRQ context. Right?
> I think without your fix, the deadlock may still happen due to the
> rendezvous condition.
>            CPU A                                |    CPU B
>      | CPU C
> Step 1| spin_lock                           |
> Step 2|                                           |
> spin_lock_irq           |
> Step 3|                                            | wait for A to
> unlock |
> Step 4|
>               | send rendezvous IPI to A and B
> Step 5| receive IPI                             | wait for A to
> unlock |
> Step 6| wait for B to handle the IPI    | wait for A to unlock |
> Step 7| spin_unlock
> 
> 
> Deadlock occurs at Step 6, IMO.
> 
> Unless we can prove that rendezvous won't happen while
> spin_lock_irqsave is taken, we have the deadlock hazard.
> 
Yes. But, in the case of Quan's patch (without it, I mean), have you
seen where in the code it is that we use spin_lock_irqsave()?

It's inside a function that is called during Xen boot, whose callchain
starts with iommu_setup(), from __start_xen(). Here's a (big, sorry)
code snippet of what is around iommu_setup():

    ...
    init_idle_domain();

    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
                                           &this_cpu(stubs).mfn);
    BUG_ON(!this_cpu(stubs.addr));

    trap_init();
    rcu_init();

    early_time_init();

    arch_init_memory();

    alternative_instructions();

    local_irq_enable();

    pt_pci_init();

    vesa_mtrr_init();

    acpi_mmcfg_init();

    early_msi_init();

    iommu_setup();    /* setup iommu if available */

    smp_prepare_cpus(max_cpus);

    spin_debug_enable();

    /*
     * Initialise higher-level timer functions. We do this fairly late
     * (after interrupts got enabled) because the time bases and scale
     * factors need to be updated regularly.
     */
    init_xen_time();
    initialize_keytable();
    console_init_postirq();

    system_state = SYS_STATE_smp_boot;
    do_presmp_initcalls();
    for_each_present_cpu ( i )
    {
        /* Set up cpu_to_node[]. */
        srat_detect_node(i);
        /* Set up node_to_cpumask based on cpu_to_node[]. */
        numa_add_cpu(i);

        if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
        {
            int ret = cpu_up(i);
            if ( ret != 0 )
                printk("Failed to bring up CPU %u (error %d)\n", i, ret);
        }
    }
    printk("Brought up %ld CPUs\n", (long)num_online_cpus());
    ...

As you can see, it is only *after* iommu_setup() that we call functions
like smp_prepare_cpus(), do_presmp_initcalls(), and then the loop that
waits for all the present CPUs to come online.

What that means is that, at iommu_setup() time, there still is only one
CPU online, and there is not much chances that one single CPU deadlocks
in a rendezvous!

Honestly, the biggest issue that I think Quan's patch solves, is that
if we ever want/manage to move spin_debug_enable() up above it, then
the BUG_ON in check_lock() would trigger the first time that
pcidevs_lock would be taken with interrupts enabled.

Until then, code is technically fine, and, as a matter of fact, I think
that removing the locking from that particular instance would be an
equally effective fix!

All that being said, consistency is indeed important, and for the sake
of it and for other reasons too, even if, strictly speaking, there
isn't any actual buggy behavior to be fixed here, and it is worthwhile
conforming to a locking pattern that is consistent with the rules that
we sat ourselves, unless there's specific reasons not to.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11 14:41         ` Meng Xu
@ 2016-03-11 16:12           ` Dario Faggioli
  0 siblings, 0 replies; 20+ messages in thread
From: Dario Faggioli @ 2016-03-11 16:12 UTC (permalink / raw)
  To: Meng Xu; +Cc: Xu, Quan, Jan Beulich, Suravee Suthikulpanit, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3214 bytes --]

On Fri, 2016-03-11 at 09:41 -0500, Meng Xu wrote:
> Thank you very much for your explanation and education! They are
> really helpful! :-D
> 
> Let me summarize: ;-)
> > 
> >                                                      | spin_lock |
> spin_lock_irq | spin_lock_irqsave
> > 
> > Can run in irq context?                  | No            |  Yes
>         | Yes
> > 
> > Can run in irq disabled context?   |
> > No            |  No                | Yes
>
The table came out mangled, at least in my mail reader. For what I can
see, I think I see the point you're trying to make, and it's hard to
say whether the table itself is right or wrong...

Probably, a table like this, is just not the best way with which to try
to represent things.

For instance, you seem to be saying that spin_lock() can't be used if
interrupts are disabled, which is not true.

For instance, it is perfectly fine to do the following:

    CPU:
    .
    spin_lock_irq(l1);
    .
    .
[1] spin_lock(l2);
    .
    .
[2] spin_unlock(l2);
    .
    .
    spin_unlock_irq(l1);
    .

Even if l2 is also taken in an interrupt handler. In fact, what we want
is make sure that such an interrupt handler that may take l2, does not
interrupt CPU in the [1]--[2] time frame... But that can't happen
because interrupts are already disabled, so just using spin_lock() is
ok, and should be done, as that's cheaper than spin_lock_irqsave().

Fact is, what really matters is whether or not a lock is taken both
inside and outside of IRQ context. As a matter of fact, it is mostly
the case that, if a lock is ever taken inside an interrupt handler,
then spin_lock_irqsave() is what you want... but this does not justify
coming up with a table like the one above and claiming it to be
general.

> Why deadlock may occur if we mix the spin_lock and
> spin_lock_irq(save)?
> If we mix the spin_lock and spin_lock_irq(save), and a group of CPUs
> rendezvousing in an IPI handler, we will have deadlock. Because the
> CPU A that takes spin_lock will wait for the rendezvousing condition
> to be satisfied, while the CPU B that takes th spin_lock_irq(save)
> will not enter into the rendezvousing condition (since it disable the
> interrupt). Then,
> CPU A waits for CPU B to handle the IPI to get out of the
> rendezvousing condition (kind of synchrous point), which won't until
> it gets the spin_lock.
> CPU B waits for CPU A to release the spin_lock, which won't until it
> get out of the rendezvousing condition;
> 
> Are my understanding and summary correct?
> 
Yes, this looks a decent explanation of the rendezvous-related spinlock
problem... Although I prefer the wording that we do have already in
check_lock() and in start_secondary(). :-D

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization
  2016-03-11 15:55             ` Dario Faggioli
@ 2016-03-11 17:17               ` Meng Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Meng Xu @ 2016-03-11 17:17 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Suravee Suthikulpanit, Jan Beulich, Xu, Quan, xen-devel

On Fri, Mar 11, 2016 at 10:55 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Fri, 2016-03-11 at 09:49 -0500, Meng Xu wrote:
>> > Yes.
>> > Consistency may be helpful to avoid some easy-to-avoid lock errors.
>> > Moreover, without my fix, I think it would not lead dead lock, as
>> > the pcidevs_lock is not being taken
>> > In IRQ context. Right?
>> I think without your fix, the deadlock may still happen due to the
>> rendezvous condition.
>>            CPU A                                |    CPU B
>>      | CPU C
>> Step 1| spin_lock                           |
>> Step 2|                                           |
>> spin_lock_irq           |
>> Step 3|                                            | wait for A to
>> unlock |
>> Step 4|
>>               | send rendezvous IPI to A and B
>> Step 5| receive IPI                             | wait for A to
>> unlock |
>> Step 6| wait for B to handle the IPI    | wait for A to unlock |
>> Step 7| spin_unlock
>>
>>
>> Deadlock occurs at Step 6, IMO.
>>
>> Unless we can prove that rendezvous won't happen while
>> spin_lock_irqsave is taken, we have the deadlock hazard.
>>
> Yes. But, in the case of Quan's patch (without it, I mean), have you
> seen where in the code it is that we use spin_lock_irqsave()?
>
> It's inside a function that is called during Xen boot, whose callchain
> starts with iommu_setup(), from __start_xen(). Here's a (big, sorry)
> code snippet of what is around iommu_setup():
>
>     ...
>     init_idle_domain();
>
>     this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
>                                            &this_cpu(stubs).mfn);
>     BUG_ON(!this_cpu(stubs.addr));
>
>     trap_init();
>     rcu_init();
>
>     early_time_init();
>
>     arch_init_memory();
>
>     alternative_instructions();
>
>     local_irq_enable();
>
>     pt_pci_init();
>
>     vesa_mtrr_init();
>
>     acpi_mmcfg_init();
>
>     early_msi_init();
>
>     iommu_setup();    /* setup iommu if available */
>
>     smp_prepare_cpus(max_cpus);
>
>     spin_debug_enable();
>
>     /*
>      * Initialise higher-level timer functions. We do this fairly late
>      * (after interrupts got enabled) because the time bases and scale
>      * factors need to be updated regularly.
>      */
>     init_xen_time();
>     initialize_keytable();
>     console_init_postirq();
>
>     system_state = SYS_STATE_smp_boot;
>     do_presmp_initcalls();
>     for_each_present_cpu ( i )
>     {
>         /* Set up cpu_to_node[]. */
>         srat_detect_node(i);
>         /* Set up node_to_cpumask based on cpu_to_node[]. */
>         numa_add_cpu(i);
>
>         if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
>         {
>             int ret = cpu_up(i);
>             if ( ret != 0 )
>                 printk("Failed to bring up CPU %u (error %d)\n", i, ret);
>         }
>     }
>     printk("Brought up %ld CPUs\n", (long)num_online_cpus());
>     ...
>
> As you can see, it is only *after* iommu_setup() that we call functions
> like smp_prepare_cpus(), do_presmp_initcalls(), and then the loop that
> waits for all the present CPUs to come online.
>
> What that means is that, at iommu_setup() time, there still is only one
> CPU online, and there is not much chances that one single CPU deadlocks
> in a rendezvous!
>
> Honestly, the biggest issue that I think Quan's patch solves, is that
> if we ever want/manage to move spin_debug_enable() up above it, then
> the BUG_ON in check_lock() would trigger the first time that
> pcidevs_lock would be taken with interrupts enabled.

Right! I got it now.. :-)
The lock is initialized as SPIN_LOCK_UNLOCKED.
check_lock() will trigger the BUG_ON when it is used in a interrupt
disabled situation.

>
> Until then, code is technically fine, and, as a matter of fact, I think
> that removing the locking from that particular instance would be an
> equally effective fix!
>
> All that being said, consistency is indeed important, and for the sake
> of it and for other reasons too, even if, strictly speaking, there
> isn't any actual buggy behavior to be fixed here, and it is worthwhile
> conforming to a locking pattern that is consistent with the rules that
> we sat ourselves, unless there's specific reasons not to.

I see. Even though no deadlock may happen, consistency can be the
reason to modify. :-)
But it's good to know the real reason of making the change, which
could be avoiding the deadlock, be consistency, or both.
It seems to me that this is more for consistency, and avoid the
potential deadlock (although not so sure if it will eventually happen,
it could be a hazard.).

Thank you, Dario and Quan, very much for your explanation! :-) They
are really helpful! :-D

BTW, I'm sorry for the messed format of the table. Let me reformat it here:

Condition 1 (C1): Can run in irq context?
      | spin_lock | spin_lock_irq | spin_lock_irqsave
C1 | No           |  Yes              | Yes
C2 |  No            |  No             | Yes

It may be too fast (incorrect) to get the generalized conclusion, but
it should give some overview of the differences of these three locks.
Just for reference later. ;-)

Thanks,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-03-11 17:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-09 13:17 [PATCH v3 0/2] Make the pcidevs_lock a recursive one Quan Xu
2016-03-09 13:17 ` [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
2016-03-09 14:59   ` Dario Faggioli
2016-03-10  6:12     ` Xu, Quan
2016-03-11  3:24   ` Meng Xu
2016-03-11  6:54     ` Xu, Quan
2016-03-11 10:35       ` Dario Faggioli
2016-03-11 12:36         ` Xu, Quan
2016-03-11 13:58           ` Dario Faggioli
2016-03-11 14:49           ` Meng Xu
2016-03-11 15:55             ` Dario Faggioli
2016-03-11 17:17               ` Meng Xu
2016-03-11 14:41         ` Meng Xu
2016-03-11 16:12           ` Dario Faggioli
2016-03-09 13:17 ` [PATCH v3 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
2016-03-09 17:45   ` Dario Faggioli
2016-03-10  1:21     ` Xu, Quan
2016-03-10  9:52   ` Jan Beulich
2016-03-10 11:27     ` Xu, Quan
2016-03-10 13:06       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).