xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Make the pcidevs_lock a recursive one
@ 2016-03-09  3:08 Quan Xu
  2016-03-09  3:08 ` [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
  2016-03-09  3:08 ` [PATCH v2 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
  0 siblings, 2 replies; 16+ messages in thread
From: Quan Xu @ 2016-03-09  3:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, Quan Xu, Keir Fraser

This patch set makes the pcidevs_lock a recursive one. It is a prereq
patch set for Patch:'VT-d Device-TLB flush issue', as the pcidevs_lock
may be recursively held for hiding the ATS device, when VT-d Device-TLB
flush timed out.

In detail:
1. Fix a bug found in AMD IOMMU initialization.

  When iommu_setup() is called in __start_xen(), interrupts have
  already been enabled, and nothing disables (without reenabling)
  them again in the path that leads to calling
  set_iommu_interrupt_handler(). There's therefore no risk of them
  being disabled and needing remaining so, and hence no need for
  saving and restoring the flags. This means that, even if interrupt
  would need to be disabled when taking pcidevs_lock, spin_lock_irq()
  and spin_unlock_irq() could be used.

  Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls
  to pci_get_pdev(), which does not require interrupts to be disabled
  during its execution. In fact, pcidevs_lock is always (except for
  this case) taken without disabling interrupts, and disabling them in
  this case would make the BUG_ON() in check_lock() trigger, if it
  wasn't for the fact that spinlock debugging checks are still
  disabled, during initialization, when iommu_setup() (which then end
  up calling set_iommu_interrupt_handler()) is called.

  In order to fix this, we can use just plain spin_lock() and spin_unlock(),
  instead of spin_lock_irqsave() and spin_unlock_irqrestore().

2.  Make the pcidevs_lock a recursive one.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Feng Wu <feng.wu@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>

--Changes in v2:
 * Enhance patch1/2 changelog.
 * Rebase against 1bd52e1fd66c4.


Quan Xu (2):
  IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  IOMMU/spinlock: Make the pcidevs_lock a recursive one

 xen/arch/x86/domctl.c                       |  8 +--
 xen/arch/x86/hvm/vmsi.c                     |  4 +-
 xen/arch/x86/irq.c                          |  8 +--
 xen/arch/x86/msi.c                          | 16 ++---
 xen/arch/x86/pci.c                          |  4 +-
 xen/arch/x86/physdev.c                      | 16 ++---
 xen/common/sysctl.c                         |  4 +-
 xen/drivers/passthrough/amd/iommu_init.c    |  9 ++-
 xen/drivers/passthrough/amd/iommu_map.c     |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  4 +-
 xen/drivers/passthrough/pci.c               | 96 +++++++++++++++++------------
 xen/drivers/passthrough/vtd/intremap.c      |  2 +-
 xen/drivers/passthrough/vtd/iommu.c         | 14 ++---
 xen/drivers/video/vga.c                     |  4 +-
 xen/include/xen/pci.h                       |  5 +-
 15 files changed, 109 insertions(+), 87 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09  3:08 [PATCH v2 0/2] Make the pcidevs_lock a recursive one Quan Xu
@ 2016-03-09  3:08 ` Quan Xu
  2016-03-09  5:19   ` Tian, Kevin
  2016-03-09  3:08 ` [PATCH v2 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
  1 sibling, 1 reply; 16+ messages in thread
From: Quan Xu @ 2016-03-09  3:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Dario Faggioli, Jan Beulich, Suravee Suthikulpanit, Quan Xu

When iommu_setup() is called in __start_xen(), interrupts have
already been enabled, and nothing disables (without reenabling)
them again in the path that leads to calling
set_iommu_interrupt_handler(). There's therefore no risk of them
being disabled and needing remaining so, and hence no need for
saving and restoring the flags. This means that, even if interrupt
would need to be disabled when taking pcidevs_lock, spin_lock_irq()
and spin_unlock_irq() could be used.

Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls
to pci_get_pdev(), which does not require interrupts to be disabled
during its execution. In fact, pcidevs_lock is always (except for
this case) taken without disabling interrupts, and disabling them in
this case would make the BUG_ON() in check_lock() trigger, if it
wasn't for the fact that spinlock debugging checks are still
disabled, during initialization, when iommu_setup() (which then end
up calling set_iommu_interrupt_handler()) is called.

In order to fix this, we can use just plain spin_lock() and spin_unlock(),
instead of spin_lock_irqsave() and spin_unlock_irqrestore().

Signed-off-by: Quan Xu <quan.xu@intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/drivers/passthrough/amd/iommu_init.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index d90a2d2..a400497 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -778,7 +778,6 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
     hw_irq_controller *handler;
-    unsigned long flags;
     u16 control;
 
     irq = create_irq(NUMA_NO_NODE);
@@ -788,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
         return 0;
     }
 
-    spin_lock_irqsave(&pcidevs_lock, flags);
+    spin_lock(&pcidevs_lock);
     iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf),
                                   PCI_DEVFN2(iommu->bdf));
-    spin_unlock_irqrestore(&pcidevs_lock, flags);
+    spin_unlock(&pcidevs_lock);
     if ( !iommu->msi.dev )
     {
         AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-09  3:08 [PATCH v2 0/2] Make the pcidevs_lock a recursive one Quan Xu
  2016-03-09  3:08 ` [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
@ 2016-03-09  3:08 ` Quan Xu
  2016-03-09  5:21   ` Tian, Kevin
  1 sibling, 1 reply; 16+ messages in thread
From: Quan Xu @ 2016-03-09  3:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, Quan Xu, Keir Fraser

Signed-off-by: Quan Xu <quan.xu@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Feng Wu <feng.wu@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/arch/x86/domctl.c                       |  8 +--
 xen/arch/x86/hvm/vmsi.c                     |  4 +-
 xen/arch/x86/irq.c                          |  8 +--
 xen/arch/x86/msi.c                          | 16 ++---
 xen/arch/x86/pci.c                          |  4 +-
 xen/arch/x86/physdev.c                      | 16 ++---
 xen/common/sysctl.c                         |  4 +-
 xen/drivers/passthrough/amd/iommu_init.c    |  8 +--
 xen/drivers/passthrough/amd/iommu_map.c     |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  4 +-
 xen/drivers/passthrough/pci.c               | 96 +++++++++++++++++------------
 xen/drivers/passthrough/vtd/intremap.c      |  2 +-
 xen/drivers/passthrough/vtd/iommu.c         | 14 ++---
 xen/drivers/video/vga.c                     |  4 +-
 xen/include/xen/pci.h                       |  5 +-
 15 files changed, 109 insertions(+), 86 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 55aecdc..b34a295 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -472,9 +472,9 @@ long arch_do_domctl(
         ret = -ESRCH;
         if ( iommu_enabled )
         {
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
             ret = pt_irq_create_bind(d, bind);
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
         }
         if ( ret < 0 )
             printk(XENLOG_G_ERR "pt_irq_create_bind failed (%ld) for dom%d\n",
@@ -497,9 +497,9 @@ long arch_do_domctl(
 
         if ( iommu_enabled )
         {
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
             ret = pt_irq_destroy_bind(d, bind);
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
         }
         if ( ret < 0 )
             printk(XENLOG_G_ERR "pt_irq_destroy_bind failed (%ld) for dom%d\n",
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 499151e..b41aba5 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -388,7 +388,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
     struct msixtbl_entry *entry, *new_entry;
     int r = -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(spin_is_locked(&d->event_lock));
 
     if ( !has_vlapic(d) )
@@ -446,7 +446,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
     struct pci_dev *pdev;
     struct msixtbl_entry *entry;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(spin_is_locked(&d->event_lock));
 
     if ( !has_vlapic(d) )
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 1228568..78e4263 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1960,7 +1960,7 @@ int map_domain_pirq(
         struct pci_dev *pdev;
         unsigned int nr = 0;
 
-        ASSERT(spin_is_locked(&pcidevs_lock));
+        ASSERT(pcidevs_is_locked());
 
         ret = -ENODEV;
         if ( !cpu_has_apic )
@@ -2105,7 +2105,7 @@ int unmap_domain_pirq(struct domain *d, int pirq)
     if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(spin_is_locked(&d->event_lock));
 
     info = pirq_info(d, pirq);
@@ -2231,7 +2231,7 @@ void free_domain_pirqs(struct domain *d)
 {
     int i;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     spin_lock(&d->event_lock);
 
     for ( i = 0; i < d->nr_pirqs; i++ )
@@ -2239,7 +2239,7 @@ void free_domain_pirqs(struct domain *d)
             unmap_domain_pirq(d, i);
 
     spin_unlock(&d->event_lock);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 static void dump_irqs(unsigned char key)
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 5a481f6..a6018a2 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -694,7 +694,7 @@ static int msi_capability_init(struct pci_dev *dev,
     u8 slot = PCI_SLOT(dev->devfn);
     u8 func = PCI_FUNC(dev->devfn);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
     if ( !pos )
         return -ENODEV;
@@ -852,7 +852,7 @@ static int msix_capability_init(struct pci_dev *dev,
     u8 func = PCI_FUNC(dev->devfn);
     bool_t maskall = msix->host_maskall;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     control = pci_conf_read16(seg, bus, slot, func, msix_control_reg(pos));
     /*
@@ -1042,7 +1042,7 @@ static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
     struct pci_dev *pdev;
     struct msi_desc *old_desc;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pdev = pci_get_pdev(msi->seg, msi->bus, msi->devfn);
     if ( !pdev )
         return -ENODEV;
@@ -1103,7 +1103,7 @@ static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc)
     u8 func = PCI_FUNC(msi->devfn);
     struct msi_desc *old_desc;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pdev = pci_get_pdev(msi->seg, msi->bus, msi->devfn);
     pos = pci_find_cap_offset(msi->seg, msi->bus, slot, func, PCI_CAP_ID_MSIX);
     if ( !pdev || !pos )
@@ -1205,7 +1205,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off)
     if ( !pos )
         return -ENODEV;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_pdev(seg, bus, devfn);
     if ( !pdev )
         rc = -ENODEV;
@@ -1224,7 +1224,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off)
         rc = msix_capability_init(pdev, pos, NULL, NULL,
                                   multi_msix_capable(control));
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
@@ -1235,7 +1235,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool_t off)
  */
 int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
 {
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( !use_msi )
         return -EPERM;
@@ -1351,7 +1351,7 @@ int pci_restore_msi_state(struct pci_dev *pdev)
     unsigned int type = 0, pos = 0;
     u16 control = 0;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( !use_msi )
         return -EOPNOTSUPP;
diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c
index 4b87cab..a9decd4 100644
--- a/xen/arch/x86/pci.c
+++ b/xen/arch/x86/pci.c
@@ -88,13 +88,13 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf,
     if ( reg < 64 || reg >= 256 )
         return 0;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
 
     pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
     if ( pdev )
         rc = pci_msi_conf_write_intercept(pdev, reg, size, data);
 
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 57b7800..1cb9b58 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -167,7 +167,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,
         goto free_domain;
     }
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     /* Verify or get pirq. */
     spin_lock(&d->event_lock);
     pirq = domain_irq_to_pirq(d, irq);
@@ -237,7 +237,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,
 
  done:
     spin_unlock(&d->event_lock);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     if ( ret != 0 )
         switch ( type )
         {
@@ -275,11 +275,11 @@ int physdev_unmap_pirq(domid_t domid, int pirq)
             goto free_domain;
     }
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     spin_lock(&d->event_lock);
     ret = unmap_domain_pirq(d, pirq);
     spin_unlock(&d->event_lock);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
  free_domain:
     rcu_unlock_domain(d);
@@ -689,10 +689,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&restore_msi, arg, 1) != 0 )
             break;
 
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         pdev = pci_get_pdev(0, restore_msi.bus, restore_msi.devfn);
         ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV;
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         break;
     }
 
@@ -704,10 +704,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&dev, arg, 1) != 0 )
             break;
 
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
         ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV;
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         break;
     }
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 58162f5..253b7c8 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -426,7 +426,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
                 break;
             }
 
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
             pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
             if ( !pdev )
                 node = XEN_INVALID_DEV;
@@ -434,7 +434,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
                 node = XEN_INVALID_NODE_ID;
             else
                 node = pdev->node;
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
 
             if ( copy_to_guest_offset(ti->nodes, i, &node, 1) )
             {
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index a400497..4536106 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -673,9 +673,9 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
     bus = PCI_BUS(device_id);
     devfn = PCI_DEVFN2(device_id);
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_real_pdev(iommu->seg, bus, devfn);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     if ( pdev )
         guest_iommu_add_ppr_log(pdev->domain, entry);
@@ -787,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
         return 0;
     }
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf),
                                   PCI_DEVFN2(iommu->bdf));
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     if ( !iommu->msi.dev )
     {
         AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 78862c9..9d91dfc 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -593,7 +593,7 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
         hd->arch.paging_mode = level;
         hd->arch.root_table = new_root;
 
-        if ( !spin_is_locked(&pcidevs_lock) )
+        if ( !pcidevs_is_locked() )
             AMD_IOMMU_DEBUG("%s Try to access pdev_list "
                             "without aquiring pcidevs_lock.\n", __func__);
 
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index c1c0b6b..dc3bfab 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -158,7 +158,7 @@ static void amd_iommu_setup_domain_device(
 
     spin_unlock_irqrestore(&iommu->lock, flags);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
          !pci_ats_enabled(iommu->seg, bus, pdev->devfn) )
@@ -345,7 +345,7 @@ void amd_iommu_disable_domain_device(struct domain *domain,
     }
     spin_unlock_irqrestore(&iommu->lock, flags);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( devfn == pdev->devfn &&
          pci_ats_device(iommu->seg, bus, devfn) &&
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 62b311b..e09f7d1 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -48,7 +48,7 @@ struct pci_seg {
     } bus2bridge[MAX_BUSES];
 };
 
-spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;
+static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
 static struct radix_tree_root pci_segments;
 
 static inline struct pci_seg *get_pseg(u16 seg)
@@ -103,6 +103,26 @@ static int pci_segments_iterate(
     return rc;
 }
 
+void pcidevs_lock(void)
+{
+    spin_lock_recursive(&_pcidevs_lock);
+}
+
+void pcidevs_unlock(void)
+{
+    spin_unlock_recursive(&_pcidevs_lock);
+}
+
+int pcidevs_is_locked(void)
+{
+    return spin_is_locked(&_pcidevs_lock);
+}
+
+int pcidevs_trylock(void)
+{
+    return spin_trylock_recursive(&_pcidevs_lock);
+}
+
 void __init pt_pci_init(void)
 {
     radix_tree_init(&pci_segments);
@@ -412,14 +432,14 @@ int __init pci_hide_device(int bus, int devfn)
     struct pci_dev *pdev;
     int rc = -ENOMEM;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = alloc_pdev(get_pseg(0), bus, devfn);
     if ( pdev )
     {
         _pci_hide_device(pdev);
         rc = 0;
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
@@ -456,7 +476,7 @@ struct pci_dev *pci_get_pdev(int seg, int bus, int devfn)
     struct pci_seg *pseg = get_pseg(seg);
     struct pci_dev *pdev = NULL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(seg != -1 || bus == -1);
     ASSERT(bus != -1 || devfn == -1);
 
@@ -581,9 +601,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
         pdev_type = "extended function";
     else if (info->is_virtfn)
     {
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         if ( !pdev )
             pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
                            NULL, node);
@@ -601,7 +621,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 
     ret = -ENOMEM;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pseg = alloc_pseg(seg);
     if ( !pseg )
         goto out;
@@ -703,7 +723,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
     pci_enable_acs(pdev);
 
 out:
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     if ( !ret )
     {
         printk(XENLOG_DEBUG "PCI add %s %04x:%02x:%02x.%u\n", pdev_type,
@@ -735,7 +755,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     if ( !pseg )
         return -ENODEV;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
@@ -749,7 +769,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
             break;
         }
 
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
     return ret;
 }
 
@@ -807,11 +827,11 @@ int pci_release_devices(struct domain *d)
     u8 bus, devfn;
     int ret;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     ret = pci_clean_dpci_irqs(d);
     if ( ret )
     {
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         return ret;
     }
     while ( (pdev = pci_get_pdev_by_domain(d, -1, -1, -1)) )
@@ -823,7 +843,7 @@ int pci_release_devices(struct domain *d)
                    d->domain_id, pdev->seg, bus,
                    PCI_SLOT(devfn), PCI_FUNC(devfn));
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return 0;
 }
@@ -920,7 +940,7 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn)
     s_time_t now = NOW();
     u16 cword;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_real_pdev(seg, bus, devfn);
     if ( pdev )
     {
@@ -931,7 +951,7 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn)
         if ( ++pdev->fault.count < PT_FAULT_THRESHOLD )
             pdev = NULL;
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     if ( !pdev )
         return;
@@ -988,9 +1008,9 @@ int __init scan_pci_devices(void)
 {
     int ret;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     ret = pci_segments_iterate(_scan_pci_devices, NULL);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return ret;
 }
@@ -1054,17 +1074,17 @@ static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, void *arg
 
             if ( iommu_verbose )
             {
-                spin_unlock(&pcidevs_lock);
+                pcidevs_unlock();
                 process_pending_softirqs();
-                spin_lock(&pcidevs_lock);
+                pcidevs_lock();
             }
         }
 
         if ( !iommu_verbose )
         {
-            spin_unlock(&pcidevs_lock);
+            pcidevs_unlock();
             process_pending_softirqs();
-            spin_lock(&pcidevs_lock);
+            pcidevs_lock();
         }
     }
 
@@ -1076,9 +1096,9 @@ void __hwdom_init setup_hwdom_pci_devices(
 {
     struct setup_hwdom ctxt = { .d = d, .handler = handler };
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pci_segments_iterate(_setup_hwdom_pci_devices, &ctxt);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 #ifdef CONFIG_ACPI
@@ -1206,9 +1226,9 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
 static void dump_pci_devices(unsigned char ch)
 {
     printk("==== PCI devices ====\n");
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pci_segments_iterate(_dump_pci_devices, NULL);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 static int __init setup_dump_pcidevs(void)
@@ -1242,7 +1262,7 @@ int iommu_add_device(struct pci_dev *pdev)
     if ( !pdev->domain )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     hd = domain_hvm_iommu(pdev->domain);
     if ( !iommu_enabled || !hd->platform_ops )
@@ -1271,7 +1291,7 @@ int iommu_enable_device(struct pci_dev *pdev)
     if ( !pdev->domain )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     hd = domain_hvm_iommu(pdev->domain);
     if ( !iommu_enabled || !hd->platform_ops ||
@@ -1320,9 +1340,9 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
 {
     struct pci_dev *pdev;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return pdev ? 0 : -EBUSY;
 }
@@ -1344,13 +1364,13 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
              p2m_get_hostp2m(d)->global_logdirty)) )
         return -EXDEV;
 
-    if ( !spin_trylock(&pcidevs_lock) )
+    if ( !pcidevs_trylock() )
         return -ERESTART;
 
     rc = iommu_construct(d);
     if ( rc )
     {
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         return rc;
     }
 
@@ -1381,7 +1401,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
  done:
     if ( !has_arch_pdevs(d) && need_iommu(d) )
         iommu_teardown(d);
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return rc;
 }
@@ -1396,7 +1416,7 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
     if ( !iommu_enabled || !hd->platform_ops )
         return -EINVAL;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
     if ( !pdev )
         return -ENODEV;
@@ -1451,7 +1471,7 @@ static int iommu_get_device_group(
 
     group_id = ops->get_device_group_id(seg, bus, devfn);
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     for_each_pdev( d, pdev )
     {
         if ( (pdev->seg != seg) ||
@@ -1470,14 +1490,14 @@ static int iommu_get_device_group(
 
             if ( unlikely(copy_to_guest_offset(buf, i, &bdf, 1)) )
             {
-                spin_unlock(&pcidevs_lock);
+                pcidevs_unlock();
                 return -1;
             }
             i++;
         }
     }
 
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 
     return i;
 }
@@ -1605,9 +1625,9 @@ int iommu_do_pci_domctl(
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
 
-        spin_lock(&pcidevs_lock);
+        pcidevs_lock();
         ret = deassign_device(d, seg, bus, devfn);
-        spin_unlock(&pcidevs_lock);
+        pcidevs_unlock();
         if ( ret )
             printk(XENLOG_G_ERR
                    "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n",
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 0ee3fb2..f5e3f49 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -984,7 +984,7 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
 
     spin_unlock_irq(&desc->lock);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     /*
      * FIXME: For performance reasons we should store the 'iommu' pointer in
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 8022702..8409100 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1282,7 +1282,7 @@ int domain_context_mapping_one(
     u16 seg = iommu->intel->drhd->segment;
     int agaw;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     spin_lock(&iommu->lock);
     maddr = bus_to_context_maddr(iommu, bus);
     context_entries = (struct context_entry *)map_vtd_domain_page(maddr);
@@ -1424,7 +1424,7 @@ static int domain_context_mapping(
     if ( !drhd )
         return -ENODEV;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     switch ( pdev->type )
     {
@@ -1506,7 +1506,7 @@ int domain_context_unmap_one(
     u64 maddr;
     int iommu_domid;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     spin_lock(&iommu->lock);
 
     maddr = bus_to_context_maddr(iommu, bus);
@@ -1816,7 +1816,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
     struct mapped_rmrr *mrmrr;
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
     ASSERT(rmrr->base_address < rmrr->end_address);
 
     /*
@@ -1881,7 +1881,7 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
     u16 bdf;
     int ret, i;
 
-    ASSERT(spin_is_locked(&pcidevs_lock));
+    ASSERT(pcidevs_is_locked());
 
     if ( !pdev->domain )
         return -EINVAL;
@@ -2109,7 +2109,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     u16 bdf;
     int ret, i;
 
-    spin_lock(&pcidevs_lock);
+    pcidevs_lock();
     for_each_rmrr_device ( rmrr, bdf, i )
     {
         /*
@@ -2123,7 +2123,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
     }
-    spin_unlock(&pcidevs_lock);
+    pcidevs_unlock();
 }
 
 int __init intel_vtd_setup(void)
diff --git a/xen/drivers/video/vga.c b/xen/drivers/video/vga.c
index 40e5963..61c6b13 100644
--- a/xen/drivers/video/vga.c
+++ b/xen/drivers/video/vga.c
@@ -117,9 +117,9 @@ void __init video_endboot(void)
                 const struct pci_dev *pdev;
                 u8 b = bus, df = devfn, sb;
 
-                spin_lock(&pcidevs_lock);
+                pcidevs_lock();
                 pdev = pci_get_pdev(0, bus, devfn);
-                spin_unlock(&pcidevs_lock);
+                pcidevs_unlock();
 
                 if ( !pdev ||
                      pci_conf_read16(0, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index a5aef55..b87571d 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -94,7 +94,10 @@ struct pci_dev {
  * interrupt handling related (the mask bit register).
  */
 
-extern spinlock_t pcidevs_lock;
+void pcidevs_lock(void);
+void pcidevs_unlock(void);
+int pcidevs_is_locked(void);
+int pcidevs_trylock(void);
 
 bool_t pci_known_segment(u16 seg);
 bool_t pci_device_detect(u16 seg, u8 bus, u8 dev, u8 func);
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09  3:08 ` [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
@ 2016-03-09  5:19   ` Tian, Kevin
  2016-03-09  7:31     ` Xu, Quan
  0 siblings, 1 reply; 16+ messages in thread
From: Tian, Kevin @ 2016-03-09  5:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Xu, Quan, Dario Faggioli, Suravee Suthikulpanit, Jan Beulich

> From: Quan Xu
> Sent: Wednesday, March 09, 2016 11:08 AM

Hi, Quan, sorry that I still didn't quite get below description.

> 
> When iommu_setup() is called in __start_xen(), interrupts have
> already been enabled, and nothing disables (without reenabling)
> them again in the path that leads to calling
> set_iommu_interrupt_handler(). There's therefore no risk of them
> being disabled and needing remaining so, and hence no need for

no risk of them being 'enabled' since no one disables them again?

> saving and restoring the flags. This means that, even if interrupt
> would need to be disabled when taking pcidevs_lock, spin_lock_irq()
> and spin_unlock_irq() could be used.

I didn't see how this explanation relates to below change. You are
changing from spin_lock_irqsave to spin_lock. But here you explains
the reason to spin_lock_irq...

> 
> Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls
> to pci_get_pdev(), which does not require interrupts to be disabled
> during its execution. In fact, pcidevs_lock is always (except for
> this case) taken without disabling interrupts, and disabling them in
> this case would make the BUG_ON() in check_lock() trigger, if it
> wasn't for the fact that spinlock debugging checks are still
> disabled, during initialization, when iommu_setup() (which then end
> up calling set_iommu_interrupt_handler()) is called.

The key problem is that we need consistent lock usage in all places 
(either all in IRQ-safe or all in IRQ-unsafe), regardless of whether 
check_lock is activated or not (which is just a debug method to help
identify such inconsistency problem). I think it should be clear 
enough to state that pci_get_pdev doesn't require holding lock with
interrupt disabled (so we should use spin_lock in this AMD case),
and add the fact that when this function is invoked the interrupt
is indeed enabled.


> 
> In order to fix this, we can use just plain spin_lock() and spin_unlock(),
> instead of spin_lock_irqsave() and spin_unlock_irqrestore().

What about revise like below?
--

pcidevs_lock should be held with interrupt enabled. However there
remains an exception in AMD IOMMU code, where the lock is
acquired with interrupt disabled. This inconsistency can lead to 
deadlock. 

The fix is straightforward to use spin_lock instead. Also interrupt
has been enabled when this function is invoked, so we're sure
consistency around pcidevs_lock can be guaranteed after this fix.
--

> 
> Signed-off-by: Quan Xu <quan.xu@intel.com>
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> ---
>  xen/drivers/passthrough/amd/iommu_init.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_init.c
> b/xen/drivers/passthrough/amd/iommu_init.c
> index d90a2d2..a400497 100644
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -778,7 +778,6 @@ static bool_t __init set_iommu_interrupt_handler(struct
> amd_iommu *iommu)
>  {
>      int irq, ret;
>      hw_irq_controller *handler;
> -    unsigned long flags;
>      u16 control;
> 
>      irq = create_irq(NUMA_NO_NODE);
> @@ -788,10 +787,10 @@ static bool_t __init set_iommu_interrupt_handler(struct
> amd_iommu *iommu)
>          return 0;
>      }
> 
> -    spin_lock_irqsave(&pcidevs_lock, flags);
> +    spin_lock(&pcidevs_lock);
>      iommu->msi.dev = pci_get_pdev(iommu->seg, PCI_BUS(iommu->bdf),
>                                    PCI_DEVFN2(iommu->bdf));
> -    spin_unlock_irqrestore(&pcidevs_lock, flags);
> +    spin_unlock(&pcidevs_lock);
>      if ( !iommu->msi.dev )
>      {
>          AMD_IOMMU_DEBUG("IOMMU: no pdev for %04x:%02x:%02x.%u\n",
> --
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-09  3:08 ` [PATCH v2 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
@ 2016-03-09  5:21   ` Tian, Kevin
  2016-03-09  5:50     ` Xu, Quan
  0 siblings, 1 reply; 16+ messages in thread
From: Tian, Kevin @ 2016-03-09  5:21 UTC (permalink / raw)
  To: Xu, Quan, xen-devel
  Cc: Wu, Feng, Jan Beulich, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, Keir Fraser

> From: Xu, Quan
> Sent: Wednesday, March 09, 2016 11:08 AM
> 
> Signed-off-by: Quan Xu <quan.xu@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Feng Wu <feng.wu@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com> for VT-d part.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one
  2016-03-09  5:21   ` Tian, Kevin
@ 2016-03-09  5:50     ` Xu, Quan
  0 siblings, 0 replies; 16+ messages in thread
From: Xu, Quan @ 2016-03-09  5:50 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Wu, Feng, Jan Beulich, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, Keir Fraser

On March 09, 2016 1:21pm, <Tian, Kevin> wrote:
> > From: Xu, Quan
> > Sent: Wednesday, March 09, 2016 11:08 AM
> >
> > Signed-off-by: Quan Xu <quan.xu@intel.com>
> > CC: Keir Fraser <keir@xen.org>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> > CC: Feng Wu <feng.wu@intel.com>
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Acked-by: Kevin Tian <kevin.tian@intel.com> for VT-d part.

Kevin, thanks!!
Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09  5:19   ` Tian, Kevin
@ 2016-03-09  7:31     ` Xu, Quan
  2016-03-09 10:09       ` Jan Beulich
  2016-03-09 10:24       ` Dario Faggioli
  0 siblings, 2 replies; 16+ messages in thread
From: Xu, Quan @ 2016-03-09  7:31 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel, Dario Faggioli, Jan Beulich; +Cc: Suravee Suthikulpanit

On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > From: Quan Xu
> > Sent: Wednesday, March 09, 2016 11:08 AM
> > When iommu_setup() is called in __start_xen(), interrupts have already
> > been enabled, and nothing disables (without reenabling) them again in
> > the path that leads to calling set_iommu_interrupt_handler(). There's
> > therefore no risk of them being disabled and needing remaining so, and
> > hence no need for
> 
> no risk of them being 'enabled' since no one disables them again?
> 

Yes,

> > saving and restoring the flags. This means that, even if interrupt
> > would need to be disabled when taking pcidevs_lock, spin_lock_irq()
> > and spin_unlock_irq() could be used.
> 
> I didn't see how this explanation relates to below change. You are changing
> from spin_lock_irqsave to spin_lock. But here you explains the reason to
> spin_lock_irq...
> 
Yes, you are right. I think I'd better remove or add a '()' for:

   "This means that, even if interrupt
    would need to be disabled when taking pcidevs_lock, spin_lock_irq()
    and spin_unlock_irq() could be used."

> >
> > Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls to
> > pci_get_pdev(), which does not require interrupts to be disabled
> > during its execution. In fact, pcidevs_lock is always (except for this
> > case) taken without disabling interrupts, and disabling them in this
> > case would make the BUG_ON() in check_lock() trigger, if it wasn't for
> > the fact that spinlock debugging checks are still disabled, during
> > initialization, when iommu_setup() (which then end up calling
> > set_iommu_interrupt_handler()) is called.
> 
> The key problem is that we need consistent lock usage in all places (either all in
> IRQ-safe or all in IRQ-unsafe), regardless of whether check_lock is activated or
> not (which is just a debug method to help identify such inconsistency problem).


IMO, this is not the key problem,  __Wait for Dario's / Jan's opinions__.


> I think it should be clear enough to state that pci_get_pdev doesn't require
> holding lock with interrupt disabled (so we should use spin_lock in this AMD
> case), and add the fact that when this function is invoked the interrupt is indeed
> enabled.
> 
> 
> >
> > In order to fix this, we can use just plain spin_lock() and
> > spin_unlock(), instead of spin_lock_irqsave() and spin_unlock_irqrestore().
> 
> What about revise like below?
> --
> 
> pcidevs_lock should be held with interrupt enabled.

I am not sure for this point.


> However there remains an
> exception in AMD IOMMU code, where the lock is acquired with interrupt
> disabled. This inconsistency can lead to deadlock.
> 
> The fix is straightforward to use spin_lock instead. Also interrupt has been
> enabled when this function is invoked, so we're sure consistency around
> pcidevs_lock can be guaranteed after this fix.

I really appreciate you send out a revised one, but I think it is not only for consistency.
 __Wait for Dario's / Jan's opinions__.

To be honest, to me, __my_changlog_ is complicated.


Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09  7:31     ` Xu, Quan
@ 2016-03-09 10:09       ` Jan Beulich
  2016-03-09 10:24       ` Dario Faggioli
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2016-03-09 10:09 UTC (permalink / raw)
  To: quan.xu; +Cc: dario.faggioli, kevin.tian, suravee.suthikulpanit, xen-devel

>>> "Xu, Quan" <quan.xu@intel.com> 03/09/16 8:32 AM >>>
>On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
>> > Inside set_iommu_interrupt_handler(), pcidevs_lock serializes calls to
>> > pci_get_pdev(), which does not require interrupts to be disabled
>> > during its execution. In fact, pcidevs_lock is always (except for this
>> > case) taken without disabling interrupts, and disabling them in this
>> > case would make the BUG_ON() in check_lock() trigger, if it wasn't for
>> > the fact that spinlock debugging checks are still disabled, during
>> > initialization, when iommu_setup() (which then end up calling
>> > set_iommu_interrupt_handler()) is called.
>> 
>> The key problem is that we need consistent lock usage in all places (either all in
>> IRQ-safe or all in IRQ-unsafe), regardless of whether check_lock is activated or
>> not (which is just a debug method to help identify such inconsistency problem).
>
>IMO, this is not the key problem,  __Wait for Dario's / Jan's opinions__.

The inconsistency is one way of look at the problem, so with that it is
kind of "key".

>> What about revise like below?
>> --
>> 
>> pcidevs_lock should be held with interrupt enabled.
>
>I am not sure for this point.

Well, I'd say something like "pcidevs_lock doesn't require interrupts to be
disabled while being acquired".

>> However there remains an
>> exception in AMD IOMMU code, where the lock is acquired with interrupt
>> disabled. This inconsistency can lead to deadlock.
>> 
>> The fix is straightforward to use spin_lock instead. Also interrupt has been
>> enabled when this function is invoked, so we're sure consistency around
>> pcidevs_lock can be guaranteed after this fix.
>
>I really appreciate you send out a revised one, but I think it is not only for consistency.
>__Wait for Dario's / Jan's opinions__.
>
>To be honest, to me, __my_changlog_ is complicated.

I think Kevin's text above is okay. Perhaps weaken the "can lead to a
deadlock" slightly, because that's just a theoretical concern, not one that's
possible in practice on that code path.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09  7:31     ` Xu, Quan
  2016-03-09 10:09       ` Jan Beulich
@ 2016-03-09 10:24       ` Dario Faggioli
  2016-03-09 12:52         ` Xu, Quan
  1 sibling, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2016-03-09 10:24 UTC (permalink / raw)
  To: Xu, Quan, Tian, Kevin; +Cc: Jan Beulich, Suravee Suthikulpanit, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 7084 bytes --]

On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:
> On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > 
> > > When iommu_setup() is called in __start_xen(), interrupts have
> > > already
> > > been enabled, and nothing disables (without reenabling) them
> > > again in
> > > the path that leads to calling set_iommu_interrupt_handler().
> > > There's
> > > therefore no risk of them being disabled and needing remaining
> > > so, and
> > > hence no need for
> > no risk of them being 'enabled' since no one disables them again?
> > 
> Yes,
> 
Reason why one use _irqsave() when locking is because he doesn't know
whether interrupt are disabled or not, and wants that, whatever the
status is (enabled or disabled), it remains unchanged upon unlock
(which, therefore, does the _irqrestore()).

If we are absolutely sure that they are enabled, i.e., there is no
_risk_ that they are disabled, it would be ok to just use _irq() (if
disabling interrupt is necessary, which is not in this case, but that's
another thing) and avoid saving the flags.

That's how I read the original description (which, of course, does not
mean it can't be simplified).

> > > saving and restoring the flags. This means that, even if
> > > interrupt
> > > would need to be disabled when taking pcidevs_lock,
> > > spin_lock_irq()
> > > and spin_unlock_irq() could be used.
> > I didn't see how this explanation relates to below change. You are
> > changing
> > from spin_lock_irqsave to spin_lock. But here you explains the
> > reason to
> > spin_lock_irq...
> > 
>
Exactly. We have spin_lock_irqsave() that does two things:
 - disables interrupts,
 - saves and restores the flags.

We think that:
 - there's no need to save flags, even if disabling interrupts were 
   necessary,
 - there is no need to disable interrupts.

And therefore, we are changing to spin_lock() that:
 - doesn't disable interrupts,
 - doesn't save and restore the flags.

So, basically, switching spinlock variant is basically _2_ changes. I
do think that it is worth touching in the changelog why _both_ are ok.

> Yes, you are right. I think I'd better remove or add a '()' for:
> 
>    "This means that, even if interrupt
>     would need to be disabled when taking pcidevs_lock,
> spin_lock_irq()
>     and spin_unlock_irq() could be used."
> 
Yeah, that makes it more accurate, but even longer, while I think the
changelog could use some shortening and simplification.

> > > Inside set_iommu_interrupt_handler(), pcidevs_lock serializes
> > > calls to
> > > pci_get_pdev(), which does not require interrupts to be disabled
> > > during its execution. In fact, pcidevs_lock is always (except for
> > > this
> > > case) taken without disabling interrupts, and disabling them in
> > > this
> > > case would make the BUG_ON() in check_lock() trigger, if it
> > > wasn't for
> > > the fact that spinlock debugging checks are still disabled,
> > > during
> > > initialization, when iommu_setup() (which then end up calling
> > > set_iommu_interrupt_handler()) is called.
> > The key problem is that we need consistent lock usage in all places
> > (either all in
> > IRQ-safe or all in IRQ-unsafe), regardless of whether check_lock is
> > activated or
> > not (which is just a debug method to help identify such
> > inconsistency problem).
> 
It is, and, during boot, is disabled for a reason, which is that we
allow mixed usage, albeit only in super special circumstances (like, in
fact, early boot). So I agree that check_lock() is just a sanity
checking mechanism, but let's not state anything incorrect (as Jan
requested in the first place).

> IMO, this is not the key problem,  __Wait for Dario's / Jan's
> opinions__.
> 
Well, it is a way to describe at least some of the aspects of the key
problem, I guess, just not the one that I think I would stress the
most.

> > I think it should be clear enough to state that pci_get_pdev
> > doesn't require
> > holding lock with interrupt disabled (so we should use spin_lock in
> > this AMD
> > case), and add the fact that when this function is invoked the
> > interrupt is indeed
> > enabled.
> > 
I totally agree with this description! (Can we use that as
changelog? :-) )

> > > In order to fix this, we can use just plain spin_lock() and
> > > spin_unlock(), instead of spin_lock_irqsave() and
> > > spin_unlock_irqrestore().
> > What about revise like below?
> > --
> > 
> > pcidevs_lock should be held with interrupt enabled.
> I am not sure for this point.
> 
Well, it is true that it should. Altough, I think it's more accurate to
say that, as Kevin says above, it "doesn't require being held with
interrupt disabled".

> > However there remains an
> > exception in AMD IOMMU code, where the lock is acquired with
> > interrupt
> > disabled. This inconsistency can lead to deadlock.
> > 
Can it? In the case of the occurrence being changed by this patch, I
don't think it can.

> > The fix is straightforward to use spin_lock instead. Also interrupt
> > has been
> > enabled when this function is invoked, so we're sure consistency
> > around
> > pcidevs_lock can be guaranteed after this fix.
> I really appreciate you send out a revised one, but I think it is not
> only for consistency.
>  __Wait for Dario's / Jan's opinions__.
> 
> To be honest, to me, __my_changlog_ is complicated.
> 
It is complicated. I think it is a detailed, and IMO correct,
description of the reason why the patch is ok. That is indeed the
purpose of a changelog (especially for these kind of patches), but it
certainly could be summarized/simplified a bit.

I guess I'm also giving it a try, using what Kevin wrote in the middle
of his email as a basis (with a small addition about consistency at the
end).

"pci_get_pdev() doesn't require interrupts to be disabled when taking
the pcidevs_lock, which protects its execution. So, spin_lock() can be
used for that, and that is what is done almost everywhere.

Currenlty, there is an exception, in early boot IOMMU initialization on
AMD systems, where spin_lock_irqsave() and restore are used. However,
since, in that case too:
 - we don't need to disable interrupts (for same reasons stated above),
 - interrupts are enabled already, so there is no need to save and
   restore flags,
just change it into using spin_lock(), as everywhere.

Note that, mixing interrupt disabled and enabled spinlocks is something
we disallow, except for very special situations. And since this one
does not qualify as such, using IRQ disabling variants can be
considered a bug (which manages to not trigger the safety checking we
have in place only because they're not yet enabled)."

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09 10:24       ` Dario Faggioli
@ 2016-03-09 12:52         ` Xu, Quan
  2016-03-09 13:19           ` Dario Faggioli
  0 siblings, 1 reply; 16+ messages in thread
From: Xu, Quan @ 2016-03-09 12:52 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Tian, Kevin, Jan Beulich, Suravee Suthikulpanit, xen-devel

On March 09, 2016 6:25pm, <dario.faggioli@citrix.com> wrote:
> On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:
> > On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > >
> > > > When iommu_setup() is called in __start_xen(), interrupts have
> > > > already been enabled, and nothing disables (without reenabling)
> > > > them again in the path that leads to calling
> > > > set_iommu_interrupt_handler().
> > > > There's
> > > > therefore no risk of them being disabled and needing remaining so,
> > > > and hence no need for
> > > no risk of them being 'enabled' since no one disables them again?
> > >
> > Yes,
> >
> Reason why one use _irqsave() when locking is because he doesn't know
> whether interrupt are disabled or not, and wants that, whatever the status is
> (enabled or disabled), it remains unchanged upon unlock (which, therefore,
> does the _irqrestore()).
> 
> If we are absolutely sure that they are enabled, i.e., there is no _risk_ that they
> are disabled, it would be ok to just use _irq() (if disabling interrupt is necessary,
> which is not in this case, but that's another thing) and avoid saving the flags.
> 
> That's how I read the original description (which, of course, does not mean it
> can't be simplified).
> 
Dario, thanks for your share.
I appreciate you always share some knowledge. :):)
btw, the "Linux Device Drivers, 3rd Edition" book also describe it clearly, http://www.makelinux.net/ldd3/chp-5-sect-5 

> > > I think it should be clear enough to state that pci_get_pdev doesn't
> > > require holding lock with interrupt disabled (so we should use
> > > spin_lock in this AMD case), and add the fact that when this
> > > function is invoked the interrupt is indeed enabled.
> > >
> I totally agree with this description! (Can we use that as changelog? :-) )
> 

Of course. :)

> > > > In order to fix this, we can use just plain spin_lock() and
> > > > spin_unlock(), instead of spin_lock_irqsave() and
> > > > spin_unlock_irqrestore().
> > > What about revise like below?
> > > --
> > >
> > > pcidevs_lock should be held with interrupt enabled.
> > I am not sure for this point.
> >
> Well, it is true that it should. Altough, I think it's more accurate to say that, as
> Kevin says above, it "doesn't require being held with interrupt disabled".
> 
> > > However there remains an
> > > exception in AMD IOMMU code, where the lock is acquired with
> > > interrupt disabled. This inconsistency can lead to deadlock.
> > >
> Can it? In the case of the occurrence being changed by this patch, I don't think it
> can.

Before this patch, it might. As Jan mentioned, that's just a theoretical concern: 
 -spin_lock_irqsave() disables interrupts (on the local processor only) before taking the spinlock. 
  Supposed in MP system (pCPU1, pCPU2, ...), when the pCPU1 call the spin_lock_irqsave(),
  It does only disable interrupts for pCPU1, and _not_ for other pCPUn.
 -Then, it is mixing interrupt disabled and enabled spinlocks.

> 
> > > The fix is straightforward to use spin_lock instead. Also interrupt
> > > has been enabled when this function is invoked, so we're sure
> > > consistency around pcidevs_lock can be guaranteed after this fix.
> > I really appreciate you send out a revised one, but I think it is not
> > only for consistency.
> >  __Wait for Dario's / Jan's opinions__.
> >
> > To be honest, to me, __my_changlog_ is complicated.
> >
> It is complicated. I think it is a detailed, and IMO correct, description of the
> reason why the patch is ok. That is indeed the purpose of a changelog
> (especially for these kind of patches), but it certainly could be
> summarized/simplified a bit.
> 
> I guess I'm also giving it a try, using what Kevin wrote in the middle of his email
> as a basis (with a small addition about consistency at the end).
> 
> "pci_get_pdev() doesn't require interrupts to be disabled when taking the
> pcidevs_lock, which protects its execution. So, spin_lock() can be used for that,
> and that is what is done almost everywhere.
> 
> Currenlty, there is an exception, in early boot IOMMU initialization on AMD
> systems, where spin_lock_irqsave() and restore are used. However, since, in
> that case too:
>  - we don't need to disable interrupts (for same reasons stated above),
>  - interrupts are enabled already, so there is no need to save and
>    restore flags,
> just change it into using spin_lock(), as everywhere.
> 
> Note that, mixing interrupt disabled and enabled spinlocks is something we
> disallow,

Dario, could you share something in detail?
Quan

> except for very special situations. 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09 12:52         ` Xu, Quan
@ 2016-03-09 13:19           ` Dario Faggioli
  2016-03-09 13:46             ` Xu, Quan
  0 siblings, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2016-03-09 13:19 UTC (permalink / raw)
  To: Xu, Quan; +Cc: Tian, Kevin, Jan Beulich, Suravee Suthikulpanit, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4102 bytes --]

On Wed, 2016-03-09 at 12:52 +0000, Xu, Quan wrote:
> On March 09, 2016 6:25pm, <dario.faggioli@citrix.com> wrote:
> > On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:
> > > On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > > > 
> > If we are absolutely sure that they are enabled, i.e., there is no
> > _risk_ that they
> > are disabled, it would be ok to just use _irq() (if disabling
> > interrupt is necessary,
> > which is not in this case, but that's another thing) and avoid
> > saving the flags.
> > 
> Dario, thanks for your share.
> I appreciate you always share some knowledge. :):)
> btw, the "Linux Device Drivers, 3rd Edition" book also describe it
> clearly, http://www.makelinux.net/ldd3/chp-5-sect-5 
> 
Yes, with respect to that, us and Linux are similar, and the concerns
on whether interrupts should be disabled or not when taking a spinlock
are generic, and can be applied to any OS/hypervisor.

AFAICR, Linux does not have any check in place similar to our
check_lock(), but that does not mean much.

> > > > However there remains an
> > > > exception in AMD IOMMU code, where the lock is acquired with
> > > > interrupt disabled. This inconsistency can lead to deadlock.
> > > > 
> > Can it? In the case of the occurrence being changed by this patch,
> > I don't think it
> > can.
> Before this patch, it might. As Jan mentioned, that's just a
> theoretical concern: 
>  -spin_lock_irqsave() disables interrupts (on the local processor
> only) before taking the spinlock. 
>   Supposed in MP system (pCPU1, pCPU2, ...), when the pCPU1 call the
> spin_lock_irqsave(),
>   It does only disable interrupts for pCPU1, and _not_ for other
> pCPUn.
>  -Then, it is mixing interrupt disabled and enabled spinlocks.
> 
I was commenting on the "can lead to deadlock" part, which is something
that, in general, we risk if we mix, but that can't possibly occur in
this specific case. This is also what Jan is saying, and the reason why
is also asking to mitigate this exact claim... So, I'm not sure what
your point is here...

> > Note that, mixing interrupt disabled and enabled spinlocks is
> > something we
> > disallow,
>
> Dario, could you share something in detail?
>
Sorry, I again don't understand... share something about what? I was
proposing myself a text to be used as changelog, which I'm pasting
again here below, for completeness/clarity.

This sentence you seem to be asking about, is what we've been
"debating", in this thread about the changelog, since almost the
beginning: it's bad to mix, because it leads to inconsistencies can are
bad because in general it would trigger BUG_ON, in debug builds, and
cause deadlock, in non-debug builds, even if in this case neither
happen. What is that you need more insights about?

Here it is (again) what I would use as commit message:
---
pci_get_pdev() doesn't require interrupts to be disabled when taking
the pcidevs_lock, which protects its execution. So, spin_lock() can be
used for that, and that is what is done almost everywhere.

Currenlty, there is an exception, in early boot IOMMU initialization on
AMD systems, where spin_lock_irqsave() and restore are used. However,
since, in that case too:
 - we don't need to disable interrupts (for same reasons stated above),
 - interrupts are enabled already, so there is no need to save and
   restore flags,
just change it into using spin_lock(), as everywhere.

Note that, mixing interrupt disabled and enabled spinlocks is something
we disallow, except for very special situations. And since this one
does not qualify as such, using IRQ disabling variants can be
considered a bug (which manages to not trigger the safety checking we
have in place only because they're not yet enabled).
---

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09 13:19           ` Dario Faggioli
@ 2016-03-09 13:46             ` Xu, Quan
  2016-03-09 13:55               ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Xu, Quan @ 2016-03-09 13:46 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Tian, Kevin, Jan Beulich, Suravee Suthikulpanit, xen-devel



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Wednesday, March 09, 2016 9:20 PM
> To: Xu, Quan
> Cc: Suravee Suthikulpanit; xen-devel@lists.xen.org; Jan Beulich; Tian, Kevin
> Subject: Re: [Xen-devel] [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in
> AMD IOMMU initialization.
> 
> On Wed, 2016-03-09 at 12:52 +0000, Xu, Quan wrote:
> > On March 09, 2016 6:25pm, <dario.faggioli@citrix.com> wrote:
> > > On Wed, 2016-03-09 at 07:31 +0000, Xu, Quan wrote:
> > > > On March 09, 2016 1:19pm, <Tian, Kevin> wrote:
> > > > >
> > > If we are absolutely sure that they are enabled, i.e., there is no
> > > _risk_ that they are disabled, it would be ok to just use _irq() (if
> > > disabling interrupt is necessary, which is not in this case, but
> > > that's another thing) and avoid saving the flags.
> > >
> > Dario, thanks for your share.
> > I appreciate you always share some knowledge. :):) btw, the "Linux
> > Device Drivers, 3rd Edition" book also describe it clearly,
> > http://www.makelinux.net/ldd3/chp-5-sect-5
> >
> Yes, with respect to that, us and Linux are similar, and the concerns on whether
> interrupts should be disabled or not when taking a spinlock are generic, and can
> be applied to any OS/hypervisor.
> 
> AFAICR, Linux does not have any check in place similar to our check_lock(), but
> that does not mean much.
> 
> > > > > However there remains an
> > > > > exception in AMD IOMMU code, where the lock is acquired with
> > > > > interrupt disabled. This inconsistency can lead to deadlock.
> > > > >
> > > Can it? In the case of the occurrence being changed by this patch, I
> > > don't think it can.
> > Before this patch, it might. As Jan mentioned, that's just a
> > theoretical concern:
> >  -spin_lock_irqsave() disables interrupts (on the local processor
> > only) before taking the spinlock.
> >   Supposed in MP system (pCPU1, pCPU2, ...), when the pCPU1 call the
> > spin_lock_irqsave(),
> >   It does only disable interrupts for pCPU1, and _not_ for other
> > pCPUn.
> >  -Then, it is mixing interrupt disabled and enabled spinlocks.
> >
> I was commenting on the "can lead to deadlock" part, which is something that,
> in general, we risk if we mix, but that can't possibly occur in this specific case.
> This is also what Jan is saying, and the reason why is also asking to mitigate this
> exact claim... So, I'm not sure what your point is here...
> 
> > > Note that, mixing interrupt disabled and enabled spinlocks is
> > > something we disallow,
> >
> > Dario, could you share something in detail?
> >
> Sorry, I again don't understand... share something about what? I was proposing
> myself a text to be used as changelog, which I'm pasting again here below, for
> completeness/clarity.
> 

Now I am still not clear for this point- "this inconsistency might lead to deadlock".
I think it is similar to 'mixing interrupt disabled and enabled spinlocks is something we disallow'.
I hope you can give me an example about how to lead to deadlock. 

Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09 13:46             ` Xu, Quan
@ 2016-03-09 13:55               ` Jan Beulich
  2016-03-09 14:45                 ` Dario Faggioli
  2016-03-10  3:21                 ` Xu, Quan
  0 siblings, 2 replies; 16+ messages in thread
From: Jan Beulich @ 2016-03-09 13:55 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Kevin Tian, Quan Xu, Suravee Suthikulpanit, xen-devel

>>> On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:
> Now I am still not clear for this point- "this inconsistency might lead to 
> deadlock".
> I think it is similar to 'mixing interrupt disabled and enabled spinlocks is 
> something we disallow'.
> I hope you can give me an example about how to lead to deadlock. 

The implication from disabling interrupts while acquiring a lock
is that the lock is also being acquired by some interrupt
handler. If you mix acquire types, the one not disabling
interrupts is prone to be interrupted, and the interrupt trying
to get hold of the lock the same CPU already owns.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09 13:55               ` Jan Beulich
@ 2016-03-09 14:45                 ` Dario Faggioli
  2016-03-10  5:36                   ` Xu, Quan
  2016-03-10  3:21                 ` Xu, Quan
  1 sibling, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2016-03-09 14:45 UTC (permalink / raw)
  To: Quan Xu; +Cc: Kevin Tian, Jan Beulich, Suravee Suthikulpanit, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1439 bytes --]

On Wed, 2016-03-09 at 06:55 -0700, Jan Beulich wrote:
> > 
> > > > On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:
> > Now I am still not clear for this point- "this inconsistency might
> > lead to 
> > deadlock".
> > I think it is similar to 'mixing interrupt disabled and enabled
> > spinlocks is 
> > something we disallow'.
> > I hope you can give me an example about how to lead to deadlock. 
> The implication from disabling interrupts while acquiring a lock
> is that the lock is also being acquired by some interrupt
> handler. If you mix acquire types, the one not disabling
> interrupts is prone to be interrupted, and the interrupt trying
> to get hold of the lock the same CPU already owns.
> 
Exactly.

There are a few other nice writeup online as well.

The most famous one, I guess, is this one from Linus (look at "Lesson
3: spinlocks revisited.")
https://www.kernel.org/doc/Documentation/locking/spinlocks.txt

And, of course, there's the comment inside check_lock(), in
xen/common/spinlock.c, in Xen's codebase, where another example of how
it could be dangerous to mix, even if multiple cpus are involved.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09 13:55               ` Jan Beulich
  2016-03-09 14:45                 ` Dario Faggioli
@ 2016-03-10  3:21                 ` Xu, Quan
  1 sibling, 0 replies; 16+ messages in thread
From: Xu, Quan @ 2016-03-10  3:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tian, Kevin, Dario Faggioli, Suravee Suthikulpanit, xen-devel

On March 09, 2016 9:56pm, <JBeulich@suse.com> wrote:
> >>> On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:
> > Now I am still not clear for this point- "this inconsistency might
> > lead to deadlock".
> > I think it is similar to 'mixing interrupt disabled and enabled
> > spinlocks is something we disallow'.
> > I hope you can give me an example about how to lead to deadlock.
> 
> The implication from disabling interrupts while acquiring a lock is that the lock is
> also being acquired by some interrupt handler. If you mix acquire types, the one
> not disabling interrupts is prone to be interrupted, and the interrupt trying to get
> hold of the lock the same CPU already owns.
> 

Jan, thanks.
Now I got it. btw, I also find a good explanation from the comment, inside check_lock(), in xen/common/spinlock.c.
Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization.
  2016-03-09 14:45                 ` Dario Faggioli
@ 2016-03-10  5:36                   ` Xu, Quan
  0 siblings, 0 replies; 16+ messages in thread
From: Xu, Quan @ 2016-03-10  5:36 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Tian, Kevin, Suravee Suthikulpanit, Jan Beulich, xen-devel

On March 09, 2016 10:45pm, Dario Faggioli wrote:
> On Wed, 2016-03-09 at 06:55 -0700, Jan Beulich wrote:
> > >
> > > > > On 09.03.16 at 14:46, <quan.xu@intel.com> wrote:
> > > Now I am still not clear for this point- "this inconsistency might
> > > lead to deadlock".
> > > I think it is similar to 'mixing interrupt disabled and enabled
> > > spinlocks is something we disallow'.
> > > I hope you can give me an example about how to lead to deadlock.
> > The implication from disabling interrupts while acquiring a lock is
> > that the lock is also being acquired by some interrupt handler. If you
> > mix acquire types, the one not disabling interrupts is prone to be
> > interrupted, and the interrupt trying to get hold of the lock the same
> > CPU already owns.
> >
> Exactly.
> 
> There are a few other nice writeup online as well.
> 
> The most famous one, I guess, is this one from Linus (look at "Lesson
> 3: spinlocks revisited.")
> https://www.kernel.org/doc/Documentation/locking/spinlocks.txt
> 
> And, of course, there's the comment inside check_lock(), in
> xen/common/spinlock.c, in Xen's codebase, where another example of how it
> could be dangerous to mix, even if multiple cpus are involved.
> 
Dario, thanks! You know, it helped me a lot.

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-03-10  5:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-09  3:08 [PATCH v2 0/2] Make the pcidevs_lock a recursive one Quan Xu
2016-03-09  3:08 ` [PATCH v2 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Quan Xu
2016-03-09  5:19   ` Tian, Kevin
2016-03-09  7:31     ` Xu, Quan
2016-03-09 10:09       ` Jan Beulich
2016-03-09 10:24       ` Dario Faggioli
2016-03-09 12:52         ` Xu, Quan
2016-03-09 13:19           ` Dario Faggioli
2016-03-09 13:46             ` Xu, Quan
2016-03-09 13:55               ` Jan Beulich
2016-03-09 14:45                 ` Dario Faggioli
2016-03-10  5:36                   ` Xu, Quan
2016-03-10  3:21                 ` Xu, Quan
2016-03-09  3:08 ` [PATCH v2 2/2] IOMMU/spinlock: Make the pcidevs_lock a recursive one Quan Xu
2016-03-09  5:21   ` Tian, Kevin
2016-03-09  5:50     ` Xu, Quan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).