xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
@ 2016-05-27  0:39 Shanker Donthineni
  2016-05-27 13:56 ` Julien Grall
  0 siblings, 1 reply; 11+ messages in thread
From: Shanker Donthineni @ 2016-05-27  0:39 UTC (permalink / raw)
  To: xen-devel
  Cc: Philip Elcan, Julien Grall, Stefano Stabellini,
	Shanker Donthineni, Vikram Sethi

Commit 9d77b3c01d1261c (Configure SPI interrupt type and route to
Dom0 dynamically) causing dead loop inside the spinlock function.
Note that spinlocks in XEN are not recursive. Re-acquiring a spinlock
that has already held by calling CPU leads to deadlock. This happens
whenever dom0 does writes to GICD regs ISENABLER/ICENABLER.

The following call trace explains the problem.

DOM0 writes GICD_ISENABLER/GICD_ICENABLER
  vgic_v3_distr_common_mmio_write()
    vgic_lock_rank()  -->  acquiring first time
      vgic_enable_irqs()
        route_irq_to_guest()
          gic_route_irq_to_guest()
            vgic_get_target_vcpu()
              vgic_lock_rank()  -->  attemping acquired lock

The simple fix release spinlock before calling vgic_enable_irqs()
and vgic_disable_irqs().

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
---
 xen/arch/arm/vgic-v2.c | 10 +++++++---
 xen/arch/arm/vgic-v3.c | 10 +++++++---
 xen/arch/arm/vgic.c    |  4 ++--
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 9adb4a9..44cd834 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -415,7 +415,7 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
     struct hsr_dabt dabt = info->dabt;
     struct vgic_irq_rank *rank;
     int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase);
-    uint32_t tr;
+    uint32_t tr, index;
     unsigned long flags;
 
     perfc_incr(vgicd_writes);
@@ -457,8 +457,10 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
         vgic_reg32_setbits(&rank->ienable, r, info);
-        vgic_enable_irqs(v, (rank->ienable) & (~tr), rank->index);
+        index = rank->index;
+        tr = rank->ienable & (~tr);
         vgic_unlock_rank(v, rank, flags);
+        vgic_enable_irqs(v, tr, index);
         return 1;
 
     case VRANGE32(GICD_ICENABLER, GICD_ICENABLERN):
@@ -468,8 +470,10 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
         vgic_reg32_clearbits(&rank->ienable, r, info);
-        vgic_disable_irqs(v, (~rank->ienable) & tr, rank->index);
+        index = rank->index;
+        tr = (~rank->ienable) & tr;
         vgic_unlock_rank(v, rank, flags);
+        vgic_disable_irqs(v, tr, index);
         return 1;
 
     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index b37a7c0..e04e180 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -568,7 +568,7 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
 {
     struct hsr_dabt dabt = info->dabt;
     struct vgic_irq_rank *rank;
-    uint32_t tr;
+    uint32_t tr, index;
     unsigned long flags;
 
     switch ( reg )
@@ -584,8 +584,10 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
         vgic_reg32_setbits(&rank->ienable, r, info);
-        vgic_enable_irqs(v, (rank->ienable) & (~tr), rank->index);
+        index = rank->index;
+        tr = rank->ienable & (~tr);
         vgic_unlock_rank(v, rank, flags);
+        vgic_enable_irqs(v, tr, index);
         return 1;
 
     case VRANGE32(GICD_ICENABLER, GICD_ICENABLERN):
@@ -595,8 +597,10 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
         vgic_reg32_clearbits(&rank->ienable, r, info);
-        vgic_disable_irqs(v, (~rank->ienable) & tr, rank->index);
+        index = rank->index;
+        tr = (~rank->ienable) & tr;
         vgic_unlock_rank(v, rank, flags);
+        vgic_disable_irqs(v, tr, index);
         return 1;
 
     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index aa420bb..82758d2 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -322,7 +322,7 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        v_target = __vgic_get_target_vcpu(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         gic_remove_from_queues(v_target, irq);
@@ -377,7 +377,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
                 gprintk(XENLOG_ERR, "Unable to route IRQ %u to domain %u\n",
                         irq, d->domain_id);
         }
-        v_target = __vgic_get_target_vcpu(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
-- 
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
a Linux Foundation Collaborative Project


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-27  0:39 [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank() Shanker Donthineni
@ 2016-05-27 13:56 ` Julien Grall
  2016-05-30 13:16   ` Stefano Stabellini
  0 siblings, 1 reply; 11+ messages in thread
From: Julien Grall @ 2016-05-27 13:56 UTC (permalink / raw)
  To: Shanker Donthineni, xen-devel
  Cc: Philip Elcan, Vikram Sethi, Wei Chen, Steve Capper,
	Stefano Stabellini, Shannon Zhao

Hello Shanker,

On 27/05/16 01:39, Shanker Donthineni wrote:
> Commit 9d77b3c01d1261c (Configure SPI interrupt type and route to
> Dom0 dynamically) causing dead loop inside the spinlock function.
> Note that spinlocks in XEN are not recursive. Re-acquiring a spinlock
> that has already held by calling CPU leads to deadlock. This happens
> whenever dom0 does writes to GICD regs ISENABLER/ICENABLER.

Thank you for spotting it, I have not noticed it while I was  reviewing, 
only tested on a model without any SPIs.

> The following call trace explains the problem.
>
> DOM0 writes GICD_ISENABLER/GICD_ICENABLER
>    vgic_v3_distr_common_mmio_write()
>      vgic_lock_rank()  -->  acquiring first time
>        vgic_enable_irqs()
>          route_irq_to_guest()
>            gic_route_irq_to_guest()
>              vgic_get_target_vcpu()
>                vgic_lock_rank()  -->  attemping acquired lock
>
> The simple fix release spinlock before calling vgic_enable_irqs()
> and vgic_disable_irqs().

You should explain why you think it is valid to release the lock earlier.

In this case, I think the fix is not correct because the lock is 
protecting both the register value and the internal state in Xen 
(modified by vgic_enable_irqs). By releasing the lock earlier, they may 
become inconsistent if another vCPU is disabling the IRQs at the same time.

I cannot find an easy fix which does not involve release the lock. When 
I was reviewing this patch, I suggested to split the IRQ configuration 
from the routing.

The routing (call to route_irq_to_guest) will be done before DOM0 is 
booting. The IRQ configuration will be done in the ICFGR register.

This will also help for PCI-passthrough as the guest will have to 
configure the SPIs (we can't expect DOM0 doing it for it). But the 
routing will be done ahead.

This would resolve the locking issue, however it is a big task. Feel 
free to suggest a simpler one.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-27 13:56 ` Julien Grall
@ 2016-05-30 13:16   ` Stefano Stabellini
  2016-05-30 19:45     ` Julien Grall
  0 siblings, 1 reply; 11+ messages in thread
From: Stefano Stabellini @ 2016-05-30 13:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Philip Elcan, Wei Chen, Vikram Sethi, Steve Capper, xen-devel,
	Stefano Stabellini, Shanker Donthineni, Shannon Zhao

On Fri, 27 May 2016, Julien Grall wrote:
> Hello Shanker,
> 
> On 27/05/16 01:39, Shanker Donthineni wrote:
> > Commit 9d77b3c01d1261c (Configure SPI interrupt type and route to
> > Dom0 dynamically) causing dead loop inside the spinlock function.
> > Note that spinlocks in XEN are not recursive. Re-acquiring a spinlock
> > that has already held by calling CPU leads to deadlock. This happens
> > whenever dom0 does writes to GICD regs ISENABLER/ICENABLER.
> 
> Thank you for spotting it, I have not noticed it while I was  reviewing, only
> tested on a model without any SPIs.
> 
> > The following call trace explains the problem.
> > 
> > DOM0 writes GICD_ISENABLER/GICD_ICENABLER
> >    vgic_v3_distr_common_mmio_write()
> >      vgic_lock_rank()  -->  acquiring first time
> >        vgic_enable_irqs()
> >          route_irq_to_guest()
> >            gic_route_irq_to_guest()
> >              vgic_get_target_vcpu()
> >                vgic_lock_rank()  -->  attemping acquired lock
> > 
> > The simple fix release spinlock before calling vgic_enable_irqs()
> > and vgic_disable_irqs().
> 
> You should explain why you think it is valid to release the lock earlier.
> 
> In this case, I think the fix is not correct because the lock is protecting
> both the register value and the internal state in Xen (modified by
> vgic_enable_irqs). By releasing the lock earlier, they may become inconsistent
> if another vCPU is disabling the IRQs at the same time.

I agree, the vgic_enable_irqs call need to stay within the
vgic_lock_rank/vgic_unlock_rank region.


> I cannot find an easy fix which does not involve release the lock. When I was
> reviewing this patch, I suggested to split the IRQ configuration from the
> routing.

Yes, the routing doesn't need to be done from vgic_enable_irqs. It is
not nice. That would be the ideal fix, but it is not trivial.

For 4.7 we could consider reverting 9d77b3c01d1261c. The only other
thing that I can come up with which is simple would be improving
gic_route_irq_to_guest to cope with callers that have the vgic rank lock
already held (see below, untested) but it's pretty ugly.



diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 2bfe4de..57f3f3f 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -127,15 +127,12 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const cpumask_t *cpu_mask,
 
 /* Program the GIC to route an interrupt to a guest
  *   - desc.lock must be held
+ *   - rank lock must be held
  */
-int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
+static int __gic_route_irq_to_guest(struct domain *d, unsigned int virq,
                            struct irq_desc *desc, unsigned int priority)
 {
-    unsigned long flags;
-    /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
-     * route SPIs to guests, it doesn't make any difference. */
-    struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq);
-    struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
+    struct vcpu *v_target = __vgic_get_target_vcpu(d->vcpu[0], virq);
     struct pending_irq *p = irq_to_pending(v_target, virq);
     int res = -EBUSY;
 
@@ -144,12 +141,10 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
     ASSERT(virq >= 32);
     ASSERT(virq < vgic_num_irqs(d));
 
-    vgic_lock_rank(v_target, rank, flags);
-
     if ( p->desc ||
          /* The VIRQ should not be already enabled by the guest */
          test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) )
-        goto out;
+        return res;
 
     desc->handler = gic_hw_ops->gic_guest_irq_type;
     set_bit(_IRQ_GUEST, &desc->status);
@@ -159,12 +154,36 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
     p->desc = desc;
     res = 0;
 
-out:
-    vgic_unlock_rank(v_target, rank, flags);
-
     return res;
 }
 
+int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
+                           struct irq_desc *desc, unsigned int priority)
+{
+    unsigned long flags;
+    int lock = 0, retval;
+    struct vgic_irq_rank *rank;
+
+    /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
+     * route SPIs to guests, it doesn't make any difference. */
+    rank = vgic_rank_irq(d->vcpu[0], virq);
+
+    /* Take the rank spinlock unless it has already been taken by the
+     * caller. */
+    if ( !spin_is_locked(&rank->lock) ) {
+        vgic_lock_rank(d->vcpu[0], rank, flags);
+        lock = 1;
+    }
+
+    retval = __gic_route_irq_to_guest(d, virq, desc, GIC_PRI_IRQ);
+
+    if ( lock )
+        vgic_unlock_rank(d->vcpu[0], rank, flags);
+
+    return retval;
+
+}
+
 /* This function only works with SPIs for now */
 int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
                               struct irq_desc *desc)
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index aa420bb..e45669f 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -215,7 +215,7 @@ int vcpu_vgic_free(struct vcpu *v)
 }
 
 /* The function should be called by rank lock taken. */
-static struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
+struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
 {
     struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
 
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index a2fccc0..726e690 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -343,6 +343,7 @@ void vgic_v3_setup_hw(paddr_t dbase,
                       const struct rdist_region *regions,
                       uint32_t rdist_stride);
 #endif
+struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq);
 
 #endif /* __ASM_ARM_VGIC_H__ */
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-30 13:16   ` Stefano Stabellini
@ 2016-05-30 19:45     ` Julien Grall
  2016-05-31  0:55       ` Shannon Zhao
  2016-05-31  9:40       ` Stefano Stabellini
  0 siblings, 2 replies; 11+ messages in thread
From: Julien Grall @ 2016-05-30 19:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Philip Elcan, Wei Liu, Wei Chen, Vikram Sethi, Steve Capper,
	xen-devel, Shannon Zhao, Shanker Donthineni

(CC Wei Liu)

Hi Stefano,

On 30/05/2016 14:16, Stefano Stabellini wrote:
> On Fri, 27 May 2016, Julien Grall wrote:
>> Hello Shanker,
>>
>> On 27/05/16 01:39, Shanker Donthineni wrote:
>>> Commit 9d77b3c01d1261c (Configure SPI interrupt type and route to
>>> Dom0 dynamically) causing dead loop inside the spinlock function.
>>> Note that spinlocks in XEN are not recursive. Re-acquiring a spinlock
>>> that has already held by calling CPU leads to deadlock. This happens
>>> whenever dom0 does writes to GICD regs ISENABLER/ICENABLER.
>>
>> Thank you for spotting it, I have not noticed it while I was  reviewing, only
>> tested on a model without any SPIs.
>>
>>> The following call trace explains the problem.
>>>
>>> DOM0 writes GICD_ISENABLER/GICD_ICENABLER
>>>    vgic_v3_distr_common_mmio_write()
>>>      vgic_lock_rank()  -->  acquiring first time
>>>        vgic_enable_irqs()
>>>          route_irq_to_guest()
>>>            gic_route_irq_to_guest()
>>>              vgic_get_target_vcpu()
>>>                vgic_lock_rank()  -->  attemping acquired lock
>>>
>>> The simple fix release spinlock before calling vgic_enable_irqs()
>>> and vgic_disable_irqs().
>>
>> You should explain why you think it is valid to release the lock earlier.
>>
>> In this case, I think the fix is not correct because the lock is protecting
>> both the register value and the internal state in Xen (modified by
>> vgic_enable_irqs). By releasing the lock earlier, they may become inconsistent
>> if another vCPU is disabling the IRQs at the same time.
>
> I agree, the vgic_enable_irqs call need to stay within the
> vgic_lock_rank/vgic_unlock_rank region.
>
>
>> I cannot find an easy fix which does not involve release the lock. When I was
>> reviewing this patch, I suggested to split the IRQ configuration from the
>> routing.
>
> Yes, the routing doesn't need to be done from vgic_enable_irqs. It is
> not nice. That would be the ideal fix, but it is not trivial.
>
> For 4.7 we could consider reverting 9d77b3c01d1261c. The only other
> thing that I can come up with which is simple would be improving
> gic_route_irq_to_guest to cope with callers that have the vgic rank lock
> already held (see below, untested) but it's pretty ugly.

We are close to release Xen 4.7, so I think we should avoid to touch the 
common interrupt code (i.e not only used by ACPI).

ACPI can only be enabled in expert mode and will be a tech-preview for 
Xen 4.7. So I would revert the patch.  SPIs will not be routed, but it 
is better than a deadlock.

I would also replace the patch with a warning until the issue will be 
fixed in Xen 4.8.

Any opinions?

> +int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
> +                           struct irq_desc *desc, unsigned int priority)
> +{
> +    unsigned long flags;
> +    int lock = 0, retval;
> +    struct vgic_irq_rank *rank;
> +
> +    /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
> +     * route SPIs to guests, it doesn't make any difference. */
> +    rank = vgic_rank_irq(d->vcpu[0], virq);
> +
> +    /* Take the rank spinlock unless it has already been taken by the
> +     * caller. */
> +    if ( !spin_is_locked(&rank->lock) ) {

AFAICT, spin_is_locked only tell us that someone has locked the rank. So 
this would be unsafe.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-30 19:45     ` Julien Grall
@ 2016-05-31  0:55       ` Shannon Zhao
  2016-05-31  9:40       ` Stefano Stabellini
  1 sibling, 0 replies; 11+ messages in thread
From: Shannon Zhao @ 2016-05-31  0:55 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: xen-devel, Wei Liu, Wei Chen, Vikram Sethi, Steve Capper,
	Philip Elcan, Shannon Zhao, Shanker Donthineni



On 2016/5/31 3:45, Julien Grall wrote:
> (CC Wei Liu)
> 
> Hi Stefano,
> 
> On 30/05/2016 14:16, Stefano Stabellini wrote:
>> On Fri, 27 May 2016, Julien Grall wrote:
>>> Hello Shanker,
>>>
>>> On 27/05/16 01:39, Shanker Donthineni wrote:
>>>> Commit 9d77b3c01d1261c (Configure SPI interrupt type and route to
>>>> Dom0 dynamically) causing dead loop inside the spinlock function.
>>>> Note that spinlocks in XEN are not recursive. Re-acquiring a spinlock
>>>> that has already held by calling CPU leads to deadlock. This happens
>>>> whenever dom0 does writes to GICD regs ISENABLER/ICENABLER.
>>>
>>> Thank you for spotting it, I have not noticed it while I was 
>>> reviewing, only
>>> tested on a model without any SPIs.
>>>
>>>> The following call trace explains the problem.
>>>>
>>>> DOM0 writes GICD_ISENABLER/GICD_ICENABLER
>>>>    vgic_v3_distr_common_mmio_write()
>>>>      vgic_lock_rank()  -->  acquiring first time
>>>>        vgic_enable_irqs()
>>>>          route_irq_to_guest()
>>>>            gic_route_irq_to_guest()
>>>>              vgic_get_target_vcpu()
>>>>                vgic_lock_rank()  -->  attemping acquired lock
>>>>
>>>> The simple fix release spinlock before calling vgic_enable_irqs()
>>>> and vgic_disable_irqs().
>>>
>>> You should explain why you think it is valid to release the lock
>>> earlier.
>>>
>>> In this case, I think the fix is not correct because the lock is
>>> protecting
>>> both the register value and the internal state in Xen (modified by
>>> vgic_enable_irqs). By releasing the lock earlier, they may become
>>> inconsistent
>>> if another vCPU is disabling the IRQs at the same time.
>>
>> I agree, the vgic_enable_irqs call need to stay within the
>> vgic_lock_rank/vgic_unlock_rank region.
>>
>>
>>> I cannot find an easy fix which does not involve release the lock.
>>> When I was
>>> reviewing this patch, I suggested to split the IRQ configuration from
>>> the
>>> routing.
>>
>> Yes, the routing doesn't need to be done from vgic_enable_irqs. It is
>> not nice. That would be the ideal fix, but it is not trivial.
>>
>> For 4.7 we could consider reverting 9d77b3c01d1261c. The only other
>> thing that I can come up with which is simple would be improving
>> gic_route_irq_to_guest to cope with callers that have the vgic rank lock
>> already held (see below, untested) but it's pretty ugly.
> 
> We are close to release Xen 4.7, so I think we should avoid to touch the
> common interrupt code (i.e not only used by ACPI).
> 
> ACPI can only be enabled in expert mode and will be a tech-preview for
> Xen 4.7. So I would revert the patch.  SPIs will not be routed, but it
> is better than a deadlock.
> 
> I would also replace the patch with a warning until the issue will be
> fixed in Xen 4.8.
> 
> Any opinions?

I agree and I'm so sorry for this problem.

Thanks,
-- 
Shannon


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-30 19:45     ` Julien Grall
  2016-05-31  0:55       ` Shannon Zhao
@ 2016-05-31  9:40       ` Stefano Stabellini
  2016-05-31 10:11         ` Julien Grall
  2016-05-31 11:37         ` Wei Liu
  1 sibling, 2 replies; 11+ messages in thread
From: Stefano Stabellini @ 2016-05-31  9:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: Philip Elcan, Wei Liu, Wei Chen, Vikram Sethi, Steve Capper,
	Stefano Stabellini, Shannon Zhao, xen-devel, Shanker Donthineni

On Mon, 30 May 2016, Julien Grall wrote:
> (CC Wei Liu)
> 
> Hi Stefano,
> 
> On 30/05/2016 14:16, Stefano Stabellini wrote:
> > On Fri, 27 May 2016, Julien Grall wrote:
> > > Hello Shanker,
> > > 
> > > On 27/05/16 01:39, Shanker Donthineni wrote:
> > > > Commit 9d77b3c01d1261c (Configure SPI interrupt type and route to
> > > > Dom0 dynamically) causing dead loop inside the spinlock function.
> > > > Note that spinlocks in XEN are not recursive. Re-acquiring a spinlock
> > > > that has already held by calling CPU leads to deadlock. This happens
> > > > whenever dom0 does writes to GICD regs ISENABLER/ICENABLER.
> > > 
> > > Thank you for spotting it, I have not noticed it while I was  reviewing,
> > > only
> > > tested on a model without any SPIs.
> > > 
> > > > The following call trace explains the problem.
> > > > 
> > > > DOM0 writes GICD_ISENABLER/GICD_ICENABLER
> > > >    vgic_v3_distr_common_mmio_write()
> > > >      vgic_lock_rank()  -->  acquiring first time
> > > >        vgic_enable_irqs()
> > > >          route_irq_to_guest()
> > > >            gic_route_irq_to_guest()
> > > >              vgic_get_target_vcpu()
> > > >                vgic_lock_rank()  -->  attemping acquired lock
> > > > 
> > > > The simple fix release spinlock before calling vgic_enable_irqs()
> > > > and vgic_disable_irqs().
> > > 
> > > You should explain why you think it is valid to release the lock earlier.
> > > 
> > > In this case, I think the fix is not correct because the lock is
> > > protecting
> > > both the register value and the internal state in Xen (modified by
> > > vgic_enable_irqs). By releasing the lock earlier, they may become
> > > inconsistent
> > > if another vCPU is disabling the IRQs at the same time.
> > 
> > I agree, the vgic_enable_irqs call need to stay within the
> > vgic_lock_rank/vgic_unlock_rank region.
> > 
> > 
> > > I cannot find an easy fix which does not involve release the lock. When I
> > > was
> > > reviewing this patch, I suggested to split the IRQ configuration from the
> > > routing.
> > 
> > Yes, the routing doesn't need to be done from vgic_enable_irqs. It is
> > not nice. That would be the ideal fix, but it is not trivial.
> > 
> > For 4.7 we could consider reverting 9d77b3c01d1261c. The only other
> > thing that I can come up with which is simple would be improving
> > gic_route_irq_to_guest to cope with callers that have the vgic rank lock
> > already held (see below, untested) but it's pretty ugly.
> 
> We are close to release Xen 4.7, so I think we should avoid to touch the
> common interrupt code (i.e not only used by ACPI).

Agreed. Wei, are you OK with this?


> ACPI can only be enabled in expert mode and will be a tech-preview for Xen
> 4.7. So I would revert the patch.  SPIs will not be routed, but it is better
> than a deadlock.
> 
> I would also replace the patch with a warning until the issue will be fixed in
> Xen 4.8.
> 
> Any opinions?
> 
> > +int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
> > +                           struct irq_desc *desc, unsigned int priority)
> > +{
> > +    unsigned long flags;
> > +    int lock = 0, retval;
> > +    struct vgic_irq_rank *rank;
> > +
> > +    /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
> > +     * route SPIs to guests, it doesn't make any difference. */
> > +    rank = vgic_rank_irq(d->vcpu[0], virq);
> > +
> > +    /* Take the rank spinlock unless it has already been taken by the
> > +     * caller. */
> > +    if ( !spin_is_locked(&rank->lock) ) {
> 
> AFAICT, spin_is_locked only tell us that someone has locked the rank. So this
> would be unsafe.

The code is checking if the lock is already taken, and if it is not
taken, it will take the lock. The purpose of this code is to
allow gic_route_irq_to_guest to be called by both functions which
already have the lock held and functions that do not. The same goal
could be achieved by duplicating gic_route_irq_to_guest into two
identical functions except for the lock taking. That would be
admittedly a more obvious fix but also a particularly ugly one.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-31  9:40       ` Stefano Stabellini
@ 2016-05-31 10:11         ` Julien Grall
  2016-06-01  9:54           ` Stefano Stabellini
  2016-05-31 11:37         ` Wei Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Julien Grall @ 2016-05-31 10:11 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Philip Elcan, Wei Liu, Wei Chen, Vikram Sethi, Steve Capper,
	xen-devel, Shannon Zhao, Shanker Donthineni

Hi Stefano,

On 31/05/16 10:40, Stefano Stabellini wrote:
> On Mon, 30 May 2016, Julien Grall wrote:
>> ACPI can only be enabled in expert mode and will be a tech-preview for Xen
>> 4.7. So I would revert the patch.  SPIs will not be routed, but it is better
>> than a deadlock.
>>
>> I would also replace the patch with a warning until the issue will be fixed in
>> Xen 4.8.
>>
>> Any opinions?
>>
>>> +int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
>>> +                           struct irq_desc *desc, unsigned int priority)
>>> +{
>>> +    unsigned long flags;
>>> +    int lock = 0, retval;
>>> +    struct vgic_irq_rank *rank;
>>> +
>>> +    /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
>>> +     * route SPIs to guests, it doesn't make any difference. */
>>> +    rank = vgic_rank_irq(d->vcpu[0], virq);
>>> +
>>> +    /* Take the rank spinlock unless it has already been taken by the
>>> +     * caller. */
>>> +    if ( !spin_is_locked(&rank->lock) ) {
>>
>> AFAICT, spin_is_locked only tell us that someone has locked the rank. So this
>> would be unsafe.
>
> The code is checking if the lock is already taken, and if it is not
> taken, it will take the lock. The purpose of this code is to
> allow gic_route_irq_to_guest to be called by both functions which
> already have the lock held and functions that do not. The same goal
> could be achieved by duplicating gic_route_irq_to_guest into two
> identical functions except for the lock taking. That would be
> admittedly a more obvious fix but also a particularly ugly one.

spin_is_locked does not work as you expect. The function will not tell 
you if the lock was taken by the current CPU, but if the lock was taken 
by *a* CPU.

It would be possible to have CPU A calling this function and have CPU B 
with the lock taken. So the data structure would be accessed by 2 CPUs 
concurrently, which is unsafe.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-31  9:40       ` Stefano Stabellini
  2016-05-31 10:11         ` Julien Grall
@ 2016-05-31 11:37         ` Wei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Wei Liu @ 2016-05-31 11:37 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Philip Elcan, Wei Liu, Wei Chen, Vikram Sethi, Steve Capper,
	Julien Grall, xen-devel, Shannon Zhao, Shanker Donthineni

On Tue, May 31, 2016 at 10:40:13AM +0100, Stefano Stabellini wrote:
> On Mon, 30 May 2016, Julien Grall wrote:
> > (CC Wei Liu)
> > 
> > Hi Stefano,
> > 
> > On 30/05/2016 14:16, Stefano Stabellini wrote:
> > > On Fri, 27 May 2016, Julien Grall wrote:
> > > > Hello Shanker,
> > > > 
> > > > On 27/05/16 01:39, Shanker Donthineni wrote:
> > > > > Commit 9d77b3c01d1261c (Configure SPI interrupt type and route to
> > > > > Dom0 dynamically) causing dead loop inside the spinlock function.
> > > > > Note that spinlocks in XEN are not recursive. Re-acquiring a spinlock
> > > > > that has already held by calling CPU leads to deadlock. This happens
> > > > > whenever dom0 does writes to GICD regs ISENABLER/ICENABLER.
> > > > 
> > > > Thank you for spotting it, I have not noticed it while I was  reviewing,
> > > > only
> > > > tested on a model without any SPIs.
> > > > 
> > > > > The following call trace explains the problem.
> > > > > 
> > > > > DOM0 writes GICD_ISENABLER/GICD_ICENABLER
> > > > >    vgic_v3_distr_common_mmio_write()
> > > > >      vgic_lock_rank()  -->  acquiring first time
> > > > >        vgic_enable_irqs()
> > > > >          route_irq_to_guest()
> > > > >            gic_route_irq_to_guest()
> > > > >              vgic_get_target_vcpu()
> > > > >                vgic_lock_rank()  -->  attemping acquired lock
> > > > > 
> > > > > The simple fix release spinlock before calling vgic_enable_irqs()
> > > > > and vgic_disable_irqs().
> > > > 
> > > > You should explain why you think it is valid to release the lock earlier.
> > > > 
> > > > In this case, I think the fix is not correct because the lock is
> > > > protecting
> > > > both the register value and the internal state in Xen (modified by
> > > > vgic_enable_irqs). By releasing the lock earlier, they may become
> > > > inconsistent
> > > > if another vCPU is disabling the IRQs at the same time.
> > > 
> > > I agree, the vgic_enable_irqs call need to stay within the
> > > vgic_lock_rank/vgic_unlock_rank region.
> > > 
> > > 
> > > > I cannot find an easy fix which does not involve release the lock. When I
> > > > was
> > > > reviewing this patch, I suggested to split the IRQ configuration from the
> > > > routing.
> > > 
> > > Yes, the routing doesn't need to be done from vgic_enable_irqs. It is
> > > not nice. That would be the ideal fix, but it is not trivial.
> > > 
> > > For 4.7 we could consider reverting 9d77b3c01d1261c. The only other
> > > thing that I can come up with which is simple would be improving
> > > gic_route_irq_to_guest to cope with callers that have the vgic rank lock
> > > already held (see below, untested) but it's pretty ugly.
> > 
> > We are close to release Xen 4.7, so I think we should avoid to touch the
> > common interrupt code (i.e not only used by ACPI).
> 
> Agreed. Wei, are you OK with this?
> 

Bare in mind that I haven't looked into the issue in details, but in
principle I agree we should avoid touching common code at this stage.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-05-31 10:11         ` Julien Grall
@ 2016-06-01  9:54           ` Stefano Stabellini
  2016-06-01 10:49             ` Julien Grall
  0 siblings, 1 reply; 11+ messages in thread
From: Stefano Stabellini @ 2016-06-01  9:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Philip Elcan, Wei Liu, Wei Chen, Vikram Sethi, Steve Capper,
	Stefano Stabellini, Shannon Zhao, xen-devel, Shanker Donthineni

On Tue, 31 May 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 31/05/16 10:40, Stefano Stabellini wrote:
> > On Mon, 30 May 2016, Julien Grall wrote:
> > > ACPI can only be enabled in expert mode and will be a tech-preview for Xen
> > > 4.7. So I would revert the patch.  SPIs will not be routed, but it is
> > > better
> > > than a deadlock.
> > > 
> > > I would also replace the patch with a warning until the issue will be
> > > fixed in
> > > Xen 4.8.
> > > 
> > > Any opinions?
> > > 
> > > > +int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
> > > > +                           struct irq_desc *desc, unsigned int
> > > > priority)
> > > > +{
> > > > +    unsigned long flags;
> > > > +    int lock = 0, retval;
> > > > +    struct vgic_irq_rank *rank;
> > > > +
> > > > +    /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
> > > > +     * route SPIs to guests, it doesn't make any difference. */
> > > > +    rank = vgic_rank_irq(d->vcpu[0], virq);
> > > > +
> > > > +    /* Take the rank spinlock unless it has already been taken by the
> > > > +     * caller. */
> > > > +    if ( !spin_is_locked(&rank->lock) ) {
> > > 
> > > AFAICT, spin_is_locked only tell us that someone has locked the rank. So
> > > this
> > > would be unsafe.
> > 
> > The code is checking if the lock is already taken, and if it is not
> > taken, it will take the lock. The purpose of this code is to
> > allow gic_route_irq_to_guest to be called by both functions which
> > already have the lock held and functions that do not. The same goal
> > could be achieved by duplicating gic_route_irq_to_guest into two
> > identical functions except for the lock taking. That would be
> > admittedly a more obvious fix but also a particularly ugly one.
> 
> spin_is_locked does not work as you expect. The function will not tell you if
> the lock was taken by the current CPU, but if the lock was taken by *a* CPU.
> 
> It would be possible to have CPU A calling this function and have CPU B with
> the lock taken. So the data structure would be accessed by 2 CPUs
> concurrently, which is unsafe.

Damn, you are right. I don't think we have a spin_lock function which
tells us if the spin_lock was taken by us.

The only other option I see would be duplicating route_irq_to_guest and
gic_route_irq_to_guest, introducing a second version of those functions
which assume that the rank lock was already taken. Very very ugly. I'll
just revert the commit and wait for better patches from Shannon.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-06-01  9:54           ` Stefano Stabellini
@ 2016-06-01 10:49             ` Julien Grall
  2016-06-01 13:55               ` Shannon Zhao
  0 siblings, 1 reply; 11+ messages in thread
From: Julien Grall @ 2016-06-01 10:49 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Philip Elcan, Wei Liu, Wei Chen, Vikram Sethi, Steve Capper,
	xen-devel, Shannon Zhao, Shanker Donthineni

Hi Stefano,

On 01/06/16 10:54, Stefano Stabellini wrote:
>> spin_is_locked does not work as you expect. The function will not tell you if
>> the lock was taken by the current CPU, but if the lock was taken by *a* CPU.
>>
>> It would be possible to have CPU A calling this function and have CPU B with
>> the lock taken. So the data structure would be accessed by 2 CPUs
>> concurrently, which is unsafe.
>
> Damn, you are right. I don't think we have a spin_lock function which
> tells us if the spin_lock was taken by us.

Unfortunately not.

> The only other option I see would be duplicating route_irq_to_guest and
> gic_route_irq_to_guest, introducing a second version of those functions
> which assume that the rank lock was already taken. Very very ugly. I'll
> just revert the commit and wait for better patches from Shannon.

I am working on a patch series to decouple IRQ configuration and 
routing. It should be ready soon.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank()
  2016-06-01 10:49             ` Julien Grall
@ 2016-06-01 13:55               ` Shannon Zhao
  0 siblings, 0 replies; 11+ messages in thread
From: Shannon Zhao @ 2016-06-01 13:55 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Philip Elcan, Wei Liu, Vikram Sethi, Wei Chen, Steve Capper,
	xen-devel, Shanker Donthineni

On 2016年06月01日 18:49, Julien Grall wrote:
> Hi Stefano,
> 
> On 01/06/16 10:54, Stefano Stabellini wrote:
>>> spin_is_locked does not work as you expect. The function will not
>>> tell you if
>>> the lock was taken by the current CPU, but if the lock was taken by
>>> *a* CPU.
>>>
>>> It would be possible to have CPU A calling this function and have CPU
>>> B with
>>> the lock taken. So the data structure would be accessed by 2 CPUs
>>> concurrently, which is unsafe.
>>
>> Damn, you are right. I don't think we have a spin_lock function which
>> tells us if the spin_lock was taken by us.
> 
> Unfortunately not.
> 
>> The only other option I see would be duplicating route_irq_to_guest and
>> gic_route_irq_to_guest, introducing a second version of those functions
>> which assume that the rank lock was already taken. Very very ugly. I'll
>> just revert the commit and wait for better patches from Shannon.
> 
> I am working on a patch series to decouple IRQ configuration and
> routing. It should be ready soon.
> 
Thanks, Julien :)

-- 
Shannon

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-06-01 13:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-27  0:39 [PATCH] arm/acpi: Fix the deadlock in function vgic_lock_rank() Shanker Donthineni
2016-05-27 13:56 ` Julien Grall
2016-05-30 13:16   ` Stefano Stabellini
2016-05-30 19:45     ` Julien Grall
2016-05-31  0:55       ` Shannon Zhao
2016-05-31  9:40       ` Stefano Stabellini
2016-05-31 10:11         ` Julien Grall
2016-06-01  9:54           ` Stefano Stabellini
2016-06-01 10:49             ` Julien Grall
2016-06-01 13:55               ` Shannon Zhao
2016-05-31 11:37         ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).