[PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix

* [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
@ 2014-01-27 17:33 Oleksandr Tyshchenko
  2014-01-27 17:33 ` [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function Oleksandr Tyshchenko
                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-27 17:33 UTC (permalink / raw)
  To: xen-devel

Hi, all.

We are trying to bringing up XEN on DRA7XX (OMAP5) platform.

We sometimes see some hangs in Hypervisor and these hangs are related to SMP.
We found out that deadlock took place in on_selected_cpus function
in case of simultaneous occurrence cross-interrupts.

The issue:

1. We receive irqs from first CPU (for example CPU0) and second CPU (for example CPU1) in parallel.
2. In our case the maintenance_interrupt function for maintenance irq from CPU0 is executed on CPU1 and
maintenance_interrupt for irq from CPU1 is executed on CPU0.
3. According to existing logic we have run gic_irq_eoi function on CPU which it was scheduled.
4. Due to this in both cases we need to call on_selected_cpus function to EOI irqs.
5. For the CPU0 on_selected_cpus function is called where we take a lock in the beginning of the function
and continue to execute it.
6. Parallel to this the same function is called for the CPU1 where we stop after attempting to take a lock because it is already holding.  
7. For the CPU0 we send IPI and going to wait until CPU1 execute function and cleared cpumask.
8. But the mask will never be cleaned, because CPU1 is waiting too. 

Now, we have next situation. The CPU0 can not exit from busy loop, it is waiting CPU1 to execute function and clear mask, but CPU0 is waiting to release lock. 
This causes to deadlock.  

Since as we needed solution to avoid hangs the attached patch was created. The solution is just to
call the smp_call_function_interrupt function if lock is holding. This causes the waiting CPU to exit from busy loop and release lock.
But I am afraid this solution not completed and maybe not enough for stable work. I would appreciate if you could explain me how to solve the issue in a right way or give some advices.

P.S. We use next SW:
1. Hypervisor - XEN 4.4 unstable
2. Dom0 - Kernel 3.8
3. DomU - Kernel 3.8 

Oleksandr Tyshchenko (2):
  xen/arm: Add return value to smp_call_function_interrupt function
  xen/arm: Fix deadlock in on_selected_cpus function

 xen/common/smp.c      |   13 ++++++++++---
 xen/include/xen/smp.h |    2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 48+ messages in thread