On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote:
> On March 11, 2016 11:25am, <mengxu@cis.upenn.edu> wrote:
> > 
> > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu <quan.xu@intel.com> wrote:
> > > 
> > > pcidevs_lock should be held with interrupt enabled. However there
> > > remains an exception in AMD IOMMU code, where the lock is
> > > acquired
> > > with interrupt disabled. This inconsistency might lead to
> > > deadlock.
> > Why will this inconsistency lead to deadlock?
> > I understand the difference between spin_lock_irqsave(), which
> > disable interrupt,
> > and spin_lock(), which allows interrupt, however, I didn't get why
> > misuse the
> > spin_lock_irqsave() at the place of spin_lock() could potentially
> > lead to a
> > deadlock?
> 
>  1).As Jan mentioned, The implication from disabling interrupts while
> acquiring a lock is that the lock is also being acquired by some
> interrupt handler.
>   If you mix acquire types, the one not disabling interrupts is prone
> to be interrupted, and the interrupt trying to get hold of the lock
> the same CPU already owns.
> 
The key issue is whether or not a lock can be acquired from IRQ context
(i.e., in an interrupt handler, or a function called by that, as a
result of an interrupt occurring).

For lock that can, IRQ disabling variants must be used *everywhere* the
lock is taken (so, e.g., not only when it is actually taken from IRQ
context, just *always*!).

If that rule is not honored, we are ok if the lock is taken on CPUA,
and the interrupt handled on CPUB:

  CPUA              CPUB
  .                 .
  .                 .
  spin_lock(l)      .
  .                 .
  .                 . <-- Interrupt!
  .                        .
  .                       spin_lock(l); //spins on l
  .                       x             //spins on l
  .                       x             //spins on l
  spin_unlock(l)          .             //takes l
  .                       .
  .                       spin_unlock(l);
  .                 . <-- .
  .                 .

but the following can happen, if the interrupt is delivered to the CPU
that has already taken the lock:

     CPU
     .
     .
[1]  spin_lock(l);       //takes l
     .
     . <-- Interrupt!
           .
           spin_lock(l) // CPU deadlocks

If, in the latter case, spin_lock_irq(l) were used at [1], things would
have been fine, as the interrupt wouldn't have occurred until l weren't
released:

  CPU
  .
  .
  spin_lock_irq(l)                        //takes l
  .
  . <---------------- Interrupt!
  .                   -                   //IRQs are disabled
  .                   -                   //IRQs are disabled
  .                   -                   //IRQs are disabled
  spin_unlock_irq(l)  .                   //IRQs re-hanbled
                      spin_lock_irq(l);   //takes l
                      .
                      .
                      spin_unlock_irq(l);
 . <----------------- .
 .

Note that whether spin_lock_irq() or spin_lock_irqsave() should be
used, is completely independent from this, and it must be decided
according to whether IRQs are disabled already or not when taking the
lock. And (quite obviously, but since wre're here) it is fine to mix
spin_lock_irq() and spin_lock_irqsave(), as they both disable
interrupts, which is what matters.

So, now, if we know for sure that a lock is _never_ever_ever_ taken
from interrupt context, can we mix spin_lock() and spin_lock_irq() on
it (for whatever reason)? Well, as far as the above reasoning is
concerned, yes.

In fact, the deadlock arises because IRQs interrupt asynchronously what
the CPU is doing, and that can happen when the CPU has taken the lock
already. But if the 'asynchronous' part goes away, we really don't care
whether a lock is take at time t1 with IRQ enabled, and at time t2 with
IRQ disabled, don't you think?

Well, here it is where what the comment inside check_lock() describes
comes into play. As quoted by Qaun already:

>  2). Comment inside check_lock(), 
> we partition locks into IRQ-safe (always held with IRQs disabled) and
> IRQ-unsafe (always held with IRQs enabled) types. The convention for
> every lock must be consistently observed else we can deadlock in
> IRQ-context rendezvous functions (__a rendezvous which gets every CPU
> into IRQ context before any CPU is released from the rendezvous__).
> If we can mix IRQ-disabled and IRQ-enabled callers, the following can
> happen:
>  * Lock is held by CPU A, with IRQs enabled
>  * CPU B is spinning on same lock, with IRQs disabled
>  * Rendezvous starts -- CPU A takes interrupt and enters rendezbous
> spin
>  * DEADLOCK -- CPU B will never enter rendezvous, CPU A will never
> exit
>                the rendezvous, and will hence never release the lock.
> 
> To guard against this subtle bug we latch the IRQ safety of every
> spinlock in the system, on first use.
> 
This is a very good safety measure, but it can be quite a restriction
in some (luckily limited) cases. And that's why it is possible --
although absolutely discouraged-- to relax it. See, for an example of
this, the comment in start_secondary(), in xen/arch/x86/smpboot.c:

    ...
    /*
     * Just as during early bootstrap, it is convenient here to disable
     * spinlock checking while we have IRQs disabled. This allows us to
     * acquire IRQ-unsafe locks when it would otherwise be disallowed.
     * 
     * It is safe because the race we are usually trying to avoid involves
     * a group of CPUs rendezvousing in an IPI handler, where one cannot
     * join because it is spinning with IRQs disabled waiting to acquire a
     * lock held by another in the rendezvous group (the lock must be an
     * IRQ-unsafe lock since the CPU took the IPI after acquiring it, and
     * hence had IRQs enabled). This is a deadlock scenario.
     * 
     * However, no CPU can be involved in rendezvous until it is online,
     * hence no such group can be waiting for this CPU until it is
     * visible in cpu_online_map. Hence such a deadlock is not possible.
     */
    spin_debug_disable();
    ...

Just FTR, at least as far as I can remember, Linux does not enforce
anything like that.

Hope this clarifies things.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)