On Fri, 2016-03-11 at 09:49 -0500, Meng Xu wrote:
> > Yes.
> > Consistency may be helpful to avoid some easy-to-avoid lock errors.
> > Moreover, without my fix, I think it would not lead dead lock, as
> > the pcidevs_lock is not being taken
> > In IRQ context. Right?
> I think without your fix, the deadlock may still happen due to the
> rendezvous condition.
>            CPU A                                |    CPU B
>      | CPU C
> Step 1| spin_lock                           |
> Step 2|                                           |
> spin_lock_irq           |
> Step 3|                                            | wait for A to
> unlock |
> Step 4|
>               | send rendezvous IPI to A and B
> Step 5| receive IPI                             | wait for A to
> unlock |
> Step 6| wait for B to handle the IPI    | wait for A to unlock |
> Step 7| spin_unlock
> 
> 
> Deadlock occurs at Step 6, IMO.
> 
> Unless we can prove that rendezvous won't happen while
> spin_lock_irqsave is taken, we have the deadlock hazard.
> 
Yes. But, in the case of Quan's patch (without it, I mean), have you
seen where in the code it is that we use spin_lock_irqsave()?

It's inside a function that is called during Xen boot, whose callchain
starts with iommu_setup(), from __start_xen(). Here's a (big, sorry)
code snippet of what is around iommu_setup():

    ...
    init_idle_domain();

    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
                                           &this_cpu(stubs).mfn);
    BUG_ON(!this_cpu(stubs.addr));

    trap_init();
    rcu_init();

    early_time_init();

    arch_init_memory();

    alternative_instructions();

    local_irq_enable();

    pt_pci_init();

    vesa_mtrr_init();

    acpi_mmcfg_init();

    early_msi_init();

    iommu_setup();    /* setup iommu if available */

    smp_prepare_cpus(max_cpus);

    spin_debug_enable();

    /*
     * Initialise higher-level timer functions. We do this fairly late
     * (after interrupts got enabled) because the time bases and scale
     * factors need to be updated regularly.
     */
    init_xen_time();
    initialize_keytable();
    console_init_postirq();

    system_state = SYS_STATE_smp_boot;
    do_presmp_initcalls();
    for_each_present_cpu ( i )
    {
        /* Set up cpu_to_node[]. */
        srat_detect_node(i);
        /* Set up node_to_cpumask based on cpu_to_node[]. */
        numa_add_cpu(i);

        if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
        {
            int ret = cpu_up(i);
            if ( ret != 0 )
                printk("Failed to bring up CPU %u (error %d)\n", i, ret);
        }
    }
    printk("Brought up %ld CPUs\n", (long)num_online_cpus());
    ...

As you can see, it is only *after* iommu_setup() that we call functions
like smp_prepare_cpus(), do_presmp_initcalls(), and then the loop that
waits for all the present CPUs to come online.

What that means is that, at iommu_setup() time, there still is only one
CPU online, and there is not much chances that one single CPU deadlocks
in a rendezvous!

Honestly, the biggest issue that I think Quan's patch solves, is that
if we ever want/manage to move spin_debug_enable() up above it, then
the BUG_ON in check_lock() would trigger the first time that
pcidevs_lock would be taken with interrupts enabled.

Until then, code is technically fine, and, as a matter of fact, I think
that removing the locking from that particular instance would be an
equally effective fix!

All that being said, consistency is indeed important, and for the sake
of it and for other reasons too, even if, strictly speaking, there
isn't any actual buggy behavior to be fixed here, and it is worthwhile
conforming to a locking pattern that is consistent with the rules that
we sat ourselves, unless there's specific reasons not to.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)