On Fri, 2016-03-11 at 09:49 -0500, Meng Xu wrote: > > Yes. > > Consistency may be helpful to avoid some easy-to-avoid lock errors. > > Moreover, without my fix, I think it would not lead dead lock, as > > the pcidevs_lock is not being taken > > In IRQ context. Right? > I think without your fix, the deadlock may still happen due to the > rendezvous condition. >            CPU A                                |    CPU B >      | CPU C > Step 1| spin_lock                           | > Step 2|                                           | > spin_lock_irq           | > Step 3|                                            | wait for A to > unlock | > Step 4| >               | send rendezvous IPI to A and B > Step 5| receive IPI                             | wait for A to > unlock | > Step 6| wait for B to handle the IPI    | wait for A to unlock | > Step 7| spin_unlock > > > Deadlock occurs at Step 6, IMO. > > Unless we can prove that rendezvous won't happen while > spin_lock_irqsave is taken, we have the deadlock hazard. > Yes. But, in the case of Quan's patch (without it, I mean), have you seen where in the code it is that we use spin_lock_irqsave()? It's inside a function that is called during Xen boot, whose callchain starts with iommu_setup(), from __start_xen(). Here's a (big, sorry) code snippet of what is around iommu_setup():     ...     init_idle_domain();     this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),                                            &this_cpu(stubs).mfn);     BUG_ON(!this_cpu(stubs.addr));     trap_init();     rcu_init();     early_time_init();     arch_init_memory();     alternative_instructions();     local_irq_enable();     pt_pci_init();     vesa_mtrr_init();     acpi_mmcfg_init();     early_msi_init();     iommu_setup();    /* setup iommu if available */     smp_prepare_cpus(max_cpus);     spin_debug_enable();     /*      * Initialise higher-level timer functions. We do this fairly late      * (after interrupts got enabled) because the time bases and scale      * factors need to be updated regularly.      */     init_xen_time();     initialize_keytable();     console_init_postirq();     system_state = SYS_STATE_smp_boot;     do_presmp_initcalls();     for_each_present_cpu ( i )     {         /* Set up cpu_to_node[]. */         srat_detect_node(i);         /* Set up node_to_cpumask based on cpu_to_node[]. */         numa_add_cpu(i);         if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )         {             int ret = cpu_up(i);             if ( ret != 0 )                 printk("Failed to bring up CPU %u (error %d)\n", i, ret);         }     }     printk("Brought up %ld CPUs\n", (long)num_online_cpus()); ... As you can see, it is only *after* iommu_setup() that we call functions like smp_prepare_cpus(), do_presmp_initcalls(), and then the loop that waits for all the present CPUs to come online. What that means is that, at iommu_setup() time, there still is only one CPU online, and there is not much chances that one single CPU deadlocks in a rendezvous! Honestly, the biggest issue that I think Quan's patch solves, is that if we ever want/manage to move spin_debug_enable() up above it, then the BUG_ON in check_lock() would trigger the first time that pcidevs_lock would be taken with interrupts enabled. Until then, code is technically fine, and, as a matter of fact, I think that removing the locking from that particular instance would be an equally effective fix! All that being said, consistency is indeed important, and for the sake of it and for other reasons too, even if, strictly speaking, there isn't any actual buggy behavior to be fixed here, and it is worthwhile conforming to a locking pattern that is consistent with the rules that we sat ourselves, unless there's specific reasons not to. Regards, Dario -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)