> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Friday, April 29, 2011 7:29 AM > > On 04/25/2011 10:52 PM, Tian, Kevin wrote: > >> From: MaoXiaoyun > >> Sent: Monday, April 25, 2011 11:15 AM > >>> Date: Fri, 15 Apr 2011 14:22:29 -0700 > >>> From: jeremy@goop.org > >>> To: tinnycloud@hotmail.com > >>> CC: giamteckchoon@gmail.com; xen-devel@lists.xensource.com; > >>> konrad.wilk@oracle.com > >>> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61 > >>> > >>> On 04/15/2011 05:23 AM, MaoXiaoyun wrote: > >>>> Hi: > >>>> > >>>> Could the crash related to this patch ? > >>>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdi > >>>> ff;h=45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3 > >>>> > >>>> Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is > >>>> before cpumask_clear_cpu(line 49). > >>>> Could it possible that right after execute line 40 of > >>>> mmu_context.h, CPU revice IPI from other CPU to flush the mm, and > >>>> when in interrupt, find the TLB state happened to be TLBSTATE_OK. > >>>> Which conflicts. > >>> Does reverting it help? > >>> > >>> J > >> > >> Hi Jeremy: > >> > >> The lastest test result shows the reverting didn't help. > >> Kernel panic exactly at the same place in tlb.c. > >> > >> I have question about TLB state, from the stack, > >> xen_do_hypervisor_callback-> xen_evtchn_do_upcall->... > >> ->drop_other_mm_ref > >> > >> What cpu_tlbstate.state should be, could TLBSTATE_OK or > TLBSTATE_LAZY all be possible? > >> That is after a hypercall from userspace, state will be TLBSTATE_OK, > and > >> if from kernel space, state will be TLBSTATE_LAZE ? > >> > >> thanks. > > it looks a bug in drop_other_mm_ref implementation, that current TLB > > state should be checked before invoking leave_mm(). There's a window > between below lines of code: > > > > > > /* Get the "official" set of cpus referring to our pagetable. */ > > if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) { > > for_each_online_cpu(cpu) { > > if (!cpumask_test_cpu(cpu, > mm_cpumask(mm)) > > && per_cpu(xen_current_cr3, cpu) != > __pa(mm->pgd)) > > continue; > > smp_call_function_single(cpu, > drop_other_mm_ref, mm, 1); > > } > > return; > > } > > > > there's chance that when smp_call_function_single is invoked, actual > > TLB state has been updated in the other cpu. The upstream kernel patch > > you referred to earlier just makes this bug exposed more easily. But > > even without this patch, you may still suffer such issue which is why reverting > the patch doesn't help. > > > > Could you try adding a check in drop_other_mm_ref? > > > > if (active_mm == mm && percpu_read(cpu_tlbstate.state) != > TLBSTATE_OK) > > leave_mm(smp_processor_id()); > > > > once the interrupted context has TLBSTATE_OK, it implicates that later > > it will handle the TLB flush and thus no need for leave_mm from > > interrupt handler, and that's the assumption of doing leave_mm. > > That seems reasonable. MaoXiaoyun, does it fix the bug for you? > > Kevin, could you submit this as a proper patch? > I'm waiting for Xiaoyun's test result before submitting a proper patch, since this part of logic is tricky and his test can make sure we don't overlook some corner cases. :-) Thanks Kevin