From mboxrd@z Thu Jan 1 00:00:00 1970 From: MaoXiaoyun Subject: RE: RE: Kernel BUG at arch/x86/mm/tlb.c:61 Date: Tue, 26 Apr 2011 15:04:31 +0800 Message-ID: References: , , , , , , , , , , , , , <4DA3438A.6070503@goop.org>, , , , , , <20110412100000.GA15647@dumpdata.com>, , , , , , , , , , <4DA8B715.9080508@goop.org>, , <625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@shsmsx502.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2001338816==" Return-path: In-Reply-To: <625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@shsmsx502.ccr.corp.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: kevin.tian@intel.com, jeremy@goop.org Cc: xen devel , giamteckchoon@gmail.com, konrad.wilk@oracle.com List-Id: xen-devel@lists.xenproject.org --===============2001338816== Content-Type: multipart/alternative; boundary="_b56b87da-8107-429c-92ce-5ec99c537358_" --_b56b87da-8107-429c-92ce-5ec99c537358_ Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable Many thanks, Kevin. =20 I agree on the race window. One thing more, In my understaning, the CPU who send out IPI message, wi= ll unpin the pagetable after=20 receive all ACKS from other cpu, if the CPU who received IPI message, = enter drop_other_mm_ref, and=20 has TLBSTATE_OK, does nothing, will it possible it possible confronts wit= h stale pagetable (that is unpinned by sender CPU)? =20 So do we need flush tlb when its state is TBLSTATE_OK? =20 if (active_mm =3D=3D mm){ if (percpu_read(cpu_tlbstate.state) =3D=3D TLBSTATE_OK) load_cr3(mm->pgd) else leave_mm(smp_processor_id()); } =20 > From: kevin.tian@intel.com > To: tinnycloud@hotmail.com; jeremy@goop.org > CC: xen-devel@lists.xensource.com; giamteckchoon@gmail.com; konrad.wilk= @oracle.com > Date: Tue, 26 Apr 2011 13:52:11 +0800 > Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61 >=20 > >From: MaoXiaoyun > >Sent: Monday, April 25, 2011 11:15 AM > >> Date: Fri, 15 Apr 2011 14:22:29 -0700 > >> From: jeremy@goop.org > >> To: tinnycloud@hotmail.com > >> CC: giamteckchoon@gmail.com; xen-devel@lists.xensource.com; konrad.w= ilk@oracle.com > >> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61 > >>=20 > >> On 04/15/2011 05:23 AM, MaoXiaoyun wrote: > >> > Hi=A3=BA > >> > > >> > Could the crash related to this patch ? > >> > http://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/xen.git;a=3Dcom= mitdiff;h=3D45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3 > >> > > >> > Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is bef= ore > >> > cpumask_clear_cpu(line 49). > >> > Could it possible that right after execute line 40 of mmu_context.= h, > >> > CPU revice IPI from other CPU to > >> > flush the mm, and when in interrupt, find the TLB state happened t= o be > >> > TLBSTATE_OK. Which conflicts. > >>=20 > >> Does reverting it help? > >>=20 > >> J > >=20 > >Hi Jeremy: > >=20 > > The lastest test result shows the reverting didn't help. > > Kernel panic exactly at the same place in tlb.c. > >=20 > > I have question about TLB state, from the stack,=20 > > xen_do_hypervisor_callback-> xen_evtchn_do_upcall->... ->drop_othe= r_mm_ref > >=20 > > What cpu_tlbstate.state should be, could TLBSTATE_OK or TLBSTAT= E_LAZY all be possible?=20 > > That is after a hypercall from userspace, state will be TLBSTATE_O= K, and > > if from kernel space, state will be TLBSTATE_LAZE ?=20 > >=20 > > thanks. >=20 > it looks a bug in drop_other_mm_ref implementation, that current TLB st= ate should be checked > before invoking leave_mm(). There's a window between below lines of cod= e: >=20 > > /* Get the "official" set of cpus referring to our pagetable. */ > if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) { > for_each_online_cpu(cpu) { > if (!cpumask_test_cpu(cpu, mm_cpumask(mm)) > && per_cpu(xen_current_cr3, cpu) !=3D __pa(mm->pgd)) > continue; > smp_call_function_single(cpu, drop_other_mm_ref, mm, 1); > } > return; > } >=20 > there's chance that when smp_call_function_single is invoked, actual TL= B state has been > updated in the other cpu. The upstream kernel patch you referred to ear= lier just makes > this bug exposed more easily. But even without this patch, you may stil= l suffer such issue > which is why reverting the patch doesn't help. >=20 > Could you try adding a check in drop_other_mm_ref? >=20 > if (active_mm =3D=3D mm && percpu_read(cpu_tlbstate.state) !=3D TLBSTAT= E_OK) > leave_mm(smp_processor_id()); >=20 > once the interrupted context has TLBSTATE_OK, it implicates that later = it will handle=20 > the TLB flush and thus no need for leave_mm from interrupt handler, and= that's the > assumption of doing leave_mm. >=20 > Thanks > Kevin =20 --_b56b87da-8107-429c-92ce-5ec99c537358_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable Many thanks, Kevin.
 
I agree on the race window.
One thing more,  In my understaning, the CPU who send out IPI messag= e, will unpin the pagetable after
receive all ACKS  from other cpu,  if the CPU who received=  IPI message, enter drop_other_mm_ref, and
has TLBSTATE_OK, does nothing, will it possible it possible confronts wit= h stale pagetable
(that is unpinned by sender CPU)?
 
So do we need flush tlb when its state is TBLSTATE_OK?
 

if (active_mm =3D=3D mm){

     if (percpu_read(cpu_tlbstate= .state) =3D=3D TLBSTATE_OK)

        load_cr3(m= m->pgd)

     else

           = ;     leave_mm(smp_processor_id());=

 }

 

> From: kevin.tian@intel.com
> To: tinnycloud@hotmail.com; jerem= y@goop.org
> CC: xen-devel@lists.xensource.com; giamteckchoon@gmail= .com; konrad.wilk@oracle.com
> Date: Tue, 26 Apr 2011 13:52:11 +080= 0
> Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61=
>
> >From: MaoXiaoyun
> >Sent: Monday, April 25= , 2011 11:15 AM
> >> Date: Fri, 15 Apr 2011 14:22:29 -0700> >> From: jeremy@goop.org
> >> To: tinnycloud@hotm= ail.com
> >> CC: giamteckchoon@gmail.com; xen-devel@lists.xen= source.com; konrad.wilk@oracle.com
> >> Subject: Re: Kernel B= UG at arch/x86/mm/tlb.c:61
> >>
> >> On 04/15/20= 11 05:23 AM, MaoXiaoyun wrote:
> >> > Hi=A3=BA
> >= ;> >
> >> > Could the crash related to this patch ?<= BR>> >> > http://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/= xen.git;a=3Dcommitdiff;h=3D45bfd7bfc6cf32f8e60bb91b 32349f0b5090eea3
> >> >
> >> > Since now T= LB state change to TLBSTATE_OK(mmu_context.h:40) is before
> >&g= t; > cpumask_clear_cpu(line 49).
> >> > Could it possib= le that right after execute line 40 of mmu_context.h,
> >> &g= t; CPU revice IPI from other CPU to
> >> > flush the mm, a= nd when in interrupt, find the TLB state happened to be
> >> = > TLBSTATE_OK. Which conflicts.
> >>
> >> Doe= s reverting it help?
> >>
> >> J
> >&nb= sp;
> >Hi Jeremy:
> > 
> >  &nb= sp; The lastest test result shows the reverting didn't help.
> >=     Kernel panic exactly at the same place in tlb.c.<= BR>> > 
> >    I have question about T= LB state, from the stack,
> >    xen_do_hypervis= or_callback-> xen_evtchn_do_upcall-> ... ->drop_other_mm_ref
> > 
> >  &n= bsp; What  cpu_tlbstate.state should be,  could =  TLBSTATE_OK or TLBSTATE_LAZY all be possible?
> > &n= bsp;  That is after a hypercall from userspace, state will be T= LBSTATE_OK, and
> >      if from kernel= space, state will be TLBSTATE_LAZE ?
> > 
> >&nb= sp;      thanks.
>
> it looks = a bug in drop_other_mm_ref implementation, that current TLB state should = be checked
> before invoking leave_mm(). There's a window between b= elow lines of code:
>
> <xen_drop_mm_ref>
> /* G= et the "official" set of cpus referring to our pagetable. */
> if (= !alloc_cpumask_var(&mask, GFP_ATOMIC)) {
> for_each_online_cpu(= cpu) {
> if (!cpumask_test_cpu(cpu, mm_cpumask(mm))
> &&a= mp; per_cpu(xen_current_cr3, cpu) !=3D __pa( mm->pgd))
> continue;
> smp_call_function_single(cpu, dro= p_other_mm_ref, mm, 1);
> }
> return;
> }
>
&= gt; there's chance that when smp_call_function_single is invoked, actual = TLB state has been
> updated in the other cpu. The upstream kernel = patch you referred to earlier just makes
> this bug exposed more ea= sily. But even without this patch, you may still suffer such issue
>= ; which is why reverting the patch doesn't help.
>
> Could y= ou try adding a check in drop_other_mm_ref?
>
> if (active_m= m =3D=3D mm && percpu_read(cpu_tlbstate.state) !=3D TLBSTATE_OK)<= BR>> leave_mm(smp_processor_id());
>
> once the interrupt= ed context has TLBSTATE_OK, it implicates that later it will handle
&= gt; the TLB flush and thus no need for leave_mm from interrupt handler, a= nd that's the
> assumption of doing leave_mm.
>
> Than= ks
> Kevin
--_b56b87da-8107-429c-92ce-5ec99c537358_-- --===============2001338816== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============2001338816==--