From mboxrd@z Thu Jan  1 00:00:00 1970
From: MaoXiaoyun <tinnycloud@hotmail.com>
Subject: RE: RE: Kernel BUG at arch/x86/mm/tlb.c:61
Date: Tue, 26 Apr 2011 15:04:31 +0800
Message-ID: <BLU157-w5697DD78D0AA06E69BA116DA990@phx.gbl>
References: <COL0-MC1-F14hmBzxHs00230882@col0-mc1-f14.Col0.hotmail.com>, ,
	<BLU157-w488E5FEBD5E2DBC0666EF1DAA70@phx.gbl>, ,
	<BLU157-w5025BFBB4B1CDFA7AA0966DAA90@phx.gbl>, ,
	<BLU157-w540B39FBA137B4D96278D2DAA90@phx.gbl>, ,
	<BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@mail.gmail.com>, ,
	<BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@mail.gmail.com>, ,
	<BLU157-w116F1BB57ABFDE535C7851DAA80@phx.gbl>,
	<4DA3438A.6070503@goop.org>, ,
	<BLU157-w2C6CD57CEA345B8D115E8DAAB0@phx.gbl>, ,
	<BLU157-w36F4E0A7503A357C9DE6A3DAAB0@phx.gbl>, ,
	<20110412100000.GA15647@dumpdata.com>, ,
	<BLU157-w14B84A51C80B41AB72B6CBDAAD0@phx.gbl>, ,
	<BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@mail.gmail.com>, ,
	<BLU157-w30A1A208238A9031F0D18EDAAD0@phx.gbl>, ,
	<BLU157-w383D1A2536480BCD4C0E0EDAAD0@phx.gbl>,
	<BLU157-w42DAD248C94153635E9749DAAC0@phx.gbl>,
	<4DA8B715.9080508@goop.org>,
	<BLU157-w51A8A73D5A656542F9AB13DA960@phx.gbl>,
	<625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@shsmsx502.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============2001338816=="
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@shsmsx502.ccr.corp.intel.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: kevin.tian@intel.com, jeremy@goop.org
Cc: xen devel <xen-devel@lists.xensource.com>, giamteckchoon@gmail.com, konrad.wilk@oracle.com
List-Id: xen-devel@lists.xenproject.org

--===============2001338816==
Content-Type: multipart/alternative;
	boundary="_b56b87da-8107-429c-92ce-5ec99c537358_"

--_b56b87da-8107-429c-92ce-5ec99c537358_
Content-Type: text/plain; charset="gb2312"
Content-Transfer-Encoding: quoted-printable


Many thanks, Kevin.
=20
I agree on the race window.
One thing more,  In my understaning, the CPU who send out IPI message, wi=
ll unpin the pagetable after=20
receive all ACKS  from other cpu,  if the CPU who received  IPI message, =
enter drop_other_mm_ref, and=20
has TLBSTATE_OK, does nothing, will it possible it possible confronts wit=
h stale pagetable
(that is unpinned by sender CPU)?
=20
So do we need flush tlb when its state is TBLSTATE_OK?
=20
if (active_mm =3D=3D mm){
     if (percpu_read(cpu_tlbstate.state) =3D=3D TLBSTATE_OK)
        load_cr3(mm->pgd)
     else
                leave_mm(smp_processor_id());
 }

=20
> From: kevin.tian@intel.com
> To: tinnycloud@hotmail.com; jeremy@goop.org
> CC: xen-devel@lists.xensource.com; giamteckchoon@gmail.com; konrad.wilk=
@oracle.com
> Date: Tue, 26 Apr 2011 13:52:11 +0800
> Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
>=20
> >From: MaoXiaoyun
> >Sent: Monday, April 25, 2011 11:15 AM
> >> Date: Fri, 15 Apr 2011 14:22:29 -0700
> >> From: jeremy@goop.org
> >> To: tinnycloud@hotmail.com
> >> CC: giamteckchoon@gmail.com; xen-devel@lists.xensource.com; konrad.w=
ilk@oracle.com
> >> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
> >>=20
> >> On 04/15/2011 05:23 AM, MaoXiaoyun wrote:
> >> > Hi=A3=BA
> >> >
> >> > Could the crash related to this patch ?
> >> > http://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/xen.git;a=3Dcom=
mitdiff;h=3D45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3
> >> >
> >> > Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is bef=
ore
> >> > cpumask_clear_cpu(line 49).
> >> > Could it possible that right after execute line 40 of mmu_context.=
h,
> >> > CPU revice IPI from other CPU to
> >> > flush the mm, and when in interrupt, find the TLB state happened t=
o be
> >> > TLBSTATE_OK. Which conflicts.
> >>=20
> >> Does reverting it help?
> >>=20
> >> J
> >=20
> >Hi Jeremy:
> >=20
> >    The lastest test result shows the reverting didn't help.
> >    Kernel panic exactly at the same place in tlb.c.
> >=20
> >    I have question about TLB state, from the stack,=20
> >    xen_do_hypervisor_callback-> xen_evtchn_do_upcall->... ->drop_othe=
r_mm_ref
> >=20
> >    What  cpu_tlbstate.state should be,  could  TLBSTATE_OK or TLBSTAT=
E_LAZY all be possible?=20
> >    That is after a hypercall from userspace, state will be TLBSTATE_O=
K, and
> >      if from kernel space, state will be TLBSTATE_LAZE ?=20
> >=20
> >       thanks.
>=20
> it looks a bug in drop_other_mm_ref implementation, that current TLB st=
ate should be checked
> before invoking leave_mm(). There's a window between below lines of cod=
e:
>=20
> <xen_drop_mm_ref>
> /* Get the "official" set of cpus referring to our pagetable. */
> if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
> for_each_online_cpu(cpu) {
> if (!cpumask_test_cpu(cpu, mm_cpumask(mm))
> && per_cpu(xen_current_cr3, cpu) !=3D __pa(mm->pgd))
> continue;
> smp_call_function_single(cpu, drop_other_mm_ref, mm, 1);
> }
> return;
> }
>=20
> there's chance that when smp_call_function_single is invoked, actual TL=
B state has been
> updated in the other cpu. The upstream kernel patch you referred to ear=
lier just makes
> this bug exposed more easily. But even without this patch, you may stil=
l suffer such issue
> which is why reverting the patch doesn't help.
>=20
> Could you try adding a check in drop_other_mm_ref?
>=20
> if (active_mm =3D=3D mm && percpu_read(cpu_tlbstate.state) !=3D TLBSTAT=
E_OK)
> leave_mm(smp_processor_id());
>=20
> once the interrupted context has TLBSTATE_OK, it implicates that later =
it will handle=20
> the TLB flush and thus no need for leave_mm from interrupt handler, and=
 that's the
> assumption of doing leave_mm.
>=20
> Thanks
> Kevin
 		 	   		 =20
--_b56b87da-8107-429c-92ce-5ec99c537358_
Content-Type: text/html; charset="gb2312"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:=CE=A2=C8=ED=D1=C5=BA=DA
}
--></style>
</head>
<body class=3D'hmmessage'>
Many thanks, Kevin.<BR>
&nbsp;<BR>
I agree on the race window.<BR>
One thing more,&nbsp; In my understaning, the CPU who send out IPI messag=
e, will unpin the pagetable after <BR>
receive all ACKS&nbsp; from other cpu,&nbsp; if the CPU&nbsp;who received=
 &nbsp;IPI message, enter drop_other_mm_ref, and <BR>
has TLBSTATE_OK, does nothing, will it possible it possible confronts wit=
h stale pagetable<BR>
(that is unpinned by sender CPU)?<BR>
&nbsp;<BR>
So do we need flush tlb when its state is TBLSTATE_OK?<BR>
&nbsp;<BR>
<P style=3D"TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt" class=3DMsoNormal><SP=
AN style=3D"COLOR: #1f497d" lang=3DEN-US><FONT size=3D3><FONT face=3DCali=
bri>if (active_mm =3D=3D mm){<?xml:namespace prefix =3D o ns =3D "urn:sch=
emas-microsoft-com:office:office" /><o:p></o:p></FONT></FONT></SPAN></P>
<P style=3D"TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt 21pt" class=3DMsoNorma=
l><B><I><SPAN style=3D"FONT-FAMILY: 'Courier New'; COLOR: red; FONT-SIZE:=
 10pt" lang=3DEN-US>&nbsp;&nbsp;&nbsp;&nbsp; if (percpu_read(cpu_tlbstate=
.state) =3D=3D TLBSTATE_OK)<o:p></o:p></SPAN></I></B></P>
<P style=3D"TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt 21pt" class=3DMsoNorma=
l><B><I><SPAN style=3D"FONT-FAMILY: 'Courier New'; COLOR: red; FONT-SIZE:=
 10pt" lang=3DEN-US>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; load_cr3(m=
m-&gt;pgd)<o:p></o:p></SPAN></I></B></P>
<P style=3D"TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt 21pt" class=3DMsoNorma=
l><B><I><SPAN style=3D"FONT-FAMILY: 'Courier New'; COLOR: red; FONT-SIZE:=
 10pt" lang=3DEN-US>&nbsp;&nbsp;&nbsp;&nbsp; else<o:p></o:p></SPAN></I></=
B></P>
<P style=3D"TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt 42pt" class=3DMsoNorma=
l><SPAN style=3D"COLOR: #1f497d" lang=3DEN-US><FONT size=3D3><FONT face=3D=
Calibri>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp; leave_mm(smp_processor_id());<o:p></o:p></FONT>=
</FONT></SPAN></P>
<P style=3D"MARGIN: 0cm 0cm 0pt" class=3DMsoNormal><SPAN style=3D"COLOR: =
#1f497d" lang=3DEN-US><FONT size=3D3><FONT face=3DCalibri>&nbsp;}<o:p></o=
:p></FONT></FONT></SPAN></P>

<P>&nbsp;<BR></P>
&gt; From: kevin.tian@intel.com<BR>&gt; To: tinnycloud@hotmail.com; jerem=
y@goop.org<BR>&gt; CC: xen-devel@lists.xensource.com; giamteckchoon@gmail=
.com; konrad.wilk@oracle.com<BR>&gt; Date: Tue, 26 Apr 2011 13:52:11 +080=
0<BR>&gt; Subject: RE: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61=
<BR>&gt; <BR>&gt; &gt;From: MaoXiaoyun<BR>&gt; &gt;Sent: Monday, April 25=
, 2011 11:15 AM<BR>&gt; &gt;&gt; Date: Fri, 15 Apr 2011 14:22:29 -0700<BR=
>&gt; &gt;&gt; From: jeremy@goop.org<BR>&gt; &gt;&gt; To: tinnycloud@hotm=
ail.com<BR>&gt; &gt;&gt; CC: giamteckchoon@gmail.com; xen-devel@lists.xen=
source.com; konrad.wilk@oracle.com<BR>&gt; &gt;&gt; Subject: Re: Kernel B=
UG at arch/x86/mm/tlb.c:61<BR>&gt; &gt;&gt; <BR>&gt; &gt;&gt; On 04/15/20=
11 05:23 AM, MaoXiaoyun wrote:<BR>&gt; &gt;&gt; &gt; Hi=A3=BA<BR>&gt; &gt=
;&gt; &gt;<BR>&gt; &gt;&gt; &gt; Could the crash related to this patch ?<=
BR>&gt; &gt;&gt; &gt; http://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/=
xen.git;a=3Dcommitdiff;h=3D45bfd7bfc6cf32f8e60bb91b
 32349f0b5090eea3<BR>&gt; &gt;&gt; &gt;<BR>&gt; &gt;&gt; &gt; Since now T=
LB state change to TLBSTATE_OK(mmu_context.h:40) is before<BR>&gt; &gt;&g=
t; &gt; cpumask_clear_cpu(line 49).<BR>&gt; &gt;&gt; &gt; Could it possib=
le that right after execute line 40 of mmu_context.h,<BR>&gt; &gt;&gt; &g=
t; CPU revice IPI from other CPU to<BR>&gt; &gt;&gt; &gt; flush the mm, a=
nd when in interrupt, find the TLB state happened to be<BR>&gt; &gt;&gt; =
&gt; TLBSTATE_OK. Which conflicts.<BR>&gt; &gt;&gt; <BR>&gt; &gt;&gt; Doe=
s reverting it help?<BR>&gt; &gt;&gt; <BR>&gt; &gt;&gt; J<BR>&gt; &gt;&nb=
sp;<BR>&gt; &gt;Hi Jeremy:<BR>&gt; &gt;&nbsp;<BR>&gt; &gt;&nbsp;&nbsp;&nb=
sp; The lastest test result shows the reverting didn't help.<BR>&gt; &gt;=
&nbsp;&nbsp;&nbsp;&nbsp;Kernel panic exactly at the same place in tlb.c.<=
BR>&gt; &gt;&nbsp;<BR>&gt; &gt;&nbsp;&nbsp;&nbsp; I have question about T=
LB state, from the stack, <BR>&gt; &gt;&nbsp;&nbsp;&nbsp; xen_do_hypervis=
or_callback-&gt; xen_evtchn_do_upcall-&gt;
 ... -&gt;drop_other_mm_ref<BR>&gt; &gt;&nbsp;<BR>&gt; &gt;&nbsp;&nbsp;&n=
bsp;&nbsp;What&nbsp;&nbsp;cpu_tlbstate.state should be,&nbsp; could&nbsp;=
&nbsp;TLBSTATE_OK or TLBSTATE_LAZY all be possible? <BR>&gt; &gt;&nbsp;&n=
bsp;&nbsp;&nbsp;That is after a hypercall from userspace, state will be T=
LBSTATE_OK, and<BR>&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if from kernel=
 space, state will be TLBSTATE_LAZE ? <BR>&gt; &gt;&nbsp;<BR>&gt; &gt;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;thanks.<BR>&gt; <BR>&gt; it looks =
a bug in drop_other_mm_ref implementation, that current TLB state should =
be checked<BR>&gt; before invoking leave_mm(). There's a window between b=
elow lines of code:<BR>&gt; <BR>&gt; &lt;xen_drop_mm_ref&gt;<BR>&gt; /* G=
et the "official" set of cpus referring to our pagetable. */<BR>&gt; if (=
!alloc_cpumask_var(&amp;mask, GFP_ATOMIC)) {<BR>&gt; for_each_online_cpu(=
cpu) {<BR>&gt; if (!cpumask_test_cpu(cpu, mm_cpumask(mm))<BR>&gt; &amp;&a=
mp; per_cpu(xen_current_cr3, cpu) !=3D __pa(
 mm-&gt;pgd))<BR>&gt; continue;<BR>&gt; smp_call_function_single(cpu, dro=
p_other_mm_ref, mm, 1);<BR>&gt; }<BR>&gt; return;<BR>&gt; }<BR>&gt; <BR>&=
gt; there's chance that when smp_call_function_single is invoked, actual =
TLB state has been<BR>&gt; updated in the other cpu. The upstream kernel =
patch you referred to earlier just makes<BR>&gt; this bug exposed more ea=
sily. But even without this patch, you may still suffer such issue<BR>&gt=
; which is why reverting the patch doesn't help.<BR>&gt; <BR>&gt; Could y=
ou try adding a check in drop_other_mm_ref?<BR>&gt; <BR>&gt; if (active_m=
m =3D=3D mm &amp;&amp; percpu_read(cpu_tlbstate.state) !=3D TLBSTATE_OK)<=
BR>&gt; leave_mm(smp_processor_id());<BR>&gt; <BR>&gt; once the interrupt=
ed context has TLBSTATE_OK, it implicates that later it will handle <BR>&=
gt; the TLB flush and thus no need for leave_mm from interrupt handler, a=
nd that's the<BR>&gt; assumption of doing leave_mm.<BR>&gt; <BR>&gt; Than=
ks<BR>&gt; Kevin<BR> 		 	   		  </body>
</html>
--_b56b87da-8107-429c-92ce-5ec99c537358_--


--===============2001338816==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

--===============2001338816==--