From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: RE: Kernel BUG at arch/x86/mm/tlb.c:61
Date: Thu, 28 Apr 2011 16:29:09 -0700
Message-ID: <4DB9F845.6020204@goop.org>
References: <COL0-MC1-F14hmBzxHs00230882@col0-mc1-f14.Col0.hotmail.com>,
	<BLU157-w488E5FEBD5E2DBC0666EF1DAA70@phx.gbl>,
	<BLU157-w5025BFBB4B1CDFA7AA0966DAA90@phx.gbl>,
	<BLU157-w540B39FBA137B4D96278D2DAA90@phx.gbl>,
	<BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@mail.gmail.com>,
	<BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@mail.gmail.com>,
	<BLU157-w116F1BB57ABFDE535C7851DAA80@phx.gbl>,
	<4DA3438A.6070503@goop.org>,
	<BLU157-w2C6CD57CEA345B8D115E8DAAB0@phx.gbl>,
	<BLU157-w36F4E0A7503A357C9DE6A3DAAB0@phx.gbl>,
	<20110412100000.GA15647@dumpdata.com>,
	<BLU157-w14B84A51C80B41AB72B6CBDAAD0@phx.gbl>,
	<BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@mail.gmail.com>,
	<BLU157-w30A1A208238A9031F0D18EDAAD0@phx.gbl>,
	<BLU157-w383D1A2536480BCD4C0E0EDAAD0@phx.gbl>	<BLU157-w42DAD248C94153635E9749DAAC0@phx.gbl>,
	<4DA8B715.9080508@goop.org>
	<BLU157-w51A8A73D5A656542F9AB13DA960@phx.gbl>
	<625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@shsmsx502.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <625BA99ED14B2D499DC4E29D8138F1505C7F2C5185@shsmsx502.ccr.corp.intel.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: MaoXiaoyun <tinnycloud@hotmail.com>, xen devel <xen-devel@lists.xensource.com>, "giamteckchoon@gmail.com" <giamteckchoon@gmail.com>, "konrad.wilk@oracle.com" <konrad.wilk@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 04/25/2011 10:52 PM, Tian, Kevin wrote:
>> From: MaoXiaoyun
>> Sent: Monday, April 25, 2011 11:15 AM
>>> Date: Fri, 15 Apr 2011 14:22:29 -0700
>>> From: jeremy@goop.org
>>> To: tinnycloud@hotmail.com
>>> CC: giamteckchoon@gmail.com; xen-devel@lists.xensource.com; konrad.wi=
lk@oracle.com
>>> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
>>>
>>> On 04/15/2011 05:23 AM, MaoXiaoyun wrote:
>>>> Hi=EF=BC=9A
>>>>
>>>> Could the crash related to this patch ?
>>>> http://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/xen.git;a=3Dcommi=
tdiff;h=3D45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3
>>>>
>>>> Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is befor=
e
>>>> cpumask_clear_cpu(line 49).
>>>> Could it possible that right after execute line 40 of mmu_context.h,
>>>> CPU revice IPI from other CPU to
>>>> flush the mm, and when in interrupt, find the TLB state happened to =
be
>>>> TLBSTATE_OK. Which conflicts.
>>> Does reverting it help?
>>>
>>> J
>> =20
>> Hi Jeremy:
>> =20
>>     The lastest test result shows the reverting didn't help.
>>     Kernel panic exactly at the same place in tlb.c.
>> =20
>>     I have question about TLB state, from the stack,=20
>>     xen_do_hypervisor_callback-> xen_evtchn_do_upcall->... ->drop_othe=
r_mm_ref
>> =20
>>     What  cpu_tlbstate.state should be,  could  TLBSTATE_OK or TLBSTAT=
E_LAZY all be possible?=20
>>     That is after a hypercall from userspace, state will be TLBSTATE_O=
K, and
>>       if from kernel space, state will be TLBSTATE_LAZE ?=20
>> =20
>>        thanks.
> it looks a bug in drop_other_mm_ref implementation, that current TLB st=
ate should be checked
> before invoking leave_mm(). There's a window between below lines of cod=
e:
>
> <xen_drop_mm_ref>
>        /* Get the "official" set of cpus referring to our pagetable. */
>         if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
>                 for_each_online_cpu(cpu) {
>                         if (!cpumask_test_cpu(cpu, mm_cpumask(mm))
>                             && per_cpu(xen_current_cr3, cpu) !=3D __pa(=
mm->pgd))
>                                 continue;
>                         smp_call_function_single(cpu, drop_other_mm_ref=
, mm, 1);
>                 }
>                 return;
>         }
>
> there's chance that when smp_call_function_single is invoked, actual TL=
B state has been
> updated in the other cpu. The upstream kernel patch you referred to ear=
lier just makes
> this bug exposed more easily. But even without this patch, you may stil=
l suffer such issue
> which is why reverting the patch doesn't help.
>
> Could you try adding a check in drop_other_mm_ref?
>
>         if (active_mm =3D=3D mm && percpu_read(cpu_tlbstate.state) !=3D=
 TLBSTATE_OK)
>                 leave_mm(smp_processor_id());
>
> once the interrupted context has TLBSTATE_OK, it implicates that later =
it will handle=20
> the TLB flush and thus no need for leave_mm from interrupt handler, and=
 that's the
> assumption of doing leave_mm.

That seems reasonable.  MaoXiaoyun, does it fix the bug for you?

Kevin, could you submit this as a proper patch?

Thanks,
    J