All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: MaoXiaoyun <tinnycloud@hotmail.com>,
	xen devel <xen-devel@lists.xensource.com>,
	"giamteckchoon@gmail.com" <giamteckchoon@gmail.com>,
	"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>
Subject: RE: RE: Kernel BUG at arch/x86/mm/tlb.c:61
Date: Fri, 29 Apr 2011 08:19:44 +0800	[thread overview]
Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C843BB27A@shsmsx502.ccr.corp.intel.com> (raw)
In-Reply-To: <4DB9F845.6020204@goop.org>

[-- Attachment #1: Type: text/plain, Size: 3576 bytes --]

> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> Sent: Friday, April 29, 2011 7:29 AM
> 
> On 04/25/2011 10:52 PM, Tian, Kevin wrote:
> >> From: MaoXiaoyun
> >> Sent: Monday, April 25, 2011 11:15 AM
> >>> Date: Fri, 15 Apr 2011 14:22:29 -0700
> >>> From: jeremy@goop.org
> >>> To: tinnycloud@hotmail.com
> >>> CC: giamteckchoon@gmail.com; xen-devel@lists.xensource.com;
> >>> konrad.wilk@oracle.com
> >>> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
> >>>
> >>> On 04/15/2011 05:23 AM, MaoXiaoyun wrote:
> >>>> Hi:
> >>>>
> >>>> Could the crash related to this patch ?
> >>>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commitdi
> >>>> ff;h=45bfd7bfc6cf32f8e60bb91b32349f0b5090eea3
> >>>>
> >>>> Since now TLB state change to TLBSTATE_OK(mmu_context.h:40) is
> >>>> before cpumask_clear_cpu(line 49).
> >>>> Could it possible that right after execute line 40 of
> >>>> mmu_context.h, CPU revice IPI from other CPU to flush the mm, and
> >>>> when in interrupt, find the TLB state happened to be TLBSTATE_OK.
> >>>> Which conflicts.
> >>> Does reverting it help?
> >>>
> >>> J
> >>
> >> Hi Jeremy:
> >>
> >>     The lastest test result shows the reverting didn't help.
> >>     Kernel panic exactly at the same place in tlb.c.
> >>
> >>     I have question about TLB state, from the stack,
> >>     xen_do_hypervisor_callback-> xen_evtchn_do_upcall->...
> >> ->drop_other_mm_ref
> >>
> >>     What  cpu_tlbstate.state should be,  could  TLBSTATE_OK or
> TLBSTATE_LAZY all be possible?
> >>     That is after a hypercall from userspace, state will be TLBSTATE_OK,
> and
> >>       if from kernel space, state will be TLBSTATE_LAZE ?
> >>
> >>        thanks.
> > it looks a bug in drop_other_mm_ref implementation, that current TLB
> > state should be checked before invoking leave_mm(). There's a window
> between below lines of code:
> >
> > <xen_drop_mm_ref>
> >        /* Get the "official" set of cpus referring to our pagetable. */
> >         if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
> >                 for_each_online_cpu(cpu) {
> >                         if (!cpumask_test_cpu(cpu,
> mm_cpumask(mm))
> >                             && per_cpu(xen_current_cr3, cpu) !=
> __pa(mm->pgd))
> >                                 continue;
> >                         smp_call_function_single(cpu,
> drop_other_mm_ref, mm, 1);
> >                 }
> >                 return;
> >         }
> >
> > there's chance that when smp_call_function_single is invoked, actual
> > TLB state has been updated in the other cpu. The upstream kernel patch
> > you referred to earlier just makes this bug exposed more easily. But
> > even without this patch, you may still suffer such issue which is why reverting
> the patch doesn't help.
> >
> > Could you try adding a check in drop_other_mm_ref?
> >
> >         if (active_mm == mm && percpu_read(cpu_tlbstate.state) !=
> TLBSTATE_OK)
> >                 leave_mm(smp_processor_id());
> >
> > once the interrupted context has TLBSTATE_OK, it implicates that later
> > it will handle the TLB flush and thus no need for leave_mm from
> > interrupt handler, and that's the assumption of doing leave_mm.
> 
> That seems reasonable.  MaoXiaoyun, does it fix the bug for you?
> 
> Kevin, could you submit this as a proper patch?
> 

I'm waiting for Xiaoyun's test result before submitting a proper patch, since this
part of logic is tricky and his test can make sure we don't overlook some corner
cases. :-)

Thanks
Kevin

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2011-04-29  0:19 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <COL0-MC1-F14hmBzxHs00230882@col0-mc1-f14.Col0.hotmail.com>
2011-04-08 11:24 ` kernel BUG at arch/x86/xen/mmu.c:1860! MaoXiaoyun
2011-04-08 11:46   ` MaoXiaoyun
2011-04-10  3:57   ` kernel BUG at arch/x86/xen/mmu.c:1872 MaoXiaoyun
2011-04-10  4:29   ` MaoXiaoyun
2011-04-10 13:57     ` MaoXiaoyun
2011-04-10 20:14       ` Teck Choon Giam
2011-04-11 12:16         ` Teck Choon Giam
2011-04-11 12:22           ` Teck Choon Giam
2011-04-11 12:31           ` MaoXiaoyun
2011-04-11 15:25             ` Teck Choon Giam
2011-04-12  3:30               ` MaoXiaoyun
2011-04-12 16:08                 ` Teck Choon Giam
2011-04-11 18:08             ` Jeremy Fitzhardinge
2011-04-12  3:35               ` MaoXiaoyun
2011-04-12  6:48                 ` Grant Table Error on 2.6.32.36 + Xen 4.0.1 MaoXiaoyun
2011-04-12  8:46                   ` Konrad Rzeszutek Wilk
2011-04-12  9:02                     ` MaoXiaoyun
2011-04-12  9:11                 ` Kernel BUG at arch/x86/mm/tlb.c:61 MaoXiaoyun
2011-04-12 10:00                   ` Konrad Rzeszutek Wilk
2011-04-12 10:10                     ` MaoXiaoyun
2011-04-14  6:16                     ` MaoXiaoyun
2011-04-14  7:26                       ` Teck Choon Giam
2011-04-14  7:56                         ` MaoXiaoyun
2011-04-14 11:16                           ` MaoXiaoyun
2011-04-15 12:23                             ` MaoXiaoyun
2011-04-15 21:22                               ` Jeremy Fitzhardinge
2011-04-18 15:20                                 ` MaoXiaoyun
2011-04-25  3:15                                 ` MaoXiaoyun
2011-04-26  5:52                                   ` Tian, Kevin
2011-04-26  7:04                                     ` MaoXiaoyun
2011-04-26  8:31                                       ` Tian, Kevin
2011-04-28 23:29                                     ` Jeremy Fitzhardinge
2011-04-29  0:19                                       ` Tian, Kevin [this message]
2011-04-29  1:50                                         ` MaoXiaoyun
2011-04-29  1:57                                           ` Tian, Kevin
2011-04-25  4:42                                 ` MaoXiaoyun
2011-04-25 12:54                                   ` MaoXiaoyun
2011-04-25 13:11                                     ` MaoXiaoyun
2011-04-25 15:05                                       ` MaoXiaoyun
2011-04-26  5:55                                         ` Tian, Kevin
2011-04-12 16:32               ` kernel BUG at arch/x86/xen/mmu.c:1872 Teck Choon Giam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=625BA99ED14B2D499DC4E29D8138F1505C843BB27A@shsmsx502.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=giamteckchoon@gmail.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=tinnycloud@hotmail.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.