Hi: I have just kicked off cpuidle=0 "cpufreq=none" tests. What is your Xen version? Do you use the backend driver of 2.6.32.36? Beside the "TLB BUG ", I've met at least two other issues 1)Xen4.0.1 + 2.6.32.36 kernel + backend driver from 2.6.31 ==> will cause "Bad grant reference " log in serial output 2)Xen4.0.1 + 2.6.32.36 kernel with its owen backend driver ==> will cause disk error like belows. sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device end_request: I/O error, dev tdb, sector 28699593 end_request: I/O error, dev tdb, sector 28699673 end_request: I/O error, dev tdb, sector 28699753 end_request: I/O error, dev tdb, sector 28699833 end_request: I/O error, dev tdb, sector 28699913 end_request: I/O error, dev tdb, sector 28699993 end_request: I/O error, dev tdb, sector 28700073 thanks. > Date: Mon, 11 Apr 2011 23:25:19 +0800 > Subject: Re: kernel BUG at arch/x86/xen/mmu.c:1872 > From: giamteckchoon@gmail.com > To: tinnycloud@hotmail.com > CC: xen-devel@lists.xensource.com; dave@ivt.com.au; ian.campbell@citrix.com; konrad.wilk@oracle.com; jeremy@goop.org; keir@xen.org > > 2011/4/11 MaoXiaoyun : > > Hi: > > > > I believe this is the fix at much extent. > > Since I have my own test cases which with this patch, my test case will > > success in 30 rounds run. > > Every round takes 8hours. While without this patch, tests fail evey > > round in 15minutes. > > > > So this really means fix most of the things. > > > > But during running, I met another crash, from the log it it looks like > > has relation with > > this BUG, since the crash log shows it is tlb related and this BUG also tlb > > related. > > Are you able to run another test with cpuidle=0 cpufreq=none in kernel > boot option? Just curious whether can you reproduce the tlb bug when > you boot with cpuidle=0 cpufreq=none... ... > > > > > Well, I'm also have poor knowledge of kernel. > > Hope someone from Xen Devel offer some help. > > > > Many thanks. > > > >> Date: Mon, 11 Apr 2011 20:16:53 +0800 > >> Subject: Re: kernel BUG at arch/x86/xen/mmu.c:1872 > >> From: giamteckchoon@gmail.com > >> To: tinnycloud@hotmail.com > >> CC: xen-devel@lists.xensource.com; dave@ivt.com.au; > >> ian.campbell@citrix.com; konrad.wilk@oracle.com; jeremy@goop.org; > >> keir@xen.org > >> > >> > > >> > Hi, > >> > > >> > Sorry, since this mmu related BUG has been troubled me for very > >> > long... I really want to "kill" this BUG but my knowledge in kernel > >> > hacking and/or xen is very limited. > >> > > >> > While waiting for Jeremy or Konrad or others ... > >> > > >> > Many thanks for spending time to track down this mmu related BUG. I > >> > have backported the commit from > >> > > >> > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec > >> > to 2.6.32.36 PVOPS kernel and patch attached. I won't know whether > >> > did I backport it correctly nor does it affects anything. I am > >> > currently testing the 2.6.32.36 PVOPS kernel with this patch applied > >> > and also unset CONFIG_DEBUG_PAGEALLOC. Currently running testcrash.sh > >> > loop 1000 as I am unable to reproduce this mmu BUG 1872 in > >> > testcrash.sh loop 100. Please note that when CONFIG_DEBUG_PAGEALLOC > >> > is unset, I can reproduce this mmu BUG 1872 easily within <50 > >> > testcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36 > >> > kernel. Now test with this backport patch to see whether I can > >> > reproduce this mmu BUG... ... > >> > > >> > Kindest regards, > >> > Giam Teck Choon > >> > > >> > >> I have tested with my backport patch and it is working fine as I am > >> unable to reproduce the mmu.c 1872 or 1860 bug with > >> CONFIG_DEBUG_PAGEALLOC not set. I tested with testcrash.sh loop 100 > >> and 1000. Now doing testcrash.sh loop 10000. > >> > >> Xiaoyun, is it possible for you to test my patch and see whether can > >> you reproduce the mmu.c 1872/1860 bug? > >> > >> Can anyone of you review my patch? > >> > >> I will post a format patch according to > >> Documentation/SubmittingPatches in my next reply and hopefully can be > >> reviewed. > >> > >> Thanks. > >> > >> Kindest regards, > >> Giam Teck Choon > >