From mboxrd@z Thu Jan 1 00:00:00 1970 From: MaoXiaoyun Subject: RE: kernel BUG at arch/x86/xen/mmu.c:1872 Date: Tue, 12 Apr 2011 11:30:32 +0800 Message-ID: References: , , , , , , , Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0441027735==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: giamteckchoon@gmail.com Cc: jeremy@goop.org, xen devel , keir@xen.org, ian.campbell@citrix.com, konrad.wilk@oracle.com, dave@ivt.com.au List-Id: xen-devel@lists.xenproject.org --===============0441027735== Content-Type: multipart/alternative; boundary="_a9fa7ebe-dc6a-4555-a398-7de33bf973b5_" --_a9fa7ebe-dc6a-4555-a398-7de33bf973b5_ Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable Hi: =20 I have just kicked off cpuidle=3D0 "cpufreq=3Dnone" tests. =20 What is your Xen version? Do you use the backend driver of 2.6.32= .36? =20 Beside the "TLB BUG ", I've met at least two other issues 1)Xen4.0.1 + 2.6.32.36 kernel + backend driver from 2.6.31 =3D=3D= > will cause "Bad grant reference " log in serial output 2)Xen4.0.1 + 2.6.32.36 kernel with its owen backend driver =3D=3D= > will cause disk error like belows. =20 sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device end_request: I/O error, dev tdb, sector 28699593 end_request: I/O error, dev tdb, sector 28699673 end_request: I/O error, dev tdb, sector 28699753 end_request: I/O error, dev tdb, sector 28699833 end_request: I/O error, dev tdb, sector 28699913 end_request: I/O error, dev tdb, sector 28699993 end_request: I/O error, dev tdb, sector 28700073 =20 thanks. =20 =20 > Date: Mon, 11 Apr 2011 23:25:19 +0800 > Subject: Re: kernel BUG at arch/x86/xen/mmu.c:1872 > From: giamteckchoon@gmail.com > To: tinnycloud@hotmail.com > CC: xen-devel@lists.xensource.com; dave@ivt.com.au; ian.campbell@citrix= .com; konrad.wilk@oracle.com; jeremy@goop.org; keir@xen.org >=20 > 2011/4/11 MaoXiaoyun : > > Hi: > > > > I believe this is the fix at much extent. > > Since I have my own test cases which with this patch, my test ca= se will > > success in 30 rounds run. > > Every round takes 8hours. While without this patch, tests fail = evey > > round in 15minutes. > > > > So this really means fix most of the things. > > > > But during running, I met another crash, from the log it it loo= ks like > > has relation with > > this BUG, since the crash log shows it is tlb related and this BUG al= so tlb > > related. >=20 > Are you able to run another test with cpuidle=3D0 cpufreq=3Dnone in ker= nel > boot option? Just curious whether can you reproduce the tlb bug when > you boot with cpuidle=3D0 cpufreq=3Dnone... ... >=20 > > > > Well, I'm also have poor knowledge of kernel. > > Hope someone from Xen Devel offer some help. > > > > Many thanks. > > > >> Date: Mon, 11 Apr 2011 20:16:53 +0800 > >> Subject: Re: kernel BUG at arch/x86/xen/mmu.c:1872 > >> From: giamteckchoon@gmail.com > >> To: tinnycloud@hotmail.com > >> CC: xen-devel@lists.xensource.com; dave@ivt.com.au; > >> ian.campbell@citrix.com; konrad.wilk@oracle.com; jeremy@goop.org; > >> keir@xen.org > >> > >> > > >> > Hi, > >> > > >> > Sorry, since this mmu related BUG has been troubled me for very > >> > long... I really want to "kill" this BUG but my knowledge in kerne= l > >> > hacking and/or xen is very limited. > >> > > >> > While waiting for Jeremy or Konrad or others ... > >> > > >> > Many thanks for spending time to track down this mmu related BUG. = I > >> > have backported the commit from > >> > > >> > http://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/xen.git;a=3Dcom= mit;h=3D64141da587241301ce8638cc945f8b67853156ec > >> > to 2.6.32.36 PVOPS kernel and patch attached. I won't know whethe= r > >> > did I backport it correctly nor does it affects anything. I am > >> > currently testing the 2.6.32.36 PVOPS kernel with this patch appli= ed > >> > and also unset CONFIG_DEBUG_PAGEALLOC. Currently running testcras= h.sh > >> > loop 1000 as I am unable to reproduce this mmu BUG 1872 in > >> > testcrash.sh loop 100. Please note that when CONFIG_DEBUG_PAGEALL= OC > >> > is unset, I can reproduce this mmu BUG 1872 easily within <50 > >> > testcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36 > >> > kernel. Now test with this backport patch to see whether I can > >> > reproduce this mmu BUG... ... > >> > > >> > Kindest regards, > >> > Giam Teck Choon > >> > > >> > >> I have tested with my backport patch and it is working fine as I am > >> unable to reproduce the mmu.c 1872 or 1860 bug with > >> CONFIG_DEBUG_PAGEALLOC not set. I tested with testcrash.sh loop 100 > >> and 1000. Now doing testcrash.sh loop 10000. > >> > >> Xiaoyun, is it possible for you to test my patch and see whether can > >> you reproduce the mmu.c 1872/1860 bug? > >> > >> Can anyone of you review my patch? > >> > >> I will post a format patch according to > >> Documentation/SubmittingPatches in my next reply and hopefully can b= e > >> reviewed. > >> > >> Thanks. > >> > >> Kindest regards, > >> Giam Teck Choon > > =20 --_a9fa7ebe-dc6a-4555-a398-7de33bf973b5_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable Hi:
 
       I have just kicked off cpuidle=3D= 0 "cpufreq=3Dnone" tests.
 
       What is your Xen version? =  Do you use the backend driver of 2.6.32.36?
 
       Beside the "TLB BUG ", I've met at l= east two other issues
       1)Xen4.0.1 + 2.6.32.36 kernel += backend driver from 2.6.31  =3D=3D> will cause "Bad grant r= eference " log in serial output
       2)Xen4.0.1 + 2.6.32.36 ker= nel with its owen backend driver   =3D=3D> will cause disk e= rror like belows.
 
sd 0:0:0:0: rejecting I/O t= o offline device
sd 0:0:0:0: rejecting I/O&nb= sp;to offline device
sd 0:0:0:0: rejecting I/= O to offline device
sd 0:0:0:0: rejecting&nbs= p;I/O to offline device
sd 0:0:0:0: rejecting=  I/O to offline device
sd 0:0:0:0: rejec= ting I/O to offline device
sd 0:0:0:0: r= ejecting I/O to offline device
sd 0:0:0:0:&nb= sp;rejecting I/O to offline device
sd 0:0:0:0= : rejecting I/O to offline device
sd 0:0= :0:0: rejecting I/O to offline device
sd = ;0:0:0:0: rejecting I/O to offline device
sd&= nbsp;0:0:0:0: rejecting I/O to offline devicesd 0:0:0:0: rejecting I/O to&n bsp;offline device
end_request: I/O error, dev&nb= sp;tdb, sector 28699593
end_request: I/O error,&nb= sp;dev tdb, sector 28699673
end_request: I/O = error, dev tdb, sector 28699753
end_request: = I/O error, dev tdb, sector 28699833
end_reque= st: I/O error, dev tdb, sector 28699913
= end_request: I/O error, dev tdb, sector 286= 99993
end_request: I/O error, dev tdb, sector=  28700073

     
    thanks.
 
 
> Date: Mon, 11 Apr 2011 23:25:19 +0800
> Subject: Re: kernel BU= G at arch/x86/xen/mmu.c:1872
> From: giamteckchoon@gmail.com
>= ; To: tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com; d= ave@ivt.com.au; ian.campbell@citrix.com; konrad.wilk@oracle.com; jeremy@g= oop.org; keir@xen.org
>
> 2011/4/11 MaoXiaoyun <tinnyclou= d@hotmail.com>:
> > Hi:
> >
> >   = ;   I believe this is the fix at much extent.
> >=      Since I have my own test cases which with this = patch, my test case will
> > success in 30 rounds run.
> &= gt;      Every round takes 8hours.  While w= ithout this patch, tests fail evey
> > round in 15minutes.
&g= t; >
> >       So this really means = fix most of the things.
> >
> >     = ;  But during running, I met another crash, from the log it it looks like
> > has relation with
= > > this BUG, since the crash log shows it is tlb related and = this BUG also tlb
> > related.
>
> Are you able to = run another test with cpuidle=3D0 cpufreq=3Dnone in kernel
> boot o= ption? Just curious whether can you reproduce the tlb bug when
> yo= u boot with cpuidle=3D0 cpufreq=3Dnone... ...
>
> >
&g= t; >       Well, I'm also have poor knowledge= of kernel.
> >       Hope someone= from Xen Devel offer some help.
> >
> >   &n= bsp;   Many thanks.
> >
> >> Date: Mon, 11= Apr 2011 20:16:53 +0800
> >> Subject: Re: kernel BUG at arch= /x86/xen/mmu.c:1872
> >> From: giamteckchoon@gmail.com
>= ; >> To: tinnycloud@hotmail.com
> >> CC: xen-devel@list= s.xensource.com; dave@ivt.com.au;
> >> ian.campbell@citrix.com; konrad.wilk@oracle.com; jeremy@goop.org;
&g= t; >> keir@xen.org
> >>
> >> >
> &= gt;> > Hi,
> >> >
> >> > Sorry, since= this mmu related BUG has been troubled me for very
> >> >= long... I really want to "kill" this BUG but my knowledge in kernel
&= gt; >> > hacking and/or xen is very limited.
> >> &g= t;
> >> > While waiting for Jeremy or Konrad or others ...=
> >> >
> >> > Many thanks for spending tim= e to track down this mmu related BUG.  I
> >> > have = backported the commit from
> >> >
> >> > ht= tp://git.kernel.org/?p=3Dlinux/kernel/git/jeremy/xen.git;a=3Dcommit;h=3D6= 4141da587241301ce8638cc945f8b67853156ec
> >> > to 2.6.32.3= 6 PVOPS kernel and patch attached.  I won't know whether
> >= ;> > did I backport it correctly nor does=20 it affects anything.  I am
> >> > currently testing = the 2.6.32.36 PVOPS kernel with this patch applied
> >> > = and also unset CONFIG_DEBUG_PAGEALLOC.  Currently running testcrash.= sh
> >> > loop 1000 as I am unable to reproduce this mmu B= UG 1872 in
> >> > testcrash.sh loop 100.  Please note= that when CONFIG_DEBUG_PAGEALLOC
> >> > is unset, I can r= eproduce this mmu BUG 1872 easily within <50
> >> > tes= tcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36
> &g= t;> > kernel.  Now test with this backport patch to see whethe= r I can
> >> > reproduce this mmu BUG... ...
> >&= gt; >
> >> > Kindest regards,
> >> > Gia= m Teck Choon
> >> >
> >>
> >> I ha= ve tested with my backport patch and it is working fine as I am
> &= gt;> unable to reproduce the mmu.c 1872 or 1860 bug with
> >> CONFIG_DEBUG_PAGEALLOC not set. I tes= ted with testcrash.sh loop 100
> >> and 1000. Now doing testc= rash.sh loop 10000.
> >>
> >> Xiaoyun, is it poss= ible for you to test my patch and see whether can
> >> you re= produce the mmu.c 1872/1860 bug?
> >>
> >> Can an= yone of you review my patch?
> >>
> >> I will pos= t a format patch according to
> >> Documentation/SubmittingPa= tches in my next reply and hopefully can be
> >> reviewed.> >>
> >> Thanks.
> >>
> >>= Kindest regards,
> >> Giam Teck Choon
> >
= --_a9fa7ebe-dc6a-4555-a398-7de33bf973b5_-- --===============0441027735== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============0441027735==--