From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Stodden Subject: Re: Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet. Date: Mon, 27 Sep 2010 02:46:29 -0700 Message-ID: <1285580789.4365.620.camel@ramone.somacoma.net> References: <1282546470-5547-1-git-send-email-daniel.stodden@citrix.com> <1282546470-5547-2-git-send-email-daniel.stodden@citrix.com> <4C802934.2000305@goop.org> <4C9B7B69.7080705@redhat.com> <4C9B7F1A.2040302@goop.org> <4C9B826B.10302@redhat.com> <4C9B9E1D.2040501@goop.org> <4C9C4FDA.1070907@redhat.com> <4C9CF2F6.2070806@goop.org> <4CA04AC1.4060902@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4CA04AC1.4060902@redhat.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andrew Jones Cc: Paolo Bonzini , Jeremy Fitzhardinge , Xen , Tom Kopec List-Id: xen-devel@lists.xenproject.org On Mon, 2010-09-27 at 03:41 -0400, Andrew Jones wrote: > On 09/24/2010 08:50 PM, Jeremy Fitzhardinge wrote: > > On 09/24/2010 12:14 AM, Andrew Jones wrote: > >> On 09/23/2010 08:36 PM, Jeremy Fitzhardinge wrote: > >>> On 09/23/2010 09:38 AM, Paolo Bonzini wrote: > >>>> On 09/23/2010 06:23 PM, Jeremy Fitzhardinge wrote: > >>>>>> Any developments with this? I've got a report of the exact same > >>>>>> warnings > >>>>>> on RHEL6 guest. See > >>>>>> > >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=632802 > >>>>>> > >>>>>> RHEL6 doesn't have the 'Move blkif_interrupt into a tasklet' patch, so > >>>>>> that can be ruled out. Unfortunately I don't have this reproducing on a > >>>>>> test machine, so it's difficult to debug. The report I have showed > >>>>>> that > >>>>>> in at least one case it occurred on boot up, right after initting the > >>>>>> block device. I'm trying to get confirmation if that's always the case. > >>>>>> > >>>>>> Thanks in advance for any pointers you might have. > >>>>> Yes, I see it even after reverting that change as well. However I only > >>>>> see it on my domain with an XFS filesystem, but I haven't dug any deeper > >>>>> to see if that's relevant. > >>>>> > >>>>> Do you know when this appeared? Is it recent? What changes are in the > >>>>> rhel6 kernel in question? > >>>> It's got pretty much everything in stable-2.6.32.x, up to the 16 patch > >>>> blkfront series you posted last July. There are some RHEL-specific > >>>> workarounds for PV-on-HVM, but for PV domains everything matches > >>>> upstream. > >>> Have you tried bisecting to see when this particular problem appeared? > >>> It looks to me like something is accidentally re-enabling interrupts - > >>> perhaps a stack overrun is corrupting the "flags" argument between a > >>> spin_lock_irqsave()/restore pair. > >>> > >> Unfortunately I don't have a test machine where I can do a bisection > >> (yet). I'm looking for one. I only have this one report so far, and it's > >> on a production machine. > > > > The report says that its repeatedly killing the machine though? In my > > testing, it seems to hit the warning once at boot, but is OK after that > > (not that I'm doing anything very stressful on the domain). > > > > It looks like the crash is from failing to read swap due to a bad page > map. It's possibly another issue, but I wanted to try and clean this > issue up first to see what happens. Uh oh. Sure this was a frontend crash? If you see it a again, a stack trace to look at would be great. Thanks, Daniel