From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: Kernel bug from 3.0 (was phy disks and vifs timing out in DomU) Date: Fri, 26 Aug 2011 13:32:00 +0100 Message-ID: References: <29902981.10.1311837224851.JavaMail.root@zimbra.overnetdata.com> <24093349.14.1311837878822.JavaMail.root@zimbra.overnetdata.com> <4E31820C.5030200@overnetdata.com> <1311870512.24408.153.camel@cthulhu.hellion.org.uk> <4E3266DE.9000606@overnetdata.com> <20110803152841.GA2860@dumpdata.com> <4E4E3957.1040007@overnetdata.com> <20110819125615.GA26558@dumpdata.com> <4E56BA90.3050907@overnetdata.com> <4E578E5D.2010701@overnetdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Return-path: In-Reply-To: <4E578E5D.2010701@overnetdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Anthony Wright Cc: "xen-devel@lists.xensource.com" , Wilk , Stefano Stabellini , Ian Campbell , Konrad, Todd Deshane List-Id: xen-devel@lists.xenproject.org On Fri, 26 Aug 2011, Anthony Wright wrote: > On 26/08/2011 13:16, Stefano Stabellini wrote: > > On Thu, 25 Aug 2011, Anthony Wright wrote: > >> On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote: > >>> On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote: > >>>> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote: > >>>>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote: > >>>>>> I've just upgraded to xen 4.1.1 with a stock 3.0 kernel on dom0 (with > >>>>>> the vga-support patch backported). I can't get my DomU's to work due to > >>>>>> the phy disks and vifs timing out in DomU and looking through my logs > >>>>>> this morning I'm getting a consistent kernel bug report with xen > >>>>>> mentioned at the top of the stack trace and vifdisconnect mentioned on > >>>>> Yikes! Ian any ideas what to try? > >>>>> > >>>>> Anthony, can you compile the kernel with debug=y and when this happens > >>>>> see what 'xl dmesg' gives? Also there is also the 'xl debug-keys g' which > >>>>> should dump the grants in use.. that might help a bit. > >>>> I've compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of other > >>>> config values appeared at this point, and I took defaults for them). > >>>> > >>>> The output from /var/log/messages & 'xl dmesg' is attached. There was no > >>>> output from 'xl debug-keys g'. > >>> Ok, so I am hitting this too - I was hoping that the patch from Stefano > >>> would have fixed the issue, but sadly it did not. > >>> > >>> Let me (I am traveling right now) see if I can come up with an internim > >>> solution until Ian comes with the right fix. > >>> > >> On different hardware with the same software I'm also getting problems > >> starting DomUs, but this time the error is different. I've attached a > >> copy of the xl console output, but basically the server hang at > >> "Mount-cache hash table entries: 512". Again the VM is paravirtualised, > >> and again I get a qemu-dm process for it. > >> > >> The references to this message are normally related to memory issues, > >> but the server has only 1000M of ram, so can't see it causing too much > >> of a problem. > >> > >> Is this related to the other problems I'm seeing or completely separate? > > Could you please post your VM config file? > Attached are two VM config files. The file xen-config-A is the xen > server that fails at the Mount-cache line. The file xen-config-B is the > xen server that timeout attaching to some of the xvds and the vif. > can you try to use losetup to setup a loop device for each of the tap:aio files you have and then specify phy:/dev/loopN in the config file rather than tap:aio? For example I mean: losetup /dev/loop0 /workspace/agent/appliances/XenFileServer-3.20/rootfs then in the config file: phy:/dev/loop0,xvda1,r