From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Kernel bug from 3.0 (was phy disks and vifs timing out in DomU) Date: Fri, 26 Aug 2011 10:44:38 -0400 Message-ID: <20110826144438.GA24836@dumpdata.com> References: <24093349.14.1311837878822.JavaMail.root@zimbra.overnetdata.com> <4E31820C.5030200@overnetdata.com> <1311870512.24408.153.camel@cthulhu.hellion.org.uk> <4E3266DE.9000606@overnetdata.com> <20110803152841.GA2860@dumpdata.com> <4E4E3957.1040007@overnetdata.com> <20110819125615.GA26558@dumpdata.com> <4E56B132.9050708@overnetdata.com> <20110826142606.GA25511@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20110826142606.GA25511@dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Anthony Wright Cc: Ian Campbell , Todd Deshane , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On Fri, Aug 26, 2011 at 10:26:06AM -0400, Konrad Rzeszutek Wilk wrote: > On Thu, Aug 25, 2011 at 09:31:46PM +0100, Anthony Wright wrote: > > On 19/08/2011 13:56, Konrad Rzeszutek Wilk wrote: > > > On Fri, Aug 19, 2011 at 11:22:15AM +0100, Anthony Wright wrote: > > >> On 03/08/2011 16:28, Konrad Rzeszutek Wilk wrote: > > >>> On Fri, Jul 29, 2011 at 08:53:02AM +0100, Anthony Wright wrote: > > >>>> I've just upgraded to xen 4.1.1 with a stock 3.0 kernel on dom0 (with > > >>>> the vga-support patch backported). I can't get my DomU's to work due to > > >>>> the phy disks and vifs timing out in DomU and looking through my logs > > >>>> this morning I'm getting a consistent kernel bug report with xen > > >>>> mentioned at the top of the stack trace and vifdisconnect mentioned on > > >>> Yikes! Ian any ideas what to try? > > >>> > > >>> Anthony, can you compile the kernel with debug=y and when this happens > > >>> see what 'xl dmesg' gives? Also there is also the 'xl debug-keys g' which > > >>> should dump the grants in use.. that might help a bit. > > >> I've compiled a 3.0.1 kernel with CONFIG_DEBUG=Y (a number of other > > >> config values appeared at this point, and I took defaults for them). > > >> > > >> The output from /var/log/messages & 'xl dmesg' is attached. There was no > > >> output from 'xl debug-keys g'. > > > Ok, so I am hitting this too - I was hoping that the patch from Stefano > > > would have fixed the issue, but sadly it did not. > > > > > > Let me (I am traveling right now) see if I can come up with an internim > > > solution until Ian comes with the right fix. > > > > > Hi Konrad - any progress on this - it's a bit of a show stopper for me. > > What is interesting is that it happens only with 32-bit guests and with > not-so fast hardware: Atom D510 for me and in your case MSI MS-7309 motherboard > (with what kind of processor?). I've a 64-bit hypervisor - not sure if you > are using a 32-bit or 64-bit. > > I hadn't tried to reproduce this on the Atom D510 with a 64-bit Dom0. > But I was wondering if you had this setup before - with a 64-bit dom0? > Or is that really not an option with your CPU? So while I am still looking at the hypervisor code to figure out why it would give me: (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 I've cobbled this patch^H^H^Hhack to retry the transaction to see if this is a tempory issue (race) or really - somehow that L1 PTE is gone. If you could, can you try it out and see if the errors that are spit are repeated - mainly the "Could not find L1 PTE". You will need to run the hypervisor with "loglvl=all" to get that information. to compile the hypervisor with debug=y to get that diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index fd00f25..7bee981 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1607,7 +1607,7 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, struct gnttab_map_grant_ref op; struct xen_netif_tx_sring *txs; struct xen_netif_rx_sring *rxs; - + int retry = 3; int err = -ENOMEM; vif->tx_comms_area = alloc_vm_area(PAGE_SIZE); @@ -1620,7 +1620,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, gnttab_set_map_op(&op, (unsigned long)vif->tx_comms_area->addr, GNTMAP_host_map, tx_ring_ref, vif->domid); - + op.status = 0; +retry_tx: if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) BUG(); @@ -1628,6 +1629,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, netdev_warn(vif->dev, "failed to map tx ring. err=%d status=%d\n", err, op.status); + if (retry-- > 0) + goto retry_tx; err = op.status; goto err; } @@ -1641,6 +1644,9 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, gnttab_set_map_op(&op, (unsigned long)vif->rx_comms_area->addr, GNTMAP_host_map, rx_ring_ref, vif->domid); + retry = 3; + op.status = 0; +retry_rx: if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) BUG(); @@ -1648,6 +1654,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif, netdev_warn(vif->dev, "failed to map rx ring. err=%d status=%d\n", err, op.status); + if (retry-- > 0) + goto retry_rx; err = op.status; goto err; } > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel