From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Kernel bug from 3.0 (was phy disks and vifs timing out in DomU) Date: Thu, 1 Sep 2011 10:23:56 -0400 Message-ID: <20110901142356.GD23971@dumpdata.com> References: <4E3266DE.9000606@overnetdata.com> <20110803152841.GA2860@dumpdata.com> <4E4E3957.1040007@overnetdata.com> <20110819125615.GA26558@dumpdata.com> <4E56B132.9050708@overnetdata.com> <20110826142606.GA25511@dumpdata.com> <20110826144438.GA24836@dumpdata.com> <4E5E6843.7050206@citrix.com> <20110831170711.GB13642@dumpdata.com> <1314862972.28989.74.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <1314862972.28989.74.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: Todd Deshane , "xen-devel@lists.xensource.com" , Jeremy Fitzhardinge , David Vrabel , Anthony Wright List-Id: xen-devel@lists.xenproject.org On Thu, Sep 01, 2011 at 08:42:52AM +0100, Ian Campbell wrote: > On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote: > > On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote: > > > On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote: > > > > > > > > So while I am still looking at the hypervisor code to figure out why > > > > it would give me [when trying to map a grant page]: > > > > > > > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 > > > > > > It is failing in guest_map_l1e() because the page for the vmalloc'd > > > virtual address PTEs is not present. > > > > > > The test that fails is: > > > > > > (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT > > > > > > I think this is because the GNTTABOP_map_grant_ref hypercall is done > > > when task->active_mm != &init_mm and alloc_vm_area() only adds PTEs into > > > init_mm so when Xen looks in the page tables it doesn't find the entries > > > because they're not there yet. > > > > > > Putting a call to vmalloc_sync_all() after create_vm_area() and before > > > the hypercall makes it work for me. Classic Xen kernels used to have > > > such a call. > > > > That sounds quite reasonable. > > I was wondering why upstream was missing the vmalloc_sync_all() in > alloc_vm_area() since the out-of-tree kernels did have it and the > function was added by us. I found this: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a > > commit ef691947d8a3d479e67652312783aedcf629320a > Author: Jeremy Fitzhardinge > Date: Wed Dec 1 15:45:48 2010 -0800 > > vmalloc: remove vmalloc_sync_all() from alloc_vm_area() > > There's no need for it: it will get faulted into the current pagetable > as needed. > > Signed-off-by: Jeremy Fitzhardinge > > The flaw in the reasoning here is that you cannot take a kernel fault > while processing a hypercall, so hypercall arguments must have been > faulted in beforehand and that is what the sync_all was for. > > It's probably fair to say that the Xen specific caller should take care > of that Xen-specific requirement rather than pushing it into common > code. On the other hand Xen is the only user and creating a Xen specific > helper/wrapper seems a bit pointless. Perhaps then doing the vmalloc_sync_all() (or are more precise one: vmalloc_sync_one) should be employed in the netback code then? And obviously guarded by the CONFIG_HIGHMEM case?