From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Donnelly Subject: Re: R/W HG memory mappings with kvm? Date: Tue, 1 Sep 2009 09:13:03 +1200 Message-ID: <5f370d430908311413m6cb5951agf8f24e00b75b1eb1@mail.gmail.com> References: <5f370d430907051541o752d3dbag80d5cb251e5e4d00@mail.gmail.com> <5f370d430907281606j77f0c1a6j5feb081daca187ff@mail.gmail.com> <5f370d430908122107j15acd2c7i96d476e69032fadd@mail.gmail.com> <4A8BEC92.6070105@redhat.com> <5f370d430908231459q4c8cfe3j62c49e33a160ab71@mail.gmail.com> <4A921D3C.6020809@redhat.com> <5f370d430908261934m15f39ab9mf54a19bdee1f278f@mail.gmail.com> <4A9606C5.4060607@redhat.com> <5f370d430908301533l1068692j1ed902a268f0ae41@mail.gmail.com> <4A9B8D5D.2070209@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Cam Macdonell , "kvm@vger.kernel.org list" To: Avi Kivity Return-path: Received: from mail-bw0-f219.google.com ([209.85.218.219]:44691 "EHLO mail-bw0-f219.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751704AbZHaVNC convert rfc822-to-8bit (ORCPT ); Mon, 31 Aug 2009 17:13:02 -0400 Received: by bwz19 with SMTP id 19so3075565bwz.37 for ; Mon, 31 Aug 2009 14:13:03 -0700 (PDT) In-Reply-To: <4A9B8D5D.2070209@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, Aug 31, 2009 at 8:44 PM, Avi Kivity wrote: > On 08/31/2009 01:33 AM, Stephen Donnelly wrote: >> >>> We can't duplicate mm/ in kvm. =A0However, mm/memory.c says: >>> >>> =A0* The way we recognize COWed pages within VM_PFNMAP mappings is = through >>> the >>> =A0* rules set up by "remap_pfn_range()": the vma will have the VM_= PFNMAP >>> bit >>> =A0* set, and the vm_pgoff will point to the first PFN mapped: thus= every >>> special >>> =A0* mapping will always honor the rule >>> =A0* >>> =A0* =A0 =A0 =A0pfn_of_page =3D=3D vma->vm_pgoff + ((addr - vma->vm= _start)>> >>> PAGE_SHIFT) >>> =A0* >>> =A0* And for normal mappings this is false. >>> >>> So it seems the kvm calculation is right and you should set vm_pgof= f in >>> your >>> driver. >>> >> >> That may be true for COW pages, which are main memory, but I don't >> think it is true for device drivers. >> > > No, COW pages have no linear pfn mapping. =A0It's only true for > remap_pfn_range). > >> In a device driver the mmap function receives the vma from the OS. T= he >> vm_pgoff field contains the offset area in the file. For drivers thi= s >> is used to determine where to start the map compared to the io base >> address. >> >> If the driver is mapping io memory to user space it calls >> io_remap_pfn_range with the pfn for the io memory. The remap_pfn_ran= ge >> call sets the VM_IO and VM_PFNMAP bits in vm_flags. It does not alte= r >> the vm_pgoff value. >> >> A simple example is hpet_mmap() in drivers/char/hpet.c, or >> mbcs_gscr_mmap() in drivers/char/mbcs.c. >> > > io_remap_pfn_range() is remap_pfn_range(), which has this: > > =A0 =A0 =A0 =A0if (addr =3D=3D vma->vm_start && end =3D=3D vma->vm_en= d) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vma->vm_pgoff =3D pfn; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vma->vm_flags |=3D VM_PFN_AT_MMAP; > =A0 =A0 =A0 =A0} > > So remap_pfn_range() will alter the pgoff. Aha! We are looking at different kernels. I should have mentioned I was looking at 2.6.28. In mm/memory.c remap_pfn_range() this has: * There's a horrible special case to handle copy-on-write * behaviour that some programs depend on. We mark the "original" * un-COW'ed pages by matching them up with "vma->vm_pgoff". */ if (is_cow_mapping(vma->vm_flags)) { if (addr !=3D vma->vm_start || end !=3D vma->vm_end) return -EINVAL; vma->vm_pgoff =3D pfn; } The macro is: static inline int is_cow_mapping(unsigned int flags) { return (flags & (VM_SHARED | VM_MAYWRITE)) =3D=3D VM_MAYWRITE; } Because my vma is marked shared, this clause does not operate and vm_pgoff is not modified (it is still 0). > I'm totally confused now. Sorry about that. The issue is the BUG in gfn_to_pgn where the pfn is not calculated correctly after looking up the vma. I still don't see how to get the physical address from the vma, since vm_pgoff is zero, and the vm_ops are not filled. The vma does not seem to store the physical base address. Regards, Stephen.