From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Donnelly Subject: Re: R/W HG memory mappings with kvm? Date: Mon, 31 Aug 2009 10:33:21 +1200 Message-ID: <5f370d430908301533l1068692j1ed902a268f0ae41@mail.gmail.com> References: <5f370d430907051541o752d3dbag80d5cb251e5e4d00@mail.gmail.com> <5f370d430907271432y5283c2cat7673efeed0febe20@mail.gmail.com> <4A6EBCB3.4080804@redhat.com> <5f370d430907281606j77f0c1a6j5feb081daca187ff@mail.gmail.com> <5f370d430908122107j15acd2c7i96d476e69032fadd@mail.gmail.com> <4A8BEC92.6070105@redhat.com> <5f370d430908231459q4c8cfe3j62c49e33a160ab71@mail.gmail.com> <4A921D3C.6020809@redhat.com> <5f370d430908261934m15f39ab9mf54a19bdee1f278f@mail.gmail.com> <4A9606C5.4060607@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Cam Macdonell , "kvm@vger.kernel.org list" To: Avi Kivity Return-path: Received: from mail-bw0-f219.google.com ([209.85.218.219]:57485 "EHLO mail-bw0-f219.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754189AbZH3WdV convert rfc822-to-8bit (ORCPT ); Sun, 30 Aug 2009 18:33:21 -0400 Received: by bwz19 with SMTP id 19so2488061bwz.37 for ; Sun, 30 Aug 2009 15:33:21 -0700 (PDT) In-Reply-To: <4A9606C5.4060607@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Aug 27, 2009 at 4:08 PM, Avi Kivity wrote: > On 08/27/2009 05:34 AM, Stephen Donnelly wrote: >> >> On Mon, Aug 24, 2009 at 4:55 PM, Avi Kivity =A0wrote= : >> >>> >>> On 08/24/2009 12:59 AM, Stephen Donnelly wrote: >>> >>>> >>>> On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity =A0 =A0= wrote: >>>> >>>>> >>>>> On 08/13/2009 07:07 AM, Stephen Donnelly wrote: >>>>> >>>>>> >>>>>> npages =3D get_user_pages_fast(addr, 1, 1, page); returns -EFAUL= T, >>>>>> presumably because (vma->vm_flags& =A0 =A0 =A0(VM_IO | VM_PFNMAP= )). >>>>>> >>>>>> It takes then unlikely branch, and checks the vma, but I don't >>>>>> understand what it is doing here: pfn =3D ((addr - vma->vm_start= )>> >>>>>> PAGE_SHIFT) + vma->vm_pgoff; >>>>>> >>>>> >>>>> It's calculating the pfn according to pfnmap rules. >>>>> >>>> >>>> =A0From what I understand this will only work when remapping 'main >>>> memory', e.g. where the pgoff is equal to the physical page offset= ? >>>> VMAs that remap IO memory will usually set pgoff to 0 for the star= t of >>>> the mapping. >>>> >>> >>> If so, how do they calculate the pfn when mapping pages? =A0kvm nee= ds to be >>> able to do the same thing. >>> >> >> If the vma->vm_file is /dev/mem, then the pg_off will map to physica= l >> addresses directly (at least on x86), and the calculation works. If >> the vma is remapping io memory from a driver, then vma->vm_file will >> point to the device node for that driver. Perhaps we can do a check >> for this at least? >> > > We can't duplicate mm/ in kvm. =A0However, mm/memory.c says: > > > =A0* The way we recognize COWed pages within VM_PFNMAP mappings is th= rough the > =A0* rules set up by "remap_pfn_range()": the vma will have the VM_PF= NMAP bit > =A0* set, and the vm_pgoff will point to the first PFN mapped: thus e= very > special > =A0* mapping will always honor the rule > =A0* > =A0* =A0 =A0 =A0pfn_of_page =3D=3D vma->vm_pgoff + ((addr - vma->vm_s= tart) >> > PAGE_SHIFT) > =A0* > =A0* And for normal mappings this is false. > > So it seems the kvm calculation is right and you should set vm_pgoff = in your > driver. That may be true for COW pages, which are main memory, but I don't think it is true for device drivers. In a device driver the mmap function receives the vma from the OS. The vm_pgoff field contains the offset area in the file. For drivers this is used to determine where to start the map compared to the io base address. If the driver is mapping io memory to user space it calls io_remap_pfn_range with the pfn for the io memory. The remap_pfn_range call sets the VM_IO and VM_PFNMAP bits in vm_flags. It does not alter the vm_pgoff value. A simple example is hpet_mmap() in drivers/char/hpet.c, or mbcs_gscr_mmap() in drivers/char/mbcs.c. >>>> I'm still not sure how genuine IO memory (mapped from a driver to >>>> userspace with remap_pfn_range or io_remap_page_range) could be ma= pped >>>> into kvm though. >>>> >>> >>> If it can be mapped to userspace, it can be mapped to kvm. =A0We ju= st need >>> to >>> synchronize the rules. >>> >> >> We can definitely map it into userspace. The problem seems to be how >> the kvm kernel module translates the guest pfn back to a host physic= al >> address. >> >> Is there a kernel equivalent of mmap? > > do_mmap(), but don't use it. =A0Use mmap() from userspace like everyo= ne else. Of course you are right, gfn_to_pfn is in user space. There is already a mapping of the memory to the process (from qemu_ram_mmap), the question is how to look it up. Regards, Stephen.