kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: R/W HG memory mappings with kvm?
@ 2009-09-28 18:27 Tsuyoshi Ozawa
  0 siblings, 0 replies; 31+ messages in thread
From: Tsuyoshi Ozawa @ 2009-09-28 18:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

Hello,

>>  Sorry about that. The issue is the BUG in gfn_to_pgn where the pfn is
>>  not calculated correctly after looking up the vma.

>>  I still don't see how to get the physical address from the vma, since
>>  vm_pgoff is zero, and the vm_ops are not filled. The vma does not seem
>>  to store the physical base address.

> So it seems the only place the pfns are stored are in the ptes themselves. Is there an API to recover the ptes from a virtual address? We could use that instead.

I'm also trying to share H/G memory with another solution -
by overwriting shadow page table.

It seems that gfn_to_pfn is the key function which associate
guest memoy with host memory. So I changed gfn_to_pfn
as follows:

pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
{
    ...
    } else
        if ( shared_gfn && shared_gfn == gfn ){
            return shared_pfn;  // return pfn which is wanted to share
        }else {
            pfn = page_to_pfn(page[0]);
        }
    }
    ...
}

Here, shared_gfn is registered by walking soft mmu with gva.
And shared_pfn is the page frame number which is hostside.
By rewriting adobe, kvm is foxed and make up new shadow
page table with new mapping after zap all pages.

But I failed to share the memory. Do I have any misunderstanding?

Regards,
Tsuyoshi Ozawa

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-31 21:13                                       ` Stephen Donnelly
@ 2009-09-09 12:50                                         ` Avi Kivity
  0 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2009-09-09 12:50 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On 09/01/2009 12:13 AM, Stephen Donnelly wrote:
>
>> I'm totally confused now.
>>      
> Sorry about that. The issue is the BUG in gfn_to_pgn where the pfn is
> not calculated correctly after looking up the vma.
>
> I still don't see how to get the physical address from the vma, since
> vm_pgoff is zero, and the vm_ops are not filled. The vma does not seem
> to store the physical base address.
>    

So it seems the only place the pfns are stored are in the ptes 
themselves.  Is there an API to recover the ptes from a virtual 
address?  We could use that instead.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-31  8:44                                     ` Avi Kivity
@ 2009-08-31 21:13                                       ` Stephen Donnelly
  2009-09-09 12:50                                         ` Avi Kivity
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-08-31 21:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On Mon, Aug 31, 2009 at 8:44 PM, Avi Kivity<avi@redhat.com> wrote:
> On 08/31/2009 01:33 AM, Stephen Donnelly wrote:
>>
>>> We can't duplicate mm/ in kvm.  However, mm/memory.c says:
>>>
>>>  * The way we recognize COWed pages within VM_PFNMAP mappings is through
>>> the
>>>  * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP
>>> bit
>>>  * set, and the vm_pgoff will point to the first PFN mapped: thus every
>>> special
>>>  * mapping will always honor the rule
>>>  *
>>>  *      pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start)>>
>>> PAGE_SHIFT)
>>>  *
>>>  * And for normal mappings this is false.
>>>
>>> So it seems the kvm calculation is right and you should set vm_pgoff in
>>> your
>>> driver.
>>>
>>
>> That may be true for COW pages, which are main memory, but I don't
>> think it is true for device drivers.
>>
>
> No, COW pages have no linear pfn mapping.  It's only true for
> remap_pfn_range).
>
>> In a device driver the mmap function receives the vma from the OS. The
>> vm_pgoff field contains the offset area in the file. For drivers this
>> is used to determine where to start the map compared to the io base
>> address.
>>
>> If the driver is mapping io memory to user space it calls
>> io_remap_pfn_range with the pfn for the io memory. The remap_pfn_range
>> call sets the VM_IO and VM_PFNMAP bits in vm_flags. It does not alter
>> the vm_pgoff value.
>>
>> A simple example is hpet_mmap() in drivers/char/hpet.c, or
>> mbcs_gscr_mmap() in drivers/char/mbcs.c.
>>
>
> io_remap_pfn_range() is remap_pfn_range(), which has this:
>
>        if (addr == vma->vm_start && end == vma->vm_end) {
>                vma->vm_pgoff = pfn;
>                vma->vm_flags |= VM_PFN_AT_MMAP;
>        }
>
> So remap_pfn_range() will alter the pgoff.

Aha! We are looking at different kernels. I should have mentioned I
was looking at 2.6.28. In mm/memory.c remap_pfn_range() this has:

	 * There's a horrible special case to handle copy-on-write
	 * behaviour that some programs depend on. We mark the "original"
	 * un-COW'ed pages by matching them up with "vma->vm_pgoff".
	 */
	if (is_cow_mapping(vma->vm_flags)) {
		if (addr != vma->vm_start || end != vma->vm_end)
			return -EINVAL;
		vma->vm_pgoff = pfn;
	}

The macro is:

static inline int is_cow_mapping(unsigned int flags)
{
	return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
}

Because my vma is marked shared, this clause does not operate and
vm_pgoff is not modified (it is still 0).

> I'm totally confused now.

Sorry about that. The issue is the BUG in gfn_to_pgn where the pfn is
not calculated correctly after looking up the vma.

I still don't see how to get the physical address from the vma, since
vm_pgoff is zero, and the vm_ops are not filled. The vma does not seem
to store the physical base address.

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-30 22:33                                   ` Stephen Donnelly
@ 2009-08-31  8:44                                     ` Avi Kivity
  2009-08-31 21:13                                       ` Stephen Donnelly
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2009-08-31  8:44 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On 08/31/2009 01:33 AM, Stephen Donnelly wrote:
>
>> We can't duplicate mm/ in kvm.  However, mm/memory.c says:
>>
>>
>>   * The way we recognize COWed pages within VM_PFNMAP mappings is through the
>>   * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit
>>   * set, and the vm_pgoff will point to the first PFN mapped: thus every
>> special
>>   * mapping will always honor the rule
>>   *
>>   *      pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start)>>
>> PAGE_SHIFT)
>>   *
>>   * And for normal mappings this is false.
>>
>> So it seems the kvm calculation is right and you should set vm_pgoff in your
>> driver.
>>      
> That may be true for COW pages, which are main memory, but I don't
> think it is true for device drivers.
>    

No, COW pages have no linear pfn mapping.  It's only true for 
remap_pfn_range).

> In a device driver the mmap function receives the vma from the OS. The
> vm_pgoff field contains the offset area in the file. For drivers this
> is used to determine where to start the map compared to the io base
> address.
>
> If the driver is mapping io memory to user space it calls
> io_remap_pfn_range with the pfn for the io memory. The remap_pfn_range
> call sets the VM_IO and VM_PFNMAP bits in vm_flags. It does not alter
> the vm_pgoff value.
>
> A simple example is hpet_mmap() in drivers/char/hpet.c, or
> mbcs_gscr_mmap() in drivers/char/mbcs.c.
>    

io_remap_pfn_range() is remap_pfn_range(), which has this:

         if (addr == vma->vm_start && end == vma->vm_end) {
                 vma->vm_pgoff = pfn;
                 vma->vm_flags |= VM_PFN_AT_MMAP;
         }

So remap_pfn_range() will alter the pgoff.

>> do_mmap(), but don't use it.  Use mmap() from userspace like everyone else.
>>      
> Of course you are right, gfn_to_pfn is in user space. There is already
> a mapping of the memory to the process (from qemu_ram_mmap), the
> question is how to look it up.
>    

I'm totally confused now.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-27  4:08                                 ` Avi Kivity
@ 2009-08-30 22:33                                   ` Stephen Donnelly
  2009-08-31  8:44                                     ` Avi Kivity
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-08-30 22:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On Thu, Aug 27, 2009 at 4:08 PM, Avi Kivity<avi@redhat.com> wrote:
> On 08/27/2009 05:34 AM, Stephen Donnelly wrote:
>>
>> On Mon, Aug 24, 2009 at 4:55 PM, Avi Kivity<avi@redhat.com>  wrote:
>>
>>>
>>> On 08/24/2009 12:59 AM, Stephen Donnelly wrote:
>>>
>>>>
>>>> On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity<avi@redhat.com>    wrote:
>>>>
>>>>>
>>>>> On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>>>>
>>>>>>
>>>>>> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
>>>>>> presumably because (vma->vm_flags&      (VM_IO | VM_PFNMAP)).
>>>>>>
>>>>>> It takes then unlikely branch, and checks the vma, but I don't
>>>>>> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
>>>>>> PAGE_SHIFT) + vma->vm_pgoff;
>>>>>>
>>>>>
>>>>> It's calculating the pfn according to pfnmap rules.
>>>>>
>>>>
>>>>  From what I understand this will only work when remapping 'main
>>>> memory', e.g. where the pgoff is equal to the physical page offset?
>>>> VMAs that remap IO memory will usually set pgoff to 0 for the start of
>>>> the mapping.
>>>>
>>>
>>> If so, how do they calculate the pfn when mapping pages?  kvm needs to be
>>> able to do the same thing.
>>>
>>
>> If the vma->vm_file is /dev/mem, then the pg_off will map to physical
>> addresses directly (at least on x86), and the calculation works. If
>> the vma is remapping io memory from a driver, then vma->vm_file will
>> point to the device node for that driver. Perhaps we can do a check
>> for this at least?
>>
>
> We can't duplicate mm/ in kvm.  However, mm/memory.c says:
>
>
>  * The way we recognize COWed pages within VM_PFNMAP mappings is through the
>  * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit
>  * set, and the vm_pgoff will point to the first PFN mapped: thus every
> special
>  * mapping will always honor the rule
>  *
>  *      pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >>
> PAGE_SHIFT)
>  *
>  * And for normal mappings this is false.
>
> So it seems the kvm calculation is right and you should set vm_pgoff in your
> driver.

That may be true for COW pages, which are main memory, but I don't
think it is true for device drivers.

In a device driver the mmap function receives the vma from the OS. The
vm_pgoff field contains the offset area in the file. For drivers this
is used to determine where to start the map compared to the io base
address.

If the driver is mapping io memory to user space it calls
io_remap_pfn_range with the pfn for the io memory. The remap_pfn_range
call sets the VM_IO and VM_PFNMAP bits in vm_flags. It does not alter
the vm_pgoff value.

A simple example is hpet_mmap() in drivers/char/hpet.c, or
mbcs_gscr_mmap() in drivers/char/mbcs.c.

>>>> I'm still not sure how genuine IO memory (mapped from a driver to
>>>> userspace with remap_pfn_range or io_remap_page_range) could be mapped
>>>> into kvm though.
>>>>
>>>
>>> If it can be mapped to userspace, it can be mapped to kvm.  We just need
>>> to
>>> synchronize the rules.
>>>
>>
>> We can definitely map it into userspace. The problem seems to be how
>> the kvm kernel module translates the guest pfn back to a host physical
>> address.
>>
>> Is there a kernel equivalent of mmap?
>
> do_mmap(), but don't use it.  Use mmap() from userspace like everyone else.

Of course you are right, gfn_to_pfn is in user space. There is already
a mapping of the memory to the process (from qemu_ram_mmap), the
question is how to look it up.

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-27  2:34                               ` Stephen Donnelly
@ 2009-08-27  4:08                                 ` Avi Kivity
  2009-08-30 22:33                                   ` Stephen Donnelly
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2009-08-27  4:08 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On 08/27/2009 05:34 AM, Stephen Donnelly wrote:
> On Mon, Aug 24, 2009 at 4:55 PM, Avi Kivity<avi@redhat.com>  wrote:
>    
>> On 08/24/2009 12:59 AM, Stephen Donnelly wrote:
>>      
>>> On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity<avi@redhat.com>    wrote:
>>>        
>>>> On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>>>          
>>>>> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
>>>>> presumably because (vma->vm_flags&      (VM_IO | VM_PFNMAP)).
>>>>>
>>>>> It takes then unlikely branch, and checks the vma, but I don't
>>>>> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
>>>>> PAGE_SHIFT) + vma->vm_pgoff;
>>>>>            
>>>> It's calculating the pfn according to pfnmap rules.
>>>>          
>>>   From what I understand this will only work when remapping 'main
>>> memory', e.g. where the pgoff is equal to the physical page offset?
>>> VMAs that remap IO memory will usually set pgoff to 0 for the start of
>>> the mapping.
>>>        
>> If so, how do they calculate the pfn when mapping pages?  kvm needs to be
>> able to do the same thing.
>>      
> If the vma->vm_file is /dev/mem, then the pg_off will map to physical
> addresses directly (at least on x86), and the calculation works. If
> the vma is remapping io memory from a driver, then vma->vm_file will
> point to the device node for that driver. Perhaps we can do a check
> for this at least?
>    

We can't duplicate mm/ in kvm.  However, mm/memory.c says:


  * The way we recognize COWed pages within VM_PFNMAP mappings is 
through the
  * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit
  * set, and the vm_pgoff will point to the first PFN mapped: thus every 
special
  * mapping will always honor the rule
  *
  *      pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> 
PAGE_SHIFT)
  *
  * And for normal mappings this is false.

So it seems the kvm calculation is right and you should set vm_pgoff in 
your driver.

>
>
>>> I'm still not sure how genuine IO memory (mapped from a driver to
>>> userspace with remap_pfn_range or io_remap_page_range) could be mapped
>>> into kvm though.
>>>        
>> If it can be mapped to userspace, it can be mapped to kvm.  We just need to
>> synchronize the rules.
>>      
> We can definitely map it into userspace. The problem seems to be how
> the kvm kernel module translates the guest pfn back to a host physical
> address.
>
> Is there a kernel equivalent of mmap?
>    

do_mmap(), but don't use it.  Use mmap() from userspace like everyone else.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-26 10:22                               ` Avi Kivity
@ 2009-08-27  2:39                                 ` Stephen Donnelly
  0 siblings, 0 replies; 31+ messages in thread
From: Stephen Donnelly @ 2009-08-27  2:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Cam Macdonell, kvm@vger.kernel.org list, Marcelo Tosatti, Chris Wright

On Wed, Aug 26, 2009 at 10:22 PM, Avi Kivity<avi@redhat.com> wrote:
> On 08/24/2009 07:55 AM, Avi Kivity wrote:
>>
>> On 08/24/2009 12:59 AM, Stephen Donnelly wrote:
>>>
>>> On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity<avi@redhat.com>  wrote:
>>>>
>>>> On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>>>>
>>>>> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
>>>>> presumably because (vma->vm_flags&    (VM_IO | VM_PFNMAP)).
>>>>>
>>>>> It takes then unlikely branch, and checks the vma, but I don't
>>>>> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
>>>>> PAGE_SHIFT) + vma->vm_pgoff;
>>>>
>>>> It's calculating the pfn according to pfnmap rules.
>>>
>>>  From what I understand this will only work when remapping 'main
>>> memory', e.g. where the pgoff is equal to the physical page offset?
>>> VMAs that remap IO memory will usually set pgoff to 0 for the start of
>>> the mapping.
>>
>> If so, how do they calculate the pfn when mapping pages?  kvm needs to be
>> able to do the same thing.
>
> Maybe the simplest thing is to call vma->vm_ops->fault here.  Marcelo/Chris?
>  Context is improving gfn_to_pfn() on the mmio path.

If the mapping is made using remap_pfn_range (or io_remap_pfn_range)
then there is are vm_ops attached by default.

gfn_to_pfn: vma 0xffff88022c50d498 start 0x7f4b0de9f000 pgoff 0x0
flags 0x844fb vm_ops 0x0000000000000000 fault 0x0000000000000000 file
0xffff88022e408000 major 250 minor 32

From linux/mm.h:

#define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct
page", just pure PFN */

Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-24  4:55                             ` Avi Kivity
  2009-08-26 10:22                               ` Avi Kivity
@ 2009-08-27  2:34                               ` Stephen Donnelly
  2009-08-27  4:08                                 ` Avi Kivity
  1 sibling, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-08-27  2:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On Mon, Aug 24, 2009 at 4:55 PM, Avi Kivity<avi@redhat.com> wrote:
> On 08/24/2009 12:59 AM, Stephen Donnelly wrote:
>>
>> On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity<avi@redhat.com>  wrote:
>>> On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>>>
>>>> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
>>>> presumably because (vma->vm_flags&    (VM_IO | VM_PFNMAP)).
>>>>
>>>> It takes then unlikely branch, and checks the vma, but I don't
>>>> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
>>>> PAGE_SHIFT) + vma->vm_pgoff;
>>>
>>> It's calculating the pfn according to pfnmap rules.
>>
>>  From what I understand this will only work when remapping 'main
>> memory', e.g. where the pgoff is equal to the physical page offset?
>> VMAs that remap IO memory will usually set pgoff to 0 for the start of
>> the mapping.
>
> If so, how do they calculate the pfn when mapping pages?  kvm needs to be
> able to do the same thing.

If the vma->vm_file is /dev/mem, then the pg_off will map to physical
addresses directly (at least on x86), and the calculation works. If
the vma is remapping io memory from a driver, then vma->vm_file will
point to the device node for that driver. Perhaps we can do a check
for this at least?

>>>> In my case addr == vma->vm_start, and vma->vm_pgoff == 0, so pfn ==0.
>>>
>>> How did you set up that vma?  It should point to the first pfn of your
>>> special memory area.
>>
>> The vma was created with a remap_pfn_range call from another driver.
>> Because this call sets VM_PFNMAP and VM_IO any get_user_pages(_fast)
>> calls will fail.
>>
>> In this case the host driver was actually just remapping host memory,
>> so I replaced the remap_pfn_range call with a nopage/fault vm_op. This
>> allows the get_user_pages_fast call to succeed, and the mapping now
>> works as expected. This is sufficient for my work at the moment.
>
> Well if the fix is correct we need it too.

The change is to the external (host) driver. If I submit my device for
inclusion upstream then the changes for that driver will be needed as
well but would not be part of the qemu-kvm tree.

>> I'm still not sure how genuine IO memory (mapped from a driver to
>> userspace with remap_pfn_range or io_remap_page_range) could be mapped
>> into kvm though.
>
> If it can be mapped to userspace, it can be mapped to kvm.  We just need to
> synchronize the rules.

We can definitely map it into userspace. The problem seems to be how
the kvm kernel module translates the guest pfn back to a host physical
address.

Is there a kernel equivalent of mmap?

Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-24  4:55                             ` Avi Kivity
@ 2009-08-26 10:22                               ` Avi Kivity
  2009-08-27  2:39                                 ` Stephen Donnelly
  2009-08-27  2:34                               ` Stephen Donnelly
  1 sibling, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2009-08-26 10:22 UTC (permalink / raw)
  To: Stephen Donnelly
  Cc: Cam Macdonell, kvm@vger.kernel.org list, Marcelo Tosatti, Chris Wright

On 08/24/2009 07:55 AM, Avi Kivity wrote:
> On 08/24/2009 12:59 AM, Stephen Donnelly wrote:
>> On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity<avi@redhat.com>  wrote:
>>> On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>>> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
>>>> presumably because (vma->vm_flags&    (VM_IO | VM_PFNMAP)).
>>>>
>>>> It takes then unlikely branch, and checks the vma, but I don't
>>>> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
>>>> PAGE_SHIFT) + vma->vm_pgoff;
>>> It's calculating the pfn according to pfnmap rules.
>>  From what I understand this will only work when remapping 'main
>> memory', e.g. where the pgoff is equal to the physical page offset?
>> VMAs that remap IO memory will usually set pgoff to 0 for the start of
>> the mapping.
>
> If so, how do they calculate the pfn when mapping pages?  kvm needs to 
> be able to do the same thing.

Maybe the simplest thing is to call vma->vm_ops->fault here.  
Marcelo/Chris?  Context is improving gfn_to_pfn() on the mmio path.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-23 21:59                           ` Stephen Donnelly
@ 2009-08-24  4:55                             ` Avi Kivity
  2009-08-26 10:22                               ` Avi Kivity
  2009-08-27  2:34                               ` Stephen Donnelly
  0 siblings, 2 replies; 31+ messages in thread
From: Avi Kivity @ 2009-08-24  4:55 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On 08/24/2009 12:59 AM, Stephen Donnelly wrote:
> On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity<avi@redhat.com>  wrote:
>    
>> On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>      
>>> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
>>> presumably because (vma->vm_flags&    (VM_IO | VM_PFNMAP)).
>>>
>>> It takes then unlikely branch, and checks the vma, but I don't
>>> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
>>> PAGE_SHIFT) + vma->vm_pgoff;
>>>        
>> It's calculating the pfn according to pfnmap rules.
>>      
>  From what I understand this will only work when remapping 'main
> memory', e.g. where the pgoff is equal to the physical page offset?
> VMAs that remap IO memory will usually set pgoff to 0 for the start of
> the mapping.
>    

If so, how do they calculate the pfn when mapping pages?  kvm needs to 
be able to do the same thing.

>>> In my case addr == vma->vm_start, and vma->vm_pgoff == 0, so pfn ==0.
>>>        
>> How did you set up that vma?  It should point to the first pfn of your
>> special memory area.
>>      
> The vma was created with a remap_pfn_range call from another driver.
> Because this call sets VM_PFNMAP and VM_IO any get_user_pages(_fast)
> calls will fail.
>
> In this case the host driver was actually just remapping host memory,
> so I replaced the remap_pfn_range call with a nopage/fault vm_op. This
> allows the get_user_pages_fast call to succeed, and the mapping now
> works as expected. This is sufficient for my work at the moment.
>
>    

Well if the fix is correct we need it too.

> I'm still not sure how genuine IO memory (mapped from a driver to
> userspace with remap_pfn_range or io_remap_page_range) could be mapped
> into kvm though.
>    

If it can be mapped to userspace, it can be mapped to kvm.  We just need 
to synchronize the rules.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-19 12:14                         ` Avi Kivity
@ 2009-08-23 21:59                           ` Stephen Donnelly
  2009-08-24  4:55                             ` Avi Kivity
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-08-23 21:59 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On Thu, Aug 20, 2009 at 12:14 AM, Avi Kivity<avi@redhat.com> wrote:
> On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>
>> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
>> presumably because (vma->vm_flags&  (VM_IO | VM_PFNMAP)).
>>
>> It takes then unlikely branch, and checks the vma, but I don't
>> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
>> PAGE_SHIFT) + vma->vm_pgoff;
>
> It's calculating the pfn according to pfnmap rules.

From what I understand this will only work when remapping 'main
memory', e.g. where the pgoff is equal to the physical page offset?
VMAs that remap IO memory will usually set pgoff to 0 for the start of
the mapping.

>> In my case addr == vma->vm_start, and vma->vm_pgoff == 0, so pfn ==0.
>
> How did you set up that vma?  It should point to the first pfn of your
> special memory area.

The vma was created with a remap_pfn_range call from another driver.
Because this call sets VM_PFNMAP and VM_IO any get_user_pages(_fast)
calls will fail.

In this case the host driver was actually just remapping host memory,
so I replaced the remap_pfn_range call with a nopage/fault vm_op. This
allows the get_user_pages_fast call to succeed, and the mapping now
works as expected. This is sufficient for my work at the moment.

I'm still not sure how genuine IO memory (mapped from a driver to
userspace with remap_pfn_range or io_remap_page_range) could be mapped
into kvm though.

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-08-13  4:07                       ` Stephen Donnelly
@ 2009-08-19 12:14                         ` Avi Kivity
  2009-08-23 21:59                           ` Stephen Donnelly
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2009-08-19 12:14 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On 08/13/2009 07:07 AM, Stephen Donnelly wrote:
>>> A less intrusive, but uglier, alternative is to call
>>> qemu_ram_alloc() and them mmap(MAP_FIXED) on top of that.
>>>        
>> I did try this, but ended up with a BUG on the host in
>> /var/lib/dkms/kvm/84/build/x86/kvm_main.c:1266 gfn_to_pfn() on the
>> line "BUG_ON(!kvm_is_mmio_pfn(pfn));" when the guest accesses the bar.
>>      
> It looks to me from the call trace like the guest is writing to the
> memory, gfn_to_pfn() from mmu_guess_page_from_pte_write() gets
> confused because of the mapping.
>
> Inside gfn_to_pfn:
>
> addr = gfn_to_hva(kvm, gfn); correctly returns the host virtual
> address of the external memory mapping.
>
> npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
> presumably because (vma->vm_flags&  (VM_IO | VM_PFNMAP)).
>
> It takes then unlikely branch, and checks the vma, but I don't
> understand what it is doing here: pfn = ((addr - vma->vm_start)>>
> PAGE_SHIFT) + vma->vm_pgoff;
>    

It's calculating the pfn according to pfnmap rules.

> In my case addr == vma->vm_start, and vma->vm_pgoff == 0, so pfn ==0.
>    

How did you set up that vma?  It should point to the first pfn of your 
special memory area.

> BUG_ON(!kvm_is_mmio_pfn(pfn)) then triggers.
>    

That's correct behaviour.  We expect a page that is not controlled by 
the kernel here.

> Instrumenting inside gfn_to_pfn I see:
> gfn_to_pfn: gfn f2010 gpte f2010000 hva 7f3eac2b0000 pfn 0 npages -14
> gfn_to_pfn: vma ffff88022142af18 start 7f3eac2b0000 pgoff 0
>
> Any suggestions what should be happening here?
>    

Well, we need to understand how that vma came into being and why pgoff == 0.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-28 23:06                     ` Stephen Donnelly
@ 2009-08-13  4:07                       ` Stephen Donnelly
  2009-08-19 12:14                         ` Avi Kivity
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-08-13  4:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On Wed, Jul 29, 2009 at 11:06 AM, Stephen Donnelly<sfdonnelly@gmail.com> wrote:
> On Tue, Jul 28, 2009 at 8:54 PM, Avi Kivity<avi@redhat.com> wrote:
>> On 07/28/2009 12:32 AM, Stephen Donnelly wrote:

>> You need a variant of qemu_ram_alloc() that accepts an fd and offset and
>> mmaps that.

I had a go at this, creating qemu_ram_mmap() using qemu_ram_alloc() as
a template, but I'm still seeing the same BUG.

>> A less intrusive, but uglier, alternative is to call
>> qemu_ram_alloc() and them mmap(MAP_FIXED) on top of that.
>
> I did try this, but ended up with a BUG on the host in
> /var/lib/dkms/kvm/84/build/x86/kvm_main.c:1266 gfn_to_pfn() on the
> line "BUG_ON(!kvm_is_mmio_pfn(pfn));" when the guest accesses the bar.

It looks to me from the call trace like the guest is writing to the
memory, gfn_to_pfn() from mmu_guess_page_from_pte_write() gets
confused because of the mapping.

Inside gfn_to_pfn:

addr = gfn_to_hva(kvm, gfn); correctly returns the host virtual
address of the external memory mapping.

npages = get_user_pages_fast(addr, 1, 1, page); returns -EFAULT,
presumably because (vma->vm_flags & (VM_IO | VM_PFNMAP)).

It takes then unlikely branch, and checks the vma, but I don't
understand what it is doing here: pfn = ((addr - vma->vm_start) >>
PAGE_SHIFT) + vma->vm_pgoff;

In my case addr == vma->vm_start, and vma->vm_pgoff == 0, so pfn ==0.
BUG_ON(!kvm_is_mmio_pfn(pfn)) then triggers.

Instrumenting inside gfn_to_pfn I see:
gfn_to_pfn: gfn f2010 gpte f2010000 hva 7f3eac2b0000 pfn 0 npages -14
gfn_to_pfn: vma ffff88022142af18 start 7f3eac2b0000 pgoff 0

Any suggestions what should be happening here?

[ 1826.807846] ------------[ cut here ]------------
[ 1826.807907] kernel BUG at
/build/buildd/linux-2.6.28/arch/x86/kvm/../../../virt/kvm/kvm_main.c:1001!
[ 1826.807985] invalid opcode: 0000 [#1] SMP
[ 1826.808102] last sysfs file: /sys/module/nf_nat/initstate
[ 1826.808159] Dumping ftrace buffer:
[ 1826.808213]    (ftrace buffer empty)
[ 1826.808266] CPU 3
[ 1826.808347] Modules linked in: tun softcard_driver(P)
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ip
v4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables
x_tables kvm_intel kvm input_polldev video output
 bridge stp lp parport iTCO_wdt iTCO_vendor_support psmouse pcspkr
serio_raw joydev i5000_edac edac_core shpchp e1000 us
bhid usb_storage e1000e floppy raid10 raid456 async_xor async_memcpy
async_tx xor raid1 raid0 multipath linear fbcon til
eblit font bitblit softcursor
[ 1826.810269] Pid: 9353, comm: qemu-system-x86 Tainted: P
2.6.28-13-server #45-Ubuntu
[ 1826.810344] RIP: 0010:[<ffffffffa01da853>]  [<ffffffffa01da853>]
gfn_to_pfn+0x153/0x160 [kvm]
[ 1826.810463] RSP: 0018:ffff88022d857958  EFLAGS: 00010246
[ 1826.810518] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88022d4d32a0
[ 1826.810577] RDX: 0000000000000000 RSI: 0000000000000282 RDI: 0000000000000000
[ 1826.810636] RBP: ffff88022d857978 R08: 0000000000000001 R09: ffff88022d857958
[ 1826.810694] R10: 0000000000000003 R11: 0000000000000001 R12: 00000000000f2010
[ 1826.810753] R13: ffff880212cb0000 R14: ffff880212cb0000 R15: ffff880212cb0000
[ 1826.810812] FS:  00007f5253bfd950(0000) GS:ffff88022f1fa380(0000)
knlGS:0000000000000000
[ 1826.810887] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 1826.810943] CR2: 00000000b7eb2044 CR3: 0000000212cac000 CR4: 00000000000026a0
[ 1826.811002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1826.811061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1826.811120] Process qemu-system-x86 (pid: 9353, threadinfo
ffff88022d856000, task ffff88022e0cd980)
[ 1826.811196] Stack:
[ 1826.811246]  ffff88022d857968 0000000000000004 0000000000000004
0000000000000000
[ 1826.811401]  ffff88022d8579b8 ffffffffa01e7ccf ffff88022d8579b8
00000000f2010073
[ 1826.811634]  0000000000000004 ffff880212cb15b0 000000001f402b00
ffff880212cb0000
[ 1826.811913] Call Trace:
[ 1826.811964]  [<ffffffffa01e7ccf>]
mmu_guess_page_from_pte_write+0xaf/0x190 [kvm]
[ 1826.812076]  [<ffffffffa01e820f>] kvm_mmu_pte_write+0x3f/0x4f0 [kvm]
[ 1826.812172]  [<ffffffffa01da9f1>] ? mark_page_dirty+0x11/0x90 [kvm]
[ 1826.812268]  [<ffffffffa01dabe8>] ? kvm_write_guest+0x48/0x90 [kvm]
[ 1826.812364]  [<ffffffffa01de427>] emulator_write_phys+0x47/0x70 [kvm]
[ 1826.812460]  [<ffffffffa01e0e26>]
emulator_write_emulated_onepage+0x66/0x120 [kvm]
[ 1826.812571]  [<ffffffffa01e0f50>] emulator_write_emulated+0x70/0x90 [kvm]
[ 1826.812668]  [<ffffffffa01eb36f>] x86_emulate_insn+0x4ef/0x32e0 [kvm]
[ 1826.812764]  [<ffffffffa01e950e>] ? do_insn_fetch+0x8e/0x100 [kvm]
[ 1826.812860]  [<ffffffffa01e9454>] ? seg_override_base+0x24/0x50 [kvm]
[ 1826.812955]  [<ffffffffa01eacb0>] ? x86_decode_insn+0x7a0/0x970 [kvm]
[ 1826.813051]  [<ffffffffa01e221f>] emulate_instruction+0x15f/0x2f0 [kvm]
[ 1826.813148]  [<ffffffffa01e7bd5>] kvm_mmu_page_fault+0x65/0xb0 [kvm]
[ 1826.813243]  [<ffffffffa020ac5f>] handle_exception+0x2ef/0x360 [kvm_intel]
[ 1826.813338]  [<ffffffffa01eb0a3>] ? x86_emulate_insn+0x223/0x32e0 [kvm]
[ 1826.813434]  [<ffffffffa0209c25>] kvm_handle_exit+0xb5/0x1d0 [kvm_intel]
[ 1826.813526]  [<ffffffff80699643>] ? __down_read+0xc3/0xce
[ 1826.813618]  [<ffffffffa01dd958>] vcpu_enter_guest+0x1f8/0x400 [kvm]
[ 1826.813714]  [<ffffffffa01dfc29>] __vcpu_run+0x69/0x2d0 [kvm]
[ 1826.813751]  [<ffffffffa01e38ea>] kvm_arch_vcpu_ioctl_run+0x8a/0x1f0 [kvm]
[ 1826.813751]  [<ffffffffa01d8582>] kvm_vcpu_ioctl+0x2e2/0x5a0 [kvm]
[ 1826.813751]  [<ffffffff802f6091>] vfs_ioctl+0x31/0xa0
[ 1826.813751]  [<ffffffff802f6445>] do_vfs_ioctl+0x75/0x230
[ 1826.813751]  [<ffffffff802e8216>] ? generic_file_llseek+0x56/0x70
[ 1826.813751]  [<ffffffff802f6699>] sys_ioctl+0x99/0xa0
[ 1826.813751]  [<ffffffff802e70d2>] ? sys_lseek+0x52/0x90
[ 1826.813751]  [<ffffffff8021253a>] system_call_fastpath+0x16/0x1b
[ 1826.813751] Code: 00 00 65 48 8b 04 25 00 00 00 00 48 8b b8 38 02
00 00 48 83 c7 60 e8 dd 23 09 e0 48 89 df e8 45 fe ff ff 85 c0 0f 85
08 ff ff ff <0f> 0b eb fe 66 0f 1f 84 00 00 00 00 00 55 65 8b 14 25 24
00 00
[ 1826.813751] RIP  [<ffffffffa01da853>] gfn_to_pfn+0x153/0x160 [kvm]
[ 1826.813751]  RSP <ffff88022d857958>
[ 1826.816899] ---[ end trace 2437a1197b66fb45 ]---

Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-29 23:52                     ` Cam Macdonell
@ 2009-07-30  9:31                       ` Avi Kivity
  0 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2009-07-30  9:31 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: Stephen Donnelly, kvm@vger.kernel.org list

On 07/30/2009 02:52 AM, Cam Macdonell wrote:
>> You need a variant of qemu_ram_alloc() that accepts an fd and offset 
>> and mmaps that.  A less intrusive, but uglier, alternative is to call 
>> qemu_ram_alloc() and them mmap(MAP_FIXED) on top of that.
>
>
> Hi Avi,
>
> I noticed that the region of memory being allocated for shared memory 
> using qemu_ram_alloc gets added to the total RAM of the system 
> (according to /proc/meminfo).  I'm wondering if this is normal/OK 
> since memory for the shared memory device (and similarly VGA RAM) is 
> not intended to be used as regular RAM.

qemu_ram_alloc() and the guets /proc/meminfo are totally disconnected.  
I don't understand how that happened.

>
> Should memory of devices be reported as part of MemTotal or is 
> something wrong in my use of qemu_ram_alloc()?

You can call qemu_ram_alloc() all you like.  Guest memory is determined 
by the e820 map, which is in turn determined by the -m parameter.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-28  8:54                   ` Avi Kivity
  2009-07-28 23:06                     ` Stephen Donnelly
@ 2009-07-29 23:52                     ` Cam Macdonell
  2009-07-30  9:31                       ` Avi Kivity
  1 sibling, 1 reply; 31+ messages in thread
From: Cam Macdonell @ 2009-07-29 23:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Stephen Donnelly, kvm@vger.kernel.org list

Avi Kivity wrote:
> On 07/28/2009 12:32 AM, Stephen Donnelly wrote:
>>>> What I don't understand is how to turn the host address returned from
>>>> mmap into a ram_addr_t to pass to pci_register_bar.
>>>>        
>>> Memory must be allocated using the qemu RAM functions.
>>>      
>>
>> That seems to be the problem. The memory cannot be allocated by
>> qemu_ram_alloc, because it is coming from the mmap call. The memory is
>> already allocated outside the qemu process. mmap can indicate where in
>> the qemu process address space the local mapping should be, but
>> mapping it 'on top' of memory allocated with qemu_ram_alloc doesn't
>> seem to work (I get a BUG in gfn_to_pfn).
>>    
> 
> You need a variant of qemu_ram_alloc() that accepts an fd and offset and 
> mmaps that.  A less intrusive, but uglier, alternative is to call 
> qemu_ram_alloc() and them mmap(MAP_FIXED) on top of that.

Hi Avi,

I noticed that the region of memory being allocated for shared memory 
using qemu_ram_alloc gets added to the total RAM of the system 
(according to /proc/meminfo).  I'm wondering if this is normal/OK since 
memory for the shared memory device (and similarly VGA RAM) is not 
intended to be used as regular RAM.

Should memory of devices be reported as part of MemTotal or is something 
wrong in my use of qemu_ram_alloc()?

Thanks,
Cam

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-28  8:54                   ` Avi Kivity
@ 2009-07-28 23:06                     ` Stephen Donnelly
  2009-08-13  4:07                       ` Stephen Donnelly
  2009-07-29 23:52                     ` Cam Macdonell
  1 sibling, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-28 23:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On Tue, Jul 28, 2009 at 8:54 PM, Avi Kivity<avi@redhat.com> wrote:
> On 07/28/2009 12:32 AM, Stephen Donnelly wrote:
>>>>
>>>> What I don't understand is how to turn the host address returned from
>>>> mmap into a ram_addr_t to pass to pci_register_bar.
>>>
>>> Memory must be allocated using the qemu RAM functions.
>>
>> That seems to be the problem. The memory cannot be allocated by
>> qemu_ram_alloc, because it is coming from the mmap call. The memory is
>> already allocated outside the qemu process. mmap can indicate where in
>> the qemu process address space the local mapping should be, but
>> mapping it 'on top' of memory allocated with qemu_ram_alloc doesn't
>> seem to work (I get a BUG in gfn_to_pfn).
>
> You need a variant of qemu_ram_alloc() that accepts an fd and offset and
> mmaps that.

Okay, it sounds like a function to do this is not currently available.
That confirms my understanding at least. I will take a look but I
don't think I understand the memory management well enough to write
this myself.

> A less intrusive, but uglier, alternative is to call
> qemu_ram_alloc() and them mmap(MAP_FIXED) on top of that.

I did try this, but ended up with a BUG on the host in
/var/lib/dkms/kvm/84/build/x86/kvm_main.c:1266 gfn_to_pfn() on the
line "BUG_ON(!kvm_is_mmio_pfn(pfn));" when the guest accesses the bar.

[1847926.363458] ------------[ cut here ]------------
[1847926.363464] kernel BUG at /var/lib/dkms/kvm/84/build/x86/kvm_main.c:1266!
[1847926.363466] invalid opcode: 0000 [#1] SMP
[1847926.363470] last sysfs file:
/sys/devices/pci0000:00/0000:00:1c.5/0000:02:00.0/net/eth0/statistics/collisions
[1847926.363473] Dumping ftrace buffer:
[1847926.363476]    (ftrace buffer empty)
[1847926.363478] Modules linked in: softcard_driver(P) nls_iso8859_1
vfat fat usb_storage tun nls_utf8 nls_cp437 cifs nfs lockd nfs_acl
sunrpc binfmt_misc ppdev bnep ipt_MASQUERADE iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm
video output input_polldev dm_crypt sbp2 lp parport snd_usb_audio
snd_pcm_oss snd_hda_intel snd_mixer_oss snd_pcm snd_seq_dummy
snd_usb_lib snd_seq_oss snd_seq_midi snd_seq_midi_event uvcvideo
compat_ioctl32 snd_rawmidi snd_seq iTCO_wdt videodev snd_timer
snd_seq_device iTCO_vendor_support ftdi_sio usbhid v4l1_compat
snd_hwdep intel_agp nvidia(P) usbserial snd soundcore snd_page_alloc
agpgart pcspkr ohci1394 ieee1394 atl1 mii floppy fbcon tileblit font
bitblit softcursor [last unloaded: softcard_driver]
[1847926.363539]
[1847926.363542] Pid: 31516, comm: qemu-system-x86 Tainted: P
 (2.6.28-13-generic #44-Ubuntu) P5K
[1847926.363544] EIP: 0060:[<f7f5961f>] EFLAGS: 00010246 CPU: 1
[1847926.363556] EIP is at gfn_to_pfn+0xff/0x110 [kvm]
[1847926.363558] EAX: 00000000 EBX: 00000000 ECX: f40d30c8 EDX: 00000000
[1847926.363560] ESI: d0baa000 EDI: 00000001 EBP: f2cddbbc ESP: f2cddbac
[1847926.363562]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[1847926.363564] Process qemu-system-x86 (pid: 31516, ti=f2cdc000
task=f163d7f0 task.ti=f2cdc000)
[1847926.363566] Stack:
[1847926.363567]  f2cddbb0 f2cddbc8 00000000 000f2010 f2cddc7c
f7f65f00 00000004 f2cddbd4
[1847926.363573]  f7f5829f 00000004 f2cddbf4 f7f582ec 00000df4
00000004 d0baa000 f185a370
[1847926.363579]  df402c00 0001f719 f2cddc4c f7f66858 f2cddc40
00000004 0001f95f 00000000
[1847926.363585] Call Trace:
[1847926.363588]  [<f7f65f00>] ? kvm_mmu_pte_write+0x160/0x9a0 [kvm]
[1847926.363598]  [<f7f5829f>] ? kvm_read_guest_page+0x2f/0x40 [kvm]
[1847926.363607]  [<f7f582ec>] ? kvm_read_guest+0x3c/0x70 [kvm]
[1847926.363616]  [<f7f66858>] ? paging32_walk_addr+0x118/0x2d0 [kvm]
[1847926.363625]  [<f7f59360>] ? mark_page_dirty+0x10/0x70 [kvm]
[1847926.363634]  [<f7f59412>] ? kvm_write_guest_page+0x52/0x60 [kvm]
[1847926.363643]  [<f7f5becf>] ? emulator_write_phys+0x4f/0x70 [kvm]
[1847926.363652]  [<f7f5dcc8>] ?
emulator_write_emulated_onepage+0x58/0x130 [kvm]
[1847926.363661]  [<f7f5ddf9>] ? emulator_write_emulated+0x59/0x70 [kvm]
[1847926.363674]  [<f7f69d84>] ? x86_emulate_insn+0x414/0x2650 [kvm]
[1847926.363684]  [<c011f714>] ? handle_vm86_fault+0x4c4/0x740
[1847926.363690]  [<c011f714>] ? handle_vm86_fault+0x4c4/0x740
[1847926.363699]  [<f7f681e6>] ? do_insn_fetch+0x76/0xd0 [kvm]
[1847926.363712]  [<c011f716>] ? handle_vm86_fault+0x4c6/0x740
[1847926.363715]  [<c011f716>] ? handle_vm86_fault+0x4c6/0x740
[1847926.363719]  [<f7f6909a>] ? x86_decode_insn+0x54a/0xe20 [kvm]
[1847926.363732]  [<f7f5ecfc>] ? emulate_instruction+0x12c/0x2a0 [kvm]
[1847926.363741]  [<f7f65988>] ? kvm_mmu_page_fault+0x58/0xa0 [kvm]
[1847926.363750]  [<f7e8797a>] ? handle_exception+0x35a/0x400 [kvm_intel]
[1847926.363755]  [<f7e83e97>] ? handle_interrupt_window+0x27/0xc0 [kvm_intel]
[1847926.363760]  [<c011f714>] ? handle_vm86_fault+0x4c4/0x740
[1847926.363763]  [<f7e864e9>] ? kvm_handle_exit+0xd9/0x270 [kvm_intel]
[1847926.363768]  [<f7e87c87>] ? vmx_vcpu_run+0x137/0xa4a [kvm_intel]
[1847926.363772]  [<f7f6d767>] ? kvm_apic_has_interrupt+0x37/0xb0 [kvm]
[1847926.363781]  [<f7f6c0b7>] ? kvm_cpu_has_interrupt+0x27/0x40 [kvm]
[1847926.363790]  [<f7f61306>] ? kvm_arch_vcpu_ioctl_run+0x626/0xb20 [kvm]
[1847926.363799]  [<c015da68>] ? futex_wait+0x358/0x440
[1847926.363804]  [<f7f576e5>] ? kvm_vcpu_ioctl+0x395/0x490 [kvm]
[1847926.363812]  [<c04fec68>] ? _spin_lock+0x8/0x10
[1847926.363815]  [<c015d508>] ? futex_wake+0xc8/0xf0
[1847926.363819]  [<f7f57350>] ? kvm_vcpu_ioctl+0x0/0x490 [kvm]
[1847926.363827]  [<c01ca1d8>] ? vfs_ioctl+0x28/0x90
[1847926.363831]  [<c01ca6be>] ? do_vfs_ioctl+0x5e/0x200
[1847926.363834]  [<c01ca8c3>] ? sys_ioctl+0x63/0x70
[1847926.363836]  [<c0103f6b>] ? sysenter_do_call+0x12/0x2f
[1847926.363840] Code: 29 d3 c1 eb 0c 03 58 44 64 a1 00 e0 7a c0 8b 80
cc 01 00 00 83 c0 34 e8 b0 9b 1f c8 89 d8 e8 89 fc ff ff 85 c0 0f 85
50 ff ff ff <0f> 0b eb fe 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89
e5 e8
[1847926.363873] EIP: [<f7f5961f>] gfn_to_pfn+0xff/0x110 [kvm] SS:ESP
0068:f2cddbac
[1847926.363885] ---[ end trace 314ce851a956cf3c ]---

pseudo code in my pci init function is:
{
offset = qemu_ram_alloc(64*1024);
ptr = qemu_get_ram_ptr(offset);

fd = open(charfile, O_RDWR);

mmap(ptr, 64*1024, PROT_READ | PROT_WRITE, MAP_SHARED|MAP_FIXED, fd, 0))

pci_register_bar((PCIDevice *)d, 0, 1024*64, PCI_ADDRESS_SPACE_MEM, mmio_map);
}

mmio_map() {
    cpu_register_physical_memory(addr + 0, 64*1024, offset);
}

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-27 21:32                 ` Stephen Donnelly
@ 2009-07-28  8:54                   ` Avi Kivity
  2009-07-28 23:06                     ` Stephen Donnelly
  2009-07-29 23:52                     ` Cam Macdonell
  0 siblings, 2 replies; 31+ messages in thread
From: Avi Kivity @ 2009-07-28  8:54 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Cam Macdonell, kvm@vger.kernel.org list

On 07/28/2009 12:32 AM, Stephen Donnelly wrote:
>>> What I don't understand is how to turn the host address returned from
>>> mmap into a ram_addr_t to pass to pci_register_bar.
>>>        
>> Memory must be allocated using the qemu RAM functions.
>>      
>
> That seems to be the problem. The memory cannot be allocated by
> qemu_ram_alloc, because it is coming from the mmap call. The memory is
> already allocated outside the qemu process. mmap can indicate where in
> the qemu process address space the local mapping should be, but
> mapping it 'on top' of memory allocated with qemu_ram_alloc doesn't
> seem to work (I get a BUG in gfn_to_pfn).
>    

You need a variant of qemu_ram_alloc() that accepts an fd and offset and 
mmaps that.  A less intrusive, but uglier, alternative is to call 
qemu_ram_alloc() and them mmap(MAP_FIXED) on top of that.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-27 14:48               ` Cam Macdonell
@ 2009-07-27 21:32                 ` Stephen Donnelly
  2009-07-28  8:54                   ` Avi Kivity
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-27 21:32 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: Avi Kivity, kvm@vger.kernel.org list

Hi Cam,

> Sorry I haven't answered your email from last Thursday.  I'll answer it
> shortly.

Thanks, I'm still chipping away at it slowly.

>> On Thu, Jul 9, 2009 at 6:01 PM, Cam Macdonell<cam@cs.ualberta.ca> wrote:
>>
>>> The memory for the device allocated as a POSIX shared memory object and
>>> then
>>> mmapped on to the allocated BAR region in Qemu's allocated memory.
>>>  That's
>>> actually one spot that needs a bit of fixing by passing the already
>>> allocated memory object to qemu instead of mmapping on to it.
>>
>> If you work out how to use pre-existing host memory rather than
>> allocating it inside qemu I would be interested.
>
> How is the host memory pre-existing?

It comes from outside qemu, it is mapped in via mmap.

>> I would like to have qemu mmap memory from a host char driver, and
>> then in turn register that mapping as a PCI BAR for the guest device.
>> (I know this sounds like pci pass-through, but it isn't.)
>
> In my setup, qemu just calls mmap on the shared memory object that was
> opened.  So I *think* that switching the shm_open(...) to
> open("/dev/chardev"), might be all that's necessary as long as your char
> device handles mmapping.

It does, but it maps memory into the user program rather than out.

>> What I don't understand is how to turn the host address returned from
>> mmap into a ram_addr_t to pass to pci_register_bar.
>
> Memory must be allocated using the qemu RAM functions.

That seems to be the problem. The memory cannot be allocated by
qemu_ram_alloc, because it is coming from the mmap call. The memory is
already allocated outside the qemu process. mmap can indicate where in
the qemu process address space the local mapping should be, but
mapping it 'on top' of memory allocated with qemu_ram_alloc doesn't
seem to work (I get a BUG in gfn_to_pfn).

>  Look at
> qemu_ram_alloc() and qemu_get_ram_ptr() which are a two step process that
> allocate the memory.  Then notice that the ivshmem_ptr is mmapped on to the
> memory that is returned from the qemu_get_ram_ptr.
>
> pci_register_bar calls a function (the last parameter passed to it) that in
> turn calls cpu_register_physical_memory which registers the allocated memory
> (accessed a s->ivshmem_ptr) as the BAR.

Right, that seems to make sense for your application where you
allocate the memory in qemu and then share it externally via shm.

Have you thought about how to use a shm file that has already been
allocated by another application? I think you mentioned this as a
feature you were going to look at in one of your list posts.

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
       [not found]             ` <5f370d430907262256rd7f9fdalfbbec1f9492ce86@mail.gmail.com>
@ 2009-07-27 14:48               ` Cam Macdonell
  2009-07-27 21:32                 ` Stephen Donnelly
  0 siblings, 1 reply; 31+ messages in thread
From: Cam Macdonell @ 2009-07-27 14:48 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Avi Kivity, kvm@vger.kernel.org list

Stephen Donnelly wrote:
> Hi Cam,

Hi Steve,

Sorry I haven't answered your email from last Thursday.  I'll answer it 
shortly.

> 
> On Thu, Jul 9, 2009 at 6:01 PM, Cam Macdonell<cam@cs.ualberta.ca> wrote:
> 
>> The memory for the device allocated as a POSIX shared memory object and then
>> mmapped on to the allocated BAR region in Qemu's allocated memory.  That's
>> actually one spot that needs a bit of fixing by passing the already
>> allocated memory object to qemu instead of mmapping on to it.
> 
> If you work out how to use pre-existing host memory rather than
> allocating it inside qemu I would be interested.

How is the host memory pre-existing?

> 
> I would like to have qemu mmap memory from a host char driver, and
> then in turn register that mapping as a PCI BAR for the guest device.
> (I know this sounds like pci pass-through, but it isn't.)

In my setup, qemu just calls mmap on the shared memory object that was 
opened.  So I *think* that switching the shm_open(...) to 
open("/dev/chardev"), might be all that's necessary as long as your char 
device handles mmapping.

> What I don't understand is how to turn the host address returned from
> mmap into a ram_addr_t to pass to pci_register_bar.

Memory must be allocated using the qemu RAM functions.  Look at 
qemu_ram_alloc() and qemu_get_ram_ptr() which are a two step process 
that allocate the memory.  Then notice that the ivshmem_ptr is mmapped 
on to the memory that is returned from the qemu_get_ram_ptr.

pci_register_bar calls a function (the last parameter passed to it) that 
in turn calls cpu_register_physical_memory which registers the allocated 
memory (accessed a s->ivshmem_ptr) as the BAR.

Let me know if you have any more questions,
Cam


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-10 17:03               ` Cam Macdonell
@ 2009-07-12 21:28                 ` Stephen Donnelly
  0 siblings, 0 replies; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-12 21:28 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: kvm

On Sat, Jul 11, 2009 at 5:03 AM, Cam Macdonell<cam@cs.ualberta.ca> wrote:
> Oops, I realize now that I passed the driver patch both times.  Here is the
> old patch.
>
> http://patchwork.kernel.org/patch/22363/
>
> What are you compiling against?  the git tree or a particular version? The
> above patch won't compile against the latest git tree due to changes to how
> BARs are setup in Qemu.  I can send you a patch for the latest tree if you
> need it.

Thanks Cam, I will take a look at this code.

At the moment I have cloned the tree so am intending to work at the
tip. If you have a patch for the latest tree that would be great.

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-09 22:38             ` Stephen Donnelly
@ 2009-07-10 17:03               ` Cam Macdonell
  2009-07-12 21:28                 ` Stephen Donnelly
  0 siblings, 1 reply; 31+ messages in thread
From: Cam Macdonell @ 2009-07-10 17:03 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: kvm

Stephen Donnelly wrote:
> On Thu, Jul 9, 2009 at 6:01 PM, Cam Macdonell<cam@cs.ualberta.ca> wrote:
> 
>>> Is there a corresponding qemu patch for the backend to the guest pci
>>> driver?
>> Oops right.   For some reason I can't my driver patch in patchwork.
>>
>> http://kerneltrap.org/mailarchive/linux-kvm/2009/5/7/5665734
> 
> Thanks for the link, I have read through the thread now. It seems very
> relevant to what I am doing. Have you found a link to your qemu-kvm
> backend patches? Or are you running your own git tree? I don't really
> know where to look.

Oops, I realize now that I passed the driver patch both times.  Here is 
the old patch.

http://patchwork.kernel.org/patch/22363/

What are you compiling against?  the git tree or a particular version? 
The above patch won't compile against the latest git tree due to changes 
to how BARs are setup in Qemu.  I can send you a patch for the latest 
tree if you need it.

Cam

> 
>>> I'm curious how the buffer memory is allocated and how BAR
>>> accesses are handled from the host side.
>> The memory for the device allocated as a POSIX shared memory object and then
>> mmapped on to the allocated BAR region in Qemu's allocated memory.  That's
>> actually one spot that needs a bit of fixing by passing the already
>> allocated memory object to qemu instead of mmapping on to it.
> 
> Right, I would be passing the memory in pre-allocated as well, but
> should be a relatively simple change.
> 
> Regards,
> Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-09  6:01           ` Cam Macdonell
@ 2009-07-09 22:38             ` Stephen Donnelly
  2009-07-10 17:03               ` Cam Macdonell
       [not found]             ` <5f370d430907262256rd7f9fdalfbbec1f9492ce86@mail.gmail.com>
  1 sibling, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-09 22:38 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: Avi Kivity, kvm

On Thu, Jul 9, 2009 at 6:01 PM, Cam Macdonell<cam@cs.ualberta.ca> wrote:

>> Is there a corresponding qemu patch for the backend to the guest pci
>> driver?
>
> Oops right.   For some reason I can't my driver patch in patchwork.
>
> http://kerneltrap.org/mailarchive/linux-kvm/2009/5/7/5665734

Thanks for the link, I have read through the thread now. It seems very
relevant to what I am doing. Have you found a link to your qemu-kvm
backend patches? Or are you running your own git tree? I don't really
know where to look.

>> I'm curious how the buffer memory is allocated and how BAR
>> accesses are handled from the host side.
>
> The memory for the device allocated as a POSIX shared memory object and then
> mmapped on to the allocated BAR region in Qemu's allocated memory.  That's
> actually one spot that needs a bit of fixing by passing the already
> allocated memory object to qemu instead of mmapping on to it.

Right, I would be passing the memory in pre-allocated as well, but
should be a relatively simple change.

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-08 21:33       ` Stephen Donnelly
@ 2009-07-09  8:10         ` Avi Kivity
  0 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2009-07-09  8:10 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: kvm

On 07/09/2009 12:33 AM, Stephen Donnelly wrote:
>> Shared memory is fully coherent.  You can use the ordinary x86 bus lock
>> operations for concurrent read-modify-write access, and the memory barrier
>> instructions to prevent reordering.  Just like ordinary shared memory.
>>      
>
> Okay, I think I was confused by the 'dirty' code. Is that just to do
> with migration?
>    

Migration and reducing vga updates.

>> static void cirrus_pci_lfb_map(PCIDevice *d, int region_num,
>>                                uint32_t addr, uint32_t size, int type)
>> {
>>
>> ...
>>
>>     /* XXX: add byte swapping apertures */
>>     cpu_register_physical_memory(addr, s->vga.vram_size,
>>                                  s->cirrus_linear_io_addr);
>>
>> This function is called whenever the guest updates the BAR.
>>      
>
> So guest accesses to the LFB PCI_BAR trigger the cirrus_linear
> functions, which set dirty on writes and allow 'side effect' handling
> for reads if required? In my case there should be no side effects, so
> it could be quite simple. I wonder about the cost of the callbacks on
> each access though, am I still missing something?
>    

Sorry, I quoted the wrong part.  vga is complicated because some modes 
do need callbacks on reads and writes, and others can use normal RAM 
(with dirty tracking).

The real direct mapping code is:

     static void map_linear_vram(CirrusVGAState *s)
     {
         vga_dirty_log_stop(&s->vga);
         if (!s->vga.map_addr && s->vga.lfb_addr && s->vga.lfb_end) {
             s->vga.map_addr = s->vga.lfb_addr;
             s->vga.map_end = s->vga.lfb_end;
             cpu_register_physical_memory(s->vga.map_addr, 
s->vga.map_end - s->vga.map_addr, s->vga.vram_offset);
         }

s->vga.vram_offset is a ram_addr_t describing the the vga framebuffer.  
You're much better off reading Cam's code as it's much simpler and 
closer to what you want to do (possibly you can use it as is).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-08 22:01         ` Stephen Donnelly
@ 2009-07-09  6:01           ` Cam Macdonell
  2009-07-09 22:38             ` Stephen Donnelly
       [not found]             ` <5f370d430907262256rd7f9fdalfbbec1f9492ce86@mail.gmail.com>
  0 siblings, 2 replies; 31+ messages in thread
From: Cam Macdonell @ 2009-07-09  6:01 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: Avi Kivity, kvm


On 8-Jul-09, at 4:01 PM, Stephen Donnelly wrote:

> On Thu, Jul 9, 2009 at 9:45 AM, Cam Macdonell<cam@cs.ualberta.ca>  
> wrote:
>> Hi Stephen,
>>
>> Here is the latest patch that supports interrupts.  I am currently  
>> working
>> on a broadcast mechanism that should be ready fairly soon.
>>
>> http://patchwork.kernel.org/patch/22368/
>>
>> I have some test scripts that can demonstrate how to use the memory  
>> between
>> guest/host and guest/guest.  Let me know if you would like me to  
>> send them
>> to you.
>
> Hi Cam,
>
> Thanks for the pointer. That makes perfect sense, I'm familiar with
> PCI drivers so that's fine.
>
> Is there a corresponding qemu patch for the backend to the guest pci
> driver?

Oops right.   For some reason I can't my driver patch in patchwork.

http://kerneltrap.org/mailarchive/linux-kvm/2009/5/7/5665734

> I'm curious how the buffer memory is allocated and how BAR
> accesses are handled from the host side.

The memory for the device allocated as a POSIX shared memory object  
and then mmapped on to the allocated BAR region in Qemu's allocated  
memory.  That's actually one spot that needs a bit of fixing by  
passing the already allocated memory object to qemu instead of  
mmapping on to it.

Cam



-----------------------------------------------
A. Cameron Macdonell
Ph.D. Student
Department of Computing Science
University of Alberta
cam@cs.ualberta.ca




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-08 21:45       ` Cam Macdonell
@ 2009-07-08 22:01         ` Stephen Donnelly
  2009-07-09  6:01           ` Cam Macdonell
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-08 22:01 UTC (permalink / raw)
  To: Cam Macdonell; +Cc: Avi Kivity, kvm

On Thu, Jul 9, 2009 at 9:45 AM, Cam Macdonell<cam@cs.ualberta.ca> wrote:
> Hi Stephen,
>
> Here is the latest patch that supports interrupts.  I am currently working
> on a broadcast mechanism that should be ready fairly soon.
>
> http://patchwork.kernel.org/patch/22368/
>
> I have some test scripts that can demonstrate how to use the memory between
> guest/host and guest/guest.  Let me know if you would like me to send them
> to you.

Hi Cam,

Thanks for the pointer. That makes perfect sense, I'm familiar with
PCI drivers so that's fine.

Is there a corresponding qemu patch for the backend to the guest pci
driver? I'm curious how the buffer memory is allocated and how BAR
accesses are handled from the host side.

Regards,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-08  4:36     ` Avi Kivity
  2009-07-08 21:33       ` Stephen Donnelly
@ 2009-07-08 21:45       ` Cam Macdonell
  2009-07-08 22:01         ` Stephen Donnelly
  1 sibling, 1 reply; 31+ messages in thread
From: Cam Macdonell @ 2009-07-08 21:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Stephen Donnelly, kvm

Avi Kivity wrote:
> On 07/08/2009 01:23 AM, Stephen Donnelly wrote:
>>
>>>> Also it appears that PCI IO memory (cpu_register_io_memory) is
>>>> provided via access functions, like the pci config space?
>>>>        
>>> It can also use ordinary RAM (for example, vga maps its framebuffer 
>>> as a PCI
>>> BAR).
>>>      
>>
>> So host memory is exported as a PCI_BAR to the guest via
>> cpu_register_physical_memory(). It looks like the code has to
>> explicitly manage marking pages dirty and synchronising at appropriate
>> times. Is the coherency problem bidirectional, e.g. writes from either
>> host or guest to the shared memory need to mark pages dirty, and
>> ensure sync is called before the other side reads those areas?
>>    
> 
> Shared memory is fully coherent.  You can use the ordinary x86 bus lock 
> operations for concurrent read-modify-write access, and the memory 
> barrier instructions to prevent reordering.  Just like ordinary shared 
> memory.
> 
>>>> Does this
>>>> cause a page fault/vm_exit on each read or write, or is it more
>>>> efficient than that?
>>>>        
>>> It depends on how you configure it.  Look at the vga code (hw/vga.c,
>>> hw/cirrus_vga.c).  Also Cam (copied) wrote a PCI card that provides 
>>> shared
>>> memory across guests, you may want to look at that.
>>>      
>>
>> I will look into the vga code and see if I get inspired. The 'copied'
>> driver sounds interesting, the code is not in kvm git?
>>    
> 
> (copied) means Cam was copied (cc'ed) on the email, not the name of the 
> driver.  It hasn't been merged but copies (of the driver, not Cam) are 
> floating around on the Internet.

Hi Stephen,

Here is the latest patch that supports interrupts.  I am currently 
working on a broadcast mechanism that should be ready fairly soon.

http://patchwork.kernel.org/patch/22368/

I have some test scripts that can demonstrate how to use the memory 
between guest/host and guest/guest.  Let me know if you would like me to 
send them to you.

Cheers,
Cam

> 
> The relevant parts of cirrus_vga.c are:
> 
> static void cirrus_pci_lfb_map(PCIDevice *d, int region_num,
>                                uint32_t addr, uint32_t size, int type)
> {
> 
> ...
> 
>     /* XXX: add byte swapping apertures */
>     cpu_register_physical_memory(addr, s->vga.vram_size,
>                                  s->cirrus_linear_io_addr);
> 
> This function is called whenever the guest updates the BAR.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-08  4:36     ` Avi Kivity
@ 2009-07-08 21:33       ` Stephen Donnelly
  2009-07-09  8:10         ` Avi Kivity
  2009-07-08 21:45       ` Cam Macdonell
  1 sibling, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-08 21:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

> Shared memory is fully coherent.  You can use the ordinary x86 bus lock
> operations for concurrent read-modify-write access, and the memory barrier
> instructions to prevent reordering.  Just like ordinary shared memory.

Okay, I think I was confused by the 'dirty' code. Is that just to do
with migration?

> (copied) means Cam was copied (cc'ed) on the email, not the name of the
> driver.  It hasn't been merged but copies (of the driver, not Cam) are
> floating around on the Internet.

Thanks, I'll ask him for a pointer.

> The relevant parts of cirrus_vga.c are:
>
> static void cirrus_pci_lfb_map(PCIDevice *d, int region_num,
>                               uint32_t addr, uint32_t size, int type)
> {
>
> ...
>
>    /* XXX: add byte swapping apertures */
>    cpu_register_physical_memory(addr, s->vga.vram_size,
>                                 s->cirrus_linear_io_addr);
>
> This function is called whenever the guest updates the BAR.

So guest accesses to the LFB PCI_BAR trigger the cirrus_linear
functions, which set dirty on writes and allow 'side effect' handling
for reads if required? In my case there should be no side effects, so
it could be quite simple. I wonder about the cost of the callbacks on
each access though, am I still missing something?

Thank you for your patience, I really appreciate the assistance and
look forward to using kvm more widely.

Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-07 22:23   ` Stephen Donnelly
@ 2009-07-08  4:36     ` Avi Kivity
  2009-07-08 21:33       ` Stephen Donnelly
  2009-07-08 21:45       ` Cam Macdonell
  0 siblings, 2 replies; 31+ messages in thread
From: Avi Kivity @ 2009-07-08  4:36 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: kvm

On 07/08/2009 01:23 AM, Stephen Donnelly wrote:
>
>>> Also it appears that PCI IO memory (cpu_register_io_memory) is
>>> provided via access functions, like the pci config space?
>>>        
>> It can also use ordinary RAM (for example, vga maps its framebuffer as a PCI
>> BAR).
>>      
>
> So host memory is exported as a PCI_BAR to the guest via
> cpu_register_physical_memory(). It looks like the code has to
> explicitly manage marking pages dirty and synchronising at appropriate
> times. Is the coherency problem bidirectional, e.g. writes from either
> host or guest to the shared memory need to mark pages dirty, and
> ensure sync is called before the other side reads those areas?
>    

Shared memory is fully coherent.  You can use the ordinary x86 bus lock 
operations for concurrent read-modify-write access, and the memory 
barrier instructions to prevent reordering.  Just like ordinary shared 
memory.

>>> Does this
>>> cause a page fault/vm_exit on each read or write, or is it more
>>> efficient than that?
>>>        
>> It depends on how you configure it.  Look at the vga code (hw/vga.c,
>> hw/cirrus_vga.c).  Also Cam (copied) wrote a PCI card that provides shared
>> memory across guests, you may want to look at that.
>>      
>
> I will look into the vga code and see if I get inspired. The 'copied'
> driver sounds interesting, the code is not in kvm git?
>    

(copied) means Cam was copied (cc'ed) on the email, not the name of the 
driver.  It hasn't been merged but copies (of the driver, not Cam) are 
floating around on the Internet.

The relevant parts of cirrus_vga.c are:

static void cirrus_pci_lfb_map(PCIDevice *d, int region_num,
                                uint32_t addr, uint32_t size, int type)
{

...

     /* XXX: add byte swapping apertures */
     cpu_register_physical_memory(addr, s->vga.vram_size,
                                  s->cirrus_linear_io_addr);

This function is called whenever the guest updates the BAR.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-06  7:38 ` Avi Kivity
@ 2009-07-07 22:23   ` Stephen Donnelly
  2009-07-08  4:36     ` Avi Kivity
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-07 22:23 UTC (permalink / raw)
  To: Avi Kivity, kvm

On Mon, Jul 6, 2009 at 7:38 PM, Avi Kivity<avi@redhat.com> wrote:

>> I see virtio_pci uses cpu_physical_memory_map() which provides either
>> read or write mappings and notes "Use only for reads OR writes - not
>> for read-modify-write operations."
>
> Right, these are for unidirectional transient DMA.

Okay, as I thought. I would rather have 'relatively' persistent
mappings, multi-use, and preferably bi-directional.

>> Is there an alternative method that allows large (Several MB)
>> persistent hg memory mappings that are r/w? I would only be using this
>> under kvm, not kqemu or plain qemu.
>
> All of guest memory is permanently mapped in the host.  You can use
> accessors like cpu_physical_memory_rw() or cpu_physical_memory_map() to
> access it.  What exactly do you need that is not provided by these
> accessors?

I have an existing software system that provides high speed
communication between processes on a single host using shared memory.
I would like to extend the system to provide communication between
processes on the host and guest. Unfortunately the transport is
optimised for speed and is not highly abstracted so I cannot easily
substitute a virtio-ring for example.

The system uses two memory spaces, one is a control area which is
register-like and contains R/W values at various offsets. The second
area is for data transport and is divided into rings. Each ring is
unidirectional so I could map these separately with
cpu_physical_memory_map(), but there seems to be no simple solution
for the control area. Target ring performance is perhaps 1-2
gigabytes/second with rings approx 32-512MB in size.

>> Also it appears that PCI IO memory (cpu_register_io_memory) is
>> provided via access functions, like the pci config space?
>
> It can also use ordinary RAM (for example, vga maps its framebuffer as a PCI
> BAR).

So host memory is exported as a PCI_BAR to the guest via
cpu_register_physical_memory(). It looks like the code has to
explicitly manage marking pages dirty and synchronising at appropriate
times. Is the coherency problem bidirectional, e.g. writes from either
host or guest to the shared memory need to mark pages dirty, and
ensure sync is called before the other side reads those areas?

>> Does this
>> cause a page fault/vm_exit on each read or write, or is it more
>> efficient than that?
>
> It depends on how you configure it.  Look at the vga code (hw/vga.c,
> hw/cirrus_vga.c).  Also Cam (copied) wrote a PCI card that provides shared
> memory across guests, you may want to look at that.

I will look into the vga code and see if I get inspired. The 'copied'
driver sounds interesting, the code is not in kvm git?

Thanks for the response!

Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: R/W HG memory mappings with kvm?
  2009-07-05 22:41 Stephen Donnelly
@ 2009-07-06  7:38 ` Avi Kivity
  2009-07-07 22:23   ` Stephen Donnelly
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2009-07-06  7:38 UTC (permalink / raw)
  To: Stephen Donnelly; +Cc: kvm, Cam Macdonell

On 07/06/2009 01:41 AM, Stephen Donnelly wrote:
> I am looking at how to do memory mapped IO between host and guests
> under kvm. I expect to use the PCI emulation layer to present a PCI
> device to the guest.
>
> I see virtio_pci uses cpu_physical_memory_map() which provides either
> read or write mappings and notes "Use only for reads OR writes - not
> for read-modify-write operations."
>    

Right, these are for unidirectional transient DMA.

> Is there an alternative method that allows large (Several MB)
> persistent hg memory mappings that are r/w? I would only be using this
> under kvm, not kqemu or plain qemu.
>    

All of guest memory is permanently mapped in the host.  You can use 
accessors like cpu_physical_memory_rw() or cpu_physical_memory_map() to 
access it.  What exactly do you need that is not provided by these 
accessors?

> Also it appears that PCI IO memory (cpu_register_io_memory) is
> provided via access functions, like the pci config space?

It can also use ordinary RAM (for example, vga maps its framebuffer as a 
PCI BAR).

> Does this
> cause a page fault/vm_exit on each read or write, or is it more
> efficient than that?
>    

It depends on how you configure it.  Look at the vga code (hw/vga.c, 
hw/cirrus_vga.c).  Also Cam (copied) wrote a PCI card that provides 
shared memory across guests, you may want to look at that.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 31+ messages in thread

* R/W HG memory mappings with kvm?
@ 2009-07-05 22:41 Stephen Donnelly
  2009-07-06  7:38 ` Avi Kivity
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Donnelly @ 2009-07-05 22:41 UTC (permalink / raw)
  To: kvm

I am looking at how to do memory mapped IO between host and guests
under kvm. I expect to use the PCI emulation layer to present a PCI
device to the guest.

I see virtio_pci uses cpu_physical_memory_map() which provides either
read or write mappings and notes "Use only for reads OR writes - not
for read-modify-write operations."

Is there an alternative method that allows large (Several MB)
persistent hg memory mappings that are r/w? I would only be using this
under kvm, not kqemu or plain qemu.

Also it appears that PCI IO memory (cpu_register_io_memory) is
provided via access functions, like the pci config space? Does this
cause a page fault/vm_exit on each read or write, or is it more
efficient than that?

Thanks,
Stephen.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2009-09-28 18:27 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-28 18:27 R/W HG memory mappings with kvm? Tsuyoshi Ozawa
  -- strict thread matches above, loose matches on Subject: below --
2009-07-05 22:41 Stephen Donnelly
2009-07-06  7:38 ` Avi Kivity
2009-07-07 22:23   ` Stephen Donnelly
2009-07-08  4:36     ` Avi Kivity
2009-07-08 21:33       ` Stephen Donnelly
2009-07-09  8:10         ` Avi Kivity
2009-07-08 21:45       ` Cam Macdonell
2009-07-08 22:01         ` Stephen Donnelly
2009-07-09  6:01           ` Cam Macdonell
2009-07-09 22:38             ` Stephen Donnelly
2009-07-10 17:03               ` Cam Macdonell
2009-07-12 21:28                 ` Stephen Donnelly
     [not found]             ` <5f370d430907262256rd7f9fdalfbbec1f9492ce86@mail.gmail.com>
2009-07-27 14:48               ` Cam Macdonell
2009-07-27 21:32                 ` Stephen Donnelly
2009-07-28  8:54                   ` Avi Kivity
2009-07-28 23:06                     ` Stephen Donnelly
2009-08-13  4:07                       ` Stephen Donnelly
2009-08-19 12:14                         ` Avi Kivity
2009-08-23 21:59                           ` Stephen Donnelly
2009-08-24  4:55                             ` Avi Kivity
2009-08-26 10:22                               ` Avi Kivity
2009-08-27  2:39                                 ` Stephen Donnelly
2009-08-27  2:34                               ` Stephen Donnelly
2009-08-27  4:08                                 ` Avi Kivity
2009-08-30 22:33                                   ` Stephen Donnelly
2009-08-31  8:44                                     ` Avi Kivity
2009-08-31 21:13                                       ` Stephen Donnelly
2009-09-09 12:50                                         ` Avi Kivity
2009-07-29 23:52                     ` Cam Macdonell
2009-07-30  9:31                       ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).