From mboxrd@z Thu Jan  1 00:00:00 1970
From: Cameron Macdonell <cam@cs.ualberta.ca>
Subject: Re: [PATCH] Add shared memory PCI device that shares a memory object betweens VMs
Date: Sat, 18 Apr 2009 23:22:57 -0600
Message-ID: <AAA6C642-A0E4-4FE3-A4C2-84BCD4C83A8E@cs.ualberta.ca>
References: <1238600608-9120-1-git-send-email-cam@cs.ualberta.ca> <49D3965C.1030503@codemonkey.ws> <49D3AD79.7080708@redhat.com> <49D3B7ED.4030303@codemonkey.ws>
Mime-Version: 1.0 (Apple Message framework v930.3)
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org
To: Anthony Liguori <anthony@codemonkey.ws>
Return-path: <kvm-owner@vger.kernel.org>
Received: from fleet.cs.ualberta.ca ([129.128.22.22]:55100 "EHLO
	fleet.cs.ualberta.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751671AbZDSFW7 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sun, 19 Apr 2009 01:22:59 -0400
In-Reply-To: <49D3B7ED.4030303@codemonkey.ws>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


Hi Avi and Anthony,

Sorry for the top-reply, but we haven't discussed this aspect here  
before.

I've been thinking about how to implement interrupts.  As far as I can  
tell, unix domain sockets in Qemu/KVM are used point-to-point with one  
VM being the server by specifying "server" along with the unix:  
option.  This works simply for two VMs, but I'm unsure how this can  
extend to multiple VMs.  How would a server VM know how many clients  
to wait for?  How can messages then be multicast or broadcast?  Is a  
separate "interrupt server" necessary?

Thanks,
Cam

On 1-Apr-09, at 12:52 PM, Anthony Liguori wrote:

> Avi Kivity wrote:
>> Anthony Liguori wrote:
>>> Hi Cam,
>>>
>>>
>>> I would suggest two design changes to make here.  The first is  
>>> that I think you should use virtio.
>>
>> I disagree with this.  While virtio is excellent at exporting guest  
>> memory, it isn't so good at importing another guest's memory.
>
> First we need to separate static memory sharing and dynamic memory  
> sharing.  Static memory sharing has to be configured on start up.  I  
> think in practice, static memory sharing is not terribly interesting  
> except for maybe embedded environments.
>
> Dynamically memory sharing requires bidirectional communication in  
> order to establish mappings and tear down mappings.  You'll  
> eventually recreate virtio once you've implemented this  
> communication mechanism.
>
>>>  The second is that I think instead of relying on mapping in  
>>> device memory to the guest, you should have the guest allocate  
>>> it's own memory to dedicate to sharing.
>>
>> That's not what you describe below.  You're having the guest  
>> allocate parts of its address space that happen to be used by RAM,  
>> and overlaying those parts with the shared memory.
>
> But from the guest's perspective, it's RAM is being used for memory  
> sharing.
>
> If you're clever, you could start a guest with -mem-path and then  
> use this mechanism to map a portion of one guest's memory into  
> another guest without either guest ever knowing who "owns" the  
> memory and with exactly the same driver on both.
>
>>> Right now, you've got a bit of a hole in your implementation  
>>> because you only support files that are powers-of-two in size even  
>>> though that's not documented/enforced.  This is a limitation of  
>>> PCI resource regions.
>>
>> While the BAR needs to be a power of two, I don't think the RAM  
>> backing it needs to be.
>
> Then you need a side channel to communicate the information to the  
> guest.
>
>>> Also, the PCI memory hole is limited in size today which is going  
>>> to put an upper bound on the amount of memory you could ever map  
>>> into a guest.
>>
>> Today.  We could easily lift this restriction by supporting 64-bit  
>> BARs.  It would probably take only a few lines of code.
>>
>>> Since you're using qemu_ram_alloc() also, it makes hotplug  
>>> unworkable too since qemu_ram_alloc() is a static allocation from  
>>> a contiguous heap.
>>
>> We need to fix this anyway, for memory hotplug.
>
> It's going to be hard to "fix" with TCG.
>
>>> If you used virtio, what you could do is provide a ring queue that  
>>> was used to communicate a series of requests/response.  The  
>>> exchange might look like this:
>>>
>>> guest: REQ discover memory region
>>> host: RSP memory region id: 4 size: 8k
>>> guest: REQ map region id: 4 size: 8k: sgl: {(addr=43000, size=4k),  
>>> (addr=944000,size=4k)}
>>> host: RSP mapped region id: 4
>>> guest: REQ notify region id: 4
>>> host: RSP notify region id: 4
>>> guest: REQ poll region id: 4
>>> host: RSP poll region id: 4
>>
>> That looks significantly more complex.
>
> It's also supporting dynamic shared memory.  If you do use BARs,  
> then perhaps you'd just do PCI hotplug to make things dynamic.
>
>>>
>>> And the REQ/RSP order does not have to be in series like this.  In  
>>> general, you need one entry on the queue to poll for new memory  
>>> regions, one entry for each mapped region to poll for incoming  
>>> notification, and then the remaining entries can be used to send  
>>> short-lived requests/responses.
>>>
>>> It's important that the REQ map takes a scatter/gather list of  
>>> physical addresses because after running for a while, it's  
>>> unlikely that you'll be able to allocate any significant size of  
>>> contiguous memory.
>>>
>>> From a QEMU perspective, you would do memory sharing by waiting  
>>> for a map REQ from the guest and then you would complete the  
>>> request by doing an mmap(MAP_FIXED) with the appropriate  
>>> parameters into phys_ram_base.
>>
>> That will fragment the vma list.  And what do you do when you unmap  
>> the region?
>>
>> How does a 256M guest map 1G of shared memory?
>
> It doesn't but it couldn't today either b/c of the 32-bit BARs.
>
> Regards,
>
> Anthony Liguori
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-----------------------------------------------
A. Cameron Macdonell
Ph.D. Student
Department of Computing Science
University of Alberta
cam@cs.ualberta.ca