From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH] Add shared memory PCI device that shares a memory object
 betweens VMs
Date: Wed, 01 Apr 2009 21:07:53 +0300
Message-ID: <49D3AD79.7080708@redhat.com>
References: <1238600608-9120-1-git-send-email-cam@cs.ualberta.ca> <49D3965C.1030503@codemonkey.ws>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Cam Macdonell <cam@cs.ualberta.ca>, kvm@vger.kernel.org
To: Anthony Liguori <anthony@codemonkey.ws>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:37986 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933275AbZDASHT (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 1 Apr 2009 14:07:19 -0400
In-Reply-To: <49D3965C.1030503@codemonkey.ws>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Anthony Liguori wrote:
> Hi Cam,
>
> Cam Macdonell wrote:
>> This patch supports sharing memory between VMs and between the 
>> host/VM.  It's a first cut and comments are encouraged.  The goal is 
>> to support simple Inter-VM communication
>> with zero-copy access to shared memory.
>>   
>
> Nice work!
>
> I would suggest two design changes to make here.  The first is that I 
> think you should use virtio.

I disagree with this.  While virtio is excellent at exporting guest 
memory, it isn't so good at importing another guest's memory.

>   The second is that I think instead of relying on mapping in device 
> memory to the guest, you should have the guest allocate it's own 
> memory to dedicate to sharing.

That's not what you describe below.  You're having the guest allocate 
parts of its address space that happen to be used by RAM, and overlaying 
those parts with the shared memory.

> Right now, you've got a bit of a hole in your implementation because 
> you only support files that are powers-of-two in size even though 
> that's not documented/enforced.  This is a limitation of PCI resource 
> regions.  

While the BAR needs to be a power of two, I don't think the RAM backing 
it needs to be.

> Also, the PCI memory hole is limited in size today which is going to 
> put an upper bound on the amount of memory you could ever map into a 
> guest.  

Today.  We could easily lift this restriction by supporting 64-bit 
BARs.  It would probably take only a few lines of code.

> Since you're using qemu_ram_alloc() also, it makes hotplug unworkable 
> too since qemu_ram_alloc() is a static allocation from a contiguous heap.

We need to fix this anyway, for memory hotplug.

>
> If you used virtio, what you could do is provide a ring queue that was 
> used to communicate a series of requests/response.  The exchange might 
> look like this:
>
> guest: REQ discover memory region
> host: RSP memory region id: 4 size: 8k
> guest: REQ map region id: 4 size: 8k: sgl: {(addr=43000, size=4k), 
> (addr=944000,size=4k)}
> host: RSP mapped region id: 4
> guest: REQ notify region id: 4
> host: RSP notify region id: 4
> guest: REQ poll region id: 4
> host: RSP poll region id: 4

That looks significantly more complex.

>
> And the REQ/RSP order does not have to be in series like this.  In 
> general, you need one entry on the queue to poll for new memory 
> regions, one entry for each mapped region to poll for incoming 
> notification, and then the remaining entries can be used to send 
> short-lived requests/responses.
>
> It's important that the REQ map takes a scatter/gather list of 
> physical addresses because after running for a while, it's unlikely 
> that you'll be able to allocate any significant size of contiguous 
> memory.
>
> From a QEMU perspective, you would do memory sharing by waiting for a 
> map REQ from the guest and then you would complete the request by 
> doing an mmap(MAP_FIXED) with the appropriate parameters into 
> phys_ram_base.

That will fragment the vma list.  And what do you do when you unmap the 
region?

How does a 256M guest map 1G of shared memory?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.