From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cameron Macdonell Subject: Re: [PATCH] Add shared memory PCI device that shares a memory object betweens VMs Date: Sat, 18 Apr 2009 23:22:57 -0600 Message-ID: References: <1238600608-9120-1-git-send-email-cam@cs.ualberta.ca> <49D3965C.1030503@codemonkey.ws> <49D3AD79.7080708@redhat.com> <49D3B7ED.4030303@codemonkey.ws> Mime-Version: 1.0 (Apple Message framework v930.3) Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Cc: Avi Kivity , kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from fleet.cs.ualberta.ca ([129.128.22.22]:55100 "EHLO fleet.cs.ualberta.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751671AbZDSFW7 (ORCPT ); Sun, 19 Apr 2009 01:22:59 -0400 In-Reply-To: <49D3B7ED.4030303@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Hi Avi and Anthony, Sorry for the top-reply, but we haven't discussed this aspect here before. I've been thinking about how to implement interrupts. As far as I can tell, unix domain sockets in Qemu/KVM are used point-to-point with one VM being the server by specifying "server" along with the unix: option. This works simply for two VMs, but I'm unsure how this can extend to multiple VMs. How would a server VM know how many clients to wait for? How can messages then be multicast or broadcast? Is a separate "interrupt server" necessary? Thanks, Cam On 1-Apr-09, at 12:52 PM, Anthony Liguori wrote: > Avi Kivity wrote: >> Anthony Liguori wrote: >>> Hi Cam, >>> >>> >>> I would suggest two design changes to make here. The first is >>> that I think you should use virtio. >> >> I disagree with this. While virtio is excellent at exporting guest >> memory, it isn't so good at importing another guest's memory. > > First we need to separate static memory sharing and dynamic memory > sharing. Static memory sharing has to be configured on start up. I > think in practice, static memory sharing is not terribly interesting > except for maybe embedded environments. > > Dynamically memory sharing requires bidirectional communication in > order to establish mappings and tear down mappings. You'll > eventually recreate virtio once you've implemented this > communication mechanism. > >>> The second is that I think instead of relying on mapping in >>> device memory to the guest, you should have the guest allocate >>> it's own memory to dedicate to sharing. >> >> That's not what you describe below. You're having the guest >> allocate parts of its address space that happen to be used by RAM, >> and overlaying those parts with the shared memory. > > But from the guest's perspective, it's RAM is being used for memory > sharing. > > If you're clever, you could start a guest with -mem-path and then > use this mechanism to map a portion of one guest's memory into > another guest without either guest ever knowing who "owns" the > memory and with exactly the same driver on both. > >>> Right now, you've got a bit of a hole in your implementation >>> because you only support files that are powers-of-two in size even >>> though that's not documented/enforced. This is a limitation of >>> PCI resource regions. >> >> While the BAR needs to be a power of two, I don't think the RAM >> backing it needs to be. > > Then you need a side channel to communicate the information to the > guest. > >>> Also, the PCI memory hole is limited in size today which is going >>> to put an upper bound on the amount of memory you could ever map >>> into a guest. >> >> Today. We could easily lift this restriction by supporting 64-bit >> BARs. It would probably take only a few lines of code. >> >>> Since you're using qemu_ram_alloc() also, it makes hotplug >>> unworkable too since qemu_ram_alloc() is a static allocation from >>> a contiguous heap. >> >> We need to fix this anyway, for memory hotplug. > > It's going to be hard to "fix" with TCG. > >>> If you used virtio, what you could do is provide a ring queue that >>> was used to communicate a series of requests/response. The >>> exchange might look like this: >>> >>> guest: REQ discover memory region >>> host: RSP memory region id: 4 size: 8k >>> guest: REQ map region id: 4 size: 8k: sgl: {(addr=43000, size=4k), >>> (addr=944000,size=4k)} >>> host: RSP mapped region id: 4 >>> guest: REQ notify region id: 4 >>> host: RSP notify region id: 4 >>> guest: REQ poll region id: 4 >>> host: RSP poll region id: 4 >> >> That looks significantly more complex. > > It's also supporting dynamic shared memory. If you do use BARs, > then perhaps you'd just do PCI hotplug to make things dynamic. > >>> >>> And the REQ/RSP order does not have to be in series like this. In >>> general, you need one entry on the queue to poll for new memory >>> regions, one entry for each mapped region to poll for incoming >>> notification, and then the remaining entries can be used to send >>> short-lived requests/responses. >>> >>> It's important that the REQ map takes a scatter/gather list of >>> physical addresses because after running for a while, it's >>> unlikely that you'll be able to allocate any significant size of >>> contiguous memory. >>> >>> From a QEMU perspective, you would do memory sharing by waiting >>> for a map REQ from the guest and then you would complete the >>> request by doing an mmap(MAP_FIXED) with the appropriate >>> parameters into phys_ram_base. >> >> That will fragment the vma list. And what do you do when you unmap >> the region? >> >> How does a 256M guest map 1G of shared memory? > > It doesn't but it couldn't today either b/c of the 32-bit BARs. > > Regards, > > Anthony Liguori > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ----------------------------------------------- A. Cameron Macdonell Ph.D. Student Department of Computing Science University of Alberta cam@cs.ualberta.ca