From mboxrd@z Thu Jan 1 00:00:00 1970 From: "George Dunlap" Subject: Re: [RFC][PATCH] 0/9 Populate-on-demand memory Date: Wed, 24 Dec 2008 13:55:20 +0000 Message-ID: References: <42c6b1aa-198a-412b-ae07-f25a2649914c@default> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <42c6b1aa-198a-412b-ae07-f25a2649914c@default> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Dan Magenheimer Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Tue, Dec 23, 2008 at 7:06 PM, Dan Magenheimer wrote: > Very nice! Thanks! > One thing that might be worth adding to the requirements list or > README is that this approach (or any which depends on ballooning) > will now almost certainly require any participating hvm domain > to have an adequately-sized properly-configured swap disk. > Ballooning is insufficiently responsive to grow memory fast > enough to handle rapidly growing memory needs of an active domain > The consequence for a no-swap-disk is application failures > and the consequence even if a swap disk IS configured is temporarily > very poor performance. I don't think this is particular to the PoD patches, or even ballooning per se. A swap disk would be required any time you boot with a small amount of memory, whether it could be increased or not. But you're right, in that this differs from a typical operating system's "demang-paging" mechanism, where the goal is to give a process only the memory it actually needs, so you can use it for other processes. You're still allocating a fixed amount of memory to a guest at start-up. The un-populated memory is not available to use by other VMs, and allocating more memory is a (relatively) slow process. I guess a brief note pointing out the difference between "populate on demand" and "allocate on demand" would be useful. > So this won't work for any domain that does start-of-day > scrubbing with a non-zero value? I suppose that's OK. Not if the scrubber might win the race against the balloon driver. :-) If this really becomes an issue, it should be straightforward to add functionality to handle it. It just requires having a simple way of specifying what "scrubbed" pages look like, an extra p2m type for "PoD scrubbed" (rather than PoD zero, the default), and how to change from scrubbed <-> zero. Did you have a particular system in mind? -George >> -----Original Message----- >> From: George Dunlap [mailto:dunlapg@umich.edu] >> Sent: Tuesday, December 23, 2008 5:55 AM >> To: xen-devel@lists.xensource.com >> Subject: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory >> >> >> This set of patches introduces a set of mechanisms and interfaces to >> implement populate-on-demand memory. The purpose of >> populate-on-demand memory is to allow non-paravirtualized guests (such >> as Windows or Linux HVM) boot in a ballooned state. >> >> BACKGROUND >> >> When non-PV domains boots, they typically read the e820 maps to >> determine how much memory they have, and then assume that much memory >> thereafter. Memory requirements can be reduced using a balloon >> driver, but it cannot be increased past this initial value. >> Currently, this means that a non-PV domain must be booted with the >> maximum amount of memory you want that VM every to be able to use. >> >> Populate-on-demand allows us to "boot ballooned", in the >> following manner: >> * Mark the entire range of memory (memory_static_max aka maxmem) with >> a new p2m type, populate_on_demand, reporting memory_static_max in th >> e820 map. No memory is allocated at this stage. >> * Allocate the "memory_dynamic_max" (aka "target") amount of memory >> for a "PoD cache". This memory is kept on a separate list in the >> domain struct. >> * Boot the guest. >> * Populate the p2m table on-demand as it's accessed with pages from >> the PoD cache. >> * When the balloon driver loads, it inflates the balloon size to >> (maxmem - target), giving the memory back to Xen. When this is >> accomplished, the "populate-on-demand" portion of boot is effectively >> finished. >> >> One complication is that many operating systems have start-of-day page >> scrubbers, which touch all of memory to zero it. This scrubber may >> run before the balloon driver can return memory to Xen. These zeroed >> pages, however, don't contain any information; we can safely replace >> them with PoD entries again. So when we run out of PoD cache, we do >> an "emergency sweep" to look for zero pages we can reclaim for the >> populate-on-demand cache. When we find a page range which is entirely >> zero, we mark the gfn range PoD again, and put the memory back into >> the PoD cache. >> >> NB that this code is designed to work only in conjunction with a >> balloon driver. If the balloon driver is not loaded, eventually all >> pages will be dirtied (non-zero), the emergency sweep will fail, and >> there will be no memory to back outstanding PoD pages. When this >> happens, the domain will crash. >> >> The code works for both shadow mode and HAP mode; it has been tested >> with NPT/RVI and shadow, but not yet with EPT. It also attempts to >> avoid splintering superpages, to allow HAP to function more >> effectively. >> >> To use: >> * ensure that you have a functioning balloon driver in the guest >> (e.g., xen_balloon.ko for Linux HVM guests). >> * Set maxmem/memory_static_max to one value, and >> memory/memory_dynamic_max to another when creating the domain; e.g: >> # xm create debian-hvm maxmem=512 memory=256 >> >> The patches are as follows: >> 01 - Add a p2m_query_type to core gfn_to_mfn*() functions. >> >> 02 - Change some gfn_to_mfn() calls to gfn_to_mfn_query(), which will >> not populate PoD entries. Specifically, since gfn_to_mfn() may grab >> the p2m lock, it must not be called while the shadow lock is held. >> >> 03 - Populate-on-demand core. Introduce new p2m type, PoD cache >> structures, and core functionality. Add PoD checking to audit_p2m(). >> Add PoD information to the 'q' debug key. >> >> 04 - Implement p2m_decrease_reservation. As the balloon driver >> returns gfns to Xen, it handles PoD entries properly; it also "steals" >> memory being returned for the PoD cache instead of freeing it, if >> necessary. >> >> 05 - emergency sweep: Implement emergency sweep for zero memory if the >> cache is low. If it finds pages (or page ranges) entirely zero, it >> will replace the entry with a PoD entry again, reclaiming the memory >> for the PoD cache. >> >> 06 - Deal with splintering both PoD pages (to back singleton PoD >> entries) and PoD ranges >> >> 07 - Xen interface for populate-on-demand functionality: PoD flag for >> populate_physmap, {get,set}_pod_target for interacting with the PoD >> cache. set_pod_target() should be called for any domain that may have >> PoD entries. It will increase the size of the cache if necessary, but >> will never decrease the size of the cache. (This will be done as the >> balloon driver balloons down.) >> >> 08 - libxc interface. Add a new libxc functions: >> + xc_hvm_build_target_mem(), which accepts memsize and target. If >> these are equal, PoD functionality is not invoked. Otherwise, memsize >> is marked PoD, and the target MiB is allocated to the PoD cache. >> + xc_[sg]et_pod_target(): get / set PoD target. set_pod_target() >> should be called whenever you change the guest target mem on a domain >> which may have outstaning PoD entries. This may increase the size of >> the PoD cache up to the number of outstanding PoD entries, but will >> not reduce the size of the cache. (The cache may be reduced as the >> balloon driver returns gfn space to Xen.) >> >> 09 - xend integration. >> + Always calls xc_hvm_build_target_mem() with memsize=maxmem and >> target=memory. If these the same, the internal function will not use >> PoD. >> + Calls xc_set_target_mem() whenever a domain's target is changed. >> Also calls balloon.free(), causing dom0 to balloon down itself if >> there's not enough memory otherwise. >> >> Things still to do: >> * When reduce_reservation() is called with a superpage, keep the >> superpage intact. >> * Create a hypercall continuation for set_pod_target. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >