From mboxrd@z Thu Jan  1 00:00:00 1970
From: "George Dunlap" <dunlapg@umich.edu>
Subject: Re: [RFC][PATCH] 0/9 Populate-on-demand memory
Date: Wed, 24 Dec 2008 13:55:20 +0000
Message-ID: <de76405a0812240555w3ced6a84nbcb05cebd4a2c233@mail.gmail.com>
References: <de76405a0812230455m51a8bd62ncf1b38dbccb3d442@mail.gmail.com>
	<42c6b1aa-198a-412b-ae07-f25a2649914c@default>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <42c6b1aa-198a-412b-ae07-f25a2649914c@default>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On Tue, Dec 23, 2008 at 7:06 PM, Dan Magenheimer
<dan.magenheimer@oracle.com> wrote:
> Very nice!

Thanks!

> One thing that might be worth adding to the requirements list or
> README is that this approach (or any which depends on ballooning)
> will now almost certainly require any participating hvm domain
> to have an adequately-sized properly-configured swap disk.
> Ballooning is insufficiently responsive to grow memory fast
> enough to handle rapidly growing memory needs of an active domain
> The consequence for a no-swap-disk is application failures
> and the consequence even if a swap disk IS configured is temporarily
> very poor performance.

I don't think this is particular to the PoD patches, or even
ballooning per se.  A swap disk would be required any time you boot
with a small amount of memory, whether it could be increased or not.

But you're right, in that this differs from a typical operating
system's "demang-paging" mechanism, where the goal is to give a
process only the memory it actually needs, so you can use it for other
processes.  You're still allocating a fixed amount of memory to a
guest at start-up.  The un-populated memory is not available to use by
other VMs, and allocating more memory is a (relatively) slow process.
I guess a brief note pointing out the difference between "populate on
demand" and "allocate on demand" would be useful.

> So this won't work for any domain that does start-of-day
> scrubbing with a non-zero value?  I suppose that's OK.

Not if the scrubber might win the race against the balloon driver. :-)
 If this really becomes an issue, it should be straightforward to add
functionality to handle it.  It just requires having a simple way of
specifying what "scrubbed" pages look like, an extra p2m type for "PoD
scrubbed" (rather than PoD zero, the default), and how to change from
scrubbed <-> zero.

Did you have a particular system in mind?

-George

>> -----Original Message-----
>> From: George Dunlap [mailto:dunlapg@umich.edu]
>> Sent: Tuesday, December 23, 2008 5:55 AM
>> To: xen-devel@lists.xensource.com
>> Subject: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>>
>>
>> This set of patches introduces a set of mechanisms and interfaces to
>> implement populate-on-demand memory.  The purpose of
>> populate-on-demand memory is to allow non-paravirtualized guests (such
>> as Windows or Linux HVM) boot in a ballooned state.
>>
>> BACKGROUND
>>
>> When non-PV domains boots, they typically read the e820 maps to
>> determine how much memory they have, and then assume that much memory
>> thereafter.  Memory requirements can be reduced using a balloon
>> driver, but it cannot be increased past this initial value.
>> Currently, this means that a non-PV domain must be booted with the
>> maximum amount of memory you want that VM every to be able to use.
>>
>> Populate-on-demand allows us to "boot ballooned", in the
>> following manner:
>> * Mark the entire range of memory (memory_static_max aka maxmem) with
>> a new p2m type, populate_on_demand, reporting memory_static_max in th
>> e820 map.  No memory is allocated at this stage.
>> * Allocate the "memory_dynamic_max" (aka "target") amount of memory
>> for a "PoD cache".  This memory is kept on a separate list in the
>> domain struct.
>> * Boot the guest.
>> * Populate the p2m table on-demand as it's accessed with pages from
>> the PoD cache.
>> * When the balloon driver loads, it inflates the balloon size to
>> (maxmem - target), giving the memory back to Xen.  When this is
>> accomplished, the "populate-on-demand" portion of boot is effectively
>> finished.
>>
>> One complication is that many operating systems have start-of-day page
>> scrubbers, which touch all of memory to zero it.  This scrubber may
>> run before the balloon driver can return memory to Xen.  These zeroed
>> pages, however, don't contain any information; we can safely replace
>> them with PoD entries again.  So when we run out of PoD cache, we do
>> an "emergency sweep" to look for zero pages we can reclaim for the
>> populate-on-demand cache.  When we find a page range which is entirely
>> zero, we mark the gfn range PoD again, and put the memory back into
>> the PoD cache.
>>
>> NB that this code is designed to work only in conjunction with a
>> balloon driver.  If the balloon driver is not loaded, eventually all
>> pages will be dirtied (non-zero), the emergency sweep will fail, and
>> there will be no memory to back outstanding PoD pages.  When this
>> happens, the domain will crash.
>>
>> The code works for both shadow mode and HAP mode; it has been tested
>> with NPT/RVI and shadow, but not yet with EPT.  It also attempts to
>> avoid splintering superpages, to allow HAP to function more
>> effectively.
>>
>> To use:
>> * ensure that you have a functioning balloon driver in the guest
>> (e.g., xen_balloon.ko for Linux HVM guests).
>> * Set maxmem/memory_static_max to one value, and
>> memory/memory_dynamic_max to another when creating the domain; e.g:
>>  # xm create debian-hvm maxmem=512 memory=256
>>
>> The patches are as follows:
>> 01 - Add a p2m_query_type to core gfn_to_mfn*() functions.
>>
>> 02 - Change some gfn_to_mfn() calls to gfn_to_mfn_query(), which will
>> not populate PoD entries.  Specifically, since gfn_to_mfn() may grab
>> the p2m lock, it must not be called while the shadow lock is held.
>>
>> 03 - Populate-on-demand core.  Introduce new p2m type, PoD cache
>> structures, and core functionality.  Add PoD checking to audit_p2m().
>> Add PoD information to the 'q' debug key.
>>
>> 04 - Implement p2m_decrease_reservation.  As the balloon driver
>> returns gfns to Xen, it handles PoD entries properly; it also "steals"
>> memory being returned for the PoD cache instead of freeing it, if
>> necessary.
>>
>> 05 - emergency sweep: Implement emergency sweep for zero memory if the
>> cache is low.  If it finds pages (or page ranges) entirely zero, it
>> will replace the entry with a PoD entry again, reclaiming the memory
>> for the PoD cache.
>>
>> 06 - Deal with splintering both PoD pages (to back singleton PoD
>> entries) and PoD ranges
>>
>> 07 - Xen interface for populate-on-demand functionality: PoD flag for
>> populate_physmap, {get,set}_pod_target for interacting with the PoD
>> cache.  set_pod_target() should be called for any domain that may have
>> PoD entries.  It will increase the size of the cache if necessary, but
>> will never decrease the size of the cache.  (This will be done as the
>> balloon driver balloons down.)
>>
>> 08 - libxc interface.  Add a new libxc functions:
>> + xc_hvm_build_target_mem(), which accepts memsize and target.  If
>> these are equal, PoD functionality is not invoked.  Otherwise, memsize
>> is marked PoD, and the target MiB is allocated to the PoD cache.
>> + xc_[sg]et_pod_target(): get / set PoD target.  set_pod_target()
>> should be called whenever you change the guest target mem on a domain
>> which may have outstaning PoD entries.  This may increase the size of
>> the PoD cache up to the number of outstanding PoD entries, but will
>> not reduce the size of the cache.  (The cache may be reduced as the
>> balloon driver returns gfn space to Xen.)
>>
>> 09 - xend integration.
>> + Always calls xc_hvm_build_target_mem() with memsize=maxmem and
>> target=memory.  If these the same, the internal function will not use
>> PoD.
>> + Calls xc_set_target_mem() whenever a domain's target is changed.
>> Also calls balloon.free(), causing dom0 to balloon down itself if
>> there's not enough memory otherwise.
>>
>> Things still to do:
>> * When reduce_reservation() is called with a superpage, keep the
>> superpage intact.
>> * Create a hypercall continuation for set_pod_target.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>