All of lore.kernel.org
 help / color / mirror / Atom feed
* Issue with PV superpage handling
@ 2012-06-25 14:38 Dave McCracken
  2012-06-25 15:08 ` George Dunlap
  2012-07-09  6:02 ` Juergen Gross
  0 siblings, 2 replies; 7+ messages in thread
From: Dave McCracken @ 2012-06-25 14:38 UTC (permalink / raw)
  To: Xen Developers; +Cc: George Dunlap, Ian Campbell


Awhile back I added the domain config flag "superpages" to support Linux 
hugepages in PV domains.  When the flag is set, the PV domain is populated 
entirely with superpages.  If not enough superpage-sized chunks can be found, 
the domain creation fails.

At some time after my patch was accepted, the code I added to domain restore 
was removed because I broke page allocation batching.  I put it on my TODO 
list to reimplement it, then it got lost, for which I apologize.

Now I have gotten back to reimplementing PV superpage support in restore, I 
find that recently other code was added to restore that, while triggered by 
the superpage flag, only allocates superpages opportunistically and falls back 
to small pages if it fails.  This breaks the original semantics of the flag 
and could cause any OS that depends on the semantics to fail catastrophically.

I have a patch that implements the original semantics of the superpage flag 
while preserving the batch allocation behavior.  I can remove the competing 
code and submit mine, but I have a question.  What value is there in 
implementing opportunistic allocation of superpages for a PV (or an HVM) 
domain in restore?  It clearly can't be based on the superpages flag.  
Opportunistic superpage allocation is already the default behavior for HVM 
domain creation.  Should it also be a default on HVM restore?  What about for 
PV domains?  Is there any real benefit?

Thanks,
Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issue with PV superpage handling
  2012-06-25 14:38 Issue with PV superpage handling Dave McCracken
@ 2012-06-25 15:08 ` George Dunlap
  2012-06-25 15:48   ` Jan Beulich
  2012-07-09  6:02 ` Juergen Gross
  1 sibling, 1 reply; 7+ messages in thread
From: George Dunlap @ 2012-06-25 15:08 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Ian Campbell, Xen Developers

On 25/06/12 15:38, Dave McCracken wrote:
> Awhile back I added the domain config flag "superpages" to support Linux
> hugepages in PV domains.  When the flag is set, the PV domain is populated
> entirely with superpages.  If not enough superpage-sized chunks can be found,
> the domain creation fails.
>
> At some time after my patch was accepted, the code I added to domain restore
> was removed because I broke page allocation batching.  I put it on my TODO
> list to reimplement it, then it got lost, for which I apologize.
>
> Now I have gotten back to reimplementing PV superpage support in restore, I
> find that recently other code was added to restore that, while triggered by
> the superpage flag, only allocates superpages opportunistically and falls back
> to small pages if it fails.  This breaks the original semantics of the flag
> and could cause any OS that depends on the semantics to fail catastrophically.
>
> I have a patch that implements the original semantics of the superpage flag
> while preserving the batch allocation behavior.  I can remove the competing
> code and submit mine, but I have a question.  What value is there in
> implementing opportunistic allocation of superpages for a PV (or an HVM)
> domain in restore?  It clearly can't be based on the superpages flag.
> Opportunistic superpage allocation is already the default behavior for HVM
> domain creation.  Should it also be a default on HVM restore?  What about for
> PV domains?  Is there any real benefit?
Well the value of having superpages for HVM guests is pretty obvious.  
When using hardware assisted pagetables (HAP), the number of memory 
reads on a TLB lookup is guest_levels * p2m_level -- so on a 64-bit 
guest, the one extra level of p2m could cause up to 4 extra memory reads 
for every TLB miss.  The reason to do it opportunistically instead of 
all-or-nothing is that there's no reason not to -- every little helps. :-)

My question is, what is the value of enforcing all-or-nothing for PV 
guests?  Is it the case that PV guests have to be entirely in either one 
mode or the other?

I'm not particularly fussed about having a way to disable the 
opportunistic superpage allocation for HVM guests, and just turning that 
on all the time.  I only really used the flag because I saw it was being 
passed but wasn't being used; I didn't realize it was meant to have the 
"use superpages or abort" semantics.  My only non-negotiable is that we 
have *a way* to get opportunistic superpages for HVM guests.

  -George

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issue with PV superpage handling
  2012-06-25 15:08 ` George Dunlap
@ 2012-06-25 15:48   ` Jan Beulich
  2012-06-25 16:07     ` Dave McCracken
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2012-06-25 15:48 UTC (permalink / raw)
  To: George Dunlap, Dave McCracken; +Cc: Ian Campbell, Xen Developers

>>> On 25.06.12 at 17:08, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 25/06/12 15:38, Dave McCracken wrote:
>> Awhile back I added the domain config flag "superpages" to support Linux
>> hugepages in PV domains.  When the flag is set, the PV domain is populated
>> entirely with superpages.  If not enough superpage-sized chunks can be found,
>> the domain creation fails.
>>
>> At some time after my patch was accepted, the code I added to domain restore
>> was removed because I broke page allocation batching.  I put it on my TODO
>> list to reimplement it, then it got lost, for which I apologize.
>>
>> Now I have gotten back to reimplementing PV superpage support in restore, I
>> find that recently other code was added to restore that, while triggered by
>> the superpage flag, only allocates superpages opportunistically and falls 
> back
>> to small pages if it fails.  This breaks the original semantics of the flag
>> and could cause any OS that depends on the semantics to fail 
> catastrophically.
>>
>> I have a patch that implements the original semantics of the superpage flag
>> while preserving the batch allocation behavior.  I can remove the competing
>> code and submit mine, but I have a question.  What value is there in
>> implementing opportunistic allocation of superpages for a PV (or an HVM)
>> domain in restore?  It clearly can't be based on the superpages flag.
>> Opportunistic superpage allocation is already the default behavior for HVM
>> domain creation.  Should it also be a default on HVM restore?  What about 
> for
>> PV domains?  Is there any real benefit?
> Well the value of having superpages for HVM guests is pretty obvious.  
> When using hardware assisted pagetables (HAP), the number of memory 
> reads on a TLB lookup is guest_levels * p2m_level -- so on a 64-bit 
> guest, the one extra level of p2m could cause up to 4 extra memory reads 
> for every TLB miss.  The reason to do it opportunistically instead of 
> all-or-nothing is that there's no reason not to -- every little helps. :-)
> 
> My question is, what is the value of enforcing all-or-nothing for PV 
> guests?  Is it the case that PV guests have to be entirely in either one 
> mode or the other?

Since I understand a PV guest's balloon driver must play with
this, I indeed think this is a strictly separated set.

> I'm not particularly fussed about having a way to disable the 
> opportunistic superpage allocation for HVM guests, and just turning that 
> on all the time.  I only really used the flag because I saw it was being 
> passed but wasn't being used; I didn't realize it was meant to have the 
> "use superpages or abort" semantics.  My only non-negotiable is that we 
> have *a way* to get opportunistic superpages for HVM guests.

Couldn't we have the setting be an override for the HVM
allocation behavior (defaulting to enabled there), and have
the originally intended meaning for PV (disabled by default)?

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issue with PV superpage handling
  2012-06-25 15:48   ` Jan Beulich
@ 2012-06-25 16:07     ` Dave McCracken
  2012-06-25 17:07       ` George Dunlap
  2012-06-26  6:52       ` Jan Beulich
  0 siblings, 2 replies; 7+ messages in thread
From: Dave McCracken @ 2012-06-25 16:07 UTC (permalink / raw)
  To: xen-devel, Jan Beulich, George Dunlap; +Cc: Ian Campbell

On Monday, June 25, 2012, Jan Beulich wrote:
> > My question is, what is the value of enforcing all-or-nothing for PV 
> > guests?  Is it the case that PV guests have to be entirely in either one 
> > mode or the other?
> 
> Since I understand a PV guest's balloon driver must play with
> this, I indeed think this is a strictly separated set.

I specifically need to be able to guarantee superpage-backed memory in PV 
guests to be able to map them as superpages (hugepages in Linux).  I'm trying 
to come up with some benefit for opportunistic superpages in PV guests, but 
nothing comes to mind.

> > I'm not particularly fussed about having a way to disable the 
> > opportunistic superpage allocation for HVM guests, and just turning that 
> > on all the time.  I only really used the flag because I saw it was being 
> > passed but wasn't being used; I didn't realize it was meant to have the 
> > "use superpages or abort" semantics.  My only non-negotiable is that we 
> > have a way to get opportunistic superpages for HVM guests.
> 
> Couldn't we have the setting be an override for the HVM
> allocation behavior (defaulting to enabled there), and have
> the originally intended meaning for PV (disabled by default)?

I like this idea.  It would be simple enough.

Is there any reason to allow disabling HVM superpage allocation?  HVM domain 
creation always allocates as many superpages as it can, then falls back to 
small pages for the rest.  Wouldn't it be reasonable to make restore always do 
this too?

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issue with PV superpage handling
  2012-06-25 16:07     ` Dave McCracken
@ 2012-06-25 17:07       ` George Dunlap
  2012-06-26  6:52       ` Jan Beulich
  1 sibling, 0 replies; 7+ messages in thread
From: George Dunlap @ 2012-06-25 17:07 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Ian Campbell, Jan Beulich, xen-devel

On Mon, Jun 25, 2012 at 5:07 PM, Dave McCracken <dcm@mccr.org> wrote:
>> Couldn't we have the setting be an override for the HVM
>> allocation behavior (defaulting to enabled there), and have
>> the originally intended meaning for PV (disabled by default)?
>
> I like this idea.  It would be simple enough.
>
> Is there any reason to allow disabling HVM superpage allocation?  HVM domain
> creation always allocates as many superpages as it can, then falls back to
> small pages for the rest.  Wouldn't it be reasonable to make restore always do
> this too?

At this point, probably not.  Every toolstack that I know of (xend,
xl, xapi) always set it to '1' for HVM guests.

Is there any reason not to just have the "superpage" flag mean "try to
use superpages", and then have the allocation routine fail if
superpages && pv==true?  It seems like that might be the simplest
option.

Alternately, we could change the argument to "pv_superpages" or
something, and ignore it for HVM guests (always trying to allocate
superpages if available).  But I'm not sure that really buys us
anything.  The only trick would be trying to help people in the future
to not make the same mistake I did in interpreting what the
"superpages" flag means.

 -George

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issue with PV superpage handling
  2012-06-25 16:07     ` Dave McCracken
  2012-06-25 17:07       ` George Dunlap
@ 2012-06-26  6:52       ` Jan Beulich
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2012-06-26  6:52 UTC (permalink / raw)
  To: Dave McCracken; +Cc: George Dunlap, Ian Campbell, xen-devel

>>> On 25.06.12 at 18:07, Dave McCracken <dcm@mccr.org> wrote:
> On Monday, June 25, 2012, Jan Beulich wrote:
>> > My question is, what is the value of enforcing all-or-nothing for PV 
>> > guests?  Is it the case that PV guests have to be entirely in either one 
>> > mode or the other?
>> 
>> Since I understand a PV guest's balloon driver must play with
>> this, I indeed think this is a strictly separated set.
> 
> I specifically need to be able to guarantee superpage-backed memory in PV 
> guests to be able to map them as superpages (hugepages in Linux).  I'm 
> trying 
> to come up with some benefit for opportunistic superpages in PV guests, but 
> nothing comes to mind.
> 
>> > I'm not particularly fussed about having a way to disable the 
>> > opportunistic superpage allocation for HVM guests, and just turning that 
>> > on all the time.  I only really used the flag because I saw it was being 
>> > passed but wasn't being used; I didn't realize it was meant to have the 
>> > "use superpages or abort" semantics.  My only non-negotiable is that we 
>> > have a way to get opportunistic superpages for HVM guests.
>> 
>> Couldn't we have the setting be an override for the HVM
>> allocation behavior (defaulting to enabled there), and have
>> the originally intended meaning for PV (disabled by default)?
> 
> I like this idea.  It would be simple enough.
> 
> Is there any reason to allow disabling HVM superpage allocation?

Debugging of certain code paths? Or discriminating certain
(unimportant) VMs?

>  HVM domain 
> creation always allocates as many superpages as it can, then falls back to 
> small pages for the rest.  Wouldn't it be reasonable to make restore always 
> do this too?

Absolutely imo - not having done so from the beginning was
likely just an oversight (but that would need confirmation by
someone more familiar with that code than me).

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issue with PV superpage handling
  2012-06-25 14:38 Issue with PV superpage handling Dave McCracken
  2012-06-25 15:08 ` George Dunlap
@ 2012-07-09  6:02 ` Juergen Gross
  1 sibling, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2012-07-09  6:02 UTC (permalink / raw)
  To: Dave McCracken; +Cc: George Dunlap, Ian Campbell, Xen Developers

Am 25.06.2012 16:38, schrieb Dave McCracken:
>
> Awhile back I added the domain config flag "superpages" to support Linux
> hugepages in PV domains.  When the flag is set, the PV domain is populated
> entirely with superpages.  If not enough superpage-sized chunks can be found,
> the domain creation fails.
>
> At some time after my patch was accepted, the code I added to domain restore
> was removed because I broke page allocation batching.  I put it on my TODO
> list to reimplement it, then it got lost, for which I apologize.
>
> Now I have gotten back to reimplementing PV superpage support in restore, I
> find that recently other code was added to restore that, while triggered by
> the superpage flag, only allocates superpages opportunistically and falls back
> to small pages if it fails.  This breaks the original semantics of the flag
> and could cause any OS that depends on the semantics to fail catastrophically.
>
> I have a patch that implements the original semantics of the superpage flag
> while preserving the batch allocation behavior.  I can remove the competing
> code and submit mine, but I have a question.  What value is there in
> implementing opportunistic allocation of superpages for a PV (or an HVM)
> domain in restore?  It clearly can't be based on the superpages flag.
> Opportunistic superpage allocation is already the default behavior for HVM
> domain creation.  Should it also be a default on HVM restore?  What about for
> PV domains?  Is there any real benefit?

There is a real benefit.

We are seeing severe performance penalties after migrating a HVM domain.
Performance is going down by 10% or more! Our OS (BS2000) is trying to use
superpages where possible. Before live migration I can see that the complete
memory for the domain is allocated in at least 2MB chunks, after the
migration not a single superpage is left.

With EPT this not only makes each TLB-miss more expensive, but there are much
more TLB-misses, as no 2MB TLB-entries are possible at all!


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PDG ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-07-09  6:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-25 14:38 Issue with PV superpage handling Dave McCracken
2012-06-25 15:08 ` George Dunlap
2012-06-25 15:48   ` Jan Beulich
2012-06-25 16:07     ` Dave McCracken
2012-06-25 17:07       ` George Dunlap
2012-06-26  6:52       ` Jan Beulich
2012-07-09  6:02 ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.