All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: vNUMA project
@ 2014-11-11 17:36 Wei Liu
  2014-11-11 18:03 ` David Vrabel
  2014-11-19 11:18 ` George Dunlap
  0 siblings, 2 replies; 13+ messages in thread
From: Wei Liu @ 2014-11-11 17:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Dario Faggioli, wei.liu2, David Vrabel, Jan Beulich

# What's already implemented?

PV vNUMA support in libxl/xl and Linux kernel.

# What's planned but yet implemented?

NUMA-aware ballooning, HVM vNUMA

# How is vNUMA used in toolstack and Xen?

On libxl level, user (xl and other higher level toolstack) can specify
number of vnodes, size of a vnode, vnode to pnode mapping, vcpu to vnode
mapping, and distances for local and remote node.

Then libxl will generate one or more vmemranges for each node. The
need to generate more than one vmemranges is to accommodate memory
holes. One example is to have e820_host=1 in PV guest config file and
allocate to guest more than 4G RAM.

The generated information will also be stored in Xen. It will be
used in two scenarios: to be retrieved by PV guest; to implement
NUMA-aware ballooning.

# How is vNUMA used in guest?

When PV guest boots up, it issues hypercall to retrieve vNUMA
information. Guest is able to retrieve the number of vnodes, size of
each vnode, vcpu to vnode mapping and finally an array of vmemranges.
Guest can massage these pieces of information for its own use.

HVM guest will still use ACPI to initialise NUMA. ACPI table is
arranged by hvmloader.

# NUMA-aware ballooning

It's agreed that NUMA-aware ballooning should be achieved solely in
hypervisor. Everything should happen under the hood without guest
knowing vnode to pnode mapping.

As far as I can tell, existing guests (Linux and FreeBSD) use
XENMEM_populate_physmap to balloon up. There's a hypercall
called XENMEM_increase_reservation but it's not used
by Linux and FreeBSD.

I can think of two options to implement NUMA-aware ballooning:

1. Modify XENMEM_populate_physmap to take into account vNUMA hint
   when it tries to allocate a page for guest.
2. Introduce a new hypercall dedicated to vNUMA ballooning. Its
   functionality is similar to XENMEM_populate_physmap but it's only
   used in ballooning so that we don't break XENMEM_populate_physmap.

Option #1 requires less modification to guest, because guest won't
need to switch to new hypercall. It's unclear at this point if a guest
asks to populate a gpfn that doesn't belong to any vnode, what Xen
should do about it. Should it be permissive or strict? 

If Xen is strict (say, refuse to populate gpfn that doesn't belong to
a vnode), it imposes difficulty in implementing HVM vNUMA. Hvmloader
may try to populate firmware pages which are in a memory hole, and
memory hole doesn't belong to a node.

Option #2, the question would be should Xen be permissive or strict
on guest that uses vNUMA but doesn't use the new hypercall to balloon
up.

# HVM vNUMA

HVM vNUMA is implemented as followed:

1. Libxl generates vNUMA information and passes it to hvmloader.
2. Hvmloader build SRAT table.

Note that hvmloader is capable of relocating memory. This means
toolstack and guest can have different ideas of the memory layout.

This makes NUMA-aware ballooning for HVM guest tricky to implement,
due to the fact toolstack to hvmloader communication is one way, and
hypervisor shares the same view of guest memory layout as
toolstack. Hvmloader should not be allowed to adjust memory layout;
otherwise Xen will use the wrong hinting information and the end
result is certainly wrong.

To have basic HVM vNUMA support, we should disallow memory relocation
and discourage ballooning if vNUMA is enabled in HVM guest. We also
need to disable populate-on-demand as PoD pool in Xen is not
NUMA-aware.

We then can gradually lift these limits when we deicde what to do
about them.

# Planning

There are many moving parts that don't fit well together. I think a
valid strategy is to impose some limitations on vNUMA and other
features, either by restricting in toolstack or in documentation. Then
lift these limitations in different stages.

First stage:

           Basic     PoD   Ballooning  Mem_relocation
PV/PVH       Y       na       X         na
HVM          Y       X        X         X

Implement basic functionality of vNUMA. That is, to boot a guest
(PV/HVM) with vNUMA support.

Second stage:

           Basic     PoD   Ballooning  Mem_relocation
PV/PVH       Y       na       Y         na
HVM          Y       X        Y         X

Implement NUMA-aware ballooning.

Third stage:

           Basic     PoD   Ballooning  Mem_relocation
PV/PVH       Y       na       Y         na
HVM          Y       Y        Y         X

NUMA-aware PoD?

Fourth stage:

           Basic     PoD   Ballooning  Mem_relocation
PV/PVH       Y       na       Y         na
HVM          Y       Y        Y         Y

Implement bi-direction communication mechanism so that we can
allow memory relocation in hvmloader?

Third stages onward are less concrete at this point.

Thoughts?

Wei.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-11 17:36 RFC: vNUMA project Wei Liu
@ 2014-11-11 18:03 ` David Vrabel
  2014-11-12  9:35   ` Jan Beulich
  2014-11-12 12:14   ` Wei Liu
  2014-11-19 11:18 ` George Dunlap
  1 sibling, 2 replies; 13+ messages in thread
From: David Vrabel @ 2014-11-11 18:03 UTC (permalink / raw)
  To: Wei Liu, xen-devel; +Cc: Dario Faggioli, David Vrabel, Jan Beulich

On 11/11/14 17:36, Wei Liu wrote:
> # What's already implemented?
> 
> PV vNUMA support in libxl/xl and Linux kernel.

Linux doesn't have vnuma yet, although the last set of patches I saw
looked fine and were waiting for acks from x86 maintainers I think.

> # NUMA-aware ballooning
> 
> It's agreed that NUMA-aware ballooning should be achieved solely in
> hypervisor. Everything should happen under the hood without guest
> knowing vnode to pnode mapping.
> 
> As far as I can tell, existing guests (Linux and FreeBSD) use
> XENMEM_populate_physmap to balloon up. There's a hypercall
> called XENMEM_increase_reservation but it's not used
> by Linux and FreeBSD.
> 
> I can think of two options to implement NUMA-aware ballooning:
> 
> 1. Modify XENMEM_populate_physmap to take into account vNUMA hint
>    when it tries to allocate a page for guest.
[...]
> Option #1 requires less modification to guest, because guest won't
> need to switch to new hypercall. It's unclear at this point if a guest
> asks to populate a gpfn that doesn't belong to any vnode, what Xen
> should do about it. Should it be permissive or strict? 

There are XENMEMF flags to request exact node or not  -- leave it up to
the balloon driver.  The Linux balloon driver could try exact on all
nodes before falling back to permissive or just always try inexact.

Perhaps a XENMEMF_vnode bit to indicate the node is virtual?

> 
> # HVM vNUMA
> 
> HVM vNUMA is implemented as followed:
> 
> 1. Libxl generates vNUMA information and passes it to hvmloader.
> 2. Hvmloader build SRAT table.
> 
> Note that hvmloader is capable of relocating memory. This means
> toolstack and guest can have different ideas of the memory layout.

Why can't hvmloader update the vnuma tables after it has relocated memory?

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-11 18:03 ` David Vrabel
@ 2014-11-12  9:35   ` Jan Beulich
  2014-11-12 13:45     ` Wei Liu
  2014-11-12 12:14   ` Wei Liu
  1 sibling, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2014-11-12  9:35 UTC (permalink / raw)
  To: David Vrabel, Wei Liu, xen-devel; +Cc: Dario Faggioli

>>> On 11.11.14 at 19:03, <david.vrabel@citrix.com> wrote:
> On 11/11/14 17:36, Wei Liu wrote:
>> Option #1 requires less modification to guest, because guest won't
>> need to switch to new hypercall. It's unclear at this point if a guest
>> asks to populate a gpfn that doesn't belong to any vnode, what Xen
>> should do about it. Should it be permissive or strict? 
> 
> There are XENMEMF flags to request exact node or not  -- leave it up to
> the balloon driver.  The Linux balloon driver could try exact on all
> nodes before falling back to permissive or just always try inexact.
> 
> Perhaps a XENMEMF_vnode bit to indicate the node is virtual?

Yes. The only bad thing here is that we don't currently check in the
hypervisor that unknown bits are zero, i.e. code using the new flag
will need to have a separate means to find out whether the bit is
supported. Not a big deal of course.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-11 18:03 ` David Vrabel
  2014-11-12  9:35   ` Jan Beulich
@ 2014-11-12 12:14   ` Wei Liu
  2014-11-12 14:23     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 13+ messages in thread
From: Wei Liu @ 2014-11-12 12:14 UTC (permalink / raw)
  To: David Vrabel; +Cc: Dario Faggioli, Wei Liu, Jan Beulich, xen-devel

On Tue, Nov 11, 2014 at 06:03:22PM +0000, David Vrabel wrote:
> On 11/11/14 17:36, Wei Liu wrote:
> > # What's already implemented?
> > 
> > PV vNUMA support in libxl/xl and Linux kernel.
> 
> Linux doesn't have vnuma yet, although the last set of patches I saw
> looked fine and were waiting for acks from x86 maintainers I think.
> 

What I meant was I have those implemented but not yet posted. ;-)

> > # NUMA-aware ballooning
> > 
> > It's agreed that NUMA-aware ballooning should be achieved solely in
> > hypervisor. Everything should happen under the hood without guest
> > knowing vnode to pnode mapping.
> > 
> > As far as I can tell, existing guests (Linux and FreeBSD) use
> > XENMEM_populate_physmap to balloon up. There's a hypercall
> > called XENMEM_increase_reservation but it's not used
> > by Linux and FreeBSD.
> > 
> > I can think of two options to implement NUMA-aware ballooning:
> > 
> > 1. Modify XENMEM_populate_physmap to take into account vNUMA hint
> >    when it tries to allocate a page for guest.
> [...]
> > Option #1 requires less modification to guest, because guest won't
> > need to switch to new hypercall. It's unclear at this point if a guest
> > asks to populate a gpfn that doesn't belong to any vnode, what Xen
> > should do about it. Should it be permissive or strict? 
> 
> There are XENMEMF flags to request exact node or not  -- leave it up to
> the balloon driver.  The Linux balloon driver could try exact on all
> nodes before falling back to permissive or just always try inexact.
> 
> Perhaps a XENMEMF_vnode bit to indicate the node is virtual?
> 

Good idea. It should be easy to make it work.

> > 
> > # HVM vNUMA
> > 
> > HVM vNUMA is implemented as followed:
> > 
> > 1. Libxl generates vNUMA information and passes it to hvmloader.
> > 2. Hvmloader build SRAT table.
> > 
> > Note that hvmloader is capable of relocating memory. This means
> > toolstack and guest can have different ideas of the memory layout.
> 
> Why can't hvmloader update the vnuma tables after it has relocated memory?
> 

Because setvnuma is a domctl which cannot be issued by hvmloader.

Wei.

> David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12  9:35   ` Jan Beulich
@ 2014-11-12 13:45     ` Wei Liu
  2014-11-12 14:13       ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Liu @ 2014-11-12 13:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Dario Faggioli, Wei Liu, David Vrabel, xen-devel

On Wed, Nov 12, 2014 at 09:35:01AM +0000, Jan Beulich wrote:
> >>> On 11.11.14 at 19:03, <david.vrabel@citrix.com> wrote:
> > On 11/11/14 17:36, Wei Liu wrote:
> >> Option #1 requires less modification to guest, because guest won't
> >> need to switch to new hypercall. It's unclear at this point if a guest
> >> asks to populate a gpfn that doesn't belong to any vnode, what Xen
> >> should do about it. Should it be permissive or strict? 
> > 
> > There are XENMEMF flags to request exact node or not  -- leave it up to
> > the balloon driver.  The Linux balloon driver could try exact on all
> > nodes before falling back to permissive or just always try inexact.
> > 
> > Perhaps a XENMEMF_vnode bit to indicate the node is virtual?
> 
> Yes. The only bad thing here is that we don't currently check in the
> hypervisor that unknown bits are zero, i.e. code using the new flag
> will need to have a separate means to find out whether the bit is
> supported. Not a big deal of course.
> 

If this new bit is set and domain has vnuma, then it's valid
(supported); otherwise it's not.

To not break existing guests, we can fall back to non-vnuma hinted
allocation when the new bit is set and vnuma is not available.

Wei.

> Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12 13:45     ` Wei Liu
@ 2014-11-12 14:13       ` Jan Beulich
  2014-11-12 14:27         ` Wei Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2014-11-12 14:13 UTC (permalink / raw)
  To: Wei Liu; +Cc: Dario Faggioli, David Vrabel, xen-devel

>>> On 12.11.14 at 14:45, <wei.liu2@citrix.com> wrote:
> On Wed, Nov 12, 2014 at 09:35:01AM +0000, Jan Beulich wrote:
>> >>> On 11.11.14 at 19:03, <david.vrabel@citrix.com> wrote:
>> > On 11/11/14 17:36, Wei Liu wrote:
>> >> Option #1 requires less modification to guest, because guest won't
>> >> need to switch to new hypercall. It's unclear at this point if a guest
>> >> asks to populate a gpfn that doesn't belong to any vnode, what Xen
>> >> should do about it. Should it be permissive or strict? 
>> > 
>> > There are XENMEMF flags to request exact node or not  -- leave it up to
>> > the balloon driver.  The Linux balloon driver could try exact on all
>> > nodes before falling back to permissive or just always try inexact.
>> > 
>> > Perhaps a XENMEMF_vnode bit to indicate the node is virtual?
>> 
>> Yes. The only bad thing here is that we don't currently check in the
>> hypervisor that unknown bits are zero, i.e. code using the new flag
>> will need to have a separate means to find out whether the bit is
>> supported. Not a big deal of course.
>> 
> 
> If this new bit is set and domain has vnuma, then it's valid
> (supported); otherwise it's not.
> 
> To not break existing guests, we can fall back to non-vnuma hinted
> allocation when the new bit is set and vnuma is not available.

While this is valid, none of this was my point - I was talking about a
new guest running on an older hypervisor.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12 12:14   ` Wei Liu
@ 2014-11-12 14:23     ` Konrad Rzeszutek Wilk
  2014-11-12 17:10       ` Wei Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-12 14:23 UTC (permalink / raw)
  To: Wei Liu; +Cc: Dario Faggioli, David Vrabel, Jan Beulich, xen-devel

On Wed, Nov 12, 2014 at 12:14:48PM +0000, Wei Liu wrote:
> On Tue, Nov 11, 2014 at 06:03:22PM +0000, David Vrabel wrote:
> > On 11/11/14 17:36, Wei Liu wrote:
> > > # What's already implemented?
> > > 
> > > PV vNUMA support in libxl/xl and Linux kernel.
> > 
> > Linux doesn't have vnuma yet, although the last set of patches I saw
> > looked fine and were waiting for acks from x86 maintainers I think.
> > 
> 
> What I meant was I have those implemented but not yet posted. ;-)

Are you refering to the patches that Elena posted and that
were Acked? Or is this another set that does things differently?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12 14:13       ` Jan Beulich
@ 2014-11-12 14:27         ` Wei Liu
  2014-11-12 14:29           ` David Vrabel
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Liu @ 2014-11-12 14:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Dario Faggioli, Wei Liu, David Vrabel, xen-devel

On Wed, Nov 12, 2014 at 02:13:09PM +0000, Jan Beulich wrote:
> >>> On 12.11.14 at 14:45, <wei.liu2@citrix.com> wrote:
> > On Wed, Nov 12, 2014 at 09:35:01AM +0000, Jan Beulich wrote:
> >> >>> On 11.11.14 at 19:03, <david.vrabel@citrix.com> wrote:
> >> > On 11/11/14 17:36, Wei Liu wrote:
> >> >> Option #1 requires less modification to guest, because guest won't
> >> >> need to switch to new hypercall. It's unclear at this point if a guest
> >> >> asks to populate a gpfn that doesn't belong to any vnode, what Xen
> >> >> should do about it. Should it be permissive or strict? 
> >> > 
> >> > There are XENMEMF flags to request exact node or not  -- leave it up to
> >> > the balloon driver.  The Linux balloon driver could try exact on all
> >> > nodes before falling back to permissive or just always try inexact.
> >> > 
> >> > Perhaps a XENMEMF_vnode bit to indicate the node is virtual?
> >> 
> >> Yes. The only bad thing here is that we don't currently check in the
> >> hypervisor that unknown bits are zero, i.e. code using the new flag
> >> will need to have a separate means to find out whether the bit is
> >> supported. Not a big deal of course.
> >> 
> > 
> > If this new bit is set and domain has vnuma, then it's valid
> > (supported); otherwise it's not.
> > 
> > To not break existing guests, we can fall back to non-vnuma hinted
> > allocation when the new bit is set and vnuma is not available.
> 
> While this is valid, none of this was my point - I was talking about a
> new guest running on an older hypervisor.
> 

That would not cause breakage. Even if the guest sets this new bit it's
ignored by Xen. Guest can still balloon up, though the end result is
sub-optimal.

Question is, do we want to avoid such sub-optimal result?

Wei.

> Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12 14:27         ` Wei Liu
@ 2014-11-12 14:29           ` David Vrabel
  2014-11-12 14:40             ` Wei Liu
  0 siblings, 1 reply; 13+ messages in thread
From: David Vrabel @ 2014-11-12 14:29 UTC (permalink / raw)
  To: Wei Liu, Jan Beulich; +Cc: Dario Faggioli, xen-devel

On 12/11/14 14:27, Wei Liu wrote:
> On Wed, Nov 12, 2014 at 02:13:09PM +0000, Jan Beulich wrote:
>>>>> On 12.11.14 at 14:45, <wei.liu2@citrix.com> wrote:
>>> On Wed, Nov 12, 2014 at 09:35:01AM +0000, Jan Beulich wrote:
>>>>>>> On 11.11.14 at 19:03, <david.vrabel@citrix.com> wrote:
>>>>> On 11/11/14 17:36, Wei Liu wrote:
>>>>>> Option #1 requires less modification to guest, because guest won't
>>>>>> need to switch to new hypercall. It's unclear at this point if a guest
>>>>>> asks to populate a gpfn that doesn't belong to any vnode, what Xen
>>>>>> should do about it. Should it be permissive or strict? 
>>>>>
>>>>> There are XENMEMF flags to request exact node or not  -- leave it up to
>>>>> the balloon driver.  The Linux balloon driver could try exact on all
>>>>> nodes before falling back to permissive or just always try inexact.
>>>>>
>>>>> Perhaps a XENMEMF_vnode bit to indicate the node is virtual?
>>>>
>>>> Yes. The only bad thing here is that we don't currently check in the
>>>> hypervisor that unknown bits are zero, i.e. code using the new flag
>>>> will need to have a separate means to find out whether the bit is
>>>> supported. Not a big deal of course.
>>>>
>>>
>>> If this new bit is set and domain has vnuma, then it's valid
>>> (supported); otherwise it's not.
>>>
>>> To not break existing guests, we can fall back to non-vnuma hinted
>>> allocation when the new bit is set and vnuma is not available.
>>
>> While this is valid, none of this was my point - I was talking about a
>> new guest running on an older hypervisor.
>>
> 
> That would not cause breakage. Even if the guest sets this new bit it's
> ignored by Xen. Guest can still balloon up, though the end result is
> sub-optimal.

No. Because it will get memory allocated from the specified /physical/
node which would be quite wrong.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12 14:29           ` David Vrabel
@ 2014-11-12 14:40             ` Wei Liu
  2014-11-12 14:54               ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Liu @ 2014-11-12 14:40 UTC (permalink / raw)
  To: David Vrabel; +Cc: Dario Faggioli, Wei Liu, Jan Beulich, xen-devel

On Wed, Nov 12, 2014 at 02:29:56PM +0000, David Vrabel wrote:
> On 12/11/14 14:27, Wei Liu wrote:
> > On Wed, Nov 12, 2014 at 02:13:09PM +0000, Jan Beulich wrote:
> >>>>> On 12.11.14 at 14:45, <wei.liu2@citrix.com> wrote:
> >>> On Wed, Nov 12, 2014 at 09:35:01AM +0000, Jan Beulich wrote:
> >>>>>>> On 11.11.14 at 19:03, <david.vrabel@citrix.com> wrote:
> >>>>> On 11/11/14 17:36, Wei Liu wrote:
> >>>>>> Option #1 requires less modification to guest, because guest won't
> >>>>>> need to switch to new hypercall. It's unclear at this point if a guest
> >>>>>> asks to populate a gpfn that doesn't belong to any vnode, what Xen
> >>>>>> should do about it. Should it be permissive or strict? 
> >>>>>
> >>>>> There are XENMEMF flags to request exact node or not  -- leave it up to
> >>>>> the balloon driver.  The Linux balloon driver could try exact on all
> >>>>> nodes before falling back to permissive or just always try inexact.
> >>>>>
> >>>>> Perhaps a XENMEMF_vnode bit to indicate the node is virtual?
> >>>>
> >>>> Yes. The only bad thing here is that we don't currently check in the
> >>>> hypervisor that unknown bits are zero, i.e. code using the new flag
> >>>> will need to have a separate means to find out whether the bit is
> >>>> supported. Not a big deal of course.
> >>>>
> >>>
> >>> If this new bit is set and domain has vnuma, then it's valid
> >>> (supported); otherwise it's not.
> >>>
> >>> To not break existing guests, we can fall back to non-vnuma hinted
> >>> allocation when the new bit is set and vnuma is not available.
> >>
> >> While this is valid, none of this was my point - I was talking about a
> >> new guest running on an older hypervisor.
> >>
> > 
> > That would not cause breakage. Even if the guest sets this new bit it's
> > ignored by Xen. Guest can still balloon up, though the end result is
> > sub-optimal.
> 
> No. Because it will get memory allocated from the specified /physical/
> node which would be quite wrong.
> 

Fair enough.

So what's the "usual technique" in Linux to make sure if a specific
Xen feature is present?

Jan, is it suitable to use a XENFEAT_* bit for this?

Wei.

> David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12 14:40             ` Wei Liu
@ 2014-11-12 14:54               ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2014-11-12 14:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: DarioFaggioli, David Vrabel, xen-devel

>>> On 12.11.14 at 15:40, <wei.liu2@citrix.com> wrote:
> So what's the "usual technique" in Linux to make sure if a specific
> Xen feature is present?
> 
> Jan, is it suitable to use a XENFEAT_* bit for this?

Yes, that would be the canonical way.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-12 14:23     ` Konrad Rzeszutek Wilk
@ 2014-11-12 17:10       ` Wei Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Wei Liu @ 2014-11-12 17:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Dario Faggioli, Wei Liu, David Vrabel, Jan Beulich, xen-devel

On Wed, Nov 12, 2014 at 09:23:58AM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 12, 2014 at 12:14:48PM +0000, Wei Liu wrote:
> > On Tue, Nov 11, 2014 at 06:03:22PM +0000, David Vrabel wrote:
> > > On 11/11/14 17:36, Wei Liu wrote:
> > > > # What's already implemented?
> > > > 
> > > > PV vNUMA support in libxl/xl and Linux kernel.
> > > 
> > > Linux doesn't have vnuma yet, although the last set of patches I saw
> > > looked fine and were waiting for acks from x86 maintainers I think.
> > > 
> > 
> > What I meant was I have those implemented but not yet posted. ;-)
> 
> Are you refering to the patches that Elena posted and that
> were Acked? Or is this another set that does things differently?

The interface changed last minute before applying to Xen so I rewrote
both the toolstack and kernel patches. They are still patches for PV
vNUMA but do things differently. Most changes are in toolstack part,
kernel patch mainly adapts to the new assumptions I make.

Wei.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: vNUMA project
  2014-11-11 17:36 RFC: vNUMA project Wei Liu
  2014-11-11 18:03 ` David Vrabel
@ 2014-11-19 11:18 ` George Dunlap
  1 sibling, 0 replies; 13+ messages in thread
From: George Dunlap @ 2014-11-19 11:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: Dario Faggioli, David Vrabel, Jan Beulich, xen-devel

On Tue, Nov 11, 2014 at 5:36 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> Third stage:
>
>            Basic     PoD   Ballooning  Mem_relocation
> PV/PVH       Y       na       Y         na
> HVM          Y       Y        Y         X
>
> NUMA-aware PoD?

Hmm, that will certainly be interesting. :-)

The point of PoD is to allocate a chunk of memory at guest creation
time and have the VM balloon down to fit that amount of memory.

If we assume that vnodes correspond to some set of pnodes, then the
initial allocation will (ideally) have to come from *some* subset of
those pnodes; but depending on the situation, it may be any
combinaton.  So for example, a guest with 2 vnodes each with 2GiB each
might end up with 1G on each pnode, or 2 G on one pnode and none on
another.

In this case, the only way to get an ideal memory layout is to
communicate back to the balloon driver how much memory to free on each
virtual node.  If the split is 1G / 1G, then the balloon driver will
need to allocate 1G for each vnode.  If the split was 0.5G / 1.5G,
then it would have to allocate 1.5G / 0.5G, &c.

 -George

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-11-19 11:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-11 17:36 RFC: vNUMA project Wei Liu
2014-11-11 18:03 ` David Vrabel
2014-11-12  9:35   ` Jan Beulich
2014-11-12 13:45     ` Wei Liu
2014-11-12 14:13       ` Jan Beulich
2014-11-12 14:27         ` Wei Liu
2014-11-12 14:29           ` David Vrabel
2014-11-12 14:40             ` Wei Liu
2014-11-12 14:54               ` Jan Beulich
2014-11-12 12:14   ` Wei Liu
2014-11-12 14:23     ` Konrad Rzeszutek Wilk
2014-11-12 17:10       ` Wei Liu
2014-11-19 11:18 ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.