All of lore.kernel.org
 help / color / mirror / Atom feed
* vNUMA for PV guest: kernel and toolstack interaction regarding e820_host=1
@ 2015-02-06 19:32 Wei Liu
  2015-02-06 19:42 ` David Vrabel
  0 siblings, 1 reply; 3+ messages in thread
From: Wei Liu @ 2015-02-06 19:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, David Vrabel, Boris Ostrovsky; +Cc: wei.liu2, xen-devel

Hi all

I encounter a problem that I would like to get some advice. It's PV
specific because of the P2M manipulation is only required by PV.

Current scheme of memory allocation scheme:

1. Libxc populate contiguous chunk of pages and fill in initial P2M. The
   holes in e820 map are in fact filled with pages.

2. Guest kernel reads e820 map from Xen and remap pages in e820 holes if
   there are holes, update P2M as it sees fit. (That is normally true when
   e820_host=1 is set)

This is not very ideal for PV vNUMA, because those pages remapped may
end up in the wrong vnode.

What I have in mind is:

1. Libxc populates pages, but skips e820 holes. The initial P2M is the
   final P2M guest sees.
2. Guest kernel skips remapping. But Linux still needs to setup 1-1
   mapping for holes.

In order to avoid misconfiguration, we would need to introduce a new
feature flag to indicate guest has the ability to skip remapping. Libxc
will check that feature flag when building domain.

Does the above scheme make sense?

Wei.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: vNUMA for PV guest: kernel and toolstack interaction regarding e820_host=1
  2015-02-06 19:32 vNUMA for PV guest: kernel and toolstack interaction regarding e820_host=1 Wei Liu
@ 2015-02-06 19:42 ` David Vrabel
  2015-02-06 20:26   ` Wei Liu
  0 siblings, 1 reply; 3+ messages in thread
From: David Vrabel @ 2015-02-06 19:42 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk, David Vrabel, Boris Ostrovsky; +Cc: xen-devel

On 06/02/15 19:32, Wei Liu wrote:
> Hi all
> 
> I encounter a problem that I would like to get some advice. It's PV
> specific because of the P2M manipulation is only required by PV.
> 
> Current scheme of memory allocation scheme:
> 
> 1. Libxc populate contiguous chunk of pages and fill in initial P2M. The
>    holes in e820 map are in fact filled with pages.
> 
> 2. Guest kernel reads e820 map from Xen and remap pages in e820 holes if
>    there are holes, update P2M as it sees fit. (That is normally true when
>    e820_host=1 is set)
> 
> This is not very ideal for PV vNUMA, because those pages remapped may
> end up in the wrong vnode.
> 
> What I have in mind is:
> 
> 1. Libxc populates pages, but skips e820 holes. The initial P2M is the
>    final P2M guest sees.
> 2. Guest kernel skips remapping. But Linux still needs to setup 1-1
>    mapping for holes.
> 
> In order to avoid misconfiguration, we would need to introduce a new
> feature flag to indicate guest has the ability to skip remapping. Libxc
> will check that feature flag when building domain.
> 
> Does the above scheme make sense?

I really not keen on any additional complexity in PV guest memory setup.
 Particularly as I don't see a long term future for x86 PV guests.  I
also don't think we should bake into the Xen ABI the behaviour of one
particular guest.

Consider how you can fix this purely in the guest.

David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: vNUMA for PV guest: kernel and toolstack interaction regarding e820_host=1
  2015-02-06 19:42 ` David Vrabel
@ 2015-02-06 20:26   ` Wei Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Wei Liu @ 2015-02-06 20:26 UTC (permalink / raw)
  To: David Vrabel; +Cc: Boris Ostrovsky, xen-devel, Wei Liu

On Fri, Feb 06, 2015 at 07:42:15PM +0000, David Vrabel wrote:
> On 06/02/15 19:32, Wei Liu wrote:
> > Hi all
> > 
> > I encounter a problem that I would like to get some advice. It's PV
> > specific because of the P2M manipulation is only required by PV.
> > 
> > Current scheme of memory allocation scheme:
> > 
> > 1. Libxc populate contiguous chunk of pages and fill in initial P2M. The
> >    holes in e820 map are in fact filled with pages.
> > 
> > 2. Guest kernel reads e820 map from Xen and remap pages in e820 holes if
> >    there are holes, update P2M as it sees fit. (That is normally true when
> >    e820_host=1 is set)
> > 
> > This is not very ideal for PV vNUMA, because those pages remapped may
> > end up in the wrong vnode.
> > 
> > What I have in mind is:
> > 
> > 1. Libxc populates pages, but skips e820 holes. The initial P2M is the
> >    final P2M guest sees.
> > 2. Guest kernel skips remapping. But Linux still needs to setup 1-1
> >    mapping for holes.
> > 
> > In order to avoid misconfiguration, we would need to introduce a new
> > feature flag to indicate guest has the ability to skip remapping. Libxc
> > will check that feature flag when building domain.
> > 
> > Does the above scheme make sense?
> 
> I really not keen on any additional complexity in PV guest memory setup.

I agree. I would like to avoid as much complexity as possible. That's
why I ask before implementing anything on guest side.

FWIW the tool stack side already makes sense to me (sans the new feature
flag). It's Linux that I'm not very sure of what to do.

>  Particularly as I don't see a long term future for x86 PV guests.  I
> also don't think we should bake into the Xen ABI the behaviour of one
> particular guest.
> 
> Consider how you can fix this purely in the guest.
> 

Yeah, I guess I can make the memory movement more sensible inside Linux.
That is, take into consideration vNUMA information. There might be other
complex interactions, I will need to get my hands dirty first.

Now my conclusion is that I should proceed with my toolstack side patches
first and fix Linux later.

Wei.

> David

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-02-06 20:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-06 19:32 vNUMA for PV guest: kernel and toolstack interaction regarding e820_host=1 Wei Liu
2015-02-06 19:42 ` David Vrabel
2015-02-06 20:26   ` Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.