Re: [PATCH v5 06/24] libxl: introduce vNUMA types

From: Dario Faggioli <dario.faggioli@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: "JBeulich@suse.com" <JBeulich@suse.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"ufimtseva@gmail.com" <ufimtseva@gmail.com>,
	Ian Jackson <Ian.Jackson@citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>
Subject: Re: [PATCH v5 06/24] libxl: introduce vNUMA types
Date: Mon, 16 Feb 2015 16:51:43 +0000	[thread overview]
Message-ID: <1424105501.2591.53.camel@citrix.com> (raw)
In-Reply-To: <20150216161155.GF20572@zion.uk.xensource.com>

[-- Attachment #1.1: Type: text/plain, Size: 4711 bytes --]

On Mon, 2015-02-16 at 16:11 +0000, Wei Liu wrote:
> On Mon, Feb 16, 2015 at 03:56:21PM +0000, Dario Faggioli wrote:
> > On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote:

> > > And there is no way to
> > > specify priority among the group of nodes you specify with a single
> > > bitmap.
> > > 
> > Why do we need such a thing as a 'priority'? What I'm talking about is
> > making it possible, for each vnode, to specify vnode-to-pnode mapping as
> > a bitmap of pnode. What we'd do, in presence of a bitmap, would be
> > allocating the memory by striping it across _all_ the pnodes present in
> > the bitmap.
> > 
> 
> Should we enforce memory equally stripped across all nodes? If so this
> should be stated explicitly in the comment of interface.  
>
I don't think we should enforce anything... I was much rather describing
what happens *right* *now* in that scenario, it being documented or not.

> I can't see
> that in your original description. I ask "priority" because I
> interpreted as something else (which is one of many ways to interpret
> I think).
> 
So, if you're saying that, if we use a bitmap, we should write somewhere
how libxl would use it, I certainly agree. Up to what level of details
we, at that point, should do that, I'm not sure. I think I'd be fine, as
a user, if finding it written that "the memory of the vnode will be
allocated out of the pnodes specified in the bitmap", with no much
further detail, especially considering the use case for the feature.

> If it's up to libxl to make dynamic choice, we should also say that. But
> this is not very useful to user because libxl's algorithm can change
> isn't it? How do users expect to know that across versions of Xen?
> 
Why does he need to? This would be something enabling a bit more of
flexibility, if one wants it, or a bit less worse performance, in some
specific situations, and all this pretty much independently from the
algorithm used inside libxl, I think.

As I said, if there is only 1GB free on all pnodes, the user will be
allowed to specify a set of pnodes for the vnodes, instead of not being
able to use vnuma at all, no matter how libxl (or whoever else) will
actually split the memory, in this, previous or future version of Xen...
This is the scenario I'm talking about, and in such a scenario, knowing
how the split happens, does not really help much, it is just the
_possibility_ of splitting, that helps...

> > If we allow the user (or the automatic placement algorithm) to specify a
> > bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0
> > and #2, which maybe are really close (in terms of NUMA distances) to
> > each other, and vnode #2 to pnode #5 and #6 (close to each others too).
> > This would give worst performance than having each vnode on just one
> > pnode, but, most likely, better performance than the scenario described
> > right above.
> > 
> 
> I get what you mean. So by writing the above paragraphs, you sort of
> confirm that there still are too many implications in the algorithms,
> right? A user cannot just tell from the interface what the behaviour is
> going to be.  
>
An user can tell that, if he wants a vnode 2GB wide, and there is no
pnode with 2GB free, but the sum of free memory in pnode #4 and #6 is >=
2GB, he can still use vNUMA, by paying the (small or high will depends
on more factors) price of having that vnode split in two (or more!).

I think there would be room for some increased user satisfaction in
this, even without knowing much and/or being in control on how exactly
the split happens, as there are chances for performance to be (if the
thing is used properly) better than in the no-vNUMA case, which is what
we're after.

> You can of course say the algorithm is fixed but I don't
> think we want to do that?
> 
I don't want to, but I don't think it's needed.

Anyway, I'm more than ok if we want to defer the discussion to after
this series is in. It will require a further change in the interface,
but I don't think it would be a terrible price to pay, if we decide the
feature is worth.

Or, and that was the other thing I was suggesting, we can have the
bitmap in vnode_info since now, but then only accept ints in xl config
parsing, and enforce the weight of the bitmap to be 1 (and perhaps print
a warning) for now. This would not require changing the API in future,
it'd just be a matter of changing the xl config file parsing. The
"problem" would still stand for libxl callers different than xl, though,
I know.

Regards,
Dario

> Wei.
> 
> > Hope I made myself clear enough :-)
> > 
> > Regards,
> > Dario
> 
> 

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel