From mboxrd@z Thu Jan 1 00:00:00 1970 From: Elena Ufimtseva Subject: Re: [PATCH v10 5/9] libxl: vnuma types declararion Date: Tue, 16 Sep 2014 02:17:13 -0400 Message-ID: References: <1409718258-3276-1-git-send-email-ufimtseva@gmail.com> <1409718258-3276-3-git-send-email-ufimtseva@gmail.com> <1409760274.15505.9.camel@kazak.uk.xensource.com> <1409826927.15057.22.camel@kazak.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1155075995673787206==" Return-path: In-Reply-To: <1409826927.15057.22.camel@kazak.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Keir Fraser , Stefano Stabellini , George Dunlap , Matt Wilson , Dario Faggioli , Li Yechen , Ian Jackson , "xen-devel@lists.xen.org" , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============1155075995673787206== Content-Type: multipart/alternative; boundary=001a11c29f3ecea377050328b30c --001a11c29f3ecea377050328b30c Content-Type: text/plain; charset=UTF-8 On Thu, Sep 4, 2014 at 6:35 AM, Ian Campbell wrote: > On Wed, 2014-09-03 at 23:48 -0400, Elena Ufimtseva wrote: > > > What happens if a guest has 33 vcpus? > > > > > > Or maybe the output of this is a single vcpu number? > > > > > >> + ("vdistance", Array(uint32, "num_vdistance")), > > > > > > nodes to distances? Can num_vdistance ever differ from vnodes? > > > > num_vdistance is a square of vnodes. > > > > > > I thought distances were between two nodes, in which case doesn't this > > > need to be multidimensional? > > > > yes, but parser just fills it on per-row order, treating it as it is a > > multidimensional array. > > I suppose this is a workaround for the lack of multidimensional arrays > in the IDL? > > If this is to be done then it needs a comment I think. In particular the > stride length needs to be clearly defined. > > > > You appear to unconditionally overwrite whatever the users might have > > > put here. > > > > > >> + ("vnuma_autoplacement", libxl_defbool), > > > > > > Why/how does this differ from the existing numa placement option? > > > > > > Basically, its meaning can be briefly described as to try to build > > vnode to pnode mapping if automatic vnuma placement > > enabled, or try to use the user-defined one. > > But I am thinking maybe you are right, and there is no need in that > > vnuma_autoplacement at all. > > Just to remove it, and if numa placement is in automatic mode, just > > use it. If not, try to use vnode to pnode mapping. > > > > Now I think that is what Dario was talking about for some time ) > > > > > > Ok, now going back to these types. > > > > ("vnodes", uint32), > > ("vmemranges", uint32), > > ("vnuma_mem", Array(uint64, "num_vnuma_mem")), > > ("vnuma_vcpumap", Array(uint32, "num_vnuma_vcpumap")), > > ("vdistance", Array(uint32, "num_vdistance")), > > ("vnuma_vnodemap", Array(uint32, "num_vnuma_vnondemap")), > > > > > > vnodes - number of vnuma nodes; > > vmemranges - number of vmemranges (not used, but added support in > > Xen); Not sure how to deal with this. Should I just remove it as its > > not used by tool stack yet? > > I think that would be best, otherwise we risk tying ourselves to an API > which might not actually work when we come to try and use it. > > If you need the concept internally while building you could add such > things to libxl__domain_build_state instead. > > > vnuma_mem - vnode memory sizes (conflicts now with vmemranges); > > vnuma_vcpumap - map of vnuma nodes and vcpus (length is number of > > vcpus, each element is vnuma node number); > > vdistance - distances, num_vdistance = vnodes * vnodes; > > vnuma_vnodemap - mapping of vnuma nodes to physical NUMA nodes; > > > > And vnuma_autoplacement probably not needed here at all. > > > > About vmemranges. They will be needed for vnuma nodes what have > > multiple ranges of memories. > > I added it in type definition to specify support when calling > > hypercall. As of now, PV domain have one memory range per node. > > > > There is libxl__build_vnuma_ranges function which for PV domain > > basically converts vnuma memory node sizes into ranges. > > But for PV domain there is only one range per vnode. > > > > Maybe there is a better way of doing this, maybe even remove confusing > > at this point vmemranges? > > That would help, I think. > > Then I would take the rest and wrap them in a libxl_vnode_info struct: > > libxl_vnode_info = Struct("libxl_vnode_info", [ > ("mem", MemKB), # Size of this node's memory > ("distances", Array(...)), # Distances from this node to the others > (single dimensional array) > ("pnode", TYPE), # which pnode this vnode is associated with > ]) > > Then in the (b|c)_config > ("vnuma_nodes", Array(libxl_vnode_info...)) > Hi Ian Is there a better mechanism to parse first vnuma_nodes, and within it to parse distances array? So lets say config will be looking like this: vnuma_nodes = [ a b c, d e f, g e k,... ] but c, f, k cannot be arrays. Parser does not expect to have one more string withing each element of a topmost string. So this does not work: vnuma_nodes = [ a b "distance array1", d e "distance array2", ...] also this does not work: vnuma_node = [ [], [], [], ..] Looks like that I was trying to avoid with emulating multi dimensional array for distance still persists here. Maybe you know some tricky way to get around this? Thank you! > > This avoids the issue of having to have various num_* which are expected > to always be identical. Except for vnuma_nodes[N].num_distances, but I > think that is unavoidable. > > The only thing I wasn't sure of was how to fit the vcpumap into this. > AIUI this is a map from vcpu number to vnode number (i.e. for each vcpu > it tells you which vnuma node it is part of). I presume it isn't > possible/desirable to have a vcpu be part of multiple vnuma nodes. > > One possibility would be to have a libxl_bitmap/cpumap as part of the > libxl_vnode_info (containing the set of vcpus which are part of this > node) but that would be prone to allowing a vcpu to be in multiple > nodes. > > Another possibility would be an array in (b|c)_config as you have now, > which would be annoying because it's num_foo really ought to be the > ->vcpus count and the IDL doesn't really allow that. > > I'm not sure what is the lesser of the two evils here. > > Ian. > > -- Elena --001a11c29f3ecea377050328b30c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Thu, Sep 4, 2014 at 6:35 AM, Ian Campbell <Ian.Campbell@citri= x.com> wrote:
On Wed, 2014-09-03 at 23:48 -0400, Elena Ufimtseva wrote:
> > What happens if a guest has 33 vcpus?
> >
> > Or maybe the output of this is a single vcpu number?
> >
> >> +=C2=A0 =C2=A0 ("vdistance",=C2=A0 =C2=A0 =C2=A0 = =C2=A0Array(uint32, "num_vdistance")),
> >
> > nodes to distances? Can num_vdistance ever differ from vnodes? >
> num_vdistance is a square of vnodes.
> >
> > I thought distances were between two nodes, in which case doesn&#= 39;t this
> > need to be multidimensional?
>
> yes, but parser just fills it on per-row order, treating it as it is a=
> multidimensional array.

I suppose this is a workaround for the lack of multidimensional arra= ys
in the IDL?

If this is to be done then it needs a comment I think. In particular the stride length needs to be clearly defined.

> > You appear to unconditionally overwrite whatever the users might = have
> > put here.
> >
> >> +=C2=A0 =C2=A0 ("vnuma_autoplacement",=C2=A0 libxl_= defbool),
> >
> > Why/how does this differ from the existing numa placement option?=
>
>
> Basically, its meaning can be briefly described as to try to build
> vnode to pnode mapping if automatic vnuma placement
> enabled, or try to use the user-defined one.
> But I am thinking maybe you are right, and there is no need in that > vnuma_autoplacement at all.
> Just to remove it, and if numa placement is in automatic mode, just > use it. If not, try to use vnode to pnode mapping.
>
> Now I think that is what Dario was talking about for some time )
>
>
> Ok, now going back to these types.
>
>=C2=A0 =C2=A0("vnodes",=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 uin= t32),
>=C2=A0 =C2=A0("vmemranges",=C2=A0 =C2=A0 =C2=A0 uint32),
>=C2=A0 =C2=A0("vnuma_mem",=C2=A0 =C2=A0 =C2=A0 =C2=A0Array(ui= nt64, "num_vnuma_mem")),
>=C2=A0 =C2=A0("vnuma_vcpumap",=C2=A0 =C2=A0Array(uint32, &quo= t;num_vnuma_vcpumap")),
>=C2=A0 =C2=A0("vdistance",=C2=A0 =C2=A0 =C2=A0 =C2=A0Array(ui= nt32, "num_vdistance")),
>=C2=A0 =C2=A0("vnuma_vnodemap",=C2=A0 Array(uint32, "num= _vnuma_vnondemap")),
>
>
> vnodes -=C2=A0 number of vnuma nodes;
> vmemranges - number of vmemranges (not used, but added support in
> Xen); Not sure how to deal with this. Should I just remove it as its > not used by tool stack yet?

I think that would be best, otherwise we risk tying ourselves to an = API
which might not actually work when we come to try and use it.

If you need the concept internally while building you could add such
things to libxl__domain_build_state instead.

> vnuma_mem - vnode memory sizes (conflicts now with vmemranges);
> vnuma_vcpumap - map of vnuma nodes and vcpus (length is number of
> vcpus, each element is vnuma node number);
> vdistance - distances, num_vdistance =3D vnodes * vnodes;
> vnuma_vnodemap - mapping of vnuma nodes to physical NUMA nodes;
>
> And vnuma_autoplacement probably not needed here at all.
>
> About vmemranges. They will be needed for vnuma nodes what have
> multiple ranges of memories.
> I added it in type definition to specify support when calling
> hypercall. As of now, PV domain have one memory range per node.
>
> There is libxl__build_vnuma_ranges function which for PV domain
> basically converts vnuma memory node sizes into ranges.
> But for PV domain there is only one range per vnode.
>
> Maybe there is a better way of doing this, maybe even remove confusing=
> at this point vmemranges?

That would help, I think.

Then I would take the rest and wrap them in a libxl_vnode_info struct:

libxl_vnode_info =3D Struct("libxl_vnode_info", [
=C2=A0 =C2=A0 ("mem", MemKB), # Size of this node's memory =C2=A0 =C2=A0 ("distances", Array(...)), # Distances from this no= de to the others (single dimensional array)
=C2=A0 =C2=A0 ("pnode", TYPE), # which pnode this vnode is associ= ated with
])

Then in the (b|c)_config
=C2=A0 =C2=A0 =C2=A0("vnuma_nodes", Array(libxl_vnode_info...))

Hi Ian

Is there= a better mechanism to parse first vnuma_nodes, and within it to parse dist= ances array?
So lets say config will be looking like this:
<= div>
vnuma_nodes =3D [ a b c, d e f, g e k,... ]
bu= t c, f, k cannot be arrays. Parser does not expect to have one more string = withing each element of a topmost string.

So this = does not work:

vnuma_nodes =C2=A0=3D =C2=A0[ a b &= quot;distance array1", d e "distance array2", ...]
also this does not work:

vnuma_node =3D [ [], [],= [], ..]
Looks like that I was trying to avoid with emulating mul= ti dimensional array for distance still persists here.

=
Maybe you know some tricky way to get around this?

Thank you!

=C2=A0

This avoids the issue of having to have various num_* which are expected to always be identical. Except for vnuma_nodes[N].num_distances, but I
think that is unavoidable.

The only thing I wasn't sure of was how to fit the vcpumap into this. AIUI this is a map from vcpu number to vnode number (i.e. for each vcpu
it tells you which vnuma node it is part of). I presume it isn't
possible/desirable to have a vcpu be part of multiple vnuma nodes.

One possibility would be to have a libxl_bitmap/cpumap as part of the
libxl_vnode_info (containing the set of vcpus which are part of this
node) but that would be prone to allowing a vcpu to be in multiple
nodes.

Another possibility would be an array in (b|c)_config as you have now,
which would be annoying because it's num_foo really ought to be the
->vcpus count and the IDL doesn't really allow that.

I'm not sure what is the lesser of the two evils here.

Ian.




-- Elena
--001a11c29f3ecea377050328b30c-- --===============1155075995673787206== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============1155075995673787206==--