From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v5 06/24] libxl: introduce vNUMA types Date: Mon, 16 Feb 2015 16:51:43 +0000 Message-ID: <1424105501.2591.53.camel@citrix.com> References: <1423770294-9779-1-git-send-email-wei.liu2@citrix.com> <1423770294-9779-7-git-send-email-wei.liu2@citrix.com> <1424098710.6968.33.camel@citrix.com> <20150216151715.GB20572@zion.uk.xensource.com> <1424102178.2591.12.camel@citrix.com> <20150216161155.GF20572@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1087802824622405706==" Return-path: In-Reply-To: <20150216161155.GF20572@zion.uk.xensource.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: "JBeulich@suse.com" , Andrew Cooper , "xen-devel@lists.xen.org" , "ufimtseva@gmail.com" , Ian Jackson , Ian Campbell List-Id: xen-devel@lists.xenproject.org --===============1087802824622405706== Content-Language: en-US Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-/Jes4P0VOtRGmCeKc0SD" --=-/Jes4P0VOtRGmCeKc0SD Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2015-02-16 at 16:11 +0000, Wei Liu wrote: > On Mon, Feb 16, 2015 at 03:56:21PM +0000, Dario Faggioli wrote: > > On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote: > > > And there is no way to > > > specify priority among the group of nodes you specify with a single > > > bitmap. > > >=20 > > Why do we need such a thing as a 'priority'? What I'm talking about is > > making it possible, for each vnode, to specify vnode-to-pnode mapping a= s > > a bitmap of pnode. What we'd do, in presence of a bitmap, would be > > allocating the memory by striping it across _all_ the pnodes present in > > the bitmap. > >=20 >=20 > Should we enforce memory equally stripped across all nodes? If so this > should be stated explicitly in the comment of interface. =20 > I don't think we should enforce anything... I was much rather describing what happens *right* *now* in that scenario, it being documented or not. > I can't see > that in your original description. I ask "priority" because I > interpreted as something else (which is one of many ways to interpret > I think). >=20 So, if you're saying that, if we use a bitmap, we should write somewhere how libxl would use it, I certainly agree. Up to what level of details we, at that point, should do that, I'm not sure. I think I'd be fine, as a user, if finding it written that "the memory of the vnode will be allocated out of the pnodes specified in the bitmap", with no much further detail, especially considering the use case for the feature. > If it's up to libxl to make dynamic choice, we should also say that. But > this is not very useful to user because libxl's algorithm can change > isn't it? How do users expect to know that across versions of Xen? >=20 Why does he need to? This would be something enabling a bit more of flexibility, if one wants it, or a bit less worse performance, in some specific situations, and all this pretty much independently from the algorithm used inside libxl, I think. As I said, if there is only 1GB free on all pnodes, the user will be allowed to specify a set of pnodes for the vnodes, instead of not being able to use vnuma at all, no matter how libxl (or whoever else) will actually split the memory, in this, previous or future version of Xen... This is the scenario I'm talking about, and in such a scenario, knowing how the split happens, does not really help much, it is just the _possibility_ of splitting, that helps... > > If we allow the user (or the automatic placement algorithm) to specify = a > > bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0 > > and #2, which maybe are really close (in terms of NUMA distances) to > > each other, and vnode #2 to pnode #5 and #6 (close to each others too). > > This would give worst performance than having each vnode on just one > > pnode, but, most likely, better performance than the scenario described > > right above. > >=20 >=20 > I get what you mean. So by writing the above paragraphs, you sort of > confirm that there still are too many implications in the algorithms, > right? A user cannot just tell from the interface what the behaviour is > going to be. =20 > An user can tell that, if he wants a vnode 2GB wide, and there is no pnode with 2GB free, but the sum of free memory in pnode #4 and #6 is >=3D 2GB, he can still use vNUMA, by paying the (small or high will depends on more factors) price of having that vnode split in two (or more!). I think there would be room for some increased user satisfaction in this, even without knowing much and/or being in control on how exactly the split happens, as there are chances for performance to be (if the thing is used properly) better than in the no-vNUMA case, which is what we're after. > You can of course say the algorithm is fixed but I don't > think we want to do that? >=20 I don't want to, but I don't think it's needed. Anyway, I'm more than ok if we want to defer the discussion to after this series is in. It will require a further change in the interface, but I don't think it would be a terrible price to pay, if we decide the feature is worth. Or, and that was the other thing I was suggesting, we can have the bitmap in vnode_info since now, but then only accept ints in xl config parsing, and enforce the weight of the bitmap to be 1 (and perhaps print a warning) for now. This would not require changing the API in future, it'd just be a matter of changing the xl config file parsing. The "problem" would still stand for libxl callers different than xl, though, I know. Regards, Dario > Wei. >=20 > > Hope I made myself clear enough :-) > >=20 > > Regards, > > Dario >=20 >=20 --=-/Jes4P0VOtRGmCeKc0SD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlTiIB0ACgkQk4XaBE3IOsRjawCdEu6GsYQbGHO4xitsAd/2F/6X 4WoAoJdu3/SVaseR0NswFixSeac+0ZPN =Efga -----END PGP SIGNATURE----- --=-/Jes4P0VOtRGmCeKc0SD-- --===============1087802824622405706== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============1087802824622405706==--