From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH 0 of 8] NUMA Awareness for the Credit Scheduler Date: Wed, 10 Oct 2012 12:00:03 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Marcus Granado , Andre Przywara , Ian Campbell , Anil Madhavapeddy , Andrew Cooper , Juergen Gross , Ian Jackson , xen-devel@lists.xen.org, Jan Beulich , Daniel De Graaf , Matt Wilson List-Id: xen-devel@lists.xenproject.org On Fri, Oct 5, 2012 at 3:08 PM, Dario Faggioli wrote: > Hi Everyone, > > Here it comes a patch series instilling some NUMA awareness in the Credit > scheduler. Hey Dario -- I've looked through everything and acked everything I felt I understood well enough / had the authority to ack. Thanks for the good work! -George > > What the patches do is teaching the Xen's scheduler how to try maximizing > performances on a NUMA host, taking advantage of the information coming from > the automatic NUMA placement we have in libxl. Right now, the > placement algorithm runs and selects a node (or a set of nodes) where it is best > to put a new domain on. Then, all the memory for the new domain is allocated > from those node(s) and all the vCPUs of the new domain are pinned to the pCPUs > of those node(s). What we do here is, instead of statically pinning the domain's > vCPUs to the nodes' pCPUs, have the (Credit) scheduler _prefer_ running them > there. That enables most of the performances benefits of "real" pinning, but > without its intrinsic lack of flexibility. > > The above happens by extending to the scheduler the knowledge of a domain's > node-affinity. We then ask it to first try to run the domain's vCPUs on one of > the nodes the domain has affinity with. Of course, if that turns out to be > impossible, it falls back on the old behaviour (i.e., considering vcpu-affinity > only). > > Just allow me to mention that NUMA aware scheduling not only is one of the item > of the NUMA roadmap I'm trying to maintain here > http://wiki.xen.org/wiki/Xen_NUMA_Roadmap. It is also one of the features we > decided we want for Xen 4.3 (and thus it is part of the list of such features > that George is maintaining). > > Up to now, I've been able to thoroughly test this only on my 2 NUMA nodes > testbox, by running the SpecJBB2005 benchmark concurrently on multiple VMs, and > the results looks really nice. A full set of what I got can be found inside my > presentation from last XenSummit, which is available here: > > http://www.slideshare.net/xen_com_mgr/numa-and-virtualization-the-case-of-xen?ref=http://www.xen.org/xensummit/xs12na_talks/T9.html > > However, I rerun some of the tests in these last days (since I changed some > bits of the implementation) and here's what I got: > > ------------------------------------------------------- > SpecJBB2005 Total Aggregate Throughput > ------------------------------------------------------- > #VMs No NUMA affinity NUMA affinity & +/- % > scheduling > ------------------------------------------------------- > 2 34653.273 40243.015 +16.13% > 4 29883.057 35526.807 +18.88% > 6 23512.926 27015.786 +14.89% > 8 19120.243 21825.818 +14.15% > 10 15676.675 17701.472 +12.91% > > Basically, results are consistent with what is shown in the super-nice graphs I > have in the slides above! :-) As said, this looks nice to me, especially > considering that my test machine is quite small, i.e., its 2 nodes are very > close to each others from a latency point of view. I really expect more > improvement on bigger hardware, where much greater NUMA effect is to be > expected. Of course, I myself will continue benchmarking (hopefully, on > systems with more than 2 nodes too), but should anyone want to run its own > testing, that would be great, so feel free to do that and report results to me > and/or to the list! > > A little bit more about the series: > > 1/8 xen, libxc: rename xenctl_cpumap to xenctl_bitmap > 2/8 xen, libxc: introduce node maps and masks > > Is some preparation work. > > 3/8 xen: let the (credit) scheduler know about `node affinity` > > Is where the vcpu load balancing logic of the credit scheduler is modified to > support node-affinity. > > 4/8 xen: allow for explicitly specifying node-affinity > 5/8 libxc: allow for explicitly specifying node-affinity > 6/8 libxl: allow for explicitly specifying node-affinity > 7/8 libxl: automatic placement deals with node-affinity > > Is what wires the in-scheduler node-affinity support with the external world. > Please, note that patch 4 touches XSM and Flask, which is the area with which I > have less experience and less chance to test properly. So, If Daniel and/or > anyone interested in that could take a look and comment, that would be awesome. > > 8/8 xl: report node-affinity for domains > > Is just some small output enhancement. > > Thanks and Regards, > Dario > > -- > <> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel