[PATCH 00/15] Implement per-vcpu NUMA node-affinity for credit1

* [PATCH 00/15] Implement per-vcpu NUMA node-affinity for credit1
@ 2013-10-03 17:45 Dario Faggioli
  2013-10-03 17:45 ` [PATCH 01/15] xl: update the manpage about "cpus=" and NUMA node-affinity Dario Faggioli
                   ` (15 more replies)
  0 siblings, 16 replies; 34+ messages in thread
From: Dario Faggioli @ 2013-10-03 17:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Marcus Granado, Keir Fraser, Ian Campbell, Li Yechen,
	George Dunlap, Andrew Cooper, Juergen Gross, Ian Jackson,
	Jan Beulich, Justin Weaver, Daniel De Graaf, Matt Wilson,
	Elena Ufimtseva

Hi everyone,

So, this series introduces the concept of per-vcpu NUMA node-affinity. In fact,
up to now, node-affinity has only been "per-domain". That means it was the
domain that had a node-affinity and:
 - that node-affinity was used to decide where to allocate the memory for the
   domain;
 - that node-affinity was used to decide on what nodes _all_ the vcpus of the
   domain prefer to be scheduled.

After this series this changes like this:
 - each vcpu of a domain has (well, may have) its own node-affinity, and that
   is what is used to determine (if the credit1 scheduler is used) where each
   specific vcpu prefers to run;
 - the node-affinity of the whole domain is the _union_ of all the
   node-affinities of the domain's vcpus;
 - the memory is still allocated following what the node-affinity of the whole
   domain (so, the union of vcpu node-affinities, as said above) says.

In practise, it's not such a big change, I'm just extending at the per-vcpu
level what we already had at the domain level. This is also making
node-affinity a lot more similar to vcpu-pinning, both in terms of functioning
and user interface. As a side efect, that simplify the scheduling code (at
least the NUMA-aware part) by quite a bit. Finally, and most important, this is
something that will become really important when we will start to support
virtual NUMA topologies, as, a that point, having the same node-affinity for
all the vcpus in a domain won't be enough any longer (we'll want the vcpus from
a particular vnode to have node-afinity with a particular pnode).

More detailed description of the mechanism and of the implementation choices
are provided in the changelogs and in the documentation (docs/misc and
manpages).

One last thing is that this series relies on some other patches and series that
I sent on xen-devel already, but have not been applied yet.  I'm re-sending
them here, as a part of this series, so feel free to pick them up from here, if
wanting to apply them, or comment on them in this thread, if you want me to
change them.  In particular, patches 01 and 03, I already sent as single
patches, patches 04-07, I already sent them as a series. Sorry if that is a bit
clumsy, but I couldn't find a better way to do it. :-)

In the detailed list of patches below, 'x' means previously submitted, '*'
means already acked/reviewed-by.

Finally, Elena, that is not super important, but perhaps, in the next release
of your vNUMA series, you could try to integrate it with this (and of course,
ask if you need anything while trying to do that).

Matt, if/when you eventually get to release, even as RFC or something like
that, your HVM vNUMA series, we can try to figure out how to integrate that
with this, so to use node-affinity instead than pinning.

The series is also available at the following git coordinates:

 git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-v1
 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/numa/per-vcpu-affinity-v1

Let me know what you think about all this.

Regards,
Dario

PS. Someone of you probably received part of this series as a direct message
(i.e., with your address in 'To', rather than in 'Cc'). I'm sincerely sorry for
that, messed up with `stg mail'... Won't happen again, I promise! :-P

---

Dario Faggioli (15):
 x *  xl: update the manpage about "cpus=" and NUMA node-affinity
      xl: fix a typo in main_vcpulist()
 x *  xen: numa-sched: leave node-affinity alone if not in "auto" mode
 x *  libxl: introduce libxl_node_to_cpumap
 x    xl: allow for node-wise specification of vcpu pinning
 x *  xl: implement and enable dryrun mode for `xl vcpu-pin'
 x    xl: test script for the cpumap parser (for vCPU pinning)
      xen: numa-sched: make space for per-vcpu node-affinity
      xen: numa-sched: domain node-affinity always comes from vcpu node-affinity
      xen: numa-sched: use per-vcpu node-affinity for actual scheduling
      xen: numa-sched: enable getting/specifying per-vcpu node-affinity
      libxc: numa-sched: enable getting/specifying per-vcpu node-affinity
      libxl: numa-sched: enable getting/specifying per-vcpu node-affinity
      xl: numa-sched: enable getting/specifying per-vcpu node-affinity
      xl: numa-sched: enable specifying node-affinity in VM config file

 docs/man/xl.cfg.pod.5                           |   88 ++++
 docs/man/xl.pod.1                               |   25 +
 docs/misc/xl-numa-placement.markdown            |  124 ++++--
 tools/libxc/xc_domain.c                         |   90 ++++-
 tools/libxc/xenctrl.h                           |   19 +
 tools/libxl/check-xl-vcpupin-parse              |  294 +++++++++++++++
 tools/libxl/check-xl-vcpupin-parse.data-example |   53 +++
 tools/libxl/libxl.c                             |   28 +
 tools/libxl/libxl.h                             |   11 +
 tools/libxl/libxl_dom.c                         |   18 +
 tools/libxl/libxl_numa.c                        |   14 -
 tools/libxl/libxl_types.idl                     |    1 
 tools/libxl/libxl_utils.c                       |   22 +
 tools/libxl/libxl_utils.h                       |   15 +
 tools/libxl/xl.h                                |    1 
 tools/libxl/xl_cmdimpl.c                        |  458 +++++++++++++++++++----
 tools/libxl/xl_cmdtable.c                       |   11 -
 xen/common/domain.c                             |   97 ++---
 xen/common/domctl.c                             |   47 ++
 xen/common/keyhandler.c                         |    6 
 xen/common/sched_credit.c                       |   63 ---
 xen/common/schedule.c                           |   55 +++
 xen/include/public/domctl.h                     |    8 
 xen/include/xen/sched-if.h                      |    2 
 xen/include/xen/sched.h                         |   13 +
 xen/xsm/flask/hooks.c                           |    2 
 26 files changed, 1282 insertions(+), 283 deletions(-)
 create mode 100755 tools/libxl/check-xl-vcpupin-parse
 create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 34+ messages in thread