[PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2!

* [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2!
@ 2016-08-17 17:17 Dario Faggioli
  2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
                   ` (26 more replies)
  0 siblings, 27 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Anshul Makkar, Ian Jackson,
	George Dunlap, David Vrabel, Jan Beulich

Hi everyone,

Here's another rather big scheduler-related series. The most of the content (as
usual, lately) is about Credit2, but there are other things, about tracing and
also about Credit1. In fact, this patch series introduces soft-affinity support
for Credit2.

The first 3 patches are indeed bugfixes and performance enhancements for
Credit1. I discovered them while comparing performance and behavior of the two
schedulers, Credit1 and Credit2. In particular, running a Xen build (first
column, lower==>better) and iperf from the VMs to the host (second column,
higher==>better) within 2 8 vCPUs VMs, concurrently, on a 16 vCPUs host,
without (first line) or with (second line) some other load in the system (i.e.,
12 dom0 vCPUs kept artificially busy), produced the following results:

 CREDIT1    MAKEXEN IPERF
---------------------------
 baseline : 28.772  11.354 | (no dom0 load)
 patched  : 28.602+ 11.416+|
---------------------------|
 baseline : 52.852  10.995 | (with dom0 load)
 patched  : 43.788+ 10.405+|
---------------------------

 + marks the best results

So, the patch series improves the situation quite a bit, at least in CPU bound
workload (the Xen build) when running under overload. I suspect the
soft-affinity related bug in __runq_tickle() (fixed by patch 2) to be the
main responsible for this.

Patches 4 to 6 are improvements to Credit1, and patch 7 to both Credit1 and
Credit2.

Then, we find fixes for a few other random things, mostly about tracing, in
patches 8 - 11 (see individual descriptions), and some context switch ratelimit
enhancements (again for both Credit1 and Credit2, but mostly for Credit2), in
patches 12 - 15.

Afterwords, it comes the most important contribution, is the introduction of
the soft-affinity support in Credit2. This happens in steps --i.e., in patches
16 to 20. This approach of introducing the feature with such breakdown follows
what was discussed a while ago here. There's a lot of moving parts and, while
working on implementing this, and revisiting the discussion, I found the
suggestion from George toward this approach to still be a very good one.

Among the soft-affinity patches, 1 is just refactoring, 2 of them are rather
easy, as they sort of follow the same approach always used for implementing
soft-affinity (i.e., in Credit1), which is the two steps load balancing loop.
The 4th patch, the one that touches Credit2's load balancer, is the one that
likely deserves more attention. The basic idea in there is to integrate the
soft-affinity logic inside the Credit2 load balancing framework. I think I've
put enough info in the changelog, and don't want to clobber this spase with
that... But do feel free to ask.

The last 4 patches, still for Credit2, are optimizations, either wrt existing
code, or wrt new code introduced in this series. I've chosen to keep them
separate to make reviewing/understanding new code easier. In fact, although
they look pretty simple, the soft-affinity code was pretty complex already, and
even these simple optimization, if done all at once, would have made the
reviewer's life (unnecessary) tougher.

Numbers are quite good. Actually, they show a really nice picture, IMO. I want
to run more benchmarks, of course, but it looks like we're on the right path.
The benchmarks are the same as above. I'm using credit2_runqueue=socket as it's
proven (in quite a few other benchmarks that I'm not showing for brevity) the
best configuration, at least with my latest series applied (it's in staging
already).

 CREDIT2    MAKEXEN IPERF
---------------------------
 baseline : 31.990  11.689 | (no dom0 load)
 patched  : 27.834+ 12.180+|
---------------------------|
 baseline : 44.628  10.329 | (with dom0 load)
 patched  : 40.272+ 10.904+|
---------------------------

 + marks the best results

So, patches are really really effective in this case. Now, what if we compare
unpatched and patched version of Credit1 and Credit2?  Here we are:

 UNPATCHED  MAKEXEN IPERF
---------------------------
 Credit1  : 28.772+ 11.354 | (no dom0 load)
 Credit2  : 31.990  11.689+|
---------------------------|
 Credit1  : 52.852  10.995+| (with dom0 load)
 Credit2  : 44.628+ 10.329 |
---------------------------

In this use case, the two VMs would fit each one in one node, and hence
soft-affinity can "make his magic", in Credit1 while in Credit2, without this
patch, there's no such thing, and hence Credit1, overall, wins the match. Yes,
Credit2 has an edge on IPERF in the 'no dom0 load' case, but result is very
tight anyway. And Credit1 also does bad in Xen build with load, but that's only
because of the bug.

 PATCHED    MAKEXEN IPERF
---------------------------
 Credit1  : 28.602  11.416 | (no dom0 load)
 Credit2  : 27.834+ 12.180+|
---------------------------|
 Credit1  : 43.788  10.405 | (with dom0 load)
 Credit2  : 40.272+ 10.904+|
---------------------------

OTOH, with this patch series in, i.e., with Credit2 also able to take advantage
of soft-affinity, the game changes. The Iperf results are still very tight and
--although I don't have the std-dev still available-- I've observed them to be
not necessarily always consistent (although, with clearly visible trends, which
are the ones subsumedXX by these numbers I'm reporting). But on CPU workloads,
and especially in overload situations, Credit2 does rather good! :-)

So, this is still a limited set of use cases (and we're working, inside Citrix,
on producing more), but that's why I'm saying that we're on the right path for
making Credit2 usable in production and the new default.

Thanks and Regards,
Dario
---
Dario Faggioli (24):
      xen: credit1: small optimization in Credit1's tickling logic.
      xen: credit1: fix mask to be used for tickling in Credit1
      xen: credit1: return the 'time remaining to the limit' as next timeslice.
      xen: credit2: properly schedule migration of a running vcpu.
      xen: credit2: make tickling more deterministic
      xen: credit2: implement yield()
      xen: sched: don't rate limit context switches in case of yields
      xen: tracing: add trace records for schedule and rate-limiting.
      xen/tools: tracing: improve tracing of context switches.
      xen: tracing: improve Credit2's tickle_check and burn_credits records
      tools: tracing: handle more scheduling related events.
      xen: libxc: allow to set the ratelimit value online
      libxc: improve error handling of xc Credit1 and Credit2 helpers
      libxl: allow to set the ratelimit value online for Credit2
      xl: allow to set the ratelimit value online for Credit2
      xen: sched: factor affinity helpers out of sched_credit.c
      xen: credit2: soft-affinity awareness in runq_tickle()
      xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()
      xen: credit2: soft-affinity awareness in load balancing
      xen: credit2: kick away vcpus not running within their soft-affinity
      xen: credit2: optimize runq_candidate() a little bit
      xen: credit2: "relax" CSCHED2_MAX_TIMER
      xen: credit2: optimize runq_tickle() a little bit
      xen: credit2: try to avoid tickling cpus subject to ratelimiting

 docs/man/xl.pod.1.in                |    9 
 docs/misc/xen-command-line.markdown |   10 
 tools/libxc/include/xenctrl.h       |   32 +
 tools/libxc/xc_csched.c             |   27 -
 tools/libxc/xc_csched2.c            |   59 ++
 tools/libxl/libxl.c                 |  111 +++-
 tools/libxl/libxl.h                 |    4 
 tools/libxl/libxl_types.idl         |    4 
 tools/libxl/xl_cmdimpl.c            |   91 ++-
 tools/libxl/xl_cmdtable.c           |    2 
 tools/xentrace/formats              |   16 -
 tools/xentrace/xenalyze.c           |  133 ++++
 xen/common/sched_credit.c           |  156 ++---
 xen/common/sched_credit2.c          | 1059 +++++++++++++++++++++++++++++------
 xen/common/sched_rt.c               |   15 
 xen/common/schedule.c               |   10 
 xen/include/public/sysctl.h         |   17 -
 xen/include/xen/perfc_defn.h        |    4 
 xen/include/xen/sched-if.h          |   65 ++
 19 files changed, 1444 insertions(+), 380 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread