All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2!
@ 2016-08-17 17:17 Dario Faggioli
  2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
                   ` (26 more replies)
  0 siblings, 27 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Anshul Makkar, Ian Jackson,
	George Dunlap, David Vrabel, Jan Beulich

Hi everyone,

Here's another rather big scheduler-related series. The most of the content (as
usual, lately) is about Credit2, but there are other things, about tracing and
also about Credit1. In fact, this patch series introduces soft-affinity support
for Credit2.

The first 3 patches are indeed bugfixes and performance enhancements for
Credit1. I discovered them while comparing performance and behavior of the two
schedulers, Credit1 and Credit2. In particular, running a Xen build (first
column, lower==>better) and iperf from the VMs to the host (second column,
higher==>better) within 2 8 vCPUs VMs, concurrently, on a 16 vCPUs host,
without (first line) or with (second line) some other load in the system (i.e.,
12 dom0 vCPUs kept artificially busy), produced the following results:

 CREDIT1    MAKEXEN IPERF
---------------------------
 baseline : 28.772  11.354 | (no dom0 load)
 patched  : 28.602+ 11.416+|
---------------------------|
 baseline : 52.852  10.995 | (with dom0 load)
 patched  : 43.788+ 10.405+|
---------------------------

 + marks the best results

So, the patch series improves the situation quite a bit, at least in CPU bound
workload (the Xen build) when running under overload. I suspect the
soft-affinity related bug in __runq_tickle() (fixed by patch 2) to be the
main responsible for this.

Patches 4 to 6 are improvements to Credit1, and patch 7 to both Credit1 and
Credit2.

Then, we find fixes for a few other random things, mostly about tracing, in
patches 8 - 11 (see individual descriptions), and some context switch ratelimit
enhancements (again for both Credit1 and Credit2, but mostly for Credit2), in
patches 12 - 15.

Afterwords, it comes the most important contribution, is the introduction of
the soft-affinity support in Credit2. This happens in steps --i.e., in patches
16 to 20. This approach of introducing the feature with such breakdown follows
what was discussed a while ago here. There's a lot of moving parts and, while
working on implementing this, and revisiting the discussion, I found the
suggestion from George toward this approach to still be a very good one.

Among the soft-affinity patches, 1 is just refactoring, 2 of them are rather
easy, as they sort of follow the same approach always used for implementing
soft-affinity (i.e., in Credit1), which is the two steps load balancing loop.
The 4th patch, the one that touches Credit2's load balancer, is the one that
likely deserves more attention. The basic idea in there is to integrate the
soft-affinity logic inside the Credit2 load balancing framework. I think I've
put enough info in the changelog, and don't want to clobber this spase with
that... But do feel free to ask.

The last 4 patches, still for Credit2, are optimizations, either wrt existing
code, or wrt new code introduced in this series. I've chosen to keep them
separate to make reviewing/understanding new code easier. In fact, although
they look pretty simple, the soft-affinity code was pretty complex already, and
even these simple optimization, if done all at once, would have made the
reviewer's life (unnecessary) tougher.

Numbers are quite good. Actually, they show a really nice picture, IMO. I want
to run more benchmarks, of course, but it looks like we're on the right path.
The benchmarks are the same as above. I'm using credit2_runqueue=socket as it's
proven (in quite a few other benchmarks that I'm not showing for brevity) the
best configuration, at least with my latest series applied (it's in staging
already).

 CREDIT2    MAKEXEN IPERF
---------------------------
 baseline : 31.990  11.689 | (no dom0 load)
 patched  : 27.834+ 12.180+|
---------------------------|
 baseline : 44.628  10.329 | (with dom0 load)
 patched  : 40.272+ 10.904+|
---------------------------

 + marks the best results

So, patches are really really effective in this case. Now, what if we compare
unpatched and patched version of Credit1 and Credit2?  Here we are:

 UNPATCHED  MAKEXEN IPERF
---------------------------
 Credit1  : 28.772+ 11.354 | (no dom0 load)
 Credit2  : 31.990  11.689+|
---------------------------|
 Credit1  : 52.852  10.995+| (with dom0 load)
 Credit2  : 44.628+ 10.329 |
---------------------------

In this use case, the two VMs would fit each one in one node, and hence
soft-affinity can "make his magic", in Credit1 while in Credit2, without this
patch, there's no such thing, and hence Credit1, overall, wins the match. Yes,
Credit2 has an edge on IPERF in the 'no dom0 load' case, but result is very
tight anyway. And Credit1 also does bad in Xen build with load, but that's only
because of the bug.

 PATCHED    MAKEXEN IPERF
---------------------------
 Credit1  : 28.602  11.416 | (no dom0 load)
 Credit2  : 27.834+ 12.180+|
---------------------------|
 Credit1  : 43.788  10.405 | (with dom0 load)
 Credit2  : 40.272+ 10.904+|
---------------------------

OTOH, with this patch series in, i.e., with Credit2 also able to take advantage
of soft-affinity, the game changes. The Iperf results are still very tight and
--although I don't have the std-dev still available-- I've observed them to be
not necessarily always consistent (although, with clearly visible trends, which
are the ones subsumedXX by these numbers I'm reporting). But on CPU workloads,
and especially in overload situations, Credit2 does rather good! :-)

So, this is still a limited set of use cases (and we're working, inside Citrix,
on producing more), but that's why I'm saying that we're on the right path for
making Credit2 usable in production and the new default.

Thanks and Regards,
Dario
---
Dario Faggioli (24):
      xen: credit1: small optimization in Credit1's tickling logic.
      xen: credit1: fix mask to be used for tickling in Credit1
      xen: credit1: return the 'time remaining to the limit' as next timeslice.
      xen: credit2: properly schedule migration of a running vcpu.
      xen: credit2: make tickling more deterministic
      xen: credit2: implement yield()
      xen: sched: don't rate limit context switches in case of yields
      xen: tracing: add trace records for schedule and rate-limiting.
      xen/tools: tracing: improve tracing of context switches.
      xen: tracing: improve Credit2's tickle_check and burn_credits records
      tools: tracing: handle more scheduling related events.
      xen: libxc: allow to set the ratelimit value online
      libxc: improve error handling of xc Credit1 and Credit2 helpers
      libxl: allow to set the ratelimit value online for Credit2
      xl: allow to set the ratelimit value online for Credit2
      xen: sched: factor affinity helpers out of sched_credit.c
      xen: credit2: soft-affinity awareness in runq_tickle()
      xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()
      xen: credit2: soft-affinity awareness in load balancing
      xen: credit2: kick away vcpus not running within their soft-affinity
      xen: credit2: optimize runq_candidate() a little bit
      xen: credit2: "relax" CSCHED2_MAX_TIMER
      xen: credit2: optimize runq_tickle() a little bit
      xen: credit2: try to avoid tickling cpus subject to ratelimiting

 docs/man/xl.pod.1.in                |    9 
 docs/misc/xen-command-line.markdown |   10 
 tools/libxc/include/xenctrl.h       |   32 +
 tools/libxc/xc_csched.c             |   27 -
 tools/libxc/xc_csched2.c            |   59 ++
 tools/libxl/libxl.c                 |  111 +++-
 tools/libxl/libxl.h                 |    4 
 tools/libxl/libxl_types.idl         |    4 
 tools/libxl/xl_cmdimpl.c            |   91 ++-
 tools/libxl/xl_cmdtable.c           |    2 
 tools/xentrace/formats              |   16 -
 tools/xentrace/xenalyze.c           |  133 ++++
 xen/common/sched_credit.c           |  156 ++---
 xen/common/sched_credit2.c          | 1059 +++++++++++++++++++++++++++++------
 xen/common/sched_rt.c               |   15 
 xen/common/schedule.c               |   10 
 xen/include/public/sysctl.h         |   17 -
 xen/include/xen/perfc_defn.h        |    4 
 xen/include/xen/sched-if.h          |   65 ++
 19 files changed, 1444 insertions(+), 380 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic.
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
@ 2016-08-17 17:17 ` Dario Faggioli
  2016-09-12 15:01   ` George Dunlap
  2016-08-17 17:17 ` [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1 Dario Faggioli
                   ` (25 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:17 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar, David Vrabel

If, when vcpu x wakes up, there are no idle pcpus in x's
soft-affinity, we just go ahead and look at its hard
affinity. This basically means that, if, in __runq_tickle(),
new_idlers_empty is true, balance_step is equal to
CSCHED_BALANCE_HARD_AFFINITY, and that calling
csched_balance_cpumask() for whatever vcpu, would just
return the vcpu's cpu_hard_affinity.

Therefore, don't bother calling it (it's just pure
overhead) and use cpu_hard_affinity directly.

For this very reason, this patch should only be
a (slight) optimization, and entail no functional
change.

As a side note, it would make sense to do what the
patch does, even if we could be inside the
[[ new_idlers_empty && new->pri > cur->pri ]] if
with balance_step equal to CSCHED_BALANCE_SOFT_AFFINITY.
In fact, what is actually happening is:
 - vcpu x is waking up, and (since there aren't suitable
   idlers, and it's entitled for it) it is preempting
   vcpu y;
 - vcpu y's hard-affinity is a superset of its
   soft-affinity mask.

Therefore, it makes sense to use wider possible mask,
as by doing that, we maximize the probability of
finding an idle pcpu in there, to which we can send
vcpu y, which then will be able to run.

While there, also fix the comment, which included
an awkward parenthesis nesting.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/sched_credit.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 220ff0d..6eccf09 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -424,9 +424,9 @@ static inline void __runq_tickle(struct csched_vcpu *new)
             /*
              * If there are no suitable idlers for new, and it's higher
              * priority than cur, check whether we can migrate cur away.
-             * (We have to do it indirectly, via _VPF_migrating, instead
+             * We have to do it indirectly, via _VPF_migrating (instead
              * of just tickling any idler suitable for cur) because cur
-             * is running.)
+             * is running.
              *
              * If there are suitable idlers for new, no matter priorities,
              * leave cur alone (as it is running and is, likely, cache-hot)
@@ -435,9 +435,7 @@ static inline void __runq_tickle(struct csched_vcpu *new)
              */
             if ( new_idlers_empty && new->pri > cur->pri )
             {
-                csched_balance_cpumask(cur->vcpu, balance_step,
-                                       cpumask_scratch_cpu(cpu));
-                if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+                if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity,
                                         &idle_mask) )
                 {
                     SCHED_VCPU_STAT_CRANK(cur, kicked_away);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
  2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
@ 2016-08-17 17:17 ` Dario Faggioli
  2016-08-17 23:42   ` Dario Faggioli
  2016-08-17 17:17 ` [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice Dario Faggioli
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:17 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, David Vrabel, George Dunlap

If there are idle pcpus inside the waking vcpu's
soft-affinity mask, we should really tickle one
of them (this is one of the purposes of the
__runq_tickle() function itself!), not just
any idle pcpu.

The issue has been introduced in 02ea5031825d
("credit1: properly deal with pCPUs not in any cpupool"),
where the usage of idle_mask is changed, without
updating the bottom of the function, where it
is also referenced.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.eu.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
David, a while ago you asked what could have been that was causing awful
results for Credit1, for CPU bound workloads, in the overloaded scenario of one
of my benchmarks. I think the bug fixed either here or in next patch (but I'd
be rather sure it's this one) is where the problem was. :-)
---
 xen/common/sched_credit.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 6eccf09..3d4f223 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -454,11 +454,12 @@ static inline void __runq_tickle(struct csched_vcpu *new)
                 if ( opt_tickle_one_idle )
                 {
                     this_cpu(last_tickle_cpu) =
-                        cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
+                        cpumask_cycle(this_cpu(last_tickle_cpu),
+                                      cpumask_scratch_cpu(cpu));
                     __cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
                 }
                 else
-                    cpumask_or(&mask, &mask, &idle_mask);
+                    cpumask_or(&mask, &mask, cpumask_scratch_cpu(cpu));
             }
 
             /* Did we find anyone? */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice.
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
  2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
  2016-08-17 17:17 ` [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1 Dario Faggioli
@ 2016-08-17 17:17 ` Dario Faggioli
  2016-09-12 15:14   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu Dario Faggioli
                   ` (23 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:17 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

If vcpu x has run for 200us, and sched_ratelimit_us is
1000us, continue running x _but_ return 1000us-200us as
the next time slice. This way, next scheduling point will
happen in 800us, i.e., exactly at the point when x crosses
the threshold, and can be descheduled (if appropriate).

Right now (without this patch), we're always returning
sched_ratelimit_us (1000us, in the example above), which
means we're (potentially) allowing x to run more than
it should have been able to (even when considering rate
limiting into account).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/sched_credit.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 3d4f223..3f439a0 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1782,7 +1782,7 @@ csched_schedule(
         snext = scurr;
         snext->start_time += now;
         perfc_incr(delay_ms);
-        tslice = MICROSECS(prv->ratelimit_us);
+        tslice = MICROSECS(prv->ratelimit_us) - runtime;
         ret.migrated = 0;
         goto out;
     }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu.
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (2 preceding siblings ...)
  2016-08-17 17:17 ` [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-09-12 17:11   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 05/24] xen: credit2: make tickling more deterministic Dario Faggioli
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

If wanting to migrate a vcpu that is actually running,
we need to ask the scheduler to chime in as soon as
possible, to have the vcpu itself stopped and actually
moved.

Make sure this happens by, after setting all the relevant
flags, raising the scheduler softirq.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index a5a744f..12dfd20 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1667,6 +1667,7 @@ static void migrate(const struct scheduler *ops,
         svc->migrate_rqd = trqd;
         __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
         __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
+        cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
         SCHED_STAT_CRANK(migrate_requested);
     }
     else


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (3 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-08-31 17:10   ` anshul makkar
  2016-09-13 11:28   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 06/24] xen: credit2: implement yield() Dario Faggioli
                   ` (21 subsequent siblings)
  26 siblings, 2 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Anshul Makkar, George Dunlap, Jan Beulich

Right now, the following scenario can occurr:
 - upon vcpu v wakeup, v itself is put in the runqueue,
   and pcpu X is tickled;
 - pcpu Y schedules (for whatever reason), sees v in
   the runqueue and picks it up.

This may seem ok (or even a good thing), but it's not.
In fact, if runq_tickle() decided X is where v should
run, it did it for a reason (load distribution, SMT
support, cache hotness, affinity, etc), and we really
should try as hard as possible to stick to that.

Of course, we can't be too strict, or we risk leaving
vcpus in the runqueue while there is available CPU
capacity. So, we only leave v in runqueue --for X to
pick it up-- if we see that X has been tickled and
has not scheduled yet, i.e., it will have a real chance
of actually select and schedule v.

If that is not the case, we schedule it on Y (or, at
least, we consider that), as running somewhere non-ideal
is better than not running at all.

The commit also adds performance counters for each of
the possible situations.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/common/sched_credit2.c   |   65 +++++++++++++++++++++++++++++++++++++++---
 xen/include/xen/perfc_defn.h |    3 ++
 2 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 12dfd20..a3d7beb 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -54,6 +54,7 @@
 #define TRC_CSCHED2_LOAD_CHECK       TRC_SCHED_CLASS_EVT(CSCHED2, 16)
 #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2, 17)
 #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2, 19)
+#define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2, 20)
 
 /*
  * WARNING: This is still in an experimental phase.  Status and work can be found at the
@@ -398,6 +399,7 @@ struct csched2_vcpu {
     int credit;
     s_time_t start_time; /* When we were scheduled (used for credit) */
     unsigned flags;      /* 16 bits doesn't seem to play well with clear_bit() */
+    int tickled_cpu;     /* cpu tickled for picking us up (-1 if none) */
 
     /* Individual contribution to load */
     s_time_t load_last_update;  /* Last time average was updated */
@@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
     __cpumask_set_cpu(ipid, &rqd->tickled);
     smt_idle_mask_clear(ipid, &rqd->smt_idle);
     cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
+
+    if ( unlikely(new->tickled_cpu != -1) )
+        SCHED_STAT_CRANK(tickled_cpu_overwritten);
+    new->tickled_cpu = ipid;
 }
 
 /*
@@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
         ASSERT(svc->sdom != NULL);
         svc->credit = CSCHED2_CREDIT_INIT;
         svc->weight = svc->sdom->weight;
+        svc->tickled_cpu = -1;
         /* Starting load of 50% */
         svc->avgload = 1ULL << (CSCHED2_PRIV(ops)->load_precision_shift - 1);
         svc->load_last_update = NOW() >> LOADAVG_GRANULARITY_SHIFT;
@@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
     else
     {
         ASSERT(svc->sdom == NULL);
+        svc->tickled_cpu = svc->vcpu->vcpu_id;
         svc->credit = CSCHED2_IDLE_CREDIT;
         svc->weight = 0;
     }
@@ -2233,7 +2241,8 @@ void __dump_execstate(void *unused);
 static struct csched2_vcpu *
 runq_candidate(struct csched2_runqueue_data *rqd,
                struct csched2_vcpu *scurr,
-               int cpu, s_time_t now)
+               int cpu, s_time_t now,
+               unsigned int *pos)
 {
     struct list_head *iter;
     struct csched2_vcpu *snext = NULL;
@@ -2262,13 +2271,29 @@ runq_candidate(struct csched2_runqueue_data *rqd,
 
         /* Only consider vcpus that are allowed to run on this processor. */
         if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+        {
+            (*pos)++;
             continue;
+        }
+
+        /*
+         * If a vcpu is meant to be picked up by another processor, and such
+         * processor has not scheduled yet, leave it in the runqueue for him.
+         */
+        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
+             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+        {
+            (*pos)++;
+            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
+            continue;
+        }
 
         /* If this is on a different processor, don't pull it unless
          * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
         if ( svc->vcpu->processor != cpu
              && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
         {
+            (*pos)++;
             SCHED_STAT_CRANK(migrate_resisted);
             continue;
         }
@@ -2280,9 +2305,26 @@ runq_candidate(struct csched2_runqueue_data *rqd,
 
         /* In any case, if we got this far, break. */
         break;
+    }
 
+    if ( unlikely(tb_init_done) )
+    {
+        struct {
+            unsigned vcpu:16, dom:16;
+            unsigned tickled_cpu, position;
+        } d;
+        d.dom = snext->vcpu->domain->domain_id;
+        d.vcpu = snext->vcpu->vcpu_id;
+        d.tickled_cpu = snext->tickled_cpu;
+        d.position = *pos;
+        __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
     }
 
+    if ( unlikely(snext->tickled_cpu != -1 && snext->tickled_cpu != cpu) )
+        SCHED_STAT_CRANK(tickled_cpu_overridden);
+
     return snext;
 }
 
@@ -2298,6 +2340,7 @@ csched2_schedule(
     struct csched2_runqueue_data *rqd;
     struct csched2_vcpu * const scurr = CSCHED2_VCPU(current);
     struct csched2_vcpu *snext = NULL;
+    unsigned int snext_pos = 0;
     struct task_slice ret;
 
     SCHED_STAT_CRANK(schedule);
@@ -2347,7 +2390,7 @@ csched2_schedule(
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
     }
     else
-        snext=runq_candidate(rqd, scurr, cpu, now);
+        snext = runq_candidate(rqd, scurr, cpu, now, &snext_pos);
 
     /* If switching from a non-idle runnable vcpu, put it
      * back on the runqueue. */
@@ -2371,8 +2414,21 @@ csched2_schedule(
             __set_bit(__CSFLAG_scheduled, &snext->flags);
         }
 
-        /* Check for the reset condition */
-        if ( snext->credit <= CSCHED2_CREDIT_RESET )
+        /*
+         * The reset condition is "has a scheduler epoch come to an end?".
+         * The way this is enforced is checking whether the vcpu at the top
+         * of the runqueue has negative credits. This means the epochs have
+         * variable lenght, as in one epoch expores when:
+         *  1) the vcpu at the top of the runqueue has executed for
+         *     around 10 ms (with default parameters);
+         *  2) no other vcpu with higher credits wants to run.
+         *
+         * Here, where we want to check for reset, we need to make sure the
+         * proper vcpu is being used. In fact, runqueue_candidate() may have
+         * not returned the first vcpu in the runqueue, for various reasons
+         * (e.g., affinity). Only trigger a reset when it does.
+         */
+        if ( snext_pos == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
         {
             reset_credit(ops, cpu, now, snext);
             balance_load(ops, cpu, now);
@@ -2386,6 +2442,7 @@ csched2_schedule(
         }
 
         snext->start_time = now;
+        snext->tickled_cpu = -1;
 
         /* Safe because lock for old processor is held */
         if ( snext->vcpu->processor != cpu )
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index a336c71..4a835b8 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -66,6 +66,9 @@ PERFCOUNTER(runtime_max_timer,      "csched2: runtime_max_timer")
 PERFCOUNTER(migrated,               "csched2: migrated")
 PERFCOUNTER(migrate_resisted,       "csched2: migrate_resisted")
 PERFCOUNTER(credit_reset,           "csched2: credit_reset")
+PERFCOUNTER(deferred_to_tickled_cpu,"csched2: deferred_to_tickled_cpu")
+PERFCOUNTER(tickled_cpu_overwritten,"csched2: tickled_cpu_overwritten")
+PERFCOUNTER(tickled_cpu_overridden, "csched2: tickled_cpu_overridden")
 
 PERFCOUNTER(need_flush_tlb_flush,   "PG_need_flush tlb flushes")
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 06/24] xen: credit2: implement yield()
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (4 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 05/24] xen: credit2: make tickling more deterministic Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-09-13 13:33   ` George Dunlap
  2016-09-20 13:25   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields Dario Faggioli
                   ` (20 subsequent siblings)
  26 siblings, 2 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Anshul Makkar, Jan Beulich

When a vcpu explicitly yields it is usually giving
us an advice of "let someone else run and come back
to me in a bit."

Credit2 isn't, so far, doing anything when a vcpu
yields, which means an yield is basically a NOP (well,
actually, it's pure overhead, as it causes the scheduler
kick in, but the result is --at least 99% of the time--
that the very same vcpu that yielded continues to run).

Implement a "preempt bias", to be applied to yielding
vcpus. Basically when evaluating what vcpu to run next,
if a vcpu that has just yielded is encountered, we give
it a credit penalty, and check whether there is anyone
else that would better take over the cpu (of course,
if there isn't the yielding vcpu will continue).

The value of this bias can be configured with a boot
time parameter, and the default is set to 1 ms.

Also, add an yield performance counter, and fix the
style of a couple of comments.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Note that this *only* consider the bias during the very scheduling decision
that retults from the vcpu calling yield. After that, the __CSFLAG_vcpu_yield
flag is reset, and during all furute scheduling decisions, the vcpu will
compete with the other ones with its own amount of credits.

Alternatively, we can actually _subtract_ some credits to a yielding vcpu.
That will sort of make the effect of a call to yield last in time.

I'm not sure which path is best. Personally, I like the subtract approach
(perhaps, with a smaller bias than 1ms), but I think the "one shot" behavior
implemented here is a good starting point. It is _something_, which is better
than nothing, which is what we have without this patch! :-) It's lightweight
(in its impact on the crediting algorithm, I mean), and benchmarks looks nice,
so I propose we go for this one, and explore the "permanent" --subtraction
based-- solution a bit more.
---
 docs/misc/xen-command-line.markdown |   10 ++++++
 xen/common/sched_credit2.c          |   62 +++++++++++++++++++++++++++++++----
 xen/common/schedule.c               |    2 +
 xen/include/xen/perfc_defn.h        |    1 +
 4 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 3a250cb..5f469b1 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1389,6 +1389,16 @@ Choose the default scheduler.
 ### sched\_credit2\_migrate\_resist
 > `= <integer>`
 
+### sched\_credit2\_yield\_bias
+> `= <integer>`
+
+> Default: `1000`
+
+Set how much a yielding vcpu will be penalized, in order to actually
+give a chance to run to some other vcpu. This is basically a bias, in
+favour of the non-yielding vcpus, expressed in microseconds (default
+is 1ms).
+
 ### sched\_credit\_tslice\_ms
 > `= <integer>`
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index a3d7beb..569174b 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -144,6 +144,9 @@
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
 /* How much to "compensate" a vcpu for L2 migration */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
+/* How big of a bias we should have against a yielding vcpu */
+#define CSCHED2_YIELD_BIAS           ((opt_yield_bias)*MICROSECS(1))
+#define CSCHED2_YIELD_BIAS_MIN       CSCHED2_MIN_TIMER
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
 /* Max timer: Maximum time a guest can be run for. */
@@ -181,11 +184,20 @@
  */
 #define __CSFLAG_runq_migrate_request 3
 #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
-
+/*
+ * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The
+ * scheduler is invoked to see if we can give the cpu to someone else, and
+ * get back to the yielding vcpu in a while.
+ */
+#define __CSFLAG_vcpu_yield 4
+#define CSFLAG_vcpu_yield (1<<__CSFLAG_vcpu_yield)
 
 static unsigned int __read_mostly opt_migrate_resist = 500;
 integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
 
+static unsigned int __read_mostly opt_yield_bias = 1000;
+integer_param("sched_credit2_yield_bias", opt_yield_bias);
+
 /*
  * Useful macros
  */
@@ -1432,6 +1444,14 @@ out:
 }
 
 static void
+csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v)
+{
+    struct csched2_vcpu * const svc = CSCHED2_VCPU(v);
+
+    __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
+}
+
+static void
 csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
@@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     struct list_head *iter;
     struct csched2_vcpu *snext = NULL;
     struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
+    int yield_bias = 0;
 
     /* Default to current if runnable, idle otherwise */
     if ( vcpu_runnable(scurr->vcpu) )
+    {
+        /*
+         * The way we actually take yields into account is like this:
+         * if scurr is yielding, when comparing its credits with other
+         * vcpus in the runqueue, act like those other vcpus had yield_bias
+         * more credits.
+         */
+        if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) )
+            yield_bias = CSCHED2_YIELD_BIAS;
+
         snext = scurr;
+    }
     else
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
 
@@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
+        int svc_credit = svc->credit + yield_bias;
 
         /* Only consider vcpus that are allowed to run on this processor. */
         if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
@@ -2288,19 +2321,23 @@ runq_candidate(struct csched2_runqueue_data *rqd,
             continue;
         }
 
-        /* If this is on a different processor, don't pull it unless
-         * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
+        /*
+         * If this is on a different processor, don't pull it unless
+         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+         */
         if ( svc->vcpu->processor != cpu
-             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+             && snext->credit + CSCHED2_MIGRATE_RESIST > svc_credit )
         {
             (*pos)++;
             SCHED_STAT_CRANK(migrate_resisted);
             continue;
         }
 
-        /* If the next one on the list has more credit than current
-         * (or idle, if current is not runnable), choose it. */
-        if ( svc->credit > snext->credit )
+        /*
+         * If the next one on the list has more credit than current
+         * (or idle, if current is not runnable), choose it.
+         */
+        if ( svc_credit > snext->credit )
             snext = svc;
 
         /* In any case, if we got this far, break. */
@@ -2399,6 +2436,8 @@ csched2_schedule(
          && vcpu_runnable(current) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
+    __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+
     ret.migrated = 0;
 
     /* Accounting for non-idle tasks */
@@ -2918,6 +2957,14 @@ csched2_init(struct scheduler *ops)
     printk(XENLOG_INFO "load tracking window lenght %llu ns\n",
            1ULL << opt_load_window_shift);
 
+    if ( opt_yield_bias < CSCHED2_YIELD_BIAS_MIN )
+    {
+        printk("WARNING: %s: opt_yield_bias %d too small, resetting\n",
+               __func__, opt_yield_bias);
+        opt_yield_bias = 1000; /* 1 ms */
+    }
+    printk(XENLOG_INFO "yield bias value %d us\n", opt_yield_bias);
+
     /* Basically no CPU information is available at this point; just
      * set up basic structures, and a callback when the CPU info is
      * available. */
@@ -2970,6 +3017,7 @@ static const struct scheduler sched_credit2_def = {
 
     .sleep          = csched2_vcpu_sleep,
     .wake           = csched2_vcpu_wake,
+    .yield          = csched2_vcpu_yield,
 
     .adjust         = csched2_dom_cntl,
     .adjust_global  = csched2_sys_cntl,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 32a300f..abe063d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -947,6 +947,8 @@ long vcpu_yield(void)
     SCHED_OP(VCPU2OP(v), yield, v);
     vcpu_schedule_unlock_irq(lock, v);
 
+    SCHED_STAT_CRANK(vcpu_yield);
+
     TRACE_2D(TRC_SCHED_YIELD, current->domain->domain_id, current->vcpu_id);
     raise_softirq(SCHEDULE_SOFTIRQ);
     return 0;
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 4a835b8..900fddd 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -23,6 +23,7 @@ PERFCOUNTER(vcpu_alloc,             "sched: vcpu_alloc")
 PERFCOUNTER(vcpu_insert,            "sched: vcpu_insert")
 PERFCOUNTER(vcpu_remove,            "sched: vcpu_remove")
 PERFCOUNTER(vcpu_sleep,             "sched: vcpu_sleep")
+PERFCOUNTER(vcpu_yield,             "sched: vcpu_yield")
 PERFCOUNTER(vcpu_wake_running,      "sched: vcpu_wake_running")
 PERFCOUNTER(vcpu_wake_onrunq,       "sched: vcpu_wake_onrunq")
 PERFCOUNTER(vcpu_wake_runnable,     "sched: vcpu_wake_runnable")


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (5 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 06/24] xen: credit2: implement yield() Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-09-20 13:32   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting Dario Faggioli
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar

In both Credit1 and Credit2, if a vcpu yields, let it...
well... yield!

In fact, context switch rate limiting has been primarily
introduced to avoid too heavy context switch rate due to
interrupts, and, in general, asynchronous events.

In a vcpu "voluntarily" yields, we really should let it
give up the cpu for a while. For instance, the reason may
be that it's about to start spinning, and there's few
point in forcing a vcpu to spin for (potentially) the
entire rate-limiting period.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit.c  |   20 +++++++++++--------
 xen/common/sched_credit2.c |   47 +++++++++++++++++++++++---------------------
 2 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 3f439a0..ca04732 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1771,9 +1771,18 @@ csched_schedule(
      *   cpu and steal it.
      */
 
-    /* If we have schedule rate limiting enabled, check to see
-     * how long we've run for. */
-    if ( !tasklet_work_scheduled
+    /*
+     * If we have schedule rate limiting enabled, check to see
+     * how long we've run for.
+     *
+     * If scurr is yielding, however, we don't let rate limiting kick in.
+     * In fact, it may be the case that scurr is about to spin, and there's
+     * no point forcing it to do so until rate limiting expires.
+     *
+     * While there, take the chance for clearing the yield flag at once.
+     */
+    if ( !test_and_clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags)
+         && !tasklet_work_scheduled
          && prv->ratelimit_us
          && vcpu_runnable(current)
          && !is_idle_vcpu(current)
@@ -1808,11 +1817,6 @@ csched_schedule(
     }
 
     /*
-     * Clear YIELD flag before scheduling out
-     */
-    clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags);
-
-    /*
      * SMP Load balance:
      *
      * If the next highest priority local runnable VCPU has already eaten
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 569174b..c8e0ee7 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2267,36 +2267,40 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     struct list_head *iter;
     struct csched2_vcpu *snext = NULL;
     struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
-    int yield_bias = 0;
-
-    /* Default to current if runnable, idle otherwise */
-    if ( vcpu_runnable(scurr->vcpu) )
-    {
-        /*
-         * The way we actually take yields into account is like this:
-         * if scurr is yielding, when comparing its credits with other
-         * vcpus in the runqueue, act like those other vcpus had yield_bias
-         * more credits.
-         */
-        if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) )
-            yield_bias = CSCHED2_YIELD_BIAS;
-
-        snext = scurr;
-    }
-    else
-        snext = CSCHED2_VCPU(idle_vcpu[cpu]);
+    /*
+     * The way we actually take yields into account is like this:
+     * if scurr is yielding, when comparing its credits with other vcpus in
+     * the runqueue, act like those other vcpus had yield_bias more credits.
+     */
+    int yield_bias = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) ?
+                     CSCHED2_YIELD_BIAS : 0;
 
     /*
      * Return the current vcpu if it has executed for less than ratelimit.
      * Adjuststment for the selected vcpu's credit and decision
      * for how long it will run will be taken in csched2_runtime.
+     *
+     * Note that, if scurr is yielding, we don't let rate limiting kick in.
+     * In fact, it may be the case that scurr is about to spin, and there's
+     * no point forcing it to do so until rate limiting expires.
+     *
+     * To check whether we are yielding, it's enough to look at yield_bias
+     * (as CSCHED2_YIELD_BIAS can't be zero). Also, note that the yield flag
+     * has been cleared already above.
      */
-    if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
+    if ( !yield_bias &&
+         prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
          vcpu_runnable(scurr->vcpu) &&
          (now - scurr->vcpu->runstate.state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
         return scurr;
 
+    /* Default to current if runnable, idle otherwise */
+    if ( vcpu_runnable(scurr->vcpu) )
+        snext = scurr;
+    else
+        snext = CSCHED2_VCPU(idle_vcpu[cpu]);
+
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
@@ -2423,7 +2427,8 @@ csched2_schedule(
      */
     if ( tasklet_work_scheduled )
     {
-        trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0,  NULL);
+        __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+        trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
     }
     else
@@ -2436,8 +2441,6 @@ csched2_schedule(
          && vcpu_runnable(current) )
         __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
 
-    __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
-
     ret.migrated = 0;
 
     /* Accounting for non-idle tasks */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting.
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (6 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-08-18  0:57   ` Meng Xu
  2016-09-20 13:50   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 09/24] xen/tools: tracing: improve tracing of context switches Dario Faggioli
                   ` (18 subsequent siblings)
  26 siblings, 2 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar, Meng Xu

As far as {csched, csched2, rt}_schedule() are concerned,
an "empty" event, would already make it easier to read and
understand a trace.

But while there, add a few useful information, like
if the cpu that is going through the scheduler has
been tickled or not, if it is currently idle, etc
(they vary, on a per-scheduler basis).

For Credit1 and Credit2, add a record about when
rate-limiting kicks in too.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit.c  |    7 +++++++
 xen/common/sched_credit2.c |   38 +++++++++++++++++++++++++++++++++++++-
 xen/common/sched_rt.c      |   15 +++++++++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index ca04732..f9d3ac9 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -134,6 +134,8 @@
 #define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
 #define TRC_CSCHED_BOOST_START   TRC_SCHED_CLASS_EVT(CSCHED, 7)
 #define TRC_CSCHED_BOOST_END     TRC_SCHED_CLASS_EVT(CSCHED, 8)
+#define TRC_CSCHED_SCHEDULE      TRC_SCHED_CLASS_EVT(CSCHED, 9)
+#define TRC_CSCHED_RATELIMIT     TRC_SCHED_CLASS_EVT(CSCHED, 10)
 
 
 /*
@@ -1743,6 +1745,9 @@ csched_schedule(
     SCHED_STAT_CRANK(schedule);
     CSCHED_VCPU_CHECK(current);
 
+    TRACE_3D(TRC_CSCHED_SCHEDULE, cpu, tasklet_work_scheduled,
+             is_idle_vcpu(current));
+
     runtime = now - current->runstate.state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
@@ -1792,6 +1797,8 @@ csched_schedule(
         snext->start_time += now;
         perfc_incr(delay_ms);
         tslice = MICROSECS(prv->ratelimit_us) - runtime;
+        TRACE_3D(TRC_CSCHED_RATELIMIT, scurr->vcpu->domain->domain_id,
+                 scurr->vcpu->vcpu_id, runtime);
         ret.migrated = 0;
         goto out;
     }
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c8e0ee7..164296b 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -55,6 +55,8 @@
 #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2, 17)
 #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2, 19)
 #define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2, 20)
+#define TRC_CSCHED2_SCHEDULE         TRC_SCHED_CLASS_EVT(CSCHED2, 21)
+#define TRC_CSCHED2_RATELIMIT        TRC_SCHED_CLASS_EVT(CSCHED2, 22)
 
 /*
  * WARNING: This is still in an experimental phase.  Status and work can be found at the
@@ -2293,7 +2295,22 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          vcpu_runnable(scurr->vcpu) &&
          (now - scurr->vcpu->runstate.state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
+    {
+        if ( unlikely(tb_init_done) )
+        {
+            struct {
+                unsigned vcpu:16, dom:16;
+                unsigned runtime;
+            } d;
+            d.dom = scurr->vcpu->domain->domain_id;
+            d.vcpu = scurr->vcpu->vcpu_id;
+            d.runtime = now - scurr->vcpu->runstate.state_entry_time;
+            __trace_var(TRC_CSCHED2_RATELIMIT, 1,
+                        sizeof(d),
+                        (unsigned char *)&d);
+        }
         return scurr;
+    }
 
     /* Default to current if runnable, idle otherwise */
     if ( vcpu_runnable(scurr->vcpu) )
@@ -2383,6 +2400,7 @@ csched2_schedule(
     struct csched2_vcpu *snext = NULL;
     unsigned int snext_pos = 0;
     struct task_slice ret;
+    bool_t tickled;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED2_VCPU_CHECK(current);
@@ -2397,13 +2415,31 @@ csched2_schedule(
     BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
-    if ( cpumask_test_cpu(cpu, &rqd->tickled) )
+    tickled = cpumask_test_cpu(cpu, &rqd->tickled);
+    if ( tickled )
     {
         __cpumask_clear_cpu(cpu, &rqd->tickled);
         cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
         smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
     }
 
+    if ( unlikely(tb_init_done) )
+    {
+        struct {
+            unsigned cpu:16, rq_id:16;
+            unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
+        } d;
+        d.cpu = cpu;
+        d.rq_id = c2r(ops, cpu);
+        d.tasklet = tasklet_work_scheduled;
+        d.idle = is_idle_vcpu(current);
+        d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
+        d.tickled = tickled;
+        __trace_var(TRC_CSCHED2_SCHEDULE, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
+    }
+
     /* Update credits */
     burn_credits(rqd, scurr, now);
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 41c61a7..903dbd8 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -160,6 +160,7 @@
 #define TRC_RTDS_BUDGET_BURN      TRC_SCHED_CLASS_EVT(RTDS, 3)
 #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
 #define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
+#define TRC_RTDS_SCHEDULE         TRC_SCHED_CLASS_EVT(RTDS, 6)
 
 static void repl_timer_handler(void *data);
 
@@ -1035,6 +1036,20 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     struct rt_vcpu *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
 
+    /* TRACE */
+    {
+        struct __packed {
+            unsigned cpu:17, tasklet:8, tickled:4, idle:4;
+        } d;
+        d.cpu = cpu;
+        d.tasklet = tasklet_work_scheduled;
+        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
+        d.idle = is_idle_vcpu(current);
+        __trace_var(TRC_RTDS_SCHEDULE, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
+    }
+
     /* clear ticked bit now that we've been scheduled */
     cpumask_clear_cpu(cpu, &prv->tickled);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 09/24] xen/tools: tracing: improve tracing of context switches.
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (7 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-09-20 14:08   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records Dario Faggioli
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

Right now, two out of the three events related to
context switch (that is TRC_SCHED_SWITCH_INFPREV and
TRC_SCHED_SWITCH_INFNEXT) only report the domain id,
and not the vcpu id.

That's omitting a useful piece of information, and
even if we be figured that out by looking at other
records, that's unnecessarily complicated (especially
if working on a trace from a sctipt).

This changes both the tracing code in Xen and the parsing
code in tools at once, to avoid introducing transitional
regressions.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/xentrace/formats    |    4 ++--
 tools/xentrace/xenalyze.c |   17 +++++++++--------
 xen/common/schedule.c     |    8 ++++----
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index caafb5f..0de7990 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -32,8 +32,8 @@
 0x0002800b  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  s_timer_fn
 0x0002800c  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  t_timer_fn
 0x0002800d  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  dom_timer_fn
-0x0002800e  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infprev    [ old_domid = 0x%(1)08x, runtime = %(2)d ]
-0x0002800f  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infnext    [ new_domid = 0x%(1)08x, time = %(2)d, r_time = %(3)d ]
+0x0002800e  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infprev    [ dom:vcpu = 0x%(1)04x%(2)04x, runtime = %(3)d ]
+0x0002800f  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infnext    [ new_dom:vcpu = 0x%(1)04x%(2)04x, time = %(3)d, r_time = %(4)d ]
 0x00028010  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  domain_shutdown_code [ dom:vcpu = 0x%(1)04x%(2)04x, reason = 0x%(3)08x ]
 
 0x00022001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:sched_tasklet
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 11763a8..0b697d0 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7501,28 +7501,29 @@ void sched_process(struct pcpu_info *p)
         case TRC_SCHED_SWITCH_INFPREV:
             if(opt.dump_all) {
                 struct {
-                    unsigned int domid, runtime;
+                    unsigned int domid, vcpuid, runtime;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s sched_switch prev d%u, run for %u.%uus\n",
-                       ri->dump_header, r->domid, r->runtime / 1000,
-                       r->runtime % 1000);
+                printf(" %s sched_switch prev d%uv%d, run for %u.%uus\n",
+                       ri->dump_header, r->domid, r->vcpuid,
+                       r->runtime / 1000, r->runtime % 1000);
             }
             break;
         case TRC_SCHED_SWITCH_INFNEXT:
             if(opt.dump_all)
             {
                 struct {
-                    unsigned int domid, rsince;
+                    unsigned int domid, vcpuid, rsince;
                     int slice;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s sched_switch next d%u", ri->dump_header, r->domid);
+                printf(" %s sched_switch next d%uv%u", ri->dump_header,
+                       r->domid, r->vcpuid);
                 if ( r->rsince != 0 )
-                    printf(", was runnable for %u.%uus, ", r->rsince / 1000,
+                    printf(", was runnable for %u.%uus", r->rsince / 1000,
                            r->rsince % 1000);
                 if ( r->slice > 0 )
-                    printf("next slice %u.%uus", r->slice / 1000,
+                    printf(", next slice %u.%uus", r->slice / 1000,
                            r->slice % 1000);
                 printf("\n");
             }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index abe063d..5b444c4 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1390,11 +1390,11 @@ static void schedule(void)
         return continue_running(prev);
     }
 
-    TRACE_2D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id,
+    TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
+             prev->domain->domain_id, prev->vcpu_id,
              now - prev->runstate.state_entry_time);
-    TRACE_3D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id,
+    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
+             next->domain->domain_id, next->vcpu_id,
              (next->runstate.state == RUNSTATE_runnable) ?
              (now - next->runstate.state_entry_time) : 0,
              next_slice.time);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (8 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 09/24] xen/tools: tracing: improve tracing of context switches Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-09-20 14:35   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 11/24] tools: tracing: handle more scheduling related events Dario Faggioli
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

In both Credit2's trace records relative to checking
whether we want to preempt a vcpu (in runq_tickle()),
and to credits being burn, make it explicit on which
pcpu the vcpu being considered is running.

Such information isn't currently available, not even
by looking at on which pcpu the events happen, as we
do both the above operation from a certain pcpu on
vcpus running on different pcpus.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/xentrace/formats     |    4 ++--
 tools/xentrace/xenalyze.c  |   15 +++++++++------
 xen/common/sched_credit2.c |    6 ++++--
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index 0de7990..adff681 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -45,9 +45,9 @@
 
 0x00022201  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tick
 0x00022202  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_pos       [ dom:vcpu = 0x%(1)08x, pos = %(2)d]
-0x00022203  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit burn    [ dom:vcpu = 0x%(1)08x, credit = %(2)d, delta = %(3)d ]
+0x00022203  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit burn    [ dom:vcpu = 0x%(1)08x, cpu = %(3)d, credit = %(2)d, delta = %(4)d ]
 0x00022204  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_add
-0x00022205  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle_check   [ dom:vcpu = 0x%(1)08x, credit = %(2)d ]
+0x00022205  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle_check   [ dom:vcpu = 0x%(1)08x, cpu = %(2)d, credit = %(3)d ]
 0x00022206  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tickle         [ cpu = %(1)d ]
 0x00022207  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:credit_reset   [ dom:vcpu = 0x%(1)08x, cr_start = %(2)d, cr_end = %(3)d, mult = %(4)d ]
 0x00022208  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:sched_tasklet
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 0b697d0..58a8d41 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7607,24 +7607,27 @@ void sched_process(struct pcpu_info *p)
         case TRC_SCHED_CLASS_EVT(CSCHED2, 3): /* CREDIT_BURN       */
             if(opt.dump_all) {
                 struct {
-                    unsigned int vcpuid:16, domid:16, credit;
+                    unsigned int vcpuid:16, domid:16, credit, cpu;
                     int delta;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s csched2:burn_credits d%uv%u, credit = %u, delta = %d\n",
+                printf(" %s csched2:burn_credits d%uv%u, "
+                       "on cpu = %u, credit = %u, delta = %d\n",
                        ri->dump_header, r->domid, r->vcpuid,
-                       r->credit, r->delta);
+                       r->cpu, r->credit, r->delta);
             }
             break;
         case TRC_SCHED_CLASS_EVT(CSCHED2, 5): /* TICKLE_CHECK      */
             if(opt.dump_all) {
                 struct {
                     unsigned int vcpuid:16, domid:16;
-                    unsigned int credit;
+                    unsigned int cpu, credit;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s csched2:tickle_check d%uv%u, credit = %u\n",
-                       ri->dump_header, r->domid, r->vcpuid, r->credit);
+                printf(" %s csched2:tickle_check d%uv%u, "
+                       "on cpu = %u, credits = %u\n",
+                       ri->dump_header, r->domid, r->vcpuid,
+                       r->cpu, r->credit);
             }
             break;
         case TRC_SCHED_CLASS_EVT(CSCHED2, 6): /* TICKLE            */
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 164296b..c8396a8 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1027,11 +1027,12 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         {
             struct {
                 unsigned vcpu:16, dom:16;
-                unsigned credit;
+                unsigned cpu, credit;
             } d;
             d.dom = cur->vcpu->domain->domain_id;
             d.vcpu = cur->vcpu->vcpu_id;
             d.credit = cur->credit;
+            d.cpu = i;
             __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
                         sizeof(d),
                         (unsigned char *)&d);
@@ -1181,12 +1182,13 @@ void burn_credits(struct csched2_runqueue_data *rqd,
     {
         struct {
             unsigned vcpu:16, dom:16;
-            unsigned credit;
+            unsigned credit, cpu;
             int delta;
         } d;
         d.dom = svc->vcpu->domain->domain_id;
         d.vcpu = svc->vcpu->vcpu_id;
         d.credit = svc->credit;
+        d.cpu = svc->vcpu->processor;
         d.delta = delta;
         __trace_var(TRC_CSCHED2_CREDIT_BURN, 1,
                     sizeof(d),


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 11/24] tools: tracing: handle more scheduling related events.
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (9 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-09-20 14:37   ` George Dunlap
  2016-08-17 17:18 ` [PATCH 12/24] xen: libxc: allow to set the ratelimit value online Dario Faggioli
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

There are some scheduling related trace records that
are not being taken care of (and hence only dumped as
raw records).

Some of them are being introduced in this series, while
other were just neglected by previous patches.

Add support for them.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/xentrace/formats    |    8 ++++
 tools/xentrace/xenalyze.c |  101 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 109 insertions(+)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index adff681..3488a06 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -42,6 +42,10 @@
 0x00022004  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:stolen_vcpu   [ dom:vcpu = 0x%(2)04x%(3)04x, from = %(1)d ]
 0x00022005  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:picked_cpu    [ dom:vcpu = 0x%(1)04x%(2)04x, cpu = %(3)d ]
 0x00022006  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:tickle        [ cpu = %(1)d ]
+0x00022007  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:boost         [ dom:vcpu = 0x%(1)04x%(2)04x ]
+0x00022008  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:unboost       [ dom:vcpu = 0x%(1)04x%(2)04x ]
+0x00022009  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:schedule      [ cpu = %(1)d, tasklet_scheduled = %(2)d, was_idle = %(3)d ]
+0x0002200A  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:ratelimit     [ dom:vcpu = 0x%(1)04x%(2)04x, runtime = %(3)d ]
 
 0x00022201  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tick
 0x00022202  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_pos       [ dom:vcpu = 0x%(1)08x, pos = %(2)d]
@@ -61,12 +65,16 @@
 0x00022210  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:load_check     [ lrq_id[16]:orq_id[16] = 0x%(1)08x, delta = %(2)d ]
 0x00022211  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:load_balance   [ l_bavgload = 0x%(2)08x%(1)08x, o_bavgload = 0x%(4)08x%(3)08x, lrq_id[16]:orq_id[16] = 0x%(5)08x ]
 0x00022212  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:pick_cpu       [ b_avgload = 0x%(2)08x%(1)08x, dom:vcpu = 0x%(3)08x, rq_id[16]:new_cpu[16] = %(4)d ]
+0x00022213  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_candidate [ dom:vcpu = 0x%(1)08x, runq_pos = %(2)d tickled_cpu = %(3)d ]
+0x00022214  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:schedule       [ rq:cpu = 0x%(1)08x, tasklet[8]:idle[8]:smt_idle[8]:tickled[8] = %(2)08x ]
+0x00022215  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:ratelimit      [ dom:vcpu = 0x%(1)08x, runtime = %(2)d ]
 
 0x00022801  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:tickle        [ cpu = %(1)d ]
 0x00022802  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:runq_pick     [ dom:vcpu = 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
 0x00022803  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:burn_budget   [ dom:vcpu = 0x%(1)08x, cur_budget = 0x%(3)08x%(2)08x, delta = %(4)d ]
 0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu = 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
 0x00022805  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:sched_tasklet
+0x00022806  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:schedule      [ cpu[16]:tasklet[8]:idle[4]:tickled[4] = %(1)08x ]
 
 0x00041001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  domain_create   [ dom = 0x%(1)08x ]
 0x00041002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  domain_destroy  [ dom = 0x%(1)08x ]
diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 58a8d41..aaff1d9 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7590,6 +7590,50 @@ void sched_process(struct pcpu_info *p)
                        ri->dump_header, r->cpu);
             }
             break;
+        case TRC_SCHED_CLASS_EVT(CSCHED, 7): /* BOOST_START   */
+            if(opt.dump_all) {
+                struct {
+                    unsigned int domid, vcpuid;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s csched: d%uv%u boosted\n",
+                       ri->dump_header, r->domid, r->vcpuid);
+            }
+            break;
+        case TRC_SCHED_CLASS_EVT(CSCHED, 8): /* BOOST_END     */
+            if(opt.dump_all) {
+                struct {
+                    unsigned int domid, vcpuid;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s csched: d%uv%u unboosted\n",
+                       ri->dump_header, r->domid, r->vcpuid);
+            }
+            break;
+        case TRC_SCHED_CLASS_EVT(CSCHED, 9): /* SCHEDULE      */
+            if(opt.dump_all) {
+                struct {
+                    unsigned int cpu, tasklet, idle;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s csched:schedule cpu %u, %s%s\n",
+                       ri->dump_header, r->cpu,
+                       r->tasklet ? ", tasklet scheduled" : "",
+                       r->idle ? ", idle" : ", busy");
+            }
+            break;
+        case TRC_SCHED_CLASS_EVT(CSCHED, 10): /* RATELIMIT     */
+            if(opt.dump_all) {
+                struct {
+                    unsigned int domid, vcpuid;
+                    unsigned int runtime;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s csched:ratelimit, d%uv%u run only %u.%uus\n",
+                       ri->dump_header, r->domid, r->vcpuid,
+                       r->runtime / 1000, r->runtime % 1000);
+            }
+            break;
         /* CREDIT 2 (TRC_CSCHED2_xxx) */
         case TRC_SCHED_CLASS_EVT(CSCHED2, 1): /* TICK              */
         case TRC_SCHED_CLASS_EVT(CSCHED2, 4): /* CREDIT_ADD        */
@@ -7779,6 +7823,50 @@ void sched_process(struct pcpu_info *p)
                        ri->dump_header, r->domid, r->vcpuid, r->rqi, r->cpu);
             }
             break;
+        case TRC_SCHED_CLASS_EVT(CSCHED2, 20): /* RUNQ_CANDIDATE   */
+            if (opt.dump_all) {
+                struct {
+                    unsigned vcpuid:16, domid:16;
+                    unsigned tickled_cpu, position;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s csched2:runq_candidate d%uv%u, "
+                       "pos in runq %u, ",
+                       ri->dump_header, r->domid, r->vcpuid,
+                       r->position);
+                if (r->tickled_cpu == (unsigned)-1)
+                    printf("no cpu was tickled");
+                else
+                    printf("cpu %u was tickled\n", r->tickled_cpu);
+            }
+            break;
+        case TRC_SCHED_CLASS_EVT(CSCHED2, 21): /* SCHEDULE         */
+            if (opt.dump_all) {
+                struct {
+                    unsigned cpu:16, rqi:16;
+                    unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s csched2:schedule cpu %u, rq# %u%s%s%s%s\n",
+                       ri->dump_header, r->cpu, r->rqi,
+                       r->tasklet ? ", tasklet scheduled" : "",
+                       r->idle ? ", idle" : ", busy",
+                       r->idle ? (r->smt_idle ? ", SMT idle" : ", SMT busy") : "",
+                       r->tickled ? ", tickled" : ", not tickled");
+            }
+            break;
+        case TRC_SCHED_CLASS_EVT(CSCHED2, 22): /* RATELIMIT        */
+            if (opt.dump_all) {
+                struct {
+                    unsigned int vcpuid:16, domid:16;
+                    unsigned int runtime;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s csched2:ratelimit, d%uv%u run only %u.%uus\n",
+                       ri->dump_header, r->domid, r->vcpuid,
+                       r->runtime / 1000, r->runtime % 1000);
+            }
+            break;
         /* RTDS (TRC_RTDS_xxx) */
         case TRC_SCHED_CLASS_EVT(RTDS, 1): /* TICKLE           */
             if(opt.dump_all) {
@@ -7831,6 +7919,19 @@ void sched_process(struct pcpu_info *p)
             if(opt.dump_all)
                 printf(" %s rtds:sched_tasklet\n", ri->dump_header);
             break;
+        case TRC_SCHED_CLASS_EVT(RTDS, 6): /* SCHEDULE         */
+            if (opt.dump_all) {
+                struct {
+                    unsigned cpu:16, tasklet:8, idle:4, tickled:4;
+                } __attribute__((packed)) *r = (typeof(r))ri->d;
+
+                printf(" %s rtds:schedule cpu %u, %s%s%s\n",
+                       ri->dump_header, r->cpu,
+                       r->tasklet ? ", tasklet scheduled" : "",
+                       r->idle ? ", idle" : ", busy",
+                       r->tickled ? ", tickled" : ", not tickled");
+            }
+            break;
         default:
             process_generic(ri);
         }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 12/24] xen: libxc: allow to set the ratelimit value online
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (10 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 11/24] tools: tracing: handle more scheduling related events Dario Faggioli
@ 2016-08-17 17:18 ` Dario Faggioli
  2016-09-20 14:43   ` George Dunlap
  2016-09-28 15:44   ` George Dunlap
  2016-08-17 17:19 ` [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers Dario Faggioli
                   ` (14 subsequent siblings)
  26 siblings, 2 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Anshul Makkar, Ian Jackson, Jan Beulich

The main purpose of the patch is to provide the xen-libxc
plumbing necessary to be able to change the value of the
ratelimit_us parameter online, for Credit2 (like it is
already for Credit1).

While there:
 - mention in the Xen logs when rate limiting was enables
   and is being disabled (and vice-versa);
 - fix csched2_sys_cntl() which was always returning
   -EINVAL in the XEN_SYSCTL_SCHEDOP_putinfo case.

And also:
 - fix style of an if in csched_sys_cntl();
 - fix the style of the switch in csched2_sys_cntl();

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xenctrl.h |   32 ++++++++++++++++++------------
 tools/libxc/xc_csched2.c      |   44 +++++++++++++++++++++++++++++++++++++++++
 xen/common/sched_credit.c     |   16 +++++++++------
 xen/common/sched_credit2.c    |   38 ++++++++++++++++++++---------------
 xen/include/public/sysctl.h   |   17 +++++++++++++---
 5 files changed, 108 insertions(+), 39 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 560ce7b..7a50895 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -910,25 +910,31 @@ int xc_sched_credit_domain_get(xc_interface *xch,
                                uint32_t domid,
                                struct xen_domctl_sched_credit *sdom);
 int xc_sched_credit_params_set(xc_interface *xch,
-                              uint32_t cpupool_id,
-                              struct xen_sysctl_credit_schedule *schedule);
+                               uint32_t cpupool_id,
+                               struct xen_sysctl_credit_schedule *schedule);
 int xc_sched_credit_params_get(xc_interface *xch,
-                              uint32_t cpupool_id,
-                              struct xen_sysctl_credit_schedule *schedule);
+                               uint32_t cpupool_id,
+                               struct xen_sysctl_credit_schedule *schedule);
+
+int xc_sched_credit2_params_set(xc_interface *xch,
+                                uint32_t cpupool_id,
+                                struct xen_sysctl_credit2_schedule *schedule);
+int xc_sched_credit2_params_get(xc_interface *xch,
+                                uint32_t cpupool_id,
+                                struct xen_sysctl_credit2_schedule *schedule);
 int xc_sched_credit2_domain_set(xc_interface *xch,
-                               uint32_t domid,
-                               struct xen_domctl_sched_credit2 *sdom);
-
+                                uint32_t domid,
+                                struct xen_domctl_sched_credit2 *sdom);
 int xc_sched_credit2_domain_get(xc_interface *xch,
-                               uint32_t domid,
-                               struct xen_domctl_sched_credit2 *sdom);
+                                uint32_t domid,
+                                struct xen_domctl_sched_credit2 *sdom);
 
 int xc_sched_rtds_domain_set(xc_interface *xch,
-                            uint32_t domid,
-                            struct xen_domctl_sched_rtds *sdom);
+                             uint32_t domid,
+                             struct xen_domctl_sched_rtds *sdom);
 int xc_sched_rtds_domain_get(xc_interface *xch,
-                            uint32_t domid,
-                            struct xen_domctl_sched_rtds *sdom);
+                             uint32_t domid,
+                             struct xen_domctl_sched_rtds *sdom);
 int xc_sched_rtds_vcpu_set(xc_interface *xch,
                            uint32_t domid,
                            struct xen_domctl_schedparam_vcpu *vcpus,
diff --git a/tools/libxc/xc_csched2.c b/tools/libxc/xc_csched2.c
index ed99605..5b62a5f 100644
--- a/tools/libxc/xc_csched2.c
+++ b/tools/libxc/xc_csched2.c
@@ -60,3 +60,47 @@ xc_sched_credit2_domain_get(
 
     return err;
 }
+
+int
+xc_sched_credit2_params_set(
+    xc_interface *xch,
+    uint32_t cpupool_id,
+    struct xen_sysctl_credit2_schedule *schedule)
+{
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_scheduler_op;
+    sysctl.u.scheduler_op.cpupool_id = cpupool_id;
+    sysctl.u.scheduler_op.sched_id = XEN_SCHEDULER_CREDIT2;
+    sysctl.u.scheduler_op.cmd = XEN_SYSCTL_SCHEDOP_putinfo;
+
+    sysctl.u.scheduler_op.u.sched_credit2 = *schedule;
+
+    if ( do_sysctl(xch, &sysctl) )
+        return -1;
+
+    *schedule = sysctl.u.scheduler_op.u.sched_credit2;
+
+    return 0;
+}
+
+int
+xc_sched_credit2_params_get(
+    xc_interface *xch,
+    uint32_t cpupool_id,
+    struct xen_sysctl_credit2_schedule *schedule)
+{
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_scheduler_op;
+    sysctl.u.scheduler_op.cpupool_id = cpupool_id;
+    sysctl.u.scheduler_op.sched_id = XEN_SCHEDULER_CREDIT2;
+    sysctl.u.scheduler_op.cmd = XEN_SYSCTL_SCHEDOP_getinfo;
+
+    if ( do_sysctl(xch, &sysctl) )
+        return -1;
+
+    *schedule = sysctl.u.scheduler_op.u.sched_credit2;
+
+    return 0;
+}
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index f9d3ac9..14b207d 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1203,16 +1203,20 @@ csched_sys_cntl(const struct scheduler *ops,
     switch ( sc->cmd )
     {
     case XEN_SYSCTL_SCHEDOP_putinfo:
-        if (params->tslice_ms > XEN_SYSCTL_CSCHED_TSLICE_MAX
-            || params->tslice_ms < XEN_SYSCTL_CSCHED_TSLICE_MIN 
-            || (params->ratelimit_us
-                && (params->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX
-                    || params->ratelimit_us < XEN_SYSCTL_SCHED_RATELIMIT_MIN))
-            || MICROSECS(params->ratelimit_us) > MILLISECS(params->tslice_ms) )
+        if ( params->tslice_ms > XEN_SYSCTL_CSCHED_TSLICE_MAX
+             || params->tslice_ms < XEN_SYSCTL_CSCHED_TSLICE_MIN
+             || (params->ratelimit_us
+                 && (params->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX
+                     || params->ratelimit_us < XEN_SYSCTL_SCHED_RATELIMIT_MIN))
+             || MICROSECS(params->ratelimit_us) > MILLISECS(params->tslice_ms) )
                 goto out;
 
         spin_lock_irqsave(&prv->lock, flags);
         __csched_set_tslice(prv, params->tslice_ms);
+        if ( !prv->ratelimit_us && params->ratelimit_us )
+            printk(XENLOG_INFO "Enabling context switch rate limiting\n");
+        else if ( prv->ratelimit_us && !params->ratelimit_us )
+            printk(XENLOG_INFO "Disabling context switch rate limiting\n");
         prv->ratelimit_us = params->ratelimit_us;
         spin_unlock_irqrestore(&prv->lock, flags);
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c8396a8..0d83bd7 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2020,29 +2020,33 @@ csched2_dom_cntl(
 static int csched2_sys_cntl(const struct scheduler *ops,
                             struct xen_sysctl_scheduler_op *sc)
 {
-    int rc = -EINVAL;
-    xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit;
+    xen_sysctl_credit2_schedule_t *params = &sc->u.sched_credit2;
     struct csched2_private *prv = CSCHED2_PRIV(ops);
     unsigned long flags;
 
     switch (sc->cmd )
     {
-        case XEN_SYSCTL_SCHEDOP_putinfo:
-            if ( params->ratelimit_us &&
-                ( params->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX ||
-                  params->ratelimit_us < XEN_SYSCTL_SCHED_RATELIMIT_MIN ))
-                return rc;
-            write_lock_irqsave(&prv->lock, flags);
-            prv->ratelimit_us = params->ratelimit_us;
-            write_unlock_irqrestore(&prv->lock, flags);
-            break;
-
-        case XEN_SYSCTL_SCHEDOP_getinfo:
-            params->ratelimit_us = prv->ratelimit_us;
-            rc = 0;
-            break;
+    case XEN_SYSCTL_SCHEDOP_putinfo:
+        if ( params->ratelimit_us &&
+             (params->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX ||
+              params->ratelimit_us < XEN_SYSCTL_SCHED_RATELIMIT_MIN ))
+            return -EINVAL;
+
+        write_lock_irqsave(&prv->lock, flags);
+        if ( !prv->ratelimit_us && params->ratelimit_us )
+            printk(XENLOG_INFO "Enabling context switch rate limiting\n");
+        else if ( prv->ratelimit_us && !params->ratelimit_us )
+            printk(XENLOG_INFO "Disabling context switch rate limiting\n");
+        prv->ratelimit_us = params->ratelimit_us;
+        write_unlock_irqrestore(&prv->lock, flags);
+
+    /* FALLTHRU */
+    case XEN_SYSCTL_SCHEDOP_getinfo:
+        params->ratelimit_us = prv->ratelimit_us;
+        break;
     }
-    return rc;
+
+    return 0;
 }
 
 static void *
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 8197c14..fd0fa67 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -623,19 +623,29 @@ struct xen_sysctl_arinc653_schedule {
 typedef struct xen_sysctl_arinc653_schedule xen_sysctl_arinc653_schedule_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_arinc653_schedule_t);
 
+/*
+ * Valid range for context switch rate limit (in microseconds).
+ * Applicable to Credit and Credit2 schedulers.
+ */
+#define XEN_SYSCTL_SCHED_RATELIMIT_MAX 500000
+#define XEN_SYSCTL_SCHED_RATELIMIT_MIN 100
+
 struct xen_sysctl_credit_schedule {
     /* Length of timeslice in milliseconds */
 #define XEN_SYSCTL_CSCHED_TSLICE_MAX 1000
 #define XEN_SYSCTL_CSCHED_TSLICE_MIN 1
     unsigned tslice_ms;
-    /* Rate limit (minimum timeslice) in microseconds */
-#define XEN_SYSCTL_SCHED_RATELIMIT_MAX 500000
-#define XEN_SYSCTL_SCHED_RATELIMIT_MIN 100
     unsigned ratelimit_us;
 };
 typedef struct xen_sysctl_credit_schedule xen_sysctl_credit_schedule_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_credit_schedule_t);
 
+struct xen_sysctl_credit2_schedule {
+    unsigned ratelimit_us;
+};
+typedef struct xen_sysctl_credit2_schedule xen_sysctl_credit2_schedule_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_credit2_schedule_t);
+
 /* XEN_SYSCTL_scheduler_op */
 /* Set or get info? */
 #define XEN_SYSCTL_SCHEDOP_putinfo 0
@@ -649,6 +659,7 @@ struct xen_sysctl_scheduler_op {
             XEN_GUEST_HANDLE_64(xen_sysctl_arinc653_schedule_t) schedule;
         } sched_arinc653;
         struct xen_sysctl_credit_schedule sched_credit;
+        struct xen_sysctl_credit2_schedule sched_credit2;
     } u;
 };
 typedef struct xen_sysctl_scheduler_op xen_sysctl_scheduler_op_t;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (11 preceding siblings ...)
  2016-08-17 17:18 ` [PATCH 12/24] xen: libxc: allow to set the ratelimit value online Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-09-20 15:10   ` Wei Liu
  2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
                   ` (13 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

In fact, libxc wrappers should, on error, set errno and
return -1.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/xc_csched.c  |   27 +++++++++++++++------------
 tools/libxc/xc_csched2.c |   15 +++++++++------
 2 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/tools/libxc/xc_csched.c b/tools/libxc/xc_csched.c
index bf03bfc..139fc16 100644
--- a/tools/libxc/xc_csched.c
+++ b/tools/libxc/xc_csched.c
@@ -37,7 +37,10 @@ xc_sched_credit_domain_set(
     domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo;
     domctl.u.scheduler_op.u.credit = *sdom;
 
-    return do_domctl(xch, &domctl);
+    if ( do_domctl(xch, &domctl) )
+        return -1;
+
+    return 0;
 }
 
 int
@@ -47,18 +50,18 @@ xc_sched_credit_domain_get(
     struct xen_domctl_sched_credit *sdom)
 {
     DECLARE_DOMCTL;
-    int err;
 
     domctl.cmd = XEN_DOMCTL_scheduler_op;
     domctl.domain = (domid_t) domid;
     domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_CREDIT;
     domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_getinfo;
 
-    err = do_domctl(xch, &domctl);
-    if ( err == 0 )
-        *sdom = domctl.u.scheduler_op.u.credit;
+    if ( do_domctl(xch, &domctl) )
+        return -1;
+
+    *sdom = domctl.u.scheduler_op.u.credit;
 
-    return err;
+    return 0;
 }
 
 int
@@ -67,7 +70,6 @@ xc_sched_credit_params_set(
     uint32_t cpupool_id,
     struct xen_sysctl_credit_schedule *schedule)
 {
-    int rc;
     DECLARE_SYSCTL;
 
     sysctl.cmd = XEN_SYSCTL_scheduler_op;
@@ -77,11 +79,12 @@ xc_sched_credit_params_set(
 
     sysctl.u.scheduler_op.u.sched_credit = *schedule;
 
-    rc = do_sysctl(xch, &sysctl);
+    if ( do_sysctl(xch, &sysctl) )
+        return -1;
 
     *schedule = sysctl.u.scheduler_op.u.sched_credit;
 
-    return rc;
+    return 0;
 }
 
 int
@@ -90,7 +93,6 @@ xc_sched_credit_params_get(
     uint32_t cpupool_id,
     struct xen_sysctl_credit_schedule *schedule)
 {
-    int rc;
     DECLARE_SYSCTL;
 
     sysctl.cmd = XEN_SYSCTL_scheduler_op;
@@ -98,9 +100,10 @@ xc_sched_credit_params_get(
     sysctl.u.scheduler_op.sched_id = XEN_SCHEDULER_CREDIT;
     sysctl.u.scheduler_op.cmd = XEN_SYSCTL_SCHEDOP_getinfo;
 
-    rc = do_sysctl(xch, &sysctl);
+    if ( do_sysctl(xch, &sysctl) )
+        return -1;
 
     *schedule = sysctl.u.scheduler_op.u.sched_credit;
 
-    return rc;
+    return 0;
 }
diff --git a/tools/libxc/xc_csched2.c b/tools/libxc/xc_csched2.c
index 5b62a5f..12c95e6 100644
--- a/tools/libxc/xc_csched2.c
+++ b/tools/libxc/xc_csched2.c
@@ -37,7 +37,10 @@ xc_sched_credit2_domain_set(
     domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo;
     domctl.u.scheduler_op.u.credit2 = *sdom;
 
-    return do_domctl(xch, &domctl);
+    if ( do_domctl(xch, &domctl) )
+        return -1;
+
+    return 0;
 }
 
 int
@@ -47,18 +50,18 @@ xc_sched_credit2_domain_get(
     struct xen_domctl_sched_credit2 *sdom)
 {
     DECLARE_DOMCTL;
-    int err;
 
     domctl.cmd = XEN_DOMCTL_scheduler_op;
     domctl.domain = (domid_t) domid;
     domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_CREDIT2;
     domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_getinfo;
 
-    err = do_domctl(xch, &domctl);
-    if ( err == 0 )
-        *sdom = domctl.u.scheduler_op.u.credit2;
+    if ( do_domctl(xch, &domctl) )
+        return -1;
+
+    *sdom = domctl.u.scheduler_op.u.credit2;
 
-    return err;
+    return 0;
 }
 
 int


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (12 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-08-22  9:21   ` Ian Jackson
                     ` (2 more replies)
  2016-08-17 17:19 ` [PATCH 15/24] xl: " Dario Faggioli
                   ` (12 subsequent siblings)
  26 siblings, 3 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

This is the remaining part of the plumbing (the libxl
one) necessary to be able to change the value of the
ratelimit_us parameter online, for Credit2 (like it is
already for Credit1).

Note that, so far, we were rejecting (for Credit1) a
new value of zero, despite it is a pretty nice way to
ask for the rate limiting to be disabled, and the
hypervisor is already capable of dealing with it in
that way.

Therefore, we change things so that it is possible to
do so, both for Credit1 and Credit2

While there, fix the error handling path (make it
compliant with libxl's codying style) in Credit1
rate limiting related functions.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 tools/libxl/libxl.c         |  111 ++++++++++++++++++++++++++++++++++---------
 tools/libxl/libxl.h         |    4 ++
 tools/libxl/libxl_types.idl |    4 ++
 3 files changed, 95 insertions(+), 24 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6a50e49..d6a8d02 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5229,69 +5229,132 @@ static int sched_credit_domain_set(libxl__gc *gc, uint32_t domid,
     return 0;
 }
 
+static int sched_ratelimit_check(libxl__gc *gc, int ratelimit)
+{
+    if (ratelimit != 0 &&
+        (ratelimit <  XEN_SYSCTL_SCHED_RATELIMIT_MIN ||
+         ratelimit > XEN_SYSCTL_SCHED_RATELIMIT_MAX)) {
+        LOG(ERROR, "Ratelimit out of range, valid range is from %d to %d",
+            XEN_SYSCTL_SCHED_RATELIMIT_MIN, XEN_SYSCTL_SCHED_RATELIMIT_MAX);
+        return ERROR_INVAL;
+    }
+
+    return 0;
+}
+
 int libxl_sched_credit_params_get(libxl_ctx *ctx, uint32_t poolid,
                                   libxl_sched_credit_params *scinfo)
 {
     struct xen_sysctl_credit_schedule sparam;
-    int rc;
+    int r, rc;
     GC_INIT(ctx);
 
-    rc = xc_sched_credit_params_get(ctx->xch, poolid, &sparam);
-    if (rc != 0) {
-        LOGE(ERROR, "getting sched credit param");
-        GC_FREE;
-        return ERROR_FAIL;
+    r = xc_sched_credit_params_get(ctx->xch, poolid, &sparam);
+    if (r < 0) {
+        LOGE(ERROR, "getting Credit scheduler parameters");
+        rc = ERROR_FAIL;
+        goto out;
     }
 
     scinfo->tslice_ms = sparam.tslice_ms;
     scinfo->ratelimit_us = sparam.ratelimit_us;
 
+    rc = 0;
+ out:
     GC_FREE;
-    return 0;
+    return rc;
 }
 
 int libxl_sched_credit_params_set(libxl_ctx *ctx, uint32_t poolid,
                                   libxl_sched_credit_params *scinfo)
 {
     struct xen_sysctl_credit_schedule sparam;
-    int rc=0;
+    int r, rc;
     GC_INIT(ctx);
 
     if (scinfo->tslice_ms <  XEN_SYSCTL_CSCHED_TSLICE_MIN
         || scinfo->tslice_ms > XEN_SYSCTL_CSCHED_TSLICE_MAX) {
         LOG(ERROR, "Time slice out of range, valid range is from %d to %d",
             XEN_SYSCTL_CSCHED_TSLICE_MIN, XEN_SYSCTL_CSCHED_TSLICE_MAX);
-        GC_FREE;
-        return ERROR_INVAL;
+        rc = ERROR_INVAL;
+        goto out;
     }
-    if (scinfo->ratelimit_us <  XEN_SYSCTL_SCHED_RATELIMIT_MIN
-        || scinfo->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX) {
-        LOG(ERROR, "Ratelimit out of range, valid range is from %d to %d",
-            XEN_SYSCTL_SCHED_RATELIMIT_MIN, XEN_SYSCTL_SCHED_RATELIMIT_MAX);
-        GC_FREE;
-        return ERROR_INVAL;
+    rc = sched_ratelimit_check(gc, scinfo->ratelimit_us);
+    if (rc) {
+        goto out;
     }
     if (scinfo->ratelimit_us > scinfo->tslice_ms*1000) {
         LOG(ERROR, "Ratelimit cannot be greater than timeslice");
-        GC_FREE;
-        return ERROR_INVAL;
+        rc = ERROR_INVAL;
+        goto out;
     }
 
     sparam.tslice_ms = scinfo->tslice_ms;
     sparam.ratelimit_us = scinfo->ratelimit_us;
 
-    rc = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
-    if ( rc < 0 ) {
-        LOGE(ERROR, "setting sched credit param");
-        GC_FREE;
-        return ERROR_FAIL;
+    r = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
+    if ( r < 0 ) {
+        LOGE(ERROR, "Setting Credit scheduler parameters");
+        rc = ERROR_FAIL;
+        goto out;
     }
 
     scinfo->tslice_ms = sparam.tslice_ms;
     scinfo->ratelimit_us = sparam.ratelimit_us;
 
+ out:
     GC_FREE;
-    return 0;
+    return rc;
+}
+
+int libxl_sched_credit2_params_get(libxl_ctx *ctx, uint32_t poolid,
+                                   libxl_sched_credit2_params *scinfo)
+{
+    struct xen_sysctl_credit2_schedule sparam;
+    int r, rc;
+    GC_INIT(ctx);
+
+    r = xc_sched_credit2_params_get(ctx->xch, poolid, &sparam);
+    if (r < 0) {
+        LOGE(ERROR, "getting Credit2 scheduler parameters");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    scinfo->ratelimit_us = sparam.ratelimit_us;
+
+    rc = 0;
+ out:
+    GC_FREE;
+    return rc;
+}
+
+int libxl_sched_credit2_params_set(libxl_ctx *ctx, uint32_t poolid,
+                                   libxl_sched_credit2_params *scinfo)
+{
+    struct xen_sysctl_credit2_schedule sparam;
+    int r, rc;
+    GC_INIT(ctx);
+
+    rc = sched_ratelimit_check(gc, scinfo->ratelimit_us);
+    if (rc) {
+        goto out;
+    }
+
+    sparam.ratelimit_us = scinfo->ratelimit_us;
+
+    r = xc_sched_credit2_params_set(ctx->xch, poolid, &sparam);
+    if ( r < 0 ) {
+        LOGE(ERROR, "Setting Credit2 scheduler parameters");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    scinfo->ratelimit_us = sparam.ratelimit_us;
+
+ out:
+    GC_FREE;
+    return rc;
 }
 
 static int sched_credit2_domain_get(libxl__gc *gc, uint32_t domid,
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index ae21302..efc5912 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1978,6 +1978,10 @@ int libxl_sched_credit_params_get(libxl_ctx *ctx, uint32_t poolid,
                                   libxl_sched_credit_params *scinfo);
 int libxl_sched_credit_params_set(libxl_ctx *ctx, uint32_t poolid,
                                   libxl_sched_credit_params *scinfo);
+int libxl_sched_credit2_params_get(libxl_ctx *ctx, uint32_t poolid,
+                                   libxl_sched_credit2_params *scinfo);
+int libxl_sched_credit2_params_set(libxl_ctx *ctx, uint32_t poolid,
+                                   libxl_sched_credit2_params *scinfo);
 
 /* Scheduler Per-domain parameters */
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index ef614be..38a4222 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -833,6 +833,10 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
     ("ratelimit_us", integer),
     ], dispose_fn=None)
 
+libxl_sched2_credit_params = Struct("sched_credit2_params", [
+    ("ratelimit_us", integer),
+    ], dispose_fn=None)
+
 libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
     ("allow_unsafe", libxl_defbool),


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 15/24] xl: allow to set the ratelimit value online for Credit2
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (13 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-09-28 15:46   ` George Dunlap
  2016-08-17 17:19 ` [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
                   ` (11 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

Last part of the wiring necessary for allowing to
change the value of the ratelimit_us parameter online,
for Credit2 (like it is already for Credit1).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 docs/man/xl.pod.1.in      |    9 ++++
 tools/libxl/xl_cmdimpl.c  |   91 +++++++++++++++++++++++++++++++++++++--------
 tools/libxl/xl_cmdtable.c |    2 +
 3 files changed, 86 insertions(+), 16 deletions(-)

diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
index 1adf322..013591b 100644
--- a/docs/man/xl.pod.1.in
+++ b/docs/man/xl.pod.1.in
@@ -1089,6 +1089,15 @@ to 65535 and the default is 256.
 
 Restrict output to domains in the specified cpupool.
 
+=item B<-s>, B<--schedparam>
+
+Specify to list or set pool-wide scheduler parameters.
+
+=item B<-r RLIMIT>, B<--ratelimit_us=RLIMIT>
+
+Attempts to limit the rate of context switching. It is basically the same
+as B<--ratelimit_us> in B<sched-credit>
+
 =back
 
 =item B<sched-rtds> [I<OPTIONS>]
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 7f961e3..5bdeda8 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -6452,8 +6452,29 @@ static int sched_credit_pool_output(uint32_t poolid)
     return 0;
 }
 
-static int sched_credit2_domain_output(
-    int domid)
+static int sched_credit2_params_set(int poolid,
+                                    libxl_sched_credit2_params *scinfo)
+{
+    if (libxl_sched_credit2_params_set(ctx, poolid, scinfo)) {
+        fprintf(stderr, "libxl_sched_credit2_params_set failed.\n");
+        return 1;
+    }
+
+    return 0;
+}
+
+static int sched_credit2_params_get(int poolid,
+                                    libxl_sched_credit2_params *scinfo)
+{
+    if (libxl_sched_credit2_params_get(ctx, poolid, scinfo)) {
+        fprintf(stderr, "libxl_sched_credit2_params_get failed.\n");
+        return 1;
+    }
+
+    return 0;
+}
+
+static int sched_credit2_domain_output(int domid)
 {
     char *domname;
     libxl_domain_sched_params scinfo;
@@ -6478,6 +6499,22 @@ static int sched_credit2_domain_output(
     return 0;
 }
 
+static int sched_credit2_pool_output(uint32_t poolid)
+{
+    libxl_sched_credit2_params scparam;
+    char *poolname = libxl_cpupoolid_to_name(ctx, poolid);
+
+    if (sched_credit2_params_get(poolid, &scparam))
+        printf("Cpupool %s: [sched params unavailable]\n", poolname);
+    else
+        printf("Cpupool %s: ratelimit=%dus\n",
+               poolname, scparam.ratelimit_us);
+
+    free(poolname);
+
+    return 0;
+}
+
 static int sched_rtds_domain_output(
     int domid)
 {
@@ -6577,17 +6614,6 @@ static int sched_rtds_pool_output(uint32_t poolid)
     return 0;
 }
 
-static int sched_default_pool_output(uint32_t poolid)
-{
-    char *poolname;
-
-    poolname = libxl_cpupoolid_to_name(ctx, poolid);
-    printf("Cpupool %s:\n",
-           poolname);
-    free(poolname);
-    return 0;
-}
-
 static int sched_domain_output(libxl_scheduler sched, int (*output)(int),
                                int (*pooloutput)(uint32_t), const char *cpupool)
 {
@@ -6833,17 +6859,22 @@ int main_sched_credit2(int argc, char **argv)
 {
     const char *dom = NULL;
     const char *cpupool = NULL;
+    int ratelimit = 0;
     int weight = 256;
+    bool opt_s = false;
+    bool opt_r = false;
     bool opt_w = false;
     int opt, rc;
     static struct option opts[] = {
         {"domain", 1, 0, 'd'},
         {"weight", 1, 0, 'w'},
+        {"schedparam", 0, 0, 's'},
+        {"ratelimit_us", 1, 0, 'r'},
         {"cpupool", 1, 0, 'p'},
         COMMON_LONG_OPTS
     };
 
-    SWITCH_FOREACH_OPT(opt, "d:w:p:", opts, "sched-credit2", 0) {
+    SWITCH_FOREACH_OPT(opt, "d:w:p:r:s", opts, "sched-credit2", 0) {
     case 'd':
         dom = optarg;
         break;
@@ -6851,6 +6882,13 @@ int main_sched_credit2(int argc, char **argv)
         weight = strtol(optarg, NULL, 10);
         opt_w = true;
         break;
+    case 's':
+        opt_s = true;
+        break;
+    case 'r':
+        ratelimit = strtol(optarg, NULL, 10);
+        opt_r = true;
+        break;
     case 'p':
         cpupool = optarg;
         break;
@@ -6866,10 +6904,31 @@ int main_sched_credit2(int argc, char **argv)
         return EXIT_FAILURE;
     }
 
-    if (!dom) { /* list all domain's credit scheduler info */
+    if (opt_s) {
+        libxl_sched_credit2_params scparam;
+        uint32_t poolid = 0;
+
+        if (cpupool) {
+            if (libxl_cpupool_qualifier_to_cpupoolid(ctx, cpupool,
+                                                     &poolid, NULL) ||
+                !libxl_cpupoolid_is_valid(ctx, poolid)) {
+                fprintf(stderr, "unknown cpupool \'%s\'\n", cpupool);
+                return EXIT_FAILURE;
+            }
+        }
+
+        if (!opt_r) { /* Output scheduling parameters */
+            if (sched_credit2_pool_output(poolid))
+                return EXIT_FAILURE;
+        } else {      /* Set scheduling parameters (so far, just ratelimit) */
+            scparam.ratelimit_us = ratelimit;
+            if (sched_credit2_params_set(poolid, &scparam))
+                return EXIT_FAILURE;
+        }
+    } else if (!dom) { /* list all domain's credit scheduler info */
         if (sched_domain_output(LIBXL_SCHEDULER_CREDIT2,
                                 sched_credit2_domain_output,
-                                sched_default_pool_output,
+                                sched_credit2_pool_output,
                                 cpupool))
             return EXIT_FAILURE;
     } else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 85c1e0f..a420415 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -265,6 +265,8 @@ struct cmd_spec cmd_table[] = {
       "[-d <Domain> [-w[=WEIGHT]]] [-p CPUPOOL]",
       "-d DOMAIN, --domain=DOMAIN     Domain to modify\n"
       "-w WEIGHT, --weight=WEIGHT     Weight (int)\n"
+      "-s         --schedparam        Query / modify scheduler parameters\n"
+      "-r RLIMIT, --ratelimit_us=RLIMIT Set the scheduling rate limit, in microseconds\n"
       "-p CPUPOOL, --cpupool=CPUPOOL  Restrict output to CPUPOOL"
     },
     { "sched-rtds",


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (14 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 15/24] xl: " Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-09-28 15:49   ` George Dunlap
  2016-08-17 17:19 ` [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
                   ` (10 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, Justin T. Weaver, George Dunlap

make it possible to use the various helpers from other
schedulers, e.g., for implementing soft affinity within
them.

Since we are touching the code, also make it start using
variables called v for struct_vcpu*, as it is preferrable.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit.c  |   98 +++++++-------------------------------------
 xen/include/xen/sched-if.h |   65 +++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 83 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 14b207d..5d5bba9 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -137,27 +137,6 @@
 #define TRC_CSCHED_SCHEDULE      TRC_SCHED_CLASS_EVT(CSCHED, 9)
 #define TRC_CSCHED_RATELIMIT     TRC_SCHED_CLASS_EVT(CSCHED, 10)
 
-
-/*
- * Hard and soft affinity load balancing.
- *
- * Idea is each vcpu has some pcpus that it prefers, some that it does not
- * prefer but is OK with, and some that it cannot run on at all. The first
- * set of pcpus are the ones that are both in the soft affinity *and* in the
- * hard affinity; the second set of pcpus are the ones that are in the hard
- * affinity but *not* in the soft affinity; the third set of pcpus are the
- * ones that are not in the hard affinity.
- *
- * We implement a two step balancing logic. Basically, every time there is
- * the need to decide where to run a vcpu, we first check the soft affinity
- * (well, actually, the && between soft and hard affinity), to see if we can
- * send it where it prefers to (and can) run on. However, if the first step
- * does not find any suitable and free pcpu, we fall back checking the hard
- * affinity.
- */
-#define CSCHED_BALANCE_SOFT_AFFINITY    0
-#define CSCHED_BALANCE_HARD_AFFINITY    1
-
 /*
  * Boot parameters
  */
@@ -287,53 +266,6 @@ __runq_remove(struct csched_vcpu *svc)
     list_del_init(&svc->runq_elem);
 }
 
-
-#define for_each_csched_balance_step(step) \
-    for ( (step) = 0; (step) <= CSCHED_BALANCE_HARD_AFFINITY; (step)++ )
-
-
-/*
- * Hard affinity balancing is always necessary and must never be skipped.
- * But soft affinity need only be considered when it has a functionally
- * different effect than other constraints (such as hard affinity, cpus
- * online, or cpupools).
- *
- * Soft affinity only needs to be considered if:
- * * The cpus in the cpupool are not a subset of soft affinity
- * * The hard affinity is not a subset of soft affinity
- * * There is an overlap between the soft affinity and the mask which is
- *   currently being considered.
- */
-static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
-                                           const cpumask_t *mask)
-{
-    return !cpumask_subset(cpupool_domain_cpumask(vc->domain),
-                           vc->cpu_soft_affinity) &&
-           !cpumask_subset(vc->cpu_hard_affinity, vc->cpu_soft_affinity) &&
-           cpumask_intersects(vc->cpu_soft_affinity, mask);
-}
-
-/*
- * Each csched-balance step uses its own cpumask. This function determines
- * which one (given the step) and copies it in mask. For the soft affinity
- * balancing step, the pcpus that are not part of vc's hard affinity are
- * filtered out from the result, to avoid running a vcpu where it would
- * like, but is not allowed to!
- */
-static void
-csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask)
-{
-    if ( step == CSCHED_BALANCE_SOFT_AFFINITY )
-    {
-        cpumask_and(mask, vc->cpu_soft_affinity, vc->cpu_hard_affinity);
-
-        if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, vc->cpu_hard_affinity);
-    }
-    else /* step == CSCHED_BALANCE_HARD_AFFINITY */
-        cpumask_copy(mask, vc->cpu_hard_affinity);
-}
-
 static void burn_credits(struct csched_vcpu *svc, s_time_t now)
 {
     s_time_t delta;
@@ -398,18 +330,18 @@ static inline void __runq_tickle(struct csched_vcpu *new)
          * Soft and hard affinity balancing loop. For vcpus without
          * a useful soft affinity, consider hard affinity only.
          */
-        for_each_csched_balance_step( balance_step )
+        for_each_affinity_balance_step( balance_step )
         {
             int new_idlers_empty;
 
-            if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
-                 && !__vcpu_has_soft_affinity(new->vcpu,
-                                              new->vcpu->cpu_hard_affinity) )
+            if ( balance_step == BALANCE_SOFT_AFFINITY
+                 && !has_soft_affinity(new->vcpu,
+                                       new->vcpu->cpu_hard_affinity) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
-            csched_balance_cpumask(new->vcpu, balance_step,
-                                   cpumask_scratch_cpu(cpu));
+            affinity_balance_cpumask(new->vcpu, balance_step,
+                                    cpumask_scratch_cpu(cpu));
             cpumask_and(cpumask_scratch_cpu(cpu),
                         cpumask_scratch_cpu(cpu), &idle_mask);
             new_idlers_empty = cpumask_empty(cpumask_scratch_cpu(cpu));
@@ -420,7 +352,7 @@ static inline void __runq_tickle(struct csched_vcpu *new)
              * hard affinity as well, before taking final decisions.
              */
             if ( new_idlers_empty
-                 && balance_step == CSCHED_BALANCE_SOFT_AFFINITY )
+                 && balance_step == BALANCE_SOFT_AFFINITY )
                 continue;
 
             /*
@@ -721,7 +653,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     online = cpupool_domain_cpumask(vc->domain);
     cpumask_and(&cpus, vc->cpu_hard_affinity, online);
 
-    for_each_csched_balance_step( balance_step )
+    for_each_affinity_balance_step( balance_step )
     {
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -741,12 +673,12 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * cpus and, if the result is empty, we just skip the soft affinity
          * balancing step all together.
          */
-        if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
-             && !__vcpu_has_soft_affinity(vc, &cpus) )
+        if ( balance_step == BALANCE_SOFT_AFFINITY
+             && !has_soft_affinity(vc, &cpus) )
             continue;
 
         /* Pick an online CPU from the proper affinity mask */
-        csched_balance_cpumask(vc, balance_step, &cpus);
+        affinity_balance_cpumask(vc, balance_step, &cpus);
         cpumask_and(&cpus, &cpus, online);
 
         /* If present, prefer vc's current processor */
@@ -1605,11 +1537,11 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
              * vCPUs with useful soft affinities in some sort of bitmap
              * or counter.
              */
-            if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
-                 && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
+            if ( balance_step == BALANCE_SOFT_AFFINITY
+                 && !has_soft_affinity(vc, vc->cpu_hard_affinity) )
                 continue;
 
-            csched_balance_cpumask(vc, balance_step, cpumask_scratch_cpu(cpu));
+            affinity_balance_cpumask(vc, balance_step, cpumask_scratch_cpu(cpu));
             if ( __csched_vcpu_is_migrateable(vc, cpu,
                                               cpumask_scratch_cpu(cpu)) )
             {
@@ -1665,7 +1597,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
      *  1. any "soft-affine work" to steal first,
      *  2. if not finding anything, any "hard-affine work" to steal.
      */
-    for_each_csched_balance_step( bstep )
+    for_each_affinity_balance_step( bstep )
     {
         /*
          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index bc0e794..496ed80 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -201,4 +201,69 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
     return d->cpupool->cpu_valid;
 }
 
+/*
+ * Hard and soft affinity load balancing.
+ *
+ * Idea is each vcpu has some pcpus that it prefers, some that it does not
+ * prefer but is OK with, and some that it cannot run on at all. The first
+ * set of pcpus are the ones that are both in the soft affinity *and* in the
+ * hard affinity; the second set of pcpus are the ones that are in the hard
+ * affinity but *not* in the soft affinity; the third set of pcpus are the
+ * ones that are not in the hard affinity.
+ *
+ * We implement a two step balancing logic. Basically, every time there is
+ * the need to decide where to run a vcpu, we first check the soft affinity
+ * (well, actually, the && between soft and hard affinity), to see if we can
+ * send it where it prefers to (and can) run on. However, if the first step
+ * does not find any suitable and free pcpu, we fall back checking the hard
+ * affinity.
+ */
+#define BALANCE_SOFT_AFFINITY    0
+#define BALANCE_HARD_AFFINITY    1
+
+#define for_each_affinity_balance_step(step) \
+    for ( (step) = 0; (step) <= BALANCE_HARD_AFFINITY; (step)++ )
+
+/*
+ * Hard affinity balancing is always necessary and must never be skipped.
+ * But soft affinity need only be considered when it has a functionally
+ * different effect than other constraints (such as hard affinity, cpus
+ * online, or cpupools).
+ *
+ * Soft affinity only needs to be considered if:
+ * * The cpus in the cpupool are not a subset of soft affinity
+ * * The hard affinity is not a subset of soft affinity
+ * * There is an overlap between the soft affinity and the mask which is
+ *   currently being considered.
+ */
+static inline int has_soft_affinity(const struct vcpu *v,
+                                    const cpumask_t *mask)
+{
+    return !cpumask_subset(cpupool_domain_cpumask(v->domain),
+                           v->cpu_soft_affinity) &&
+           !cpumask_subset(v->cpu_hard_affinity, v->cpu_soft_affinity) &&
+           cpumask_intersects(v->cpu_soft_affinity, mask);
+}
+
+/*
+ * This function determines copies in mask the cpumask that should be
+ * used for a particular affinity balancing step. For the soft affinity
+ * one, the pcpus that are not part of vc's hard affinity are filtered
+ * out from the result, to avoid running a vcpu where it would like,
+ * but is not allowed to!
+ */
+static inline void
+affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask)
+{
+    if ( step == BALANCE_SOFT_AFFINITY )
+    {
+        cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity);
+
+        if ( unlikely(cpumask_empty(mask)) )
+            cpumask_copy(mask, v->cpu_hard_affinity);
+    }
+    else /* step == BALANCE_HARD_AFFINITY */
+        cpumask_copy(mask, v->cpu_hard_affinity);
+}
+
 #endif /* __XEN_SCHED_IF_H__ */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle()
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (15 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-09-01 10:52   ` anshul makkar
  2016-09-28 20:44   ` George Dunlap
  2016-08-17 17:19 ` [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick() Dario Faggioli
                   ` (9 subsequent siblings)
  26 siblings, 2 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, Justin T. Weaver, George Dunlap

This is done by means of the "usual" two steps loop:
 - soft affinity balance step;
 - hard affinity balance step.

The entire logic implemented in runq_tickle() is
applied, during the first step, considering only the
CPUs in the vcpu's soft affinity. In the second step,
we fall back to use all the CPUs from its hard
affinity (as it is doing now, without this patch).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  243 ++++++++++++++++++++++++++++----------------
 1 file changed, 157 insertions(+), 86 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 0d83bd7..3aef1b4 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -902,6 +902,42 @@ __runq_remove(struct csched2_vcpu *svc)
     list_del_init(&svc->runq_elem);
 }
 
+/*
+ * During the soft-affinity step, only actually preempt someone if
+ * he does not have soft-affinity with cpu (while we have).
+ *
+ * BEWARE that this uses cpumask_scratch, trowing away what's in there!
+ */
+static inline bool_t soft_aff_check_preempt(unsigned int bs, unsigned int cpu)
+{
+    struct csched2_vcpu * cur = CSCHED2_VCPU(curr_on_cpu(cpu));
+
+    /*
+     * If we're doing hard-affinity, always check whether to preempt cur.
+     * If we're doing soft-affinity, but cur doesn't have one, check as well.
+     */
+    if ( bs == BALANCE_HARD_AFFINITY ||
+         !has_soft_affinity(cur->vcpu, cur->vcpu->cpu_hard_affinity) )
+        return 1;
+
+    /*
+     * We're doing soft-affinity, and we know that the current vcpu on cpu
+     * has a soft affinity. We now want to know whether cpu itself is in
+     * such affinity. In fact, since we now that new (in runq_tickle()) is:
+     *  - if cpu is not in cur's soft-affinity, we should indeed check to
+     *    see whether new should preempt cur. If that will be the case, that
+     *    would be an improvement wrt respecting soft affinity;
+     *  - if cpu is in cur's soft-affinity, we leave it alone and (in
+     *    runq_tickle()) move on to another cpu. In fact, we don't want to
+     *    be too harsh with someone which is running within its soft-affinity.
+     *    This is safe because later, if we don't fine anyone else during the
+     *    soft-affinity step, we will check cpu for preemption anyway, when
+     *    doing hard-affinity.
+     */
+    affinity_balance_cpumask(cur->vcpu, BALANCE_SOFT_AFFINITY, cpumask_scratch);
+    return !cpumask_test_cpu(cpu, cpumask_scratch);
+}
+
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
 
 /*
@@ -925,7 +961,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t lowest = (1<<30);
-    unsigned int cpu = new->vcpu->processor;
+    unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
     cpumask_t mask;
     struct csched2_vcpu * cur;
@@ -947,109 +983,144 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
-    /*
-     * First of all, consider idle cpus, checking if we can just
-     * re-use the pcpu where we were running before.
-     *
-     * If there are cores where all the siblings are idle, consider
-     * them first, honoring whatever the spreading-vs-consolidation
-     * SMT policy wants us to do.
-     */
-    if ( unlikely(sched_smt_power_savings) )
-        cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
-    else
-        cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
+    for_each_affinity_balance_step( bs )
     {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
-    }
+        /*
+         * First things first: if we are at the first (soft affinity) step,
+         * but new doesn't have a soft affinity, skip this step.
+         */
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) )
+            continue;
 
-    /*
-     * If there are no fully idle cores, check all idlers, after
-     * having filtered out pcpus that have been tickled but haven't
-     * gone through the scheduler yet.
-     */
-    cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
-    {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
-    }
+        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch);
 
-    /*
-     * Otherwise, look for the non-idle (and non-tickled) processors with
-     * the lowest credit, among the ones new is allowed to run on. Again,
-     * the cpu were it was running on would be the best candidate.
-     */
-    cpumask_andnot(&mask, &rqd->active, &rqd->idle);
-    cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
-    if ( cpumask_test_cpu(cpu, &mask) )
-    {
-        cur = CSCHED2_VCPU(curr_on_cpu(cpu));
-        burn_credits(rqd, cur, now);
+        /*
+         * First of all, consider idle cpus, checking if we can just
+         * re-use the pcpu where we were running before.
+         *
+         * If there are cores where all the siblings are idle, consider
+         * them first, honoring whatever the spreading-vs-consolidation
+         * SMT policy wants us to do.
+         */
+        if ( unlikely(sched_smt_power_savings) )
+            cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
+        else
+            cpumask_copy(&mask, &rqd->smt_idle);
+        cpumask_and(&mask, &mask, cpumask_scratch);
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
+        {
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
+            goto tickle;
+        }
 
-        if ( cur->credit < new->credit )
+        /*
+         * If there are no fully idle cores, check all idlers, after
+         * having filtered out pcpus that have been tickled but haven't
+         * gone through the scheduler yet.
+         */
+        cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
+        cpumask_and(&mask, &mask, cpumask_scratch);
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
         {
-            SCHED_STAT_CRANK(tickled_busy_cpu);
-            ipid = cpu;
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
             goto tickle;
         }
-    }
 
-    for_each_cpu(i, &mask)
-    {
-        /* Already looked at this one above */
-        if ( i == cpu )
-            continue;
+        /*
+         * Otherwise, look for the non-idle (and non-tickled) processors with
+         * the lowest credit, among the ones new is allowed to run on. Again,
+         * the cpu were it was running on would be the best candidate.
+         */
+        cpumask_andnot(&mask, &rqd->active, &rqd->idle);
+        cpumask_andnot(&mask, &mask, &rqd->tickled);
+        cpumask_and(&mask, &mask, cpumask_scratch);
+        if ( cpumask_test_cpu(cpu, &mask) )
+        {
+            cur = CSCHED2_VCPU(curr_on_cpu(cpu));
 
-        cur = CSCHED2_VCPU(curr_on_cpu(i));
+            if ( soft_aff_check_preempt(bs, cpu) )
+            {
+                burn_credits(rqd, cur, now);
+
+                if ( unlikely(tb_init_done) )
+                {
+                    struct {
+                        unsigned vcpu:16, dom:16;
+                        unsigned cpu, credit;
+                    } d;
+                    d.dom = cur->vcpu->domain->domain_id;
+                    d.vcpu = cur->vcpu->vcpu_id;
+                    d.credit = cur->credit;
+                    d.cpu = cpu;
+                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
+                                sizeof(d),
+                                (unsigned char *)&d);
+                }
+
+                if ( cur->credit < new->credit )
+                {
+                    SCHED_STAT_CRANK(tickled_busy_cpu);
+                    ipid = cpu;
+                    goto tickle;
+                }
+            }
+        }
 
-        ASSERT(!is_idle_vcpu(cur->vcpu));
+        for_each_cpu(i, &mask)
+        {
+            /* Already looked at this one above */
+            if ( i == cpu )
+                continue;
 
-        /* Update credits for current to see if we want to preempt. */
-        burn_credits(rqd, cur, now);
+            cur = CSCHED2_VCPU(curr_on_cpu(i));
+            ASSERT(!is_idle_vcpu(cur->vcpu));
 
-        if ( cur->credit < lowest )
-        {
-            ipid = i;
-            lowest = cur->credit;
+            if ( soft_aff_check_preempt(bs, i) )
+            {
+                /* Update credits for current to see if we want to preempt. */
+                burn_credits(rqd, cur, now);
+
+                if ( unlikely(tb_init_done) )
+                {
+                    struct {
+                        unsigned vcpu:16, dom:16;
+                        unsigned cpu, credit;
+                    } d;
+                    d.dom = cur->vcpu->domain->domain_id;
+                    d.vcpu = cur->vcpu->vcpu_id;
+                    d.credit = cur->credit;
+                    d.cpu = i;
+                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
+                                sizeof(d),
+                                (unsigned char *)&d);
+                }
+
+                if ( cur->credit < lowest )
+                {
+                    ipid = i;
+                    lowest = cur->credit;
+                }
+            }
         }
 
-        if ( unlikely(tb_init_done) )
+        /*
+         * Only switch to another processor if the credit difference is
+         * greater than the migrate resistance.
+         */
+        if ( ipid != -1 && lowest + CSCHED2_MIGRATE_RESIST <= new->credit )
         {
-            struct {
-                unsigned vcpu:16, dom:16;
-                unsigned cpu, credit;
-            } d;
-            d.dom = cur->vcpu->domain->domain_id;
-            d.vcpu = cur->vcpu->vcpu_id;
-            d.credit = cur->credit;
-            d.cpu = i;
-            __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
+            SCHED_STAT_CRANK(tickled_busy_cpu);
+            goto tickle;
         }
     }
 
-    /*
-     * Only switch to another processor if the credit difference is
-     * greater than the migrate resistance.
-     */
-    if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit )
-    {
-        SCHED_STAT_CRANK(tickled_no_cpu);
-        return;
-    }
-
-    SCHED_STAT_CRANK(tickled_busy_cpu);
+    SCHED_STAT_CRANK(tickled_no_cpu);
+    return;
  tickle:
     BUG_ON(ipid == -1);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (16 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-09-01 11:08   ` anshul makkar
  2016-09-29 11:11   ` George Dunlap
  2016-08-17 17:19 ` [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing Dario Faggioli
                   ` (8 subsequent siblings)
  26 siblings, 2 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, Justin T. Weaver, George Dunlap

For get_fallback_cpu(), by putting in place the "usual"
two steps (soft affinity step and hard affinity step)
loop. We just move the core logic of the function inside
the body of the loop itself.

For csched2_cpu_pick(), what is important is to find
the runqueue with the least average load. Currently,
we do that by looping on all runqueues and checking,
well, their load. For soft affinity, we want to know
which one is the runqueue with the least load, among
the ones where the vcpu would prefer to be assigned.

We find both the least loaded runqueue among the soft
affinity "friendly" ones, and the overall least loaded
one, in the same pass.

(Also, kill a spurious ';' when defining MAX_LOAD.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  136 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 111 insertions(+), 25 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3aef1b4..2d7228a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -506,34 +506,68 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * When a hard affinity change occurs, we may not be able to check some
- * (any!) of the other runqueues, when looking for the best new processor
- * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
- * pick, in order of decreasing preference:
- *  - svc's current pcpu;
- *  - another pcpu from svc's current runq;
- *  - any cpu.
+ * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
+ * we pick, in order of decreasing preference:
+ *  1) svc's current pcpu, if it is part of svc's soft affinity;
+ *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
+ *  3) just one valid pcpu from svc's soft affinity;
+ *  4) svc's current pcpu, if it is part of svc's hard affinity;
+ *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
+ *  6) just one valid pcpu from svc's hard affinity
+ *
+ * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
+ * note that at least 6 is guaranteed to _always_ return at least one pcpu.
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
     int cpu;
+    unsigned int bs;
 
-    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
-                                 svc->vcpu->cpu_hard_affinity)) )
-        return svc->vcpu->processor;
+    for_each_affinity_balance_step( bs )
+    {
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(svc->vcpu, svc->vcpu->cpu_hard_affinity) )
+            continue;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    cpu = cpumask_first(cpumask_scratch);
-    if ( likely(cpu < nr_cpu_ids) )
-        return cpu;
+        affinity_balance_cpumask(svc->vcpu, bs, cpumask_scratch);
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+        /*
+         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
+         * in our affinity, go for it, for cache betterness.
+         */
+        if ( likely(cpumask_test_cpu(svc->vcpu->processor,
+                                     cpumask_scratch)) )
+            return svc->vcpu->processor;
 
-    ASSERT(!cpumask_empty(cpumask_scratch));
+        /*
+         * This is cases 2 or 5 (depending on bsp): v->processor isn't there
+         * any longer, check if we at least can stay in our current runq.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    &svc->rqd->active);
+        cpu = cpumask_first(cpumask_scratch);
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
 
-    return cpumask_first(cpumask_scratch);
+        /*
+         * This is cases 3 or 6 (depending on bs): last stand, just one valid
+         * pcpu from our soft affinity, if we have one and if there's any. In
+         * fact, if we are doing soft-affinity, it is possible that we fail,
+         * which means we stay in the loop and look for hard affinity. OTOH,
+         * if we are at the hard-affinity balancing step, it's guaranteed that
+         * there is at least one valid cpu, and therefore we are sure that we
+         * return it, and never really exit the loop.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        ASSERT(!cpumask_empty(cpumask_scratch) || bs == BALANCE_SOFT_AFFINITY);
+        cpu = cpumask_first(cpumask_scratch);
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
+    }
+    BUG_ON(1);
+    return -1;
 }
 
 /*
@@ -1561,14 +1595,15 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
     vcpu_schedule_unlock_irq(lock, vc);
 }
 
-#define MAX_LOAD (STIME_MAX);
+#define MAX_LOAD (STIME_MAX)
 static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
-    int i, min_rqi = -1, new_cpu;
+    int i, min_rqi = -1, min_s_rqi = -1, new_cpu;
     struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
-    s_time_t min_avgload = MAX_LOAD;
+    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
+    bool_t has_soft;
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
@@ -1613,6 +1648,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
         else
         {
+            /*
+             * If we've been asked to move to migrate_rqd, we should just do
+             * that, which we actually do by returning one cpu from that runq.
+             * There is no need to take care of soft affinity, as that will
+             * happen in runq_tickle().
+             */
             cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_any(cpumask_scratch);
@@ -1622,7 +1663,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         /* Fall-through to normal cpu pick */
     }
 
-    /* Find the runqueue with the lowest average load. */
+    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
+    if ( has_soft )
+        affinity_balance_cpumask(vc, BALANCE_SOFT_AFFINITY, cpumask_scratch);
+
+    /*
+     * What we want is:
+     *  - if we have soft affinity, the runqueue with the lowest average
+     *    load, among the ones that contain cpus in our soft affinity; this
+     *    represents the best runq on which we would want to run.
+     *  - the runqueue with the lowest average load among the ones that
+     *    contains cpus in our hard affinity; this represent the best runq
+     *    on which we can run.
+     *
+     * Find both runqueues in one pass.
+     */
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -1656,6 +1711,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
             spin_unlock(&rqd->lock);
         }
 
+        if ( has_soft &&
+             rqd_avgload < min_s_avgload &&
+             cpumask_intersects(cpumask_scratch, &rqd->active) )
+        {
+            min_s_avgload = rqd_avgload;
+            min_s_rqi = i;
+        }
         if ( rqd_avgload < min_avgload )
         {
             min_avgload = rqd_avgload;
@@ -1663,9 +1725,33 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention). */
-    if ( min_rqi == -1 )
+    if ( has_soft && min_s_rqi != -1 )
+    {
+        /*
+         * We have soft affinity, and we have a candidate runq, so go for it.
+         *
+         * Note that, since has_soft is true, cpumask_scratch holds the proper
+         * soft-affinity mask.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    &prv->rqd[min_s_rqi].active);
+    }
+    else if ( min_rqi != -1 )
     {
+        /*
+         * Either we don't have soft affinity, or we do, but we did not find
+         * any suitable runq. But we did find one when considering hard
+         * affinity, so go for it.
+         */
+        cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+                    &prv->rqd[min_rqi].active);
+    }
+    else
+    {
+        /*
+         * We didn't find anyone at all (most likely because of spinlock
+         * contention).
+         */
         new_cpu = get_fallback_cpu(svc);
         min_rqi = c2r(ops, new_cpu);
         min_avgload = prv->rqd[min_rqi].b_avgload;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (17 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick() Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-09-02 11:46   ` anshul makkar
  2016-08-17 17:19 ` [PATCH 20/24] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
                   ` (7 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

We want is soft-affinity to play a role in load
balancing, i.e., when deciding whether or not to
push this vcpu from here to there, and/or pull that
other vcpu from there to here.

A way of doing that, is considering the following
(for both pushes and pulls, just with the roles of
current an new runqueues inverted):
 - how much affinity does the vcpu have with
   its current runq?
 - how much affinity will the vcpu it have with
   its new runq?

We call this 'degree of soft-affinity' of a vcpu
to a runq, and define it as, informally speaking,
a quantity that is proportional to the fraction of
pcpus of a runq that a vcpu has soft-affinity with.

Then, we use this 'degree of soft-affinity' to
compute a value that is used as a modifier of the
baseline --purely load based-- results of the load
balancer, we apply it (potentially, with a scaling
factor), and use the modified result for the actual
load balancing decision.

This modifier based approach is chosen because it
integrates well into the existing load balancing
framework, it is modular and can easily accommodate
further extensions.

A note on performance and optimization: since we
call (potentially) call consider() O(nr_vcpus^2)
times, we absolutely need that it is lean and
quick. Therefore, a bunch of things are
pre-calculated outside of it. This makes things
look less encapsulated and clean, but at the same
time, makes the code faster (and this is a critical
path, so we want it fast!).

Finally, this patch does not interfere with the
load balancing triggering logic. This is to say
that vcpus running outside of their soft-affinity
_don't_ trigger additional load balancing point.
Early numbers show that this is ok, but it well
may be the case that we will want to introduce
something like that at some point.

(Oh, and while there, just a couple of style fixes
are also done.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  359 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 326 insertions(+), 33 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 2d7228a..3722f46 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1786,19 +1786,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
     return new_cpu;
 }
 
-/* Working state of the load-balancing algorithm */
+/* Working state of the load-balancing algorithm. */
 typedef struct {
-    /* NB: Modified by consider() */
+    /* NB: Modified by consider(). */
     s_time_t load_delta;
     struct csched2_vcpu * best_push_svc, *best_pull_svc;
-    /* NB: Read by consider() */
+    /* NB: Read by consider() (and the various consider_foo() functions). */
     struct csched2_runqueue_data *lrqd;
-    struct csched2_runqueue_data *orqd;                  
+    struct csched2_runqueue_data *orqd;
+    bool_t push_has_soft_aff, pull_has_soft_aff;
+    s_time_t push_soft_aff_load, pull_soft_aff_load;
 } balance_state_t;
 
-static void consider(balance_state_t *st, 
-                     struct csched2_vcpu *push_svc,
-                     struct csched2_vcpu *pull_svc)
+static inline s_time_t consider_load(balance_state_t *st,
+                                     struct csched2_vcpu *push_svc,
+                                     struct csched2_vcpu *pull_svc)
 {
     s_time_t l_load, o_load, delta;
 
@@ -1821,11 +1823,166 @@ static void consider(balance_state_t *st,
     if ( delta < 0 )
         delta = -delta;
 
+    return delta;
+}
+
+/*
+ * Load balancing and soft-affinity.
+ *
+ * When trying to figure out whether or not it's best to move a vcpu from
+ * one runqueue to another, we must keep soft-affinity in mind. Intuitively
+ * we would want to know the following:
+ *  - 'how much' affinity does the vcpu have with its current runq?
+ *  - 'how much' affinity will it have with its new runq?
+ *
+ * But we certainly need to be more precise about how much it is that 'how
+ * much'! Let's start with some definitions:
+ *
+ *  - let v be a vcpu, running in runq I, with soft-affinity to vi
+ *    pcpus of runq I, and soft affinity with vj pcpus of runq J;
+ *  - let k be another vcpu, running in runq J, with soft-affinity to kj
+ *    pcpus of runq J, and with ki pcpus of runq I;
+ *  - let runq I have Ci pcpus, and runq J Cj pcpus;
+ *  - let vcpu v have an average load of lv, and k an average load of lk;
+ *  - let runq I have an average load of Li, and J an average load of Lj.
+ *
+ * We also define the following::
+ *
+ *  - lvi = lv * (vi / Ci)  as the 'perceived load' of v, when running
+ *                          in runq i;
+ *  - lvj = lv * (vj / Cj)  as the 'perceived load' of v, it running
+ *                          in runq j;
+ *  - the same for k, mutatis mutandis.
+ *
+ * Idea is that vi/Ci (i.e., the ratio of the number of cpus of a runq that
+ * a vcpu has soft-affinity with, over the total number of cpus of the runq
+ * itself) can be seen as the 'degree of soft-affinity' of v to runq I (and
+ * vj/Cj the one of v to J). In other words, we define the degree of soft
+ * affinity of a vcpu to a runq as what fraction of pcpus of the runq itself
+ * the vcpu has soft-affinity with. Then, we multiply this 'degree of
+ * soft-affinity' by the vcpu load, and call the result the 'perceived load'.
+ *
+ * Basically, if a soft-affinity is defined, the work done by a vcpu on a
+ * runq to which it has higher degree of soft-affinity, is considered
+ * 'lighter' than the same work done by the same vcpu on a runq to which it
+ * has smaller degree of soft-affinity (degree of soft affinity is <= 1). In
+ * fact, if soft-affinity is used to achieve NUMA-aware scheduling, the higher
+ * the degree of soft-affinity of the vcpu to a runq, the greater the probability
+ * of accessing local memory, when running on such runq. And that is certainly\
+ * 'lighter' than having to fetch memory from remote NUMA nodes.
+ *
+ * SoXX, evaluating pushing v from I to J would mean removing (from I) a
+ * perceived load of lv*(vi/Ci) and adding (to J) a perceived load of
+ * lv*(vj/Cj), which we (looking at things from the point of view of I,
+ * which is what balance_load() does) can call D_push:
+ *
+ *  - D_push = -lv * (vi / Ci) + lv * (vj / Cj) =
+ *           = lv * (vj/Cj - vi/Ci)
+ *
+ * On the other hand, pulling k from J to I would entail a D_pull:
+ *
+ *  - D_pull = lk * (ki / Ci) - lk * (kj / Cj) =
+ *           = lk * (ki/Ci - kj/Cj)
+ *
+ * Note that if v (k) has soft-afinity with all the cpus of both I and J,
+ * D_push (D_pull) will be 0, and the same is true in case it has no soft
+ * affinity at all with any of the cpus of I and J. Note also that both
+ * D_push and D_pull can be positive or negative (there's no abs() around
+ * in this case!) depending on the relationship between the degrees of soft
+ * affinity of the vcpu to I and J.
+ *
+ * If there is no soft-affinity, load_balance() (actually, consider()) acts
+ * as follows:
+ *
+ *  - D = abs(Li - Lj)
+ *  - consider pushing v from I to J:
+ *     - D' = abs(Li - lv - (Lj + lv))   (from now, abs(x) == |x|)
+ *     - if (D' < D) { push }
+ *  - consider pulling k from J to I:
+ *     - D' = |Li + lk - (Lj - lk)|
+ *     - if (D' < D) { pull }
+ *  - consider both push and pull:
+ *     - D' = |Li - lv + lk - (Lj + lv - lk)|
+ *     - if (D' < D) { push; pull }
+ *
+ * In order to make soft-affinity part of the process, we use D_push and
+ * D_pull, so that, the final behavior will look like this:
+ *
+ *  - D = abs(Li - Lj)
+ *  - consider pushing v from I to J:
+ *     - D' = |Li - lv - (Lj + lv)|
+ *     - D_push = lv * (vj/Cj - vi/Ci)
+ *     - if (D' + D_push < D) { push }
+ *  - consider pulling k from J to I:
+ *     - D' = |Li + lk - (Lj - lk)|
+ *       D_pull = lk * (ki/Ci - kj/Cj)
+ *     - if (D' < D) { pull }
+ *  - consider both push and pull:
+ *     - D' = |Li - lv + lk - (Lj + lv - lk)|
+ *     - D_push = lv * (vj/Cj - vi/Ci)
+ *       D_pull = lk * (ki/Ci - kj/Cj)
+ *     - if (D' + D_push + D_pull < D) { push; pull }
+ *
+ * So, for instance, the complete formula, in case of a push, with soft
+ * affinity being considered looks like this:
+ *
+ *  - D'' = D' + D_push =
+ *        = |Li - lv - (Lj + lv)| + lv*(vj/Cj - vi/Ci)
+ *
+ * which highlights how soft-affinity being considered acts as a *modifier*
+ * of the "normal" results obtained by just using the actual vcpus loads.
+ * This approach is modular, in the sense that it only takes implementing
+ * another function that returns another modifier, to make the load balancer
+ * consider some other factor or characteristics of the vcpus.
+ *
+ * Finally there is the scope for actually using a scaling factor, to limit
+ * the influence that soft-affinity will actually have on baseline results
+ * from consider_load(). Basically, that means that instead of D_push and/or
+ * D_pull, we'll be adding D_push/S and/or D_pull/S (with S the scaling
+ * factor). Check prep_soft_aff_load() for details on this.
+ */
+
+static inline s_time_t consider_soft_affinity(balance_state_t *st,
+                                              struct csched2_vcpu *push_svc,
+                                              struct csched2_vcpu *pull_svc)
+{
+    s_time_t push_load = push_svc ? st->push_soft_aff_load : 0;
+    s_time_t pull_load = pull_svc ? st->pull_soft_aff_load : 0;
+
+    /*
+     * This is potentially called many times, and a few of them, on the same
+     * vcpu(s). Therefore, all the expensive operations (e.g., the cpumask
+     * manipulations) are done in balance_load(), in the attempt of amortizing
+     * the cost, and all that remains to be done here is return the proper
+     * combination of results.
+     */
+    return push_load + pull_load;
+}
+
+static void consider(balance_state_t *st,
+                     struct csched2_vcpu *push_svc,
+                     struct csched2_vcpu *pull_svc)
+{
+    s_time_t delta, delta_soft_aff;
+
+    /*
+     * The idea here is:
+     *  - consider_load() is the basic step. It compares what would happen
+     *    if the requested combination of pushes and pulls is done, using
+     *    the actual load of the vcpus being considered;
+     *  - subsequent steps return a *modifier* for the result obtained
+     *    above, which is then applied, before drawing conclusions.
+     */
+    delta = consider_load(st, push_svc, pull_svc);
+    delta_soft_aff = consider_soft_affinity(st, push_svc, pull_svc);
+
+    delta += delta_soft_aff;
+
     if ( delta < st->load_delta )
     {
         st->load_delta = delta;
-        st->best_push_svc=push_svc;
-        st->best_pull_svc=pull_svc;
+        st->best_push_svc = push_svc;
+        st->best_pull_svc = pull_svc;
     }
 }
 
@@ -1901,12 +2058,101 @@ static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
            cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
 }
 
+/*
+ * Compute the load modifiers to be used in consider() and store them in st.
+ *
+ * aff_shift gives the chance to change how much soft-affinity should affect
+ * load balancing, i.e., the scaling factor introduced above.
+ *
+ * It's a shift width and using for it the same value as prec_shift will make
+ * the scaling factor 1. Using smaller values, will exponentially decrease
+ * the impact of soft-affinity. E.g., aff_shift=prec_shift-1 would make the
+ * scaling factor be 2 (S=2 in the formulas above), which in turn means
+ * D_push and D_pull will be halved, and so on and so forth.
+ */
+static inline s_time_t prep_soft_aff_load(const struct csched2_vcpu *svc,
+                                          const balance_state_t *st,
+                                          unsigned int nr_cpus_lrq,
+                                          unsigned int nr_cpus_orq,
+                                          unsigned int prec_shift,
+                                          unsigned int aff_shift,
+                                          bool_t is_push)
+{
+    s_time_t l_weight, o_weight;
+    s_time_t load, weight;
+    cpumask_t aff_mask;
+
+    ASSERT(aff_shift <= prec_shift);
+
+    /* Compute the degree of soft-affinity of svc to both lrq and orq. */
+    affinity_balance_cpumask(svc->vcpu, BALANCE_SOFT_AFFINITY, &aff_mask);
+    cpumask_and(cpumask_scratch, &aff_mask, &st->lrqd->active);
+    l_weight = (cpumask_weight(cpumask_scratch) << aff_shift) / nr_cpus_lrq;
+    cpumask_and(cpumask_scratch, &aff_mask, &st->orqd->active);
+    o_weight = (cpumask_weight(cpumask_scratch) << aff_shift) / nr_cpus_orq;
+
+    if ( l_weight >= o_weight )
+    {
+        weight = l_weight - o_weight;
+        load = 1;
+    }
+    else
+    {
+        weight = o_weight - l_weight;
+        load = -1;
+    }
+
+    if ( !is_push )
+        weight = -weight;
+
+    load *= (svc->avgload * weight) >> prec_shift;
+
+    /*
+     * Remember that what we are after is actually a modifier. So, for
+     * instance, let our vcpu be v, with load 30%, let's consider pushing
+     * it from runq I, to which it has a degree of soft-affinity of 3/8, to
+     * J, to which it has a degree of soft-affinity of 6/8, and let's say
+     * load of I is 40% and load of J is 15%
+     *
+     * Plain load calculations (i.e., no soft affinity involved) are as
+     * follows:
+     *  - D = abs(40 - 15) = 25
+     *  - consider_load:
+     *    - D' = abs(40 - 30 - (15 + 30)) = abs(10 - 45) = 35
+     *  - D' > D ==> don't push
+     *
+     * And this indeed will be the result returned by consider_load(). Now
+     * we need D_push (let's assume a scaling factor of 1). Following the
+     * code above:
+     *
+     *  - l_weight = 3/8 = 0.375
+     *  - o_weight = 6/8 = 0.75
+     *  - weight = 0.75 - 0.375 = 0.375, load = -1
+     *  - load = -1 * 30 * 0.375 = -11.25
+     *
+     * which, once back in consider() would mean:
+     *  - D = 25
+     *  - D' = 35
+     *  - D_push = -11.25
+     *  - D' + D_push = 35 - 11.25 = 23.75
+     *  - D ' + D_push < D ==> *push*
+     *
+     * which means considering soft-affinity changed the original load
+     * balancer decision, and seems to makes sense, considering that we'd be
+     * moving v from a runq where it only has affinity with 3 vcpus (out of
+     * 8) to one where it has twice as much of that.
+     */
+    ASSERT(load >= 0 ? svc->avgload >= load : svc->avgload >= -load);
+    return load;
+}
+
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
     int i, max_delta_rqi = -1;
     struct list_head *push_iter, *pull_iter;
-    bool_t inner_load_updated = 0;
+    unsigned int nr_cpus_lrq, nr_cpus_orq;
+    bool_t inner_loop_done_once = 0;
 
     balance_state_t st = { .best_push_svc = NULL, .best_pull_svc = NULL };
 
@@ -1962,15 +2208,13 @@ retry:
         s_time_t load_max;
         int cpus_max;
 
-        
         load_max = st.lrqd->b_avgload;
         if ( st.orqd->b_avgload > load_max )
             load_max = st.orqd->b_avgload;
 
-        cpus_max = cpumask_weight(&st.lrqd->active);
-        i = cpumask_weight(&st.orqd->active);
-        if ( i > cpus_max )
-            cpus_max = i;
+        cpus_max = nr_cpus_lrq = cpumask_weight(&st.lrqd->active);
+        nr_cpus_orq = cpumask_weight(&st.orqd->active);
+        cpus_max = nr_cpus_orq > cpus_max ? nr_cpus_orq : cpus_max;
 
         if ( unlikely(tb_init_done) )
         {
@@ -2033,6 +2277,8 @@ retry:
      * FIXME: O(n^2)! */
 
     /* Reuse load delta (as we're trying to minimize it) */
+    st.load_delta = st.load_delta;
+
     list_for_each( push_iter, &st.lrqd->svc )
     {
         struct csched2_vcpu * push_svc = list_entry(push_iter, struct csched2_vcpu, rqd_elem);
@@ -2042,34 +2288,81 @@ retry:
         if ( !vcpu_is_migrateable(push_svc, st.orqd) )
             continue;
 
+        /*
+         * Soft affinity related code being present here, is the price we pay
+         * to performance. In fact, ideally, this would all go into
+         * consider_soft_aff(). However, that would mean doing all it's done
+         * here (i.e., cpumask stuff in has_soft_affinity() and both cpumask
+         * stuff and medium heavy math in prep_soft_aff_load()), for the *same*
+         * vcpu, 1+nr_vcpus_in_orq times!
+         *
+         * So, yes, it's less beautiful and modular than it could have been,
+         * but this is an hot path, and we can't afford being that beautiful
+         * and modular.
+         */
+        if ( has_soft_affinity(push_svc->vcpu,
+                               push_svc->vcpu->cpu_hard_affinity) )
+        {
+            st.push_has_soft_aff = 1;
+            st.push_soft_aff_load = prep_soft_aff_load(push_svc, &st,
+                                                       nr_cpus_lrq,
+                                                       nr_cpus_orq,
+                                                       prv->load_precision_shift,
+                                                       prv->load_precision_shift - 0,
+                                                       1 /* is a push */);
+        }
+        else
+        {
+            st.push_has_soft_aff = 0;
+            st.push_soft_aff_load = 0;
+        }
+
+        /* Consider push only. */
+        consider(&st, push_svc, NULL);
+
         list_for_each( pull_iter, &st.orqd->svc )
         {
             struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
             
-            if ( !inner_load_updated )
+            if ( !inner_loop_done_once )
                 __update_svc_load(ops, pull_svc, 0, now);
-        
+
             if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
                 continue;
 
-            consider(&st, push_svc, pull_svc);
-        }
-
-        inner_load_updated = 1;
-
-        /* Consider push only */
-        consider(&st, push_svc, NULL);
-    }
+            /*
+             * Same argument as above about modularity. For pulls, it would
+             * be a little less of a problem having stuff done in
+             * consider_soft_aff(), as we'd be "only" doing the same ops on
+             * the same vcpus twice, but:
+             *  - doing them twice is still worse than doing them once;
+             *  - since push side is like this, it's better to be consistent.
+             */
+            if ( has_soft_affinity(pull_svc->vcpu,
+                                   pull_svc->vcpu->cpu_hard_affinity) )
+            {
+                st.pull_has_soft_aff = 1;
+                st.pull_soft_aff_load = prep_soft_aff_load(pull_svc, &st,
+                                                           nr_cpus_lrq,
+                                                           nr_cpus_orq,
+                                                           prv->load_precision_shift,
+                                                           prv->load_precision_shift - 0,
+                                                           0 /* is a pull */);
+            }
+            else
+            {
+                st.pull_has_soft_aff = 0;
+                st.pull_soft_aff_load = 0;
+            }
 
-    list_for_each( pull_iter, &st.orqd->svc )
-    {
-        struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
-        
-        if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
-            continue;
+            /* Consider pull only. */
+            if ( !inner_loop_done_once )
+                consider(&st, NULL, pull_svc);
 
-        /* Consider pull only */
-        consider(&st, NULL, pull_svc);
+            /* Consider both push and pull. */
+            consider(&st, push_svc, pull_svc);
+        }
+        inner_loop_done_once = 1;
     }
 
     /* OK, now we have some candidates; do the moving */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 20/24] xen: credit2: kick away vcpus not running within their soft-affinity
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (18 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing Dario Faggioli
@ 2016-08-17 17:19 ` Dario Faggioli
  2016-08-17 17:20 ` [PATCH 21/24] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

If, during scheduling, we realize that the current vcpu
is running outside of its own soft-affinity, it would be
preferable to send it somewhere else.

Of course, that may not be possible, and if we're too
strict, we risk having vcpus sit in runqueues, even if
there are idle pcpus (violating work-conservingness).
In fact, what about there are no pcpus, from the soft
affinity mask of the vcpu in question, where it can
run?

To make sure we don't fall in the above described trap,
only actually de-schedule the vcpu if there are idle and
not already tickled cpus from its soft affinity where it
can run immediately.

If there is (at least one) of such cpus, we let current
be preempted, so that csched2_context_saved() will put
it back in the runq, and runq_tickle() will wake (one
of) the cpu.

If there is not even one, we let current run where it is,
as running outside its soft-affinity is still better than
not running at all.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |   34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3722f46..ab0122b 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2732,6 +2732,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      */
     int yield_bias = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) ?
                      CSCHED2_YIELD_BIAS : 0;
+    bool_t cpu_in_soft_aff = 1;
 
     /*
      * Return the current vcpu if it has executed for less than ratelimit.
@@ -2768,8 +2769,37 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         return scurr;
     }
 
-    /* Default to current if runnable, idle otherwise */
-    if ( vcpu_runnable(scurr->vcpu) )
+    if ( !is_idle_vcpu(scurr->vcpu) &&
+         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
+    {
+        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
+                                 cpumask_scratch);
+        cpu_in_soft_aff = cpumask_test_cpu(cpu, cpumask_scratch);
+        /* Idle and not-tickled cpus from scurr's soft-affinity. */
+        cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
+        cpumask_andnot(cpumask_scratch, cpumask_scratch, &rqd->tickled);
+    }
+
+    /*
+     * If scurr is runnable, and this cpu is in its soft-affinity, default to
+     * it. We also default to it, even if cpu is not in its soft-affinity, if
+     * there aren't any idle and not tickled cpu in its soft-affinity. In
+     * fact, we don't want to risk leaving scurr in the runq and this cpu idle
+     * only because it running outside of its soft-affinity.
+     *
+     * On the other hand, if cpu is not in scurr's soft-affinity, and there
+     * looks to be better options, go for them. That happens by defaulting to
+     * idle here, which means scurr will be preempted, put back in runq, and
+     * one of those idle and not tickled cpus from its soft affinity will be
+     * tickled to pick it up.
+     *
+     * If scurr does not have a valid soft-affinity, we allow it to continue
+     * run here (that's why cpu_in_soft_aff is initialized to 1).
+     *
+     * Of course, we also default to idle also if scurr is not runnable.
+     */
+    if ( vcpu_runnable(scurr->vcpu) &&
+         (cpu_in_soft_aff || cpumask_empty(cpumask_scratch)) )
         snext = scurr;
     else
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 21/24] xen: credit2: optimize runq_candidate() a little bit
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (19 preceding siblings ...)
  2016-08-17 17:19 ` [PATCH 20/24] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
@ 2016-08-17 17:20 ` Dario Faggioli
  2016-08-17 17:20 ` [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER Dario Faggioli
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

By factoring into one (at the top) all the checks
to see whether current is the idle vcpu, and mark
it as unlikely().

In fact, if current is idle, all the logic for
dealing with yielding, context switching rate
limiting and soft-affinity, is just pure overhead,
and we better rush checking the runq and pick some
vcpu up.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ab0122b..21b1f91 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2730,10 +2730,18 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * if scurr is yielding, when comparing its credits with other vcpus in
      * the runqueue, act like those other vcpus had yield_bias more credits.
      */
-    int yield_bias = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) ?
-                     CSCHED2_YIELD_BIAS : 0;
+    int yield_bias = 0;
     bool_t cpu_in_soft_aff = 1;
 
+    if ( unlikely(is_idle_vcpu(scurr->vcpu)) )
+    {
+        snext = scurr;
+        goto check_runq;
+    }
+
+    if ( __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) )
+        yield_bias = CSCHED2_YIELD_BIAS;
+
     /*
      * Return the current vcpu if it has executed for less than ratelimit.
      * Adjuststment for the selected vcpu's credit and decision
@@ -2748,7 +2756,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * has been cleared already above.
      */
     if ( !yield_bias &&
-         prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
+         prv->ratelimit_us &&
          vcpu_runnable(scurr->vcpu) &&
          (now - scurr->vcpu->runstate.state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
@@ -2769,8 +2777,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         return scurr;
     }
 
-    if ( !is_idle_vcpu(scurr->vcpu) &&
-         has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
+    if ( has_soft_affinity(scurr->vcpu, scurr->vcpu->cpu_hard_affinity) )
     {
         affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
@@ -2804,6 +2811,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     else
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
 
+ check_runq:
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (20 preceding siblings ...)
  2016-08-17 17:20 ` [PATCH 21/24] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
@ 2016-08-17 17:20 ` Dario Faggioli
  2016-09-30 15:30   ` George Dunlap
  2016-08-17 17:20 ` [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit Dario Faggioli
                   ` (4 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

Credit2 is already event based, rather than tick
based. This means, the time at which the (i+1)-eth
scheduling decision needs to happen is computed
during the i-eth scheduling decision, and a timer
is set accordingly.

If there's nothing imminent (or, the most imminent
event is really really really far away), it is
ok to say "well, let's double-check things in
a little bit anyway", but such 'little bit' does
not need to be too little, as, most likely, it's
just pure overhead.

The current period, for this "safety catch"-alike
timer is 2ms, which indeed is high, but it can
well be higher. In fact, benchmarks show that
setting it to 10ms --combined with other
optimizations-- does actually improve performance.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 21b1f91..6963872 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -152,7 +152,7 @@
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
 /* Max timer: Maximum time a guest can be run for. */
-#define CSCHED2_MAX_TIMER            MILLISECS(2)
+#define CSCHED2_MAX_TIMER            CSCHED2_CREDIT_INIT
 
 
 #define CSCHED2_IDLE_CREDIT                 (-(1<<30))


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (21 preceding siblings ...)
  2016-08-17 17:20 ` [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER Dario Faggioli
@ 2016-08-17 17:20 ` Dario Faggioli
  2016-09-02 12:38   ` anshul makkar
  2016-08-17 17:20 ` [PATCH 24/24] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
                   ` (3 subsequent siblings)
  26 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

By not looking at the same cpu (to check whether
we want to preempt who's running there) twice, if
the vcpu being woken up has both soft and hard
affinity.

In fact, all the cpus that are part of both soft
affinity and hard-affinity (of the waking vcpu)
are checked during the soft-affinity balancing
step. If none turns out to be suitable, e.g.,
because they're running vcpus with higher credits,
there's no point in re-checking them, only to
re-assess the same, during the hard-affinity
step.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |   43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6963872..f03ecce 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -997,7 +997,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
     s_time_t lowest = (1<<30);
     unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
-    cpumask_t mask;
+    cpumask_t mask, skip_mask;
     struct csched2_vcpu * cur;
 
     ASSERT(new->rqd == rqd);
@@ -1017,6 +1017,13 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
+    /*
+     * Cpus that end up in this mask, have been already checked during the
+     * soft-affinity step, and need not to be checked again when doing hard
+     * affinity.
+     */
+    cpumask_clear(&skip_mask);
+
     for_each_affinity_balance_step( bs )
     {
         /*
@@ -1073,7 +1080,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         cpumask_andnot(&mask, &rqd->active, &rqd->idle);
         cpumask_andnot(&mask, &mask, &rqd->tickled);
         cpumask_and(&mask, &mask, cpumask_scratch);
-        if ( cpumask_test_cpu(cpu, &mask) )
+        if ( cpumask_test_cpu(cpu, &mask) &&
+             !cpumask_test_cpu(cpu, &skip_mask) )
         {
             cur = CSCHED2_VCPU(curr_on_cpu(cpu));
 
@@ -1102,13 +1110,26 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     ipid = cpu;
                     goto tickle;
                 }
+
+                /*
+                 * If we're here, cpu is just not a valid candidate for being
+                 * tickled. Set its bit in skip_mask, to avoid calling
+                 * burn_credits() and check its current vcpu for preemption
+                 * twice.
+                 */
+                __cpumask_set_cpu(cpu, &skip_mask);
             }
         }
 
         for_each_cpu(i, &mask)
         {
-            /* Already looked at this one above */
-            if ( i == cpu )
+            /*
+             * Already looked at these ones above, either because it's the
+             * cpu where new was running before, or because we are at the
+             * hard-affinity step, and we checked this during the
+             * soft-affinity one
+             */
+            if ( i == cpu || cpumask_test_cpu(i, &skip_mask) )
                 continue;
 
             cur = CSCHED2_VCPU(curr_on_cpu(i));
@@ -1139,6 +1160,20 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     ipid = i;
                     lowest = cur->credit;
                 }
+
+                /*
+                 * No matter if i is the new lowest or not. We've run
+                 * burn_credits() on it, and we've checked it for preemption.
+                 *
+                 * If we are at soft-affinity balancing step, and i is indeed
+                 * the lowest, it will be tickled (and we exit the function).
+                 * If it is not the lowest among the cpus in the soft-affinity
+                 * mask, it can't be the lowest among the cpus in the hard
+                 * affinity mask (assuming we'll actually do the second
+                 * balancing step), as hard-affinity is a superset of soft
+                 * affinity, and therefore we can flag it to be skipped.
+                 */
+                __cpumask_set_cpu(i, &skip_mask);
             }
         }
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 24/24] xen: credit2: try to avoid tickling cpus subject to ratelimiting
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (22 preceding siblings ...)
  2016-08-17 17:20 ` [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit Dario Faggioli
@ 2016-08-17 17:20 ` Dario Faggioli
  2016-08-18  0:11 ` [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 17:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Anshul Makkar, George Dunlap

With context switching ratelimiting enabled, the following
pattern is quite common in a scheduling trace:

     0.000845622 |||||||||||.x||| d32768v12 csched2:runq_insert d0v13, position 0
     0.000845831 |||||||||||.x||| d32768v12 csched2:runq_tickle_new d0v13, processor = 12, credit = 10135529
     0.000846546 |||||||||||.x||| d32768v12 csched2:burn_credits d2v7, credit = 2619231, delta = 255937
 [1] 0.000846739 |||||||||||.x||| d32768v12 csched2:runq_tickle cpu 12
     [...]
 [2] 0.000850597 ||||||||||||x||| d32768v12 csched2:schedule cpu 12, rq# 1, busy, SMT busy, tickled
     0.000850760 ||||||||||||x||| d32768v12 csched2:burn_credits d2v7, credit = 2614028, delta = 5203
 [3] 0.000851022 ||||||||||||x||| d32768v12 csched2:ratelimit triggered
 [4] 0.000851614 ||||||||||||x||| d32768v12 runstate_continue d2v7 running->running

Basically, what happens is that runq_tickle() realizes
d0v13 should preempt d2v7, running on cpu 12, as it
has higher credits (10135529 vs. 2619231). It therefore
tickles cpu 12 [1], which, in turn, schedules [2].

But --surprise surprise-- d2v7 has run for less than the
ratelimit interval [3], and hence it is _not_ preempted,
and continues to run. This indeed looks fine. Actually,
this is what ratelimiting is there for. Note, however,
that:
 1) we interrupted cpu 12 for nothing;
 2) what if, say on cpu 8, there is a vcpu that has:
    + less credit than d0v13 (so d0v13 can well
      preempt it),
    + more credit than d2v7 (that's why it was not
      selected to be preempted),
    + run for more than the ratelimiting interval
      (so it can really be scheduled out)?

This patch tries to figure out whether the situation
is the one described at 2) and, if it is, tickles 8 (in
the example above) instead of 12.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |   31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index f03ecce..3bb764d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -146,6 +146,8 @@
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
 /* How much to "compensate" a vcpu for L2 migration */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
+/* How tolerant we should be when peeking at runtime of vcpus on other cpus */
+#define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50)
 /* How big of a bias we should have against a yielding vcpu */
 #define CSCHED2_YIELD_BIAS           ((opt_yield_bias)*MICROSECS(1))
 #define CSCHED2_YIELD_BIAS_MIN       CSCHED2_MIN_TIMER
@@ -972,6 +974,27 @@ static inline bool_t soft_aff_check_preempt(unsigned int bs, unsigned int cpu)
     return !cpumask_test_cpu(cpu, cpumask_scratch);
 }
 
+/*
+ * What we want to know is whether svc, which we assume to be running on some
+ * pcpu, can be interrupted and preempted. So fat, the only reason because of
+ * which a preemption would be deferred is context switch ratelimiting, so
+ * check for that.
+ *
+ * Use a caller provided value of ratelimit, instead of the scheduler's own
+ * prv->ratelimit_us so the caller can play some tricks, if he wants (which,
+ * as a matter of fact, he does, by applying the tolerance).
+ */
+static inline bool_t is_preemptable(const struct csched2_vcpu *svc,
+                                    s_time_t now, s_time_t ratelimit)
+{
+    s_time_t runtime;
+
+    ASSERT(svc->vcpu->is_running);
+    runtime = now - svc->vcpu->runstate.state_entry_time;
+
+    return runtime > ratelimit;
+}
+
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
 
 /*
@@ -997,6 +1020,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
     s_time_t lowest = (1<<30);
     unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
+    s_time_t ratelimit = MICROSECS(CSCHED2_PRIV(ops)->ratelimit_us) -
+                         CSCHED2_RATELIMIT_TICKLE_TOLERANCE;
     cpumask_t mask, skip_mask;
     struct csched2_vcpu * cur;
 
@@ -1104,7 +1129,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                                 (unsigned char *)&d);
                 }
 
-                if ( cur->credit < new->credit )
+                if ( cur->credit < new->credit &&
+                     is_preemptable(cur, now, ratelimit) )
                 {
                     SCHED_STAT_CRANK(tickled_busy_cpu);
                     ipid = cpu;
@@ -1155,7 +1181,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                                 (unsigned char *)&d);
                 }
 
-                if ( cur->credit < lowest )
+                if ( cur->credit < lowest &&
+                     is_preemptable(cur, now, ratelimit) )
                 {
                     ipid = i;
                     lowest = cur->credit;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1
  2016-08-17 17:17 ` [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1 Dario Faggioli
@ 2016-08-17 23:42   ` Dario Faggioli
  2016-09-12 15:04     ` George Dunlap
  0 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-08-17 23:42 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Anshul Makkar, David Vrabel


[-- Attachment #1.1: Type: text/plain, Size: 2637 bytes --]

On Wed, 2016-08-17 at 19:17 +0200, Dario Faggioli wrote:
> If there are idle pcpus inside the waking vcpu's
> soft-affinity mask, we should really tickle one
> of them (this is one of the purposes of the
> __runq_tickle() function itself!), not just
> any idle pcpu.
> 
> The issue has been introduced in 02ea5031825d
> ("credit1: properly deal with pCPUs not in any cpupool"),
> where the usage of idle_mask is changed, without
> updating the bottom of the function, where it
> is also referenced.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.eu.com>
>
Oops!

Sorry George, got your email address wrong for this one.
Dario

> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> ---
> David, a while ago you asked what could have been that was causing
> awful
> results for Credit1, for CPU bound workloads, in the overloaded
> scenario of one
> of my benchmarks. I think the bug fixed either here or in next patch
> (but I'd
> be rather sure it's this one) is where the problem was. :-)
> ---
>  xen/common/sched_credit.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 6eccf09..3d4f223 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -454,11 +454,12 @@ static inline void __runq_tickle(struct
> csched_vcpu *new)
>                  if ( opt_tickle_one_idle )
>                  {
>                      this_cpu(last_tickle_cpu) =
> -                        cpumask_cycle(this_cpu(last_tickle_cpu),
> &idle_mask);
> +                        cpumask_cycle(this_cpu(last_tickle_cpu),
> +                                      cpumask_scratch_cpu(cpu));
>                      __cpumask_set_cpu(this_cpu(last_tickle_cpu),
> &mask);
>                  }
>                  else
> -                    cpumask_or(&mask, &mask, &idle_mask);
> +                    cpumask_or(&mask, &mask,
> cpumask_scratch_cpu(cpu));
>              }
>  
>              /* Did we find anyone? */
> 
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2!
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (23 preceding siblings ...)
  2016-08-17 17:20 ` [PATCH 24/24] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
@ 2016-08-18  0:11 ` Dario Faggioli
  2016-08-18 11:49 ` Dario Faggioli
  2016-08-18 11:53 ` Dario Faggioli
  26 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-18  0:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Anshul Makkar, Ian Jackson,
	George Dunlap, David Vrabel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 3088 bytes --]

On Wed, 2016-08-17 at 19:17 +0200, Dario Faggioli wrote:
> The last 4 patches, still for Credit2, are optimizations, either wrt
> existing
> code, or wrt new code introduced in this series. I've chosen to keep
> them
> separate to make reviewing/understanding new code easier. In fact,
> although
> they look pretty simple, the soft-affinity code was pretty complex
> already, and
> even these simple optimization, if done all at once, would have made
> the
> reviewer's life (unnecessary) tougher.
> 
About this.

I've run the benchmarks with and without these performance optimization
patches, in order to assess their effect as good as I could.

The baseline on top of which I was applying the series is different
from the one used to produce the other numbers reported in the cover
letter, so what's shown there and what I show here is not directly
comparable (but that's not a problem).

Given the nature of the improvements, I've run more iterations of each
configuration of the benchmarks (i.e., 15 iterations instead of 5) to
get more stable results.

Here's my findings:

++++++++++++++++++++++++++++++++++
|    CREDIT1, for reference      |
++++++++++++++++++++++++++++++++++
|               | MAKEXEN IPERF  |
|---------------|----------------|
|no dom0 load   | 28.353  11.793 |
|with dom0 load | 43.955  10.932*|
++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++
|   CREDIT2, until patch 20      |
++++++++++++++++++++++++++++++++++
|               | MAKEXEN IPERF  |
|---------------| ---------------|
|no dom0 load   | 28.367  11.716 |
|with dom0 load | 40.591  10.645 |
++++++++++++++++++++++++++++++++++
|+++++++++++++++++++++++++++++++++
|    CREDIT2, full series        |
++++++++++++++++++++++++++++++++++
|               | MAKEXEN IPERF  |
|---------------|----------------|
|no dom0 load   | 27.597* 12.059*|
|with dom0 load | 39.706* 10.609 |
|--------------------------------|

 * marks the best results

So:
 - apart from a glitch on "IPERF with dom0 load", Credit2 with the 
   full series applied is confirmed to be the best. About the glitch:
    - wrt the fact that Credit1 is better, we also have other evidences
      that network throughput could be a bit of a weak spot of Credit2
      versus Credit1 so far (although, we have to admit, they're
      pretty close), and we already have ideas on how to try improve
      the situation;
    - wrt the role played by optimization patches, well, results are
      basically the same.
 - The performance optimization patches do have an (positive!) impact.
 - In case of "no dom0 load, it's actually thanks to the optimization
   patches that Credit2 beats Credit1.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting.
  2016-08-17 17:18 ` [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting Dario Faggioli
@ 2016-08-18  0:57   ` Meng Xu
  2016-08-18  9:41     ` Dario Faggioli
  2016-09-20 13:50   ` George Dunlap
  1 sibling, 1 reply; 84+ messages in thread
From: Meng Xu @ 2016-08-18  0:57 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Anshul Makkar

On Wed, Aug 17, 2016 at 1:18 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
>
> As far as {csched, csched2, rt}_schedule() are concerned,
> an "empty" event, would already make it easier to read and
> understand a trace.
>
> But while there, add a few useful information, like
> if the cpu that is going through the scheduler has
> been tickled or not, if it is currently idle, etc
> (they vary, on a per-scheduler basis).
>
> For Credit1 and Credit2, add a record about when
> rate-limiting kicks in too.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit.c  |    7 +++++++
>  xen/common/sched_credit2.c |   38 +++++++++++++++++++++++++++++++++++++-
>  xen/common/sched_rt.c      |   15 +++++++++++++++
>  3 files changed, 59 insertions(+), 1 deletion(-)
>


> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 41c61a7..903dbd8 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -160,6 +160,7 @@
>  #define TRC_RTDS_BUDGET_BURN      TRC_SCHED_CLASS_EVT(RTDS, 3)
>  #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
>  #define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
> +#define TRC_RTDS_SCHEDULE         TRC_SCHED_CLASS_EVT(RTDS, 6)
>
>  static void repl_timer_handler(void *data);
>
> @@ -1035,6 +1036,20 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
>      struct rt_vcpu *snext = NULL;
>      struct task_slice ret = { .migrated = 0 };
>
> +    /* TRACE */
> +    {
> +        struct __packed {
> +            unsigned cpu:17, tasklet:8, tickled:4, idle:4;
> +        } d;
> +        d.cpu = cpu;
> +        d.tasklet = tasklet_work_scheduled;
> +        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
> +        d.idle = is_idle_vcpu(current);
> +        __trace_var(TRC_RTDS_SCHEDULE, 1,
> +                    sizeof(d),
> +                    (unsigned char *)&d);
> +    }

I have two questions here:
1) IMHO, the trace should be wrapped by the if (
unlikely(tb_init_done) ) {} statement as done in sched_credit2.c.
Otherwise, we always enable the tracing which may hurt the
performance, I think.

2) Why does the cpu field here use 17 bits instead of 16 bits as used
in credit2?
This may be a typo I guess (since you are trying to align the
structure to 32 bits I guess ;-) )?
In addition, I'm wondering if uint16 is better than unsigned? I'm not
that confident if unsigned type will always have 16 bits on all types
of machines?

Thanks,

Meng

------------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting.
  2016-08-18  0:57   ` Meng Xu
@ 2016-08-18  9:41     ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-18  9:41 UTC (permalink / raw)
  To: Meng Xu; +Cc: George Dunlap, xen-devel, Anshul Makkar


[-- Attachment #1.1: Type: text/plain, Size: 2655 bytes --]

On Wed, 2016-08-17 at 20:57 -0400, Meng Xu wrote:
> On Wed, Aug 17, 2016 at 1:18 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > 
> > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> > index 41c61a7..903dbd8 100644
> > --- a/xen/common/sched_rt.c
> > +++ b/xen/common/sched_rt.c
> > 
> > @@ -1035,6 +1036,20 @@ rt_schedule(const struct scheduler *ops,
> > s_time_t now, bool_t tasklet_work_sched
> >      struct rt_vcpu *snext = NULL;
> >      struct task_slice ret = { .migrated = 0 };
> > 
> > +    /* TRACE */
> > +    {
> > +        struct __packed {
> > +            unsigned cpu:17, tasklet:8, tickled:4, idle:4;
> > +        } d;
> > +        d.cpu = cpu;
> > +        d.tasklet = tasklet_work_scheduled;
> > +        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
> > +        d.idle = is_idle_vcpu(current);
> > +        __trace_var(TRC_RTDS_SCHEDULE, 1,
> > +                    sizeof(d),
> > +                    (unsigned char *)&d);
> > +    }
>

> 1) IMHO, the trace should be wrapped by the if (
> unlikely(tb_init_done) ) {} statement as done in sched_credit2.c.
> Otherwise, we always enable the tracing which may hurt the
> performance, I think.
> 
You're right. Actually, in order to follow suite from sched_rt.c, I
think using trace_var() instead of __trace_var() is what we want.

Then, I think it will be a good thing, at some point, to convert all
these /*TRACE*/ blocks to extrapolate the tb_init_done check and make
it guard the trace record marshalling, like I did a few patches ago for
Credit2.... but that's another patch.

This is a cut-&-paste mistake, good job noticing it. :-)

> 2) Why does the cpu field here use 17 bits instead of 16 bits as used
> in credit2?
> This may be a typo I guess (since you are trying to align the
> structure to 32 bits I guess ;-) )?
>
Wow... 17.. I must have been drunk when writing that! :-O

> In addition, I'm wondering if uint16 is better than unsigned? I'm not
> that confident if unsigned type will always have 16 bits on all types
> of machines?
> 
Yeah, well, TBH, all this bitfields and packing, etc., is something I
truly hate. But I think in this case unsigned is fine.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2!
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (24 preceding siblings ...)
  2016-08-18  0:11 ` [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
@ 2016-08-18 11:49 ` Dario Faggioli
  2016-08-18 11:53 ` Dario Faggioli
  26 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-18 11:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Anshul Makkar, Ian Jackson,
	George Dunlap, David Vrabel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1053 bytes --]

On Wed, 2016-08-17 at 19:17 +0200, Dario Faggioli wrote:
> Hi everyone,
> 
> Here's another rather big scheduler-related series. The most of the
> content (as
> usual, lately) is about Credit2, but there are other things, about
> tracing and
> also about Credit1. In fact, this patch series introduces soft-
> affinity support
> for Credit2.
> 
And here's a git branch with the series applied:

git://xenbits.xen.org/people/dariof/xen.git  rel/sched/misc-credit1-credit2-plus-credit2-softaff
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/misc-credit1-credit2-plus-credit2-softaff

https://github.com/fdario/xen/tree/rel/sched/misc-credit1-credit2-plus-credit2-softaff

https://travis-ci.org/fdario/xen/builds/153230352

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2!
  2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
                   ` (25 preceding siblings ...)
  2016-08-18 11:49 ` Dario Faggioli
@ 2016-08-18 11:53 ` Dario Faggioli
  26 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-08-18 11:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 705 bytes --]

And finally, Jan,

On Wed, 2016-08-17 at 19:17 +0200, Dario Faggioli wrote:
>
I think that these three:

> Dario Faggioli (24):
> 
>       xen: credit1: fix mask to be used for tickling in Credit1
>       xen: credit1: return the 'time remaining to the limit' as next timeslice.
>       xen: credit2: properly schedule migration of a running vcpu.
>
Are (in due course, obviously) backport candidates.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2
  2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
@ 2016-08-22  9:21   ` Ian Jackson
  2016-09-05 14:02     ` Dario Faggioli
  2016-08-22  9:28   ` Ian Jackson
  2016-09-28 15:39   ` George Dunlap
  2 siblings, 1 reply; 84+ messages in thread
From: Ian Jackson @ 2016-08-22  9:21 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Wei Liu

Dario Faggioli writes ("[PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2"):
> This is the remaining part of the plumbing (the libxl
> one) necessary to be able to change the value of the
> ratelimit_us parameter online, for Credit2 (like it is
> already for Credit1).

I think this should have a HAVE #define.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2
  2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
  2016-08-22  9:21   ` Ian Jackson
@ 2016-08-22  9:28   ` Ian Jackson
  2016-09-28 15:37     ` George Dunlap
  2016-09-30  1:03     ` Dario Faggioli
  2016-09-28 15:39   ` George Dunlap
  2 siblings, 2 replies; 84+ messages in thread
From: Ian Jackson @ 2016-08-22  9:28 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Wei Liu

Dario Faggioli writes ("[PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2"):
...
> -    rc = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
> -    if ( rc < 0 ) {
> -        LOGE(ERROR, "setting sched credit param");
> -        GC_FREE;
> -        return ERROR_FAIL;
> +    r = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
> +    if ( r < 0 ) {
> +        LOGE(ERROR, "Setting Credit scheduler parameters");
> +        rc = ERROR_FAIL;
> +        goto out;

I had to read this three times to figure out what the change was.

It is good that you are fixing the coding style but can you please put
it in a separate patch ?

In general I was surprised at how large this patch, and the
corresponding xl one, was.  Perhaps after the coding style fixes are
split off the functional patches will be smaller.

But I wonder whether there will still be lots of rather formulaic code
that could profitably be generalised somehow.  I'd appreciate your
views on whether that would be possible, and whether it would be a
good idea..

For now I will stop my review here.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-08-17 17:18 ` [PATCH 05/24] xen: credit2: make tickling more deterministic Dario Faggioli
@ 2016-08-31 17:10   ` anshul makkar
  2016-09-05 13:47     ` Dario Faggioli
  2016-09-13 11:28   ` George Dunlap
  1 sibling, 1 reply; 84+ messages in thread
From: anshul makkar @ 2016-08-31 17:10 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

On 17/08/16 18:18, Dario Faggioli wrote:
> Right now, the following scenario can occurr:
>   - upon vcpu v wakeup, v itself is put in the runqueue,
>     and pcpu X is tickled;
>   - pcpu Y schedules (for whatever reason), sees v in
>     the runqueue and picks it up.
>
> This may seem ok (or even a good thing), but it's not.
> In fact, if runq_tickle() decided X is where v should
> run, it did it for a reason (load distribution, SMT
> support, cache hotness, affinity, etc), and we really
> should try as hard as possible to stick to that.
>
> Of course, we can't be too strict, or we risk leaving
> vcpus in the runqueue while there is available CPU
> capacity. So, we only leave v in runqueue --for X to
> pick it up-- if we see that X has been tickled and
> has not scheduled yet, i.e., it will have a real chance
> of actually select and schedule v.
>
> If that is not the case, we schedule it on Y (or, at
> least, we consider that), as running somewhere non-ideal
> is better than not running at all.
>
> The commit also adds performance counters for each of
> the possible situations.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>   xen/common/sched_credit2.c   |   65 +++++++++++++++++++++++++++++++++++++++---
>   xen/include/xen/perfc_defn.h |    3 ++
>   2 files changed, 64 insertions(+), 4 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 12dfd20..a3d7beb 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -54,6 +54,7 @@
>   #define TRC_CSCHED2_LOAD_CHECK       TRC_SCHED_CLASS_EVT(CSCHED2, 16)
>   #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2, 17)
>   #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2, 19)
> +#define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2, 20)
>
>   /*
>    * WARNING: This is still in an experimental phase.  Status and work can be found at the
> @@ -398,6 +399,7 @@ struct csched2_vcpu {
>       int credit;
>       s_time_t start_time; /* When we were scheduled (used for credit) */
>       unsigned flags;      /* 16 bits doesn't seem to play well with clear_bit() */
> +    int tickled_cpu;     /* cpu tickled for picking us up (-1 if none) */
>
>       /* Individual contribution to load */
>       s_time_t load_last_update;  /* Last time average was updated */
> @@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>       __cpumask_set_cpu(ipid, &rqd->tickled);
>       smt_idle_mask_clear(ipid, &rqd->smt_idle);
>       cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
> +
> +    if ( unlikely(new->tickled_cpu != -1) )
> +        SCHED_STAT_CRANK(tickled_cpu_overwritten);
> +    new->tickled_cpu = ipid;
>   }
>
>   /*
> @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>           ASSERT(svc->sdom != NULL);
>           svc->credit = CSCHED2_CREDIT_INIT;
>           svc->weight = svc->sdom->weight;
> +        svc->tickled_cpu = -1;
>           /* Starting load of 50% */
>           svc->avgload = 1ULL << (CSCHED2_PRIV(ops)->load_precision_shift - 1);
>           svc->load_last_update = NOW() >> LOADAVG_GRANULARITY_SHIFT;
> @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>       else
>       {
>           ASSERT(svc->sdom == NULL);
> +        svc->tickled_cpu = svc->vcpu->vcpu_id;
If I understood correctly, tickled_cpu refers to pcpu and not a vcpu. 
Saving vcpu_id in tickled_cpu looks wrong.

>           svc->credit = CSCHED2_IDLE_CREDIT;
>           svc->weight = 0;
>       }
> @@ -2233,7 +2241,8 @@ void __dump_execstate(void *unused);
>   static struct csched2_vcpu *
>   runq_candidate(struct csched2_runqueue_data *rqd,
>                  struct csched2_vcpu *scurr,
> -               int cpu, s_time_t now)
> +               int cpu, s_time_t now,
> +               unsigned int *pos)
>   {
>       struct list_head *iter;
>       struct csched2_vcpu *snext = NULL;
> @@ -2262,13 +2271,29 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>
>           /* Only consider vcpus that are allowed to run on this processor. */
>           if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
> +        {
> +            (*pos)++;
>               continue;
> +        }
> +
> +        /*
> +         * If a vcpu is meant to be picked up by another processor, and such
> +         * processor has not scheduled yet, leave it in the runqueue for him.
> +         */
> +        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> +             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> +        {
> +            (*pos)++;
> +            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> +            continue;
> +        }
>
>           /* If this is on a different processor, don't pull it unless
>            * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
>           if ( svc->vcpu->processor != cpu
>                && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
>           {
> +            (*pos)++;
>               SCHED_STAT_CRANK(migrate_resisted);
>               continue;
>           }
> @@ -2280,9 +2305,26 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>
>           /* In any case, if we got this far, break. */
>           break;
> +    }
>
> +    if ( unlikely(tb_init_done) )
> +    {
> +        struct {
> +            unsigned vcpu:16, dom:16;
> +            unsigned tickled_cpu, position;
> +        } d;
> +        d.dom = snext->vcpu->domain->domain_id;
> +        d.vcpu = snext->vcpu->vcpu_id;
> +        d.tickled_cpu = snext->tickled_cpu;
> +        d.position = *pos;
> +        __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1,
> +                    sizeof(d),
> +                    (unsigned char *)&d);
>       }
>
> +    if ( unlikely(snext->tickled_cpu != -1 && snext->tickled_cpu != cpu) )
> +        SCHED_STAT_CRANK(tickled_cpu_overridden);
> +
>       return snext;
>   }
>
> @@ -2298,6 +2340,7 @@ csched2_schedule(
>       struct csched2_runqueue_data *rqd;
>       struct csched2_vcpu * const scurr = CSCHED2_VCPU(current);
>       struct csched2_vcpu *snext = NULL;
> +    unsigned int snext_pos = 0;
>       struct task_slice ret;
>
>       SCHED_STAT_CRANK(schedule);
> @@ -2347,7 +2390,7 @@ csched2_schedule(
>           snext = CSCHED2_VCPU(idle_vcpu[cpu]);
>       }
>       else
> -        snext=runq_candidate(rqd, scurr, cpu, now);
> +        snext = runq_candidate(rqd, scurr, cpu, now, &snext_pos);
>
>       /* If switching from a non-idle runnable vcpu, put it
>        * back on the runqueue. */
> @@ -2371,8 +2414,21 @@ csched2_schedule(
>               __set_bit(__CSFLAG_scheduled, &snext->flags);
>           }
>
> -        /* Check for the reset condition */
> -        if ( snext->credit <= CSCHED2_CREDIT_RESET )
> +        /*
> +         * The reset condition is "has a scheduler epoch come to an end?".
> +         * The way this is enforced is checking whether the vcpu at the top
> +         * of the runqueue has negative credits. This means the epochs have
> +         * variable lenght, as in one epoch expores when:
> +         *  1) the vcpu at the top of the runqueue has executed for
> +         *     around 10 ms (with default parameters);
> +         *  2) no other vcpu with higher credits wants to run.
> +         *
> +         * Here, where we want to check for reset, we need to make sure the
> +         * proper vcpu is being used. In fact, runqueue_candidate() may have
> +         * not returned the first vcpu in the runqueue, for various reasons
> +         * (e.g., affinity). Only trigger a reset when it does.
> +         */
> +        if ( snext_pos == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
>           {
>               reset_credit(ops, cpu, now, snext);
>               balance_load(ops, cpu, now);
> @@ -2386,6 +2442,7 @@ csched2_schedule(
>           }
>
>           snext->start_time = now;
> +        snext->tickled_cpu = -1;
>
>           /* Safe because lock for old processor is held */
>           if ( snext->vcpu->processor != cpu )
> diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
> index a336c71..4a835b8 100644
> --- a/xen/include/xen/perfc_defn.h
> +++ b/xen/include/xen/perfc_defn.h
> @@ -66,6 +66,9 @@ PERFCOUNTER(runtime_max_timer,      "csched2: runtime_max_timer")
>   PERFCOUNTER(migrated,               "csched2: migrated")
>   PERFCOUNTER(migrate_resisted,       "csched2: migrate_resisted")
>   PERFCOUNTER(credit_reset,           "csched2: credit_reset")
> +PERFCOUNTER(deferred_to_tickled_cpu,"csched2: deferred_to_tickled_cpu")
> +PERFCOUNTER(tickled_cpu_overwritten,"csched2: tickled_cpu_overwritten")
> +PERFCOUNTER(tickled_cpu_overridden, "csched2: tickled_cpu_overridden")
>
>   PERFCOUNTER(need_flush_tlb_flush,   "PG_need_flush tlb flushes")
>
>Anshul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle()
  2016-08-17 17:19 ` [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
@ 2016-09-01 10:52   ` anshul makkar
  2016-09-05 14:55     ` Dario Faggioli
  2016-09-28 20:44   ` George Dunlap
  1 sibling, 1 reply; 84+ messages in thread
From: anshul makkar @ 2016-09-01 10:52 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Justin T. Weaver, George Dunlap

On 17/08/16 18:19, Dario Faggioli wrote:
> This is done by means of the "usual" two steps loop:
>   - soft affinity balance step;
>   - hard affinity balance step.
>
> The entire logic implemented in runq_tickle() is
> applied, during the first step, considering only the
> CPUs in the vcpu's soft affinity. In the second step,
> we fall back to use all the CPUs from its hard
> affinity (as it is doing now, without this patch).
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>   xen/common/sched_credit2.c |  243 ++++++++++++++++++++++++++++----------------
>   1 file changed, 157 insertions(+), 86 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 0d83bd7..3aef1b4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -902,6 +902,42 @@ __runq_remove(struct csched2_vcpu *svc)
>       list_del_init(&svc->runq_elem);
>   }
>
> +/*
> + * During the soft-affinity step, only actually preempt someone if
> + * he does not have soft-affinity with cpu (while we have).
> + *
> + * BEWARE that this uses cpumask_scratch, trowing away what's in there!
Typo:* BEWARE that this uses cpumask_scratch, throwing away what's in there!

> + */
> +static inline bool_t soft_aff_check_preempt(unsigned int bs, unsigned int cpu)
> +{
> +    struct csched2_vcpu * cur = CSCHED2_VCPU(curr_on_cpu(cpu));
> +
> +    /*
> +     * If we're doing hard-affinity, always check whether to preempt cur.
> +     * If we're doing soft-affinity, but cur doesn't have one, check as well.
> +     */
> +    if ( bs == BALANCE_HARD_AFFINITY ||
> +         !has_soft_affinity(cur->vcpu, cur->vcpu->cpu_hard_affinity) )
> +        return 1;
> +
> +    /*
> +     * We're doing soft-affinity, and we know that the current vcpu on cpu
> +     * has a soft affinity. We now want to know whether cpu itself is in
Please can you explain the above statment. If the vcpu has soft affinity 
and its currently executing, doesn;t it always means that its running on 
one of the pcpu which is there in its soft affinity or hard affinity?
> +     * such affinity. In fact, since we now that new (in runq_tickle()) is
Typo:   * such affinity. In fact, since now we know that new (in 
runq_tickle()) is
> +     *  - if cpu is not in cur's soft-affinity, we should indeed check to
> +     *    see whether new should preempt cur. If that will be the case, that
> +     *    would be an improvement wrt respecting soft affinity;
> +     *  - if cpu is in cur's soft-affinity, we leave it alone and (in
> +     *    runq_tickle()) move on to another cpu. In fact, we don't want to
> +     *    be too harsh with someone which is running within its soft-affinity.
> +     *    This is safe because later, if we don't fine anyone else during the
> +     *    soft-affinity step, we will check cpu for preemption anyway, when
> +     *    doing hard-affinity.
> +     */
> +    affinity_balance_cpumask(cur->vcpu, BALANCE_SOFT_AFFINITY, cpumask_scratch);
> +    return !cpumask_test_cpu(cpu, cpumask_scratch);
> +}
> +
>   void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
>
>   /*
> @@ -925,7 +961,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>   {
>       int i, ipid = -1;
>       s_time_t lowest = (1<<30);
> -    unsigned int cpu = new->vcpu->processor;
> +    unsigned int bs, cpu = new->vcpu->processor;
>       struct csched2_runqueue_data *rqd = RQD(ops, cpu);
>       cpumask_t mask;
>       struct csched2_vcpu * cur;
> @@ -947,109 +983,144 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>                       (unsigned char *)&d);
>       }
>
> -    /*
> -     * First of all, consider idle cpus, checking if we can just
> -     * re-use the pcpu where we were running before.
> -     *
> -     * If there are cores where all the siblings are idle, consider
> -     * them first, honoring whatever the spreading-vs-consolidation
> -     * SMT policy wants us to do.
> -     */
> -    if ( unlikely(sched_smt_power_savings) )
> -        cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
> -    else
> -        cpumask_copy(&mask, &rqd->smt_idle);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> -    i = cpumask_test_or_cycle(cpu, &mask);
> -    if ( i < nr_cpu_ids )
> +    for_each_affinity_balance_step( bs )
>       {
> -        SCHED_STAT_CRANK(tickled_idle_cpu);
> -        ipid = i;
> -        goto tickle;
> -    }
> +        /*
> +         * First things first: if we are at the first (soft affinity) step,
> +         * but new doesn't have a soft affinity, skip this step.
> +         */
> +        if ( bs == BALANCE_SOFT_AFFINITY &&
> +             !has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) )
> +            continue;
>
> -    /*
> -     * If there are no fully idle cores, check all idlers, after
> -     * having filtered out pcpus that have been tickled but haven't
> -     * gone through the scheduler yet.
> -     */
> -    cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> -    i = cpumask_test_or_cycle(cpu, &mask);
> -    if ( i < nr_cpu_ids )
> -    {
> -        SCHED_STAT_CRANK(tickled_idle_cpu);
> -        ipid = i;
> -        goto tickle;
> -    }
> +        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch);
>
> -    /*
> -     * Otherwise, look for the non-idle (and non-tickled) processors with
> -     * the lowest credit, among the ones new is allowed to run on. Again,
> -     * the cpu were it was running on would be the best candidate.
> -     */
> -    cpumask_andnot(&mask, &rqd->active, &rqd->idle);
> -    cpumask_andnot(&mask, &mask, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> -    if ( cpumask_test_cpu(cpu, &mask) )
> -    {
> -        cur = CSCHED2_VCPU(curr_on_cpu(cpu));
> -        burn_credits(rqd, cur, now);
> +        /*
> +         * First of all, consider idle cpus, checking if we can just
> +         * re-use the pcpu where we were running before.
> +         *
> +         * If there are cores where all the siblings are idle, consider
> +         * them first, honoring whatever the spreading-vs-consolidation
> +         * SMT policy wants us to do.
> +         */
> +        if ( unlikely(sched_smt_power_savings) )
> +            cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
> +        else
> +            cpumask_copy(&mask, &rqd->smt_idle);
> +        cpumask_and(&mask, &mask, cpumask_scratch);
> +        i = cpumask_test_or_cycle(cpu, &mask);
> +        if ( i < nr_cpu_ids )
> +        {
> +            SCHED_STAT_CRANK(tickled_idle_cpu);
> +            ipid = i;
> +            goto tickle;
> +        }
>
> -        if ( cur->credit < new->credit )
> +        /*
> +         * If there are no fully idle cores, check all idlers, after
> +         * having filtered out pcpus that have been tickled but haven't
> +         * gone through the scheduler yet.
> +         */
> +        cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> +        cpumask_and(&mask, &mask, cpumask_scratch);
> +        i = cpumask_test_or_cycle(cpu, &mask);
> +        if ( i < nr_cpu_ids )
>           {
> -            SCHED_STAT_CRANK(tickled_busy_cpu);
> -            ipid = cpu;
> +            SCHED_STAT_CRANK(tickled_idle_cpu);
> +            ipid = i;
>               goto tickle;
>           }
> -    }
>
> -    for_each_cpu(i, &mask)
> -    {
> -        /* Already looked at this one above */
> -        if ( i == cpu )
> -            continue;
> +        /*
> +         * Otherwise, look for the non-idle (and non-tickled) processors with
> +         * the lowest credit, among the ones new is allowed to run on. Again,
> +         * the cpu were it was running on would be the best candidate.
> +         */
> +        cpumask_andnot(&mask, &rqd->active, &rqd->idle);
> +        cpumask_andnot(&mask, &mask, &rqd->tickled);
> +        cpumask_and(&mask, &mask, cpumask_scratch);
> +        if ( cpumask_test_cpu(cpu, &mask) )
> +        {
> +            cur = CSCHED2_VCPU(curr_on_cpu(cpu));
>
> -        cur = CSCHED2_VCPU(curr_on_cpu(i));
> +            if ( soft_aff_check_preempt(bs, cpu) )
> +            {
> +                burn_credits(rqd, cur, now);
> +
> +                if ( unlikely(tb_init_done) )
> +                {
> +                    struct {
> +                        unsigned vcpu:16, dom:16;
> +                        unsigned cpu, credit;
> +                    } d;
> +                    d.dom = cur->vcpu->domain->domain_id;
> +                    d.vcpu = cur->vcpu->vcpu_id;
> +                    d.credit = cur->credit;
> +                    d.cpu = cpu;
> +                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
> +                                sizeof(d),
> +                                (unsigned char *)&d);
> +                }
> +
> +                if ( cur->credit < new->credit )
> +                {
> +                    SCHED_STAT_CRANK(tickled_busy_cpu);
> +                    ipid = cpu;
> +                    goto tickle;
> +                }
> +            }
> +        }
>
> -        ASSERT(!is_idle_vcpu(cur->vcpu));
> +        for_each_cpu(i, &mask)
> +        {
> +            /* Already looked at this one above */
> +            if ( i == cpu )
> +                continue;
>
> -        /* Update credits for current to see if we want to preempt. */
> -        burn_credits(rqd, cur, now);
> +            cur = CSCHED2_VCPU(curr_on_cpu(i));
> +            ASSERT(!is_idle_vcpu(cur->vcpu));
>
> -        if ( cur->credit < lowest )
> -        {
> -            ipid = i;
> -            lowest = cur->credit;
> +            if ( soft_aff_check_preempt(bs, i) )
> +            {
> +                /* Update credits for current to see if we want to preempt. */
> +                burn_credits(rqd, cur, now);
> +
> +                if ( unlikely(tb_init_done) )
> +                {
> +                    struct {
> +                        unsigned vcpu:16, dom:16;
> +                        unsigned cpu, credit;
> +                    } d;
> +                    d.dom = cur->vcpu->domain->domain_id;
> +                    d.vcpu = cur->vcpu->vcpu_id;
> +                    d.credit = cur->credit;
> +                    d.cpu = i;
> +                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
> +                                sizeof(d),
> +                                (unsigned char *)&d);
> +                }
> +
> +                if ( cur->credit < lowest )
> +                {
> +                    ipid = i;
> +                    lowest = cur->credit;
> +                }
> +            }
>           }
>
> -        if ( unlikely(tb_init_done) )
> +        /*
> +         * Only switch to another processor if the credit difference is
> +         * greater than the migrate resistance.
> +         */
> +        if ( ipid != -1 && lowest + CSCHED2_MIGRATE_RESIST <= new->credit )
>           {
> -            struct {
> -                unsigned vcpu:16, dom:16;
> -                unsigned cpu, credit;
> -            } d;
> -            d.dom = cur->vcpu->domain->domain_id;
> -            d.vcpu = cur->vcpu->vcpu_id;
> -            d.credit = cur->credit;
> -            d.cpu = i;
> -            __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
> -                        sizeof(d),
> -                        (unsigned char *)&d);
> +            SCHED_STAT_CRANK(tickled_busy_cpu);
> +            goto tickle;
>           }
>       }
>
> -    /*
> -     * Only switch to another processor if the credit difference is
> -     * greater than the migrate resistance.
> -     */
> -    if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit )
> -    {
> -        SCHED_STAT_CRANK(tickled_no_cpu);
> -        return;
> -    }
> -
> -    SCHED_STAT_CRANK(tickled_busy_cpu);
> +    SCHED_STAT_CRANK(tickled_no_cpu);
> +    return;
>    tickle:
>       BUG_ON(ipid == -1);
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()
  2016-08-17 17:19 ` [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick() Dario Faggioli
@ 2016-09-01 11:08   ` anshul makkar
  2016-09-05 13:26     ` Dario Faggioli
  2016-09-29 11:11   ` George Dunlap
  1 sibling, 1 reply; 84+ messages in thread
From: anshul makkar @ 2016-09-01 11:08 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Justin T. Weaver, George Dunlap

On 17/08/16 18:19, Dario Faggioli wrote:
> For get_fallback_cpu(), by putting in place the "usual"
> two steps (soft affinity step and hard affinity step)
> loop. We just move the core logic of the function inside
> the body of the loop itself.
>
> For csched2_cpu_pick(), what is important is to find
> the runqueue with the least average load. Currently,
> we do that by looping on all runqueues and checking,
> well, their load. For soft affinity, we want to know
> which one is the runqueue with the least load, among
> the ones where the vcpu would prefer to be assigned.
>
> We find both the least loaded runqueue among the soft
> affinity "friendly" ones, and the overall least loaded
> one, in the same pass.
>
> (Also, kill a spurious ';' when defining MAX_LOAD.)
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>   xen/common/sched_credit2.c |  136 ++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 111 insertions(+), 25 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 3aef1b4..2d7228a 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -506,34 +506,68 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>   }
>
>   /*
> - * When a hard affinity change occurs, we may not be able to check some
> - * (any!) of the other runqueues, when looking for the best new processor
> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
> - * pick, in order of decreasing preference:
> - *  - svc's current pcpu;
> - *  - another pcpu from svc's current runq;
> - *  - any cpu.
> + * In csched2_cpu_pick(), it may not be possible to actually look at remote
> + * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
With remote runqueues , do you mean runqs on remote socket ? Can't we 
just read their workload or we can change the locktype to allow reading ?
> + * we pick, in order of decreasing preference:
> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> + *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
svc's current runqueue. Do you mean the runq in which svc is currently 
queued ?
> + *  3) just one valid pcpu from svc's soft affinity;
> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
> + *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
> + *  6) just one valid pcpu from svc's hard affinity
> + *
> + * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
> + * note that at least 6 is guaranteed to _always_ return at least one pcpu.
>    */
>   static int get_fallback_cpu(struct csched2_vcpu *svc)
>   {
>       int cpu;
> +    unsigned int bs;
>
> -    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
> -                                 svc->vcpu->cpu_hard_affinity)) )
> -        return svc->vcpu->processor;
> +    for_each_affinity_balance_step( bs )
> +    {
> +        if ( bs == BALANCE_SOFT_AFFINITY &&
> +             !has_soft_affinity(svc->vcpu, svc->vcpu->cpu_hard_affinity) )
> +            continue;
>
>
Anshul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing
  2016-08-17 17:19 ` [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing Dario Faggioli
@ 2016-09-02 11:46   ` anshul makkar
  2016-09-05 12:49     ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: anshul makkar @ 2016-09-02 11:46 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 17/08/16 18:19, Dario Faggioli wrote:
> We want is soft-affinity to play a role in load
> balancing, i.e., when deciding whether or not to

> something like that at some point.
>
> (Oh, and while there, just a couple of style fixes
> are also done.)
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>   xen/common/sched_credit2.c |  359 ++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 326 insertions(+), 33 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 2d7228a..3722f46 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1786,19 +1786,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
>       return new_cpu;
>   }
>
> -/* Working state of the load-balancing algorithm */
> +/* Working state of the load-balancing algorithm. */
>   typedef struct {
> -    /* NB: Modified by consider() */
> +    /* NB: Modified by consider(). */
>       s_time_t load_delta;
>       struct csched2_vcpu * best_push_svc, *best_pull_svc;
> -    /* NB: Read by consider() */
> +    /* NB: Read by consider() (and the various consider_foo() functions). */
>       struct csched2_runqueue_data *lrqd;
> -    struct csched2_runqueue_data *orqd;
> +    struct csched2_runqueue_data *orqd;
> +    bool_t push_has_soft_aff, pull_has_soft_aff;
> +    s_time_t push_soft_aff_load, pull_soft_aff_load;
>   } balance_state_t;
>
> -static void consider(balance_state_t *st,
> -                     struct csched2_vcpu *push_svc,
> -                     struct csched2_vcpu *pull_svc)
> +static inline s_time_t consider_load(balance_state_t *st,
> +                                     struct csched2_vcpu *push_svc,
> +                                     struct csched2_vcpu *pull_svc)
>   {
>       s_time_t l_load, o_load, delta;
>
> @@ -1821,11 +1823,166 @@ static void consider(balance_state_t *st,
>       if ( delta < 0 )
>           delta = -delta;
>
> +    return delta;
> +}
> +
> +/*
> + * Load balancing and soft-affinity.
> + *
> + * When trying to figure out whether or not it's best to move a vcpu from
> + * one runqueue to another, we must keep soft-affinity in mind. Intuitively
> + * we would want to know the following:
> + *  - 'how much' affinity does the vcpu have with its current runq?
> + *  - 'how much' affinity will it have with its new runq?
> + *
> + * But we certainly need to be more precise about how much it is that 'how
> + * much'! Let's start with some definitions:
> + *
> + *  - let v be a vcpu, running in runq I, with soft-affinity to vi
> + *    pcpus of runq I, and soft affinity with vj pcpus of runq J;
> + *  - let k be another vcpu, running in runq J, with soft-affinity to kj
> + *    pcpus of runq J, and with ki pcpus of runq I;
> + *  - let runq I have Ci pcpus, and runq J Cj pcpus;
> + *  - let vcpu v have an average load of lv, and k an average load of lk;
> + *  - let runq I have an average load of Li, and J an average load of Lj.
> + *
> + * We also define the following::
> + *
> + *  - lvi = lv * (vi / Ci)  as the 'perceived load' of v, when running
> + *                          in runq i;
> + *  - lvj = lv * (vj / Cj)  as the 'perceived load' of v, it running
> + *                          in runq j;
> + *  - the same for k, mutatis mutandis.
> + *
> + * Idea is that vi/Ci (i.e., the ratio of the number of cpus of a runq that
> + * a vcpu has soft-affinity with, over the total number of cpus of the runq
> + * itself) can be seen as the 'degree of soft-affinity' of v to runq I (and
> + * vj/Cj the one of v to J). In other words, we define the degree of soft
> + * affinity of a vcpu to a runq as what fraction of pcpus of the runq itself
> + * the vcpu has soft-affinity with. Then, we multiply this 'degree of
> + * soft-affinity' by the vcpu load, and call the result the 'perceived load'.
> + *
> + * Basically, if a soft-affinity is defined, the work done by a vcpu on a
> + * runq to which it has higher degree of soft-affinity, is considered
> + * 'lighter' than the same work done by the same vcpu on a runq to which it
> + * has smaller degree of soft-affinity (degree of soft affinity is <= 1). In
> + * fact, if soft-affinity is used to achieve NUMA-aware scheduling, the higher
> + * the degree of soft-affinity of the vcpu to a runq, the greater the probability
> + * of accessing local memory, when running on such runq. And that is certainly\
> + * 'lighter' than having to fetch memory from remote NUMA nodes.
Do we ensure that while defining soft-affinity for a vcpu, NUMA 
architecture is considered. If not, then this whole calculation can go 
wrong and have negative impact on performance.

Degree of affinity to runq will give good result if the affinity to 
pcpus has been chosen after due consideration ..
> + *
> + * SoXX, evaluating pushing v from I to J would mean removing (from I) a
> + * perceived load of lv*(vi/Ci) and adding (to J) a perceived load of
> + * lv*(vj/Cj), which we (looking at things from the point of view of I,
> + * which is what balance_load() does) can call D_push:
> + *
> + *  - D_push = -lv * (vi / Ci) + lv * (vj / Cj) =
> + *           = lv * (vj/Cj - vi/Ci)
> + *
> + * On the other hand, pulling k from J to I would entail a D_pull:
> + *
> + *  - D_pull = lk * (ki / Ci) - lk * (kj / Cj) =
> + *           = lk * (ki/Ci - kj/Cj)
> + *
> + * Note that if v (k) has soft-afinity with all the cpus of both I and J,
> + * D_push (D_pull) will be 0, and the same is true in case it has no soft
> + * affinity at all with any of the cpus of I and J. Note also that both
> + * D_push and D_pull can be positive or negative (there's no abs() around
> + * in this case!) depending on the relationship between the degrees of soft
> + * affinity of the vcpu to I and J.
> + *
> + * If there is no soft-affinity, load_balance() (actually, consider()) acts
> + * as follows:
> + *
> + *  - D = abs(Li - Lj)
If we are consider absolute of Li -Lj, how will we know which runq has 
less workload which, I think, is an essential parameter for load 
balancing. Am I missing something here ?
> + *  - consider pushing v from I to J:
> + *     - D' = abs(Li - lv - (Lj + lv))   (from now, abs(x) == |x|)
> + *     - if (D' < D) { push }
> + *  - consider pulling k from J to I:
> + *     - D' = |Li + lk - (Lj - lk)|
> + *     - if (D' < D) { pull }
For both push and pull we are checking (D` < D) ?
> + *  - consider both push and pull:
> + *     - D' = |Li - lv + lk - (Lj + lv - lk)|
> + *     - if (D' < D) { push; pull }
> + *
> + * In order to make soft-affinity part of the process, we use D_push and
> + * D_pull, so that, the final behavior will look like this:
> + *
> + *  - D = abs(Li - Lj)
> + *  - consider pushing v from I to J:
> + *     - D' = |Li - lv - (Lj + lv)|
> + *     - D_push = lv * (vj/Cj - vi/Ci)
> + *     - if (D' + D_push < D) { push }
> + *  - consider pulling k from J to I:
> + *     - D' = |Li + lk - (Lj - lk)|
> + *       D_pull = lk * (ki/Ci - kj/Cj)
> + *     - if (D' < D) { pull }
> + *  - consider both push and pull:
> + *     - D' = |Li - lv + lk - (Lj + lv - lk)|
> + *     - D_push = lv * (vj/Cj - vi/Ci)
> + *       D_pull = lk * (ki/Ci - kj/Cj)
> + *     - if (D' + D_push + D_pull < D) { push; pull }
> + *
> + * So, for instance, the complete formula, in case of a push, with soft
> + * affinity being considered looks like this:
> + *
> + *  - D'' = D' + D_push =
> + *        = |Li - lv - (Lj + lv)| + lv*(vj/Cj - vi/Ci)
> + *
> + * which highlights how soft-affinity being considered acts as a *modifier*
> + * of the "normal" results obtained by just using the actual vcpus loads.
> + * This approach is modular, in the sense that it only takes implementing
> + * another function that returns another modifier, to make the load balancer
> + * consider some other factor or characteristics of the vcpus.
> + *
> + * Finally there is the scope for actually using a scaling factor, to limit
> + * the influence that soft-affinity will actually have on baseline results
> + * from consider_load(). Basically, that means that instead of D_push and/or
> + * D_pull, we'll be adding D_push/S and/or D_pull/S (with S the scaling
> + * factor). Check prep_soft_aff_load() for details on this.
> + */
> +
Anshul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit
  2016-08-17 17:20 ` [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit Dario Faggioli
@ 2016-09-02 12:38   ` anshul makkar
  2016-09-05 12:52     ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: anshul makkar @ 2016-09-02 12:38 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 17/08/16 18:20, Dario Faggioli wrote:
> By not looking at the same cpu (to check whether
> we want to preempt who's running there) twice, if
> the vcpu being woken up has both soft and hard
> affinity.
>
> In fact, all the cpus that are part of both soft
> affinity and hard-affinity (of the waking vcpu)
> are checked during the soft-affinity balancing
> step. If none turns out to be suitable, e.g.,
> because they're running vcpus with higher credits,
> there's no point in re-checking them, only to
> re-assess the same, during the hard-affinity
> step.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>   xen/common/sched_credit2.c |   43 +++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 6963872..f03ecce 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -997,7 +997,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>       s_time_t lowest = (1<<30);
>       unsigned int bs, cpu = new->vcpu->processor;
>       struct csched2_runqueue_data *rqd = RQD(ops, cpu);
> -    cpumask_t mask;
> +    cpumask_t mask, skip_mask;
>       struct csched2_vcpu * cur;
>
>       ASSERT(new->rqd == rqd);
> @@ -1017,6 +1017,13 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>                       (unsigned char *)&d);
>       }
>
> +    /*
> +     * Cpus that end up in this mask, have been already checked during the
> +     * soft-affinity step, and need not to be checked again when doing hard
> +     * affinity.
> +     */
> +    cpumask_clear(&skip_mask);
> +
>       for_each_affinity_balance_step( bs )
>       {
>           /*
> @@ -1073,7 +1080,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>           cpumask_andnot(&mask, &rqd->active, &rqd->idle);
>           cpumask_andnot(&mask, &mask, &rqd->tickled);
>           cpumask_and(&mask, &mask, cpumask_scratch);
> -        if ( cpumask_test_cpu(cpu, &mask) )
> +        if ( cpumask_test_cpu(cpu, &mask) &&
> +             !cpumask_test_cpu(cpu, &skip_mask) )
>           {
>               cur = CSCHED2_VCPU(curr_on_cpu(cpu));
>
> @@ -1102,13 +1110,26 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>                       ipid = cpu;
>                       goto tickle;
>                   }
> +
> +                /*
> +                 * If we're here, cpu is just not a valid candidate for being
> +                 * tickled. Set its bit in skip_mask, to avoid calling
> +                 * burn_credits() and check its current vcpu for preemption
> +                 * twice.
> +                 */
> +                __cpumask_set_cpu(cpu, &skip_mask);
>               }
>           }
>
>           for_each_cpu(i, &mask)
>           {
> -            /* Already looked at this one above */
> -            if ( i == cpu )
> +            /*
> +             * Already looked at these ones above, either because it's the
> +             * cpu where new was running before, or because we are at the
> +             * hard-affinity step, and we checked this during the
> +             * soft-affinity one
> +             */
Sorry for my naiveness here, but can we be sure that situation has not 
changed since we checked during soft-affinity step. ?
> +            if ( i == cpu || cpumask_test_cpu(i, &skip_mask) )
>                   continue;
>
>               cur = CSCHED2_VCPU(curr_on_cpu(i));
> @@ -1139,6 +1160,20 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>                       ipid = i;
>                       lowest = cur->credit;
>                   }
> +
> +                /*
> +                 * No matter if i is the new lowest or not. We've run
> +                 * burn_credits() on it, and we've checked it for preemption.
> +                 *
> +                 * If we are at soft-affinity balancing step, and i is indeed
> +                 * the lowest, it will be tickled (and we exit the function).
> +                 * If it is not the lowest among the cpus in the soft-affinity
> +                 * mask, it can't be the lowest among the cpus in the hard
> +                 * affinity mask (assuming we'll actually do the second
> +                 * balancing step), as hard-affinity is a superset of soft
> +                 * affinity, and therefore we can flag it to be skipped.
> +                 */
> +                __cpumask_set_cpu(i, &skip_mask);
>               }
>           }
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing
  2016-09-02 11:46   ` anshul makkar
@ 2016-09-05 12:49     ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-05 12:49 UTC (permalink / raw)
  To: anshul makkar, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 3542 bytes --]

On Fri, 2016-09-02 at 12:46 +0100, anshul makkar wrote:

Hey, Anshul,

Thanks for having a look at the patch!

> On 17/08/16 18:19, Dario Faggioli wrote:
> > 
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > 
> > + * Basically, if a soft-affinity is defined, the work done by a
> > vcpu on a
> > + * runq to which it has higher degree of soft-affinity, is
> > considered
> > + * 'lighter' than the same work done by the same vcpu on a runq to
> > which it
> > + * has smaller degree of soft-affinity (degree of soft affinity is
> > <= 1). In
> > + * fact, if soft-affinity is used to achieve NUMA-aware
> > scheduling, the higher
> > + * the degree of soft-affinity of the vcpu to a runq, the greater
> > the probability
> > + * of accessing local memory, when running on such runq. And that
> > is certainly\
> > + * 'lighter' than having to fetch memory from remote NUMA nodes.
> Do we ensure that while defining soft-affinity for a vcpu, NUMA 
> architecture is considered. If not, then this whole calculation can
> go 
> wrong and have negative impact on performance.
> 
Defining soft-affinity after topology is what we do by default, just
not here in Xen: we do it in toolstack (in libxl, to be precise).

NUMA aware scheduling is indeed the most obvious use case for all this
--and, in fact that's why we configure things in such a way in higher
layers-- but the mechanism is, at the Xen level, flexible enough to be
used for any purpose that the user may find interesting.

> Degree of affinity to runq will give good result if the affinity to 
> pcpus has been chosen after due consideration ..
>
At this level, 'good result' means 'making sure that a vcpu runs for as
much time as possible on a pcpu to which it has soft-affinity'. Whether
that is good or not for performance (or for any other aspect or
metric), it's not this algorithm's job to determine.

Note that things are exactly the same for hard-affinity/pinning, or for
weights. In fact, Xen won't stop one to, say, pin 128 vcpu all to pcpu
3. This will deeply suck, but it's the higher layers' will (fault?) and
Xen should just comply to that.

> > + * If there is no soft-affinity, load_balance() (actually,
> > consider()) acts
> > + * as follows:
> > + *
> > + *  - D = abs(Li - Lj)
> If we are consider absolute of Li -Lj, how will we know which runq
> has 
> less workload which, I think, is an essential parameter for load 
> balancing. Am I missing something here ?
>
What we are aiming for is making the queues more balanced, which means
we want the difference between their load to be smaller than how it is
when the balancing start. As far as that happens, we don't care which
loads goes down and which one goes up, as far as the final result is a
smaller load delta.

> > + *  - consider pushing v from I to J:
> > + *     - D' = abs(Li - lv - (Lj + lv))   (from now, abs(x) == |x|)
> > + *     - if (D' < D) { push }
> > + *  - consider pulling k from J to I:
> > + *     - D' = |Li + lk - (Lj - lk)|
> > + *     - if (D' < D) { pull }
> For both push and pull we are checking (D` < D) ?
>
Indeed. And that's because of the abs(). :-)


Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit
  2016-09-02 12:38   ` anshul makkar
@ 2016-09-05 12:52     ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-05 12:52 UTC (permalink / raw)
  To: anshul makkar, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1304 bytes --]

On Fri, 2016-09-02 at 13:38 +0100, anshul makkar wrote:
> On 17/08/16 18:20, Dario Faggioli wrote:
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > 
> > @@ -1102,13 +1110,26 @@ runq_tickle(const struct scheduler *ops, 
> >           for_each_cpu(i, &mask)
> >           {
> > -            /* Already looked at this one above */
> > -            if ( i == cpu )
> > +            /*
> > +             * Already looked at these ones above, either because
> > it's the
> > +             * cpu where new was running before, or because we are
> > at the
> > +             * hard-affinity step, and we checked this during the
> > +             * soft-affinity one
> > +             */
> Sorry for my naiveness here,
>
NP.

>  but can we be sure that situation has not 
> changed since we checked during soft-affinity step. ?
>
Yes we can, since we're holding the runqueue lock.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()
  2016-09-01 11:08   ` anshul makkar
@ 2016-09-05 13:26     ` Dario Faggioli
  2016-09-07 12:52       ` anshul makkar
  0 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-09-05 13:26 UTC (permalink / raw)
  To: anshul makkar, xen-devel; +Cc: Justin T. Weaver, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 3502 bytes --]

On Thu, 2016-09-01 at 12:08 +0100, anshul makkar wrote:
> On 17/08/16 18:19, Dario Faggioli wrote:
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > 
> > @@ -506,34 +506,68 @@ void smt_idle_mask_clear(unsigned int cpu,
> > cpumask_t *mask)
> >   }
> > 
> >   /*
> > - * When a hard affinity change occurs, we may not be able to check
> > some
> > - * (any!) of the other runqueues, when looking for the best new
> > processor
> > - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
> > happens, we
> > - * pick, in order of decreasing preference:
> > - *  - svc's current pcpu;
> > - *  - another pcpu from svc's current runq;
> > - *  - any cpu.
> > + * In csched2_cpu_pick(), it may not be possible to actually look
> > at remote
> > + * runqueues (the trylock-s on their spinlocks can fail!). If that
> > happens,
> With remote runqueues , do you mean runqs on remote socket ? 
>
I mean runqueues different from the runq the vcpu is currently assigned
to (as per runq_assign()/runq_deassing()).

If you have runqueues configured to be per-socket, yes, it will try to
lock runqueues in which there are pcpus that are on a different socket
wrt svc->vcpu->processor.

> Can't we 
> just read their workload or we can change the locktype to allow
> reading ?
>
Reading without taking the lock would race against the load value being
updated. And updating the load is done by __update_runq_load(), which,
with all it's shifts and maths, by no means is an atomic operation.

So it's not just a matter of risking to read a slightly outdated value,
which, I agree, may not be that bad, it's that we risk reading
something inconsistent. :-/

About "changing the locktype", I guess you mean that we can turn also
the runqueue lock into an rw-lock? If yes, that's indeed interesting,
and I've also thought about it, but, for now, always deferred trying to
actually do that.

It's technically non trivial, as it would involve changing
schedule_data->schedule_lock and all the {v,p}cpu_schedule_lock*()
functions. Also, it's a lock that will almost all the times be taken
for writing, which usually means what you want is a proper spinlock.

So, IMO, before embarking in doing something like that, it'd be good to
figure out how frequently we actually fail to take the remote runqueue
lock, and what's the real impact of having to deal the consequence of
that.

I'm not saying it's not worth a try, but I'm saying that's it's
something at high risk of being a lot of work for a very small gain,
and that there are more important things to focus on.

> > + * we pick, in order of decreasing preference:
> > + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> > + *  2) a pcpu in svc's current runqueue that is also in svc's soft
> > affinity;
> svc's current runqueue. Do you mean the runq in which svc is
> currently 
> queued ?
>
I mean the runqueue to which svc is currently assigned (again, as per
runq_assign()/runq_deassing()), which in turns mean that, if svc is
queued in a runqueue, it's queues there (so, I guess the short answer
to your question is "yes" :-D).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-08-31 17:10   ` anshul makkar
@ 2016-09-05 13:47     ` Dario Faggioli
  2016-09-07 12:25       ` anshul makkar
  2016-09-13 11:13       ` George Dunlap
  0 siblings, 2 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-05 13:47 UTC (permalink / raw)
  To: anshul makkar, xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 5116 bytes --]

On Wed, 2016-08-31 at 18:10 +0100, anshul makkar wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
> > 
> > Right now, the following scenario can occurr:
> >   - upon vcpu v wakeup, v itself is put in the runqueue,
> >     and pcpu X is tickled;
> >   - pcpu Y schedules (for whatever reason), sees v in
> >     the runqueue and picks it up.
> > 
> > This may seem ok (or even a good thing), but it's not.
> > In fact, if runq_tickle() decided X is where v should
> > run, it did it for a reason (load distribution, SMT
> > support, cache hotness, affinity, etc), and we really
> > should try as hard as possible to stick to that.
> > 
> > Of course, we can't be too strict, or we risk leaving
> > vcpus in the runqueue while there is available CPU
> > capacity. So, we only leave v in runqueue --for X to
> > pick it up-- if we see that X has been tickled and
> > has not scheduled yet, i.e., it will have a real chance
> > of actually select and schedule v.
> > 
> > If that is not the case, we schedule it on Y (or, at
> > least, we consider that), as running somewhere non-ideal
> > is better than not running at all.
> > 
> > The commit also adds performance counters for each of
> > the possible situations.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> > ---
> > Cc: George Dunlap <george.dunlap@citrix.com>
> > Cc: Anshul Makkar <anshul.makkar@citrix.com>
> > Cc: Jan Beulich <JBeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> >   xen/common/sched_credit2.c   |   65
> > +++++++++++++++++++++++++++++++++++++++---
> >   xen/include/xen/perfc_defn.h |    3 ++
> >   2 files changed, 64 insertions(+), 4 deletions(-)
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > index 12dfd20..a3d7beb 100644
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -54,6 +54,7 @@
> >   #define TRC_CSCHED2_LOAD_CHECK       TRC_SCHED_CLASS_EVT(CSCHED2,
> > 16)
> >   #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2,
> > 17)
> >   #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2,
> > 19)
> > +#define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2,
> > 20)
> > 
> >   /*
> >    * WARNING: This is still in an experimental phase.  Status and
> > work can be found at the
> > @@ -398,6 +399,7 @@ struct csched2_vcpu {
> >       int credit;
> >       s_time_t start_time; /* When we were scheduled (used for
> > credit) */
> >       unsigned flags;      /* 16 bits doesn't seem to play well
> > with clear_bit() */
> > +    int tickled_cpu;     /* cpu tickled for picking us up (-1 if
> > none) */
> > 
> >       /* Individual contribution to load */
> >       s_time_t load_last_update;  /* Last time average was updated
> > */
> > @@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops,
> > struct csched2_vcpu *new, s_time_t now)
> >       __cpumask_set_cpu(ipid, &rqd->tickled);
> >       smt_idle_mask_clear(ipid, &rqd->smt_idle);
> >       cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
> > +
> > +    if ( unlikely(new->tickled_cpu != -1) )
> > +        SCHED_STAT_CRANK(tickled_cpu_overwritten);
> > +    new->tickled_cpu = ipid;
> >   }
> > 
> >   /*
> > @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler
> > *ops, struct vcpu *vc, void *dd)
> >           ASSERT(svc->sdom != NULL);
> >           svc->credit = CSCHED2_CREDIT_INIT;
> >           svc->weight = svc->sdom->weight;
> > +        svc->tickled_cpu = -1;
> >           /* Starting load of 50% */
> >           svc->avgload = 1ULL << (CSCHED2_PRIV(ops)-
> > >load_precision_shift - 1);
> >           svc->load_last_update = NOW() >>
> > LOADAVG_GRANULARITY_SHIFT;
> > @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler
> > *ops, struct vcpu *vc, void *dd)
> >       else
> >       {
> >           ASSERT(svc->sdom == NULL);
> > +        svc->tickled_cpu = svc->vcpu->vcpu_id;
> If I understood correctly, tickled_cpu refers to pcpu and not a
> vcpu. 
> Saving vcpu_id in tickled_cpu looks wrong.
> 
Yes, and in fact, as you can see in the previous hunk, for pretty much
all vcpus, tickled_cpu is initialized to -1.

Here, we are dealing with the vcpus of the idle domain. And for vcpus
of the idle domain, their vcpu id is the same as the id of the pcpu
they're associated to.

I agree it looks a little bit weird, but it's both correct, and the
easiest and cleanest way for initializing this.

I guess I can add a comment...

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2
  2016-08-22  9:21   ` Ian Jackson
@ 2016-09-05 14:02     ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-05 14:02 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, Wei Liu


[-- Attachment #1.1: Type: text/plain, Size: 766 bytes --]

On Mon, 2016-08-22 at 10:21 +0100, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 14/24] libxl: allow to set the
> ratelimit value online for Credit2"):
> > 
> > This is the remaining part of the plumbing (the libxl
> > one) necessary to be able to change the value of the
> > ratelimit_us parameter online, for Credit2 (like it is
> > already for Credit1).
>
> I think this should have a HAVE #define.
> 
Ah, sure. Sorry I've forgotten, I'll add it in v2.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle()
  2016-09-01 10:52   ` anshul makkar
@ 2016-09-05 14:55     ` Dario Faggioli
  2016-09-07 13:24       ` anshul makkar
  0 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-09-05 14:55 UTC (permalink / raw)
  To: anshul makkar, xen-devel; +Cc: Justin T. Weaver, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 2466 bytes --]

On Thu, 2016-09-01 at 11:52 +0100, anshul makkar wrote:
> On 17/08/16 18:19, Dario Faggioli wrote:
> > 
> > +    /*
> > +     * We're doing soft-affinity, and we know that the current
> > vcpu on cpu
> > +     * has a soft affinity. We now want to know whether cpu itself
> > is in
> Please can you explain the above statment. If the vcpu has soft
> affinity 
> and its currently executing, doesn;t it always means that its running
> on 
> one of the pcpu which is there in its soft affinity or hard affinity?
>
A vcpu will always run on a pcpu from its own hard-affinity (that's the
definition of hard-affinity).

On the other hand, a vcpu will, most of the time, run on a cpu from its
own soft affinity, but can run on a cpu that is in its hard-affinity,
but *IS NOT* in its soft-affinity.

That's the definition of soft-affinity: the scheduler will try to run
it there, but it that can't happen, it will run it will run it outside
of it (but still within its hard-affinity, of course).

So, yes, we know already that it's running in a cpu at least from its
hard affinity, what is it exactly that you are not understanding?

> > +     * such affinity. In fact, since we now that new (in
> > runq_tickle()) is
> Typo:   * such affinity. In fact, since now we know that new (in 
> runq_tickle()) is
>
Thanks. :-)

> > +     *  - if cpu is not in cur's soft-affinity, we should indeed
> > check to
> > +     *    see whether new should preempt cur. If that will be the
> > case, that
> > +     *    would be an improvement wrt respecting soft affinity;
> > +     *  - if cpu is in cur's soft-affinity, we leave it alone and
> > (in
> > +     *    runq_tickle()) move on to another cpu. In fact, we don't
> > want to
> > +     *    be too harsh with someone which is running within its
> > soft-affinity.
> > +     *    This is safe because later, if we don't fine anyone else
> > during the
> > +     *    soft-affinity step, we will check cpu for preemption
> > anyway, when
> > +     *    doing hard-affinity.
> > +     */
>
Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-09-05 13:47     ` Dario Faggioli
@ 2016-09-07 12:25       ` anshul makkar
  2016-09-13 11:13       ` George Dunlap
  1 sibling, 0 replies; 84+ messages in thread
From: anshul makkar @ 2016-09-07 12:25 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

On 05/09/16 14:47, Dario Faggioli wrote:
> On Wed, 2016-08-31 at 18:10 +0100, anshul makkar wrote:
>> On 17/08/16 18:18, Dario Faggioli wrote:
>>>
>>> @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler
>>> *ops, struct vcpu *vc, void *dd)
>>>            ASSERT(svc->sdom != NULL);
>>>            svc->credit = CSCHED2_CREDIT_INIT;
>>>            svc->weight = svc->sdom->weight;
>>> +        svc->tickled_cpu = -1;
>>>            /* Starting load of 50% */
>>>            svc->avgload = 1ULL << (CSCHED2_PRIV(ops)-
>>>> load_precision_shift - 1);
>>>            svc->load_last_update = NOW() >>
>>> LOADAVG_GRANULARITY_SHIFT;
>>> @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler
>>> *ops, struct vcpu *vc, void *dd)
>>>        else
>>>        {
>>>            ASSERT(svc->sdom == NULL);
>>> +        svc->tickled_cpu = svc->vcpu->vcpu_id;
>> If I understood correctly, tickled_cpu refers to pcpu and not a
>> vcpu.
>> Saving vcpu_id in tickled_cpu looks wrong.
>>
> Yes, and in fact, as you can see in the previous hunk, for pretty much
> all vcpus, tickled_cpu is initialized to -1.
>
> Here, we are dealing with the vcpus of the idle domain. And for vcpus
> of the idle domain, their vcpu id is the same as the id of the pcpu
> they're associated to.
Ah, that makes it clear .
>
> I agree it looks a little bit weird, but it's both correct, and the
> easiest and cleanest way for initializing this.
>
> I guess I can add a comment...
That will be useful.
>
> Thanks and Regards,
> Dario
>
Thanks
Anshul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()
  2016-09-05 13:26     ` Dario Faggioli
@ 2016-09-07 12:52       ` anshul makkar
  0 siblings, 0 replies; 84+ messages in thread
From: anshul makkar @ 2016-09-07 12:52 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Justin T. Weaver, George Dunlap

On 05/09/16 14:26, Dario Faggioli wrote:
> On Thu, 2016-09-01 at 12:08 +0100, anshul makkar wrote:
>> On 17/08/16 18:19, Dario Faggioli wrote:
>
>> Can't we
>> just read their workload or we can change the locktype to allow
>> reading ?
>>
> Reading without taking the lock would race against the load value being
> updated. And updating the load is done by __update_runq_load(), which,
> with all it's shifts and maths, by no means is an atomic operation.
>
> So it's not just a matter of risking to read a slightly outdated value,
> which, I agree, may not be that bad, it's that we risk reading
> something inconsistent. :-/
>
Ok. Got it and agree.
> About "changing the locktype", I guess you mean that we can turn also
> the runqueue lock into an rw-lock? If yes, that's indeed interesting,
> and I've also thought about it, but, for now, always deferred trying to
> actually do that.
Yes.
>
> It's technically non trivial, as it would involve changing
> schedule_data->schedule_lock and all the {v,p}cpu_schedule_lock*()
> functions. Also, it's a lock that will almost all the times be taken
> for writing, which usually means what you want is a proper spinlock.
>
> So, IMO, before embarking in doing something like that, it'd be good to
> figure out how frequently we actually fail to take the remote runqueue
> lock, and what's the real impact of having to deal the consequence of
> that.
>
Ok. Lets discuss on that to finalize the approach.
> I'm not saying it's not worth a try, but I'm saying that's it's
> something at high risk of being a lot of work for a very small gain,
> and that there are more important things to focus on.
>
>>> + * we pick, in order of decreasing preference:
>>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft
>>> affinity;
>> svc's current runqueue. Do you mean the runq in which svc is
>> currently
>> queued ?
>>
> I mean the runqueue to which svc is currently assigned (again, as per
> runq_assign()/runq_deassing()), which in turns mean that, if svc is
> queued in a runqueue, it's queues there (so, I guess the short answer
> to your question is "yes" :-D).
>
Ok.
> Regards,
> Dario
>
Anshul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle()
  2016-09-05 14:55     ` Dario Faggioli
@ 2016-09-07 13:24       ` anshul makkar
  2016-09-07 13:31         ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: anshul makkar @ 2016-09-07 13:24 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Justin T. Weaver, George Dunlap

On 05/09/16 15:55, Dario Faggioli wrote:
> On Thu, 2016-09-01 at 11:52 +0100, anshul makkar wrote:
>> On 17/08/16 18:19, Dario Faggioli wrote:
>>>
>>> +    /*
>>> +     * We're doing soft-affinity, and we know that the current
>>> vcpu on cpu
>>> +     * has a soft affinity. We now want to know whether cpu itself
>>> is in
>> Please can you explain the above statment. If the vcpu has soft
>> affinity
>> and its currently executing, doesn;t it always means that its running
>> on
>> one of the pcpu which is there in its soft affinity or hard affinity?
>>
> A vcpu will always run on a pcpu from its own hard-affinity (that's the
> definition of hard-affinity).
>
> On the other hand, a vcpu will, most of the time, run on a cpu from its
> own soft affinity, but can run on a cpu that is in its hard-affinity,
> but *IS NOT* in its soft-affinity.
>
> That's the definition of soft-affinity: the scheduler will try to run
> it there, but it that can't happen, it will run it will run it outside
> of it (but still within its hard-affinity, of course).
>
> So, yes, we know already that it's running in a cpu at least from its
> hard affinity, what is it exactly that you are not understanding?

If I put it simply ,  can  (X being a vcpu)
x {soft affinity pcpus} Intersect x { hard affinity pcpu} -> be Null or 
disjoint set ?
and

x{runnable pcpu} intersect (x{hard affinity pcpu} union x{soft affinity 
pcpu} ) -> be null or disjoint ??

>
>>> +     * such affinity. In fact, since we now that new (in
>>> runq_tickle()) is
>> Typo:   * such affinity. In fact, since now we know that new (in
>> runq_tickle()) is
>>
> Thanks. :-)

>>> +     */
>>
> Regards,
> Dario
>
Anshul Makkar

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle()
  2016-09-07 13:24       ` anshul makkar
@ 2016-09-07 13:31         ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-07 13:31 UTC (permalink / raw)
  To: anshul makkar, xen-devel; +Cc: Justin T. Weaver, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1754 bytes --]

On Wed, 2016-09-07 at 14:24 +0100, anshul makkar wrote:
> On 05/09/16 15:55, Dario Faggioli wrote:
> > On Thu, 2016-09-01 at 11:52 +0100, anshul makkar wrote:
> > So, yes, we know already that it's running in a cpu at least from
> > its
> > hard affinity, what is it exactly that you are not understanding?
> If I put it simply ,  can  (X being a vcpu)
> x {soft affinity pcpus} Intersect x { hard affinity pcpu} -> be Null
> or 
> disjoint set ?
> and
> 
The user can setup things such that:

 soft-affinity{X} intersection hard-affinity{X} = O

but this, here inside the scheduler, is considered like X does not have
any soft-affinity at all, i.e., only X's hard-affinity is considered,
and all the balancing steps and operations and consideration related to
soft-affinity are ignored/skipped.

That's because it's absolutely pointless to try figure out where to
execute X, among the set of the pCPUs it prefers to run on, if it
_can't_ actually run on any pCPU from that same set.

So the answer to your question is: "it seems to be possible for the
intersection to be void, but in practise, it is not." :-)

> x{runnable pcpu} intersect (x{hard affinity pcpu} union x{soft
> affinity 
> pcpu} ) -> be null or disjoint ??
> 
I still don't get this. In particular, I'm not sure what
'x{runnable pcpu}' is. Also the union of a vcpu's soft and hard
affinity is never done (it's, as explained above, the intersection that
counts).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic.
  2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
@ 2016-09-12 15:01   ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-12 15:01 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar, David Vrabel

On 17/08/16 18:17, Dario Faggioli wrote:
> If, when vcpu x wakes up, there are no idle pcpus in x's
> soft-affinity, we just go ahead and look at its hard
> affinity. This basically means that, if, in __runq_tickle(),
> new_idlers_empty is true, balance_step is equal to
> CSCHED_BALANCE_HARD_AFFINITY, and that calling
> csched_balance_cpumask() for whatever vcpu, would just
> return the vcpu's cpu_hard_affinity.
> 
> Therefore, don't bother calling it (it's just pure
> overhead) and use cpu_hard_affinity directly.
> 
> For this very reason, this patch should only be
> a (slight) optimization, and entail no functional
> change.
> 
> As a side note, it would make sense to do what the
> patch does, even if we could be inside the
> [[ new_idlers_empty && new->pri > cur->pri ]] if
> with balance_step equal to CSCHED_BALANCE_SOFT_AFFINITY.
> In fact, what is actually happening is:
>  - vcpu x is waking up, and (since there aren't suitable
>    idlers, and it's entitled for it) it is preempting
>    vcpu y;
>  - vcpu y's hard-affinity is a superset of its
>    soft-affinity mask.
> 
> Therefore, it makes sense to use wider possible mask,
> as by doing that, we maximize the probability of
> finding an idle pcpu in there, to which we can send
> vcpu y, which then will be able to run.
> 
> While there, also fix the comment, which included
> an awkward parenthesis nesting.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> ---
>  xen/common/sched_credit.c |    8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 220ff0d..6eccf09 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -424,9 +424,9 @@ static inline void __runq_tickle(struct csched_vcpu *new)
>              /*
>               * If there are no suitable idlers for new, and it's higher
>               * priority than cur, check whether we can migrate cur away.
> -             * (We have to do it indirectly, via _VPF_migrating, instead
> +             * We have to do it indirectly, via _VPF_migrating (instead
>               * of just tickling any idler suitable for cur) because cur
> -             * is running.)
> +             * is running.
>               *
>               * If there are suitable idlers for new, no matter priorities,
>               * leave cur alone (as it is running and is, likely, cache-hot)
> @@ -435,9 +435,7 @@ static inline void __runq_tickle(struct csched_vcpu *new)
>               */
>              if ( new_idlers_empty && new->pri > cur->pri )
>              {
> -                csched_balance_cpumask(cur->vcpu, balance_step,
> -                                       cpumask_scratch_cpu(cpu));
> -                if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity,
>                                          &idle_mask) )
>                  {
>                      SCHED_VCPU_STAT_CRANK(cur, kicked_away);
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1
  2016-08-17 23:42   ` Dario Faggioli
@ 2016-09-12 15:04     ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-12 15:04 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Anshul Makkar, David Vrabel

On Thu, Aug 18, 2016 at 12:42 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Wed, 2016-08-17 at 19:17 +0200, Dario Faggioli wrote:
>> If there are idle pcpus inside the waking vcpu's
>> soft-affinity mask, we should really tickle one
>> of them (this is one of the purposes of the
>> __runq_tickle() function itself!), not just
>> any idle pcpu.
>>
>> The issue has been introduced in 02ea5031825d
>> ("credit1: properly deal with pCPUs not in any cpupool"),
>> where the usage of idle_mask is changed, without
>> updating the bottom of the function, where it
>> is also referenced.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice.
  2016-08-17 17:17 ` [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice Dario Faggioli
@ 2016-09-12 15:14   ` George Dunlap
  2016-09-12 17:00     ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-12 15:14 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 17/08/16 18:17, Dario Faggioli wrote:
> If vcpu x has run for 200us, and sched_ratelimit_us is
> 1000us, continue running x _but_ return 1000us-200us as
> the next time slice. This way, next scheduling point will
> happen in 800us, i.e., exactly at the point when x crosses
> the threshold, and can be descheduled (if appropriate).
> 
> Right now (without this patch), we're always returning
> sched_ratelimit_us (1000us, in the example above), which
> means we're (potentially) allowing x to run more than
> it should have been able to (even when considering rate
> limiting into account).

Part of the reason I went with this in the first place was because I
wanted to avoid really short timers.  Part of the reason for the
ratelimit was actually to limit the amount of time spent in the
scheduler.  Since we expect ratelimit to normally be pretty short,
waiting for the whole ratelimit time seemed like a good idea.

Thoughts?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice.
  2016-09-12 15:14   ` George Dunlap
@ 2016-09-12 17:00     ` Dario Faggioli
  2016-09-14  9:34       ` George Dunlap
  0 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-09-12 17:00 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 3053 bytes --]

On Mon, 2016-09-12 at 16:14 +0100, George Dunlap wrote:
> On 17/08/16 18:17, Dario Faggioli wrote:
> > 
> > If vcpu x has run for 200us, and sched_ratelimit_us is
> > 1000us, continue running x _but_ return 1000us-200us as
> > the next time slice. This way, next scheduling point will
> > happen in 800us, i.e., exactly at the point when x crosses
> > the threshold, and can be descheduled (if appropriate).
> > 
> > Right now (without this patch), we're always returning
> > sched_ratelimit_us (1000us, in the example above), which
> > means we're (potentially) allowing x to run more than
> > it should have been able to (even when considering rate
> > limiting into account).
> Part of the reason I went with this in the first place was because I
> wanted to avoid really short timers.  Part of the reason for the
> ratelimit was actually to limit the amount of time spent in the
> scheduler.  Since we expect ratelimit to normally be pretty short,
> waiting for the whole ratelimit time seemed like a good idea.
> 
I see what you mean.

I personally am not a fan of ratelimit, because of how it acts behind
the algorithm's back and mess with and partly invalidates the
algorithms own assumptions and properties (not that these assumptions
and properties are very clear in Credit1, even before ratelimiting,
but, anyway :-)), and this is an example of that.

Nevertheless, I see that this patch may, up to some extent, re-
introduce some of the "small timers" that it was ratelemiting's own
purpose to mitigate.

So, I guess, either we make this a three way condition, introducing an
absolute minimum runtime value, under which we don't ever want to go,
e.g.:

 tsclice = MICROSECS(prv->runtime_us) - runtime > CSCHED_MIN_TIMER :
           MICROSECS(prv->runtime_us) - runtime : CSCHED_MIN_TIMER;

or we leave things as they are now.

The MIN_TIMER option would let at least the cases where the vcpu has
run for a few, but not too much, time to be more precisely scheduled.
E.g., if ratelimit_us is 1000us, MIN_TIMER is 500us, and the vcpu run
for 400us, we let it be preempted after 600us more (i.e., 1000us in
total == ratelimit_us), instead of after 1000us more (i.e., 1400us in
total).

I also agree on the fact that most of the times ratelimit_us and
MIN_TIMER will be close enough (like in the example above) that it
won't probably matter much... but if someone set ratelimit_us to
something higher (say, 10ms --we accept values as big as the timeslice,
which is 30ms b default) it may matter a bit.

What do you think?

If we decide not to care, and leave things as they are, I'd add a
comment saying that code is like that on purpose, so we won't trip over
this again in 1 or 2 years. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu.
  2016-08-17 17:18 ` [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu Dario Faggioli
@ 2016-09-12 17:11   ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-12 17:11 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar

On 17/08/16 18:18, Dario Faggioli wrote:
> If wanting to migrate a vcpu that is actually running,
> we need to ask the scheduler to chime in as soon as
> possible, to have the vcpu itself stopped and actually
> moved.
> 
> Make sure this happens by, after setting all the relevant
> flags, raising the scheduler softirq.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index a5a744f..12dfd20 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1667,6 +1667,7 @@ static void migrate(const struct scheduler *ops,
>          svc->migrate_rqd = trqd;
>          __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
>          __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
> +        cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
>          SCHED_STAT_CRANK(migrate_requested);
>      }
>      else
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-09-05 13:47     ` Dario Faggioli
  2016-09-07 12:25       ` anshul makkar
@ 2016-09-13 11:13       ` George Dunlap
  2016-09-29 15:24         ` Dario Faggioli
  1 sibling, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-13 11:13 UTC (permalink / raw)
  To: Dario Faggioli, anshul makkar, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 05/09/16 14:47, Dario Faggioli wrote:
> On Wed, 2016-08-31 at 18:10 +0100, anshul makkar wrote:
>> On 17/08/16 18:18, Dario Faggioli wrote:
>>>
>>> Right now, the following scenario can occurr:
>>>   - upon vcpu v wakeup, v itself is put in the runqueue,
>>>     and pcpu X is tickled;
>>>   - pcpu Y schedules (for whatever reason), sees v in
>>>     the runqueue and picks it up.
>>>
>>> This may seem ok (or even a good thing), but it's not.
>>> In fact, if runq_tickle() decided X is where v should
>>> run, it did it for a reason (load distribution, SMT
>>> support, cache hotness, affinity, etc), and we really
>>> should try as hard as possible to stick to that.
>>>
>>> Of course, we can't be too strict, or we risk leaving
>>> vcpus in the runqueue while there is available CPU
>>> capacity. So, we only leave v in runqueue --for X to
>>> pick it up-- if we see that X has been tickled and
>>> has not scheduled yet, i.e., it will have a real chance
>>> of actually select and schedule v.
>>>
>>> If that is not the case, we schedule it on Y (or, at
>>> least, we consider that), as running somewhere non-ideal
>>> is better than not running at all.
>>>
>>> The commit also adds performance counters for each of
>>> the possible situations.
>>>
>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>>> ---
>>> Cc: George Dunlap <george.dunlap@citrix.com>
>>> Cc: Anshul Makkar <anshul.makkar@citrix.com>
>>> Cc: Jan Beulich <JBeulich@suse.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>>   xen/common/sched_credit2.c   |   65
>>> +++++++++++++++++++++++++++++++++++++++---
>>>   xen/include/xen/perfc_defn.h |    3 ++
>>>   2 files changed, 64 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/xen/common/sched_credit2.c
>>> b/xen/common/sched_credit2.c
>>> index 12dfd20..a3d7beb 100644
>>> --- a/xen/common/sched_credit2.c
>>> +++ b/xen/common/sched_credit2.c
>>> @@ -54,6 +54,7 @@
>>>   #define TRC_CSCHED2_LOAD_CHECK       TRC_SCHED_CLASS_EVT(CSCHED2,
>>> 16)
>>>   #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2,
>>> 17)
>>>   #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2,
>>> 19)
>>> +#define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2,
>>> 20)
>>>
>>>   /*
>>>    * WARNING: This is still in an experimental phase.  Status and
>>> work can be found at the
>>> @@ -398,6 +399,7 @@ struct csched2_vcpu {
>>>       int credit;
>>>       s_time_t start_time; /* When we were scheduled (used for
>>> credit) */
>>>       unsigned flags;      /* 16 bits doesn't seem to play well
>>> with clear_bit() */
>>> +    int tickled_cpu;     /* cpu tickled for picking us up (-1 if
>>> none) */
>>>
>>>       /* Individual contribution to load */
>>>       s_time_t load_last_update;  /* Last time average was updated
>>> */
>>> @@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops,
>>> struct csched2_vcpu *new, s_time_t now)
>>>       __cpumask_set_cpu(ipid, &rqd->tickled);
>>>       smt_idle_mask_clear(ipid, &rqd->smt_idle);
>>>       cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
>>> +
>>> +    if ( unlikely(new->tickled_cpu != -1) )
>>> +        SCHED_STAT_CRANK(tickled_cpu_overwritten);
>>> +    new->tickled_cpu = ipid;
>>>   }
>>>
>>>   /*
>>> @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler
>>> *ops, struct vcpu *vc, void *dd)
>>>           ASSERT(svc->sdom != NULL);
>>>           svc->credit = CSCHED2_CREDIT_INIT;
>>>           svc->weight = svc->sdom->weight;
>>> +        svc->tickled_cpu = -1;
>>>           /* Starting load of 50% */
>>>           svc->avgload = 1ULL << (CSCHED2_PRIV(ops)-
>>>> load_precision_shift - 1);
>>>           svc->load_last_update = NOW() >>
>>> LOADAVG_GRANULARITY_SHIFT;
>>> @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler
>>> *ops, struct vcpu *vc, void *dd)
>>>       else
>>>       {
>>>           ASSERT(svc->sdom == NULL);
>>> +        svc->tickled_cpu = svc->vcpu->vcpu_id;
>> If I understood correctly, tickled_cpu refers to pcpu and not a
>> vcpu. 
>> Saving vcpu_id in tickled_cpu looks wrong.
>>
> Yes, and in fact, as you can see in the previous hunk, for pretty much
> all vcpus, tickled_cpu is initialized to -1.
> 
> Here, we are dealing with the vcpus of the idle domain. And for vcpus
> of the idle domain, their vcpu id is the same as the id of the pcpu
> they're associated to.

But what I haven't sussed out yet is why we need to initialize this for
the idle domain at all.  What benefit does it give you, and what effect
does it have?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-08-17 17:18 ` [PATCH 05/24] xen: credit2: make tickling more deterministic Dario Faggioli
  2016-08-31 17:10   ` anshul makkar
@ 2016-09-13 11:28   ` George Dunlap
  2016-09-30  2:22     ` Dario Faggioli
  1 sibling, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-13 11:28 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Andrew Cooper, Anshul Makkar, Jan Beulich

On 17/08/16 18:18, Dario Faggioli wrote:
> Right now, the following scenario can occurr:
>  - upon vcpu v wakeup, v itself is put in the runqueue,
>    and pcpu X is tickled;
>  - pcpu Y schedules (for whatever reason), sees v in
>    the runqueue and picks it up.
> 
> This may seem ok (or even a good thing), but it's not.
> In fact, if runq_tickle() decided X is where v should
> run, it did it for a reason (load distribution, SMT
> support, cache hotness, affinity, etc), and we really
> should try as hard as possible to stick to that.
> 
> Of course, we can't be too strict, or we risk leaving
> vcpus in the runqueue while there is available CPU
> capacity. So, we only leave v in runqueue --for X to
> pick it up-- if we see that X has been tickled and
> has not scheduled yet, i.e., it will have a real chance
> of actually select and schedule v.
> 
> If that is not the case, we schedule it on Y (or, at
> least, we consider that), as running somewhere non-ideal
> is better than not running at all.
> 
> The commit also adds performance counters for each of
> the possible situations.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/common/sched_credit2.c   |   65 +++++++++++++++++++++++++++++++++++++++---
>  xen/include/xen/perfc_defn.h |    3 ++
>  2 files changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 12dfd20..a3d7beb 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -54,6 +54,7 @@
>  #define TRC_CSCHED2_LOAD_CHECK       TRC_SCHED_CLASS_EVT(CSCHED2, 16)
>  #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2, 17)
>  #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2, 19)
> +#define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2, 20)
>  
>  /*
>   * WARNING: This is still in an experimental phase.  Status and work can be found at the
> @@ -398,6 +399,7 @@ struct csched2_vcpu {
>      int credit;
>      s_time_t start_time; /* When we were scheduled (used for credit) */
>      unsigned flags;      /* 16 bits doesn't seem to play well with clear_bit() */
> +    int tickled_cpu;     /* cpu tickled for picking us up (-1 if none) */
>  
>      /* Individual contribution to load */
>      s_time_t load_last_update;  /* Last time average was updated */
> @@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>      __cpumask_set_cpu(ipid, &rqd->tickled);
>      smt_idle_mask_clear(ipid, &rqd->smt_idle);
>      cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
> +
> +    if ( unlikely(new->tickled_cpu != -1) )
> +        SCHED_STAT_CRANK(tickled_cpu_overwritten);
> +    new->tickled_cpu = ipid;
>  }
>  
>  /*
> @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>          ASSERT(svc->sdom != NULL);
>          svc->credit = CSCHED2_CREDIT_INIT;
>          svc->weight = svc->sdom->weight;
> +        svc->tickled_cpu = -1;
>          /* Starting load of 50% */
>          svc->avgload = 1ULL << (CSCHED2_PRIV(ops)->load_precision_shift - 1);
>          svc->load_last_update = NOW() >> LOADAVG_GRANULARITY_SHIFT;
> @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>      else
>      {
>          ASSERT(svc->sdom == NULL);
> +        svc->tickled_cpu = svc->vcpu->vcpu_id;
>          svc->credit = CSCHED2_IDLE_CREDIT;
>          svc->weight = 0;
>      }
> @@ -2233,7 +2241,8 @@ void __dump_execstate(void *unused);
>  static struct csched2_vcpu *
>  runq_candidate(struct csched2_runqueue_data *rqd,
>                 struct csched2_vcpu *scurr,
> -               int cpu, s_time_t now)
> +               int cpu, s_time_t now,
> +               unsigned int *pos)

I think I'd prefer if this were called "skipped" or something like that
-- to indicate how many vcpus in the runqueue had been skipped before
coming to this one.

>  {
>      struct list_head *iter;
>      struct csched2_vcpu *snext = NULL;
> @@ -2262,13 +2271,29 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>  
>          /* Only consider vcpus that are allowed to run on this processor. */
>          if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
> +        {
> +            (*pos)++;
>              continue;
> +        }
> +
> +        /*
> +         * If a vcpu is meant to be picked up by another processor, and such
> +         * processor has not scheduled yet, leave it in the runqueue for him.
> +         */
> +        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> +             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> +        {
> +            (*pos)++;
> +            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> +            continue;
> +        }
>  
>          /* If this is on a different processor, don't pull it unless
>           * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
>          if ( svc->vcpu->processor != cpu
>               && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
>          {
> +            (*pos)++;
>              SCHED_STAT_CRANK(migrate_resisted);
>              continue;
>          }
> @@ -2280,9 +2305,26 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>  
>          /* In any case, if we got this far, break. */
>          break;
> +    }
>  
> +    if ( unlikely(tb_init_done) )
> +    {
> +        struct {
> +            unsigned vcpu:16, dom:16;
> +            unsigned tickled_cpu, position;
> +        } d;
> +        d.dom = snext->vcpu->domain->domain_id;
> +        d.vcpu = snext->vcpu->vcpu_id;
> +        d.tickled_cpu = snext->tickled_cpu;
> +        d.position = *pos;
> +        __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1,
> +                    sizeof(d),
> +                    (unsigned char *)&d);
>      }
>  
> +    if ( unlikely(snext->tickled_cpu != -1 && snext->tickled_cpu != cpu) )
> +        SCHED_STAT_CRANK(tickled_cpu_overridden);
> +
>      return snext;
>  }
>  
> @@ -2298,6 +2340,7 @@ csched2_schedule(
>      struct csched2_runqueue_data *rqd;
>      struct csched2_vcpu * const scurr = CSCHED2_VCPU(current);
>      struct csched2_vcpu *snext = NULL;
> +    unsigned int snext_pos = 0;
>      struct task_slice ret;
>  
>      SCHED_STAT_CRANK(schedule);
> @@ -2347,7 +2390,7 @@ csched2_schedule(
>          snext = CSCHED2_VCPU(idle_vcpu[cpu]);
>      }
>      else
> -        snext=runq_candidate(rqd, scurr, cpu, now);
> +        snext = runq_candidate(rqd, scurr, cpu, now, &snext_pos);
>  
>      /* If switching from a non-idle runnable vcpu, put it
>       * back on the runqueue. */
> @@ -2371,8 +2414,21 @@ csched2_schedule(
>              __set_bit(__CSFLAG_scheduled, &snext->flags);
>          }
>  
> -        /* Check for the reset condition */
> -        if ( snext->credit <= CSCHED2_CREDIT_RESET )
> +        /*
> +         * The reset condition is "has a scheduler epoch come to an end?".
> +         * The way this is enforced is checking whether the vcpu at the top
> +         * of the runqueue has negative credits. This means the epochs have
> +         * variable lenght, as in one epoch expores when:
> +         *  1) the vcpu at the top of the runqueue has executed for
> +         *     around 10 ms (with default parameters);
> +         *  2) no other vcpu with higher credits wants to run.
> +         *
> +         * Here, where we want to check for reset, we need to make sure the
> +         * proper vcpu is being used. In fact, runqueue_candidate() may have
> +         * not returned the first vcpu in the runqueue, for various reasons
> +         * (e.g., affinity). Only trigger a reset when it does.
> +         */
> +        if ( snext_pos == 0 && snext->credit <= CSCHED2_CREDIT_RESET )

This bit wasn't mentioned in the description. :-)

There's a certain amount of sense to the idea here, but it's the kind of
thing that may have strange side effects.  Did you look at traces before
and after this change?  And does the behavior seem more rational?

If so, I'm happy to trust your judgement -- just want to check to make
sure. :-)

Everything else looks good, thanks.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/24] xen: credit2: implement yield()
  2016-08-17 17:18 ` [PATCH 06/24] xen: credit2: implement yield() Dario Faggioli
@ 2016-09-13 13:33   ` George Dunlap
  2016-09-29 16:05     ` Dario Faggioli
  2016-09-20 13:25   ` George Dunlap
  1 sibling, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-13 13:33 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Andrew Cooper, Anshul Makkar, Jan Beulich

On 17/08/16 18:18, Dario Faggioli wrote:
> When a vcpu explicitly yields it is usually giving
> us an advice of "let someone else run and come back
> to me in a bit."
> 
> Credit2 isn't, so far, doing anything when a vcpu
> yields, which means an yield is basically a NOP (well,
> actually, it's pure overhead, as it causes the scheduler
> kick in, but the result is --at least 99% of the time--
> that the very same vcpu that yielded continues to run).
> 
> Implement a "preempt bias", to be applied to yielding
> vcpus. Basically when evaluating what vcpu to run next,
> if a vcpu that has just yielded is encountered, we give
> it a credit penalty, and check whether there is anyone
> else that would better take over the cpu (of course,
> if there isn't the yielding vcpu will continue).
> 
> The value of this bias can be configured with a boot
> time parameter, and the default is set to 1 ms.
> 
> Also, add an yield performance counter, and fix the
> style of a couple of comments.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Cool!  A few comments...

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> Note that this *only* consider the bias during the very scheduling decision
> that retults from the vcpu calling yield. After that, the __CSFLAG_vcpu_yield
> flag is reset, and during all furute scheduling decisions, the vcpu will
> compete with the other ones with its own amount of credits.
> 
> Alternatively, we can actually _subtract_ some credits to a yielding vcpu.
> That will sort of make the effect of a call to yield last in time.

But normally we want the yield to be temporary, right?  The kinds of
places it typically gets called is when the vcpu is waiting for a
spinlock held by another (probably pre-empted) vcpu.  Doing a permanent
credit subtraction will bias the credit algorithm against cpus that have
a high amount of spinlock contention (since probably all the vcpus will
be calling yield pretty regularly)

> I'm not sure which path is best. Personally, I like the subtract approach
> (perhaps, with a smaller bias than 1ms), but I think the "one shot" behavior
> implemented here is a good starting point. It is _something_, which is better
> than nothing, which is what we have without this patch! :-) It's lightweight
> (in its impact on the crediting algorithm, I mean), and benchmarks looks nice,
> so I propose we go for this one, and explore the "permanent" --subtraction
> based-- solution a bit more.

Yes, this is simple and should be effective for now.  We can look at
improving it later.

> ---
>  docs/misc/xen-command-line.markdown |   10 ++++++
>  xen/common/sched_credit2.c          |   62 +++++++++++++++++++++++++++++++----
>  xen/common/schedule.c               |    2 +
>  xen/include/xen/perfc_defn.h        |    1 +
>  4 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 3a250cb..5f469b1 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1389,6 +1389,16 @@ Choose the default scheduler.
>  ### sched\_credit2\_migrate\_resist
>  > `= <integer>`
>  
> +### sched\_credit2\_yield\_bias
> +> `= <integer>`
> +
> +> Default: `1000`
> +
> +Set how much a yielding vcpu will be penalized, in order to actually
> +give a chance to run to some other vcpu. This is basically a bias, in
> +favour of the non-yielding vcpus, expressed in microseconds (default
> +is 1ms).

Probably add _us to the end to indicate that the number is in microseconds.

> @@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>      struct list_head *iter;
>      struct csched2_vcpu *snext = NULL;
>      struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
> +    int yield_bias = 0;
>  
>      /* Default to current if runnable, idle otherwise */
>      if ( vcpu_runnable(scurr->vcpu) )
> +    {
> +        /*
> +         * The way we actually take yields into account is like this:
> +         * if scurr is yielding, when comparing its credits with other
> +         * vcpus in the runqueue, act like those other vcpus had yield_bias
> +         * more credits.
> +         */
> +        if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) )
> +            yield_bias = CSCHED2_YIELD_BIAS;
> +
>          snext = scurr;
> +    }
>      else
>          snext = CSCHED2_VCPU(idle_vcpu[cpu]);
>  
> @@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>      list_for_each( iter, &rqd->runq )
>      {
>          struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
> +        int svc_credit = svc->credit + yield_bias;

Just curious, why did you decide to add yield_bias to everyone else,
rather than just subtracting it from snext->credit?

>  
>          /* Only consider vcpus that are allowed to run on this processor. */
>          if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
> @@ -2288,19 +2321,23 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>              continue;
>          }
>  
> -        /* If this is on a different processor, don't pull it unless
> -         * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
> +        /*
> +         * If this is on a different processor, don't pull it unless
> +         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> +         */
>          if ( svc->vcpu->processor != cpu
> -             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
> +             && snext->credit + CSCHED2_MIGRATE_RESIST > svc_credit )
>          {
>              (*pos)++;
>              SCHED_STAT_CRANK(migrate_resisted);
>              continue;
>          }
>  
> -        /* If the next one on the list has more credit than current
> -         * (or idle, if current is not runnable), choose it. */
> -        if ( svc->credit > snext->credit )
> +        /*
> +         * If the next one on the list has more credit than current
> +         * (or idle, if current is not runnable), choose it.
> +         */
> +        if ( svc_credit > snext->credit )
>              snext = svc;
>  
>          /* In any case, if we got this far, break. */
> @@ -2399,6 +2436,8 @@ csched2_schedule(
>           && vcpu_runnable(current) )
>          __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
>  
> +    __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
> +
>      ret.migrated = 0;
>  
>      /* Accounting for non-idle tasks */
> @@ -2918,6 +2957,14 @@ csched2_init(struct scheduler *ops)
>      printk(XENLOG_INFO "load tracking window lenght %llu ns\n",
>             1ULL << opt_load_window_shift);
>  
> +    if ( opt_yield_bias < CSCHED2_YIELD_BIAS_MIN )
> +    {
> +        printk("WARNING: %s: opt_yield_bias %d too small, resetting\n",
> +               __func__, opt_yield_bias);
> +        opt_yield_bias = 1000; /* 1 ms */
> +    }

Why do we need a minimum bias?  And why reset it to 1ms rather than
SCHED2_YIELD_BIAS_MIN?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice.
  2016-09-12 17:00     ` Dario Faggioli
@ 2016-09-14  9:34       ` George Dunlap
  2016-09-14 13:54         ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-14  9:34 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 12/09/16 18:00, Dario Faggioli wrote:
> On Mon, 2016-09-12 at 16:14 +0100, George Dunlap wrote:
>> On 17/08/16 18:17, Dario Faggioli wrote:
>>>
>>> If vcpu x has run for 200us, and sched_ratelimit_us is
>>> 1000us, continue running x _but_ return 1000us-200us as
>>> the next time slice. This way, next scheduling point will
>>> happen in 800us, i.e., exactly at the point when x crosses
>>> the threshold, and can be descheduled (if appropriate).
>>>
>>> Right now (without this patch), we're always returning
>>> sched_ratelimit_us (1000us, in the example above), which
>>> means we're (potentially) allowing x to run more than
>>> it should have been able to (even when considering rate
>>> limiting into account).
>> Part of the reason I went with this in the first place was because I
>> wanted to avoid really short timers.  Part of the reason for the
>> ratelimit was actually to limit the amount of time spent in the
>> scheduler.  Since we expect ratelimit to normally be pretty short,
>> waiting for the whole ratelimit time seemed like a good idea.
>>
> I see what you mean.
> 
> I personally am not a fan of ratelimit, because of how it acts behind
> the algorithm's back and mess with and partly invalidates the
> algorithms own assumptions and properties (not that these assumptions
> and properties are very clear in Credit1, even before ratelimiting,
> but, anyway :-)), and this is an example of that.
> 
> Nevertheless, I see that this patch may, up to some extent, re-
> introduce some of the "small timers" that it was ratelemiting's own
> purpose to mitigate.
> 
> So, I guess, either we make this a three way condition, introducing an
> absolute minimum runtime value, under which we don't ever want to go,
> e.g.:
> 
>  tsclice = MICROSECS(prv->runtime_us) - runtime > CSCHED_MIN_TIMER :
>            MICROSECS(prv->runtime_us) - runtime : CSCHED_MIN_TIMER;
> 
> or we leave things as they are now.
> 
> The MIN_TIMER option would let at least the cases where the vcpu has
> run for a few, but not too much, time to be more precisely scheduled.
> E.g., if ratelimit_us is 1000us, MIN_TIMER is 500us, and the vcpu run
> for 400us, we let it be preempted after 600us more (i.e., 1000us in
> total == ratelimit_us), instead of after 1000us more (i.e., 1400us in
> total).
> 
> I also agree on the fact that most of the times ratelimit_us and
> MIN_TIMER will be close enough (like in the example above) that it
> won't probably matter much... but if someone set ratelimit_us to
> something higher (say, 10ms --we accept values as big as the timeslice,
> which is 30ms b default) it may matter a bit.
> 
> What do you think?
> 
> If we decide not to care, and leave things as they are, I'd add a
> comment saying that code is like that on purpose, so we won't trip over
> this again in 1 or 2 years. :-)

Yeah, I think while we're thinking about it we might as well add in the
MIN_TIMER thing you mention above (if you don't mind doing it).

Thanks,
 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice.
  2016-09-14  9:34       ` George Dunlap
@ 2016-09-14 13:54         ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-14 13:54 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1152 bytes --]

On Wed, 2016-09-14 at 10:34 +0100, George Dunlap wrote:
> On 12/09/16 18:00, Dario Faggioli wrote:
> > 
> > I also agree on the fact that most of the times ratelimit_us and
> > MIN_TIMER will be close enough (like in the example above) that it
> > won't probably matter much... but if someone set ratelimit_us to
> > something higher (say, 10ms --we accept values as big as the
> > timeslice,
> > which is 30ms b default) it may matter a bit.
> > 
> > What do you think?
> > 
> > If we decide not to care, and leave things as they are, I'd add a
> > comment saying that code is like that on purpose, so we won't trip
> > over
> > this again in 1 or 2 years. :-)
> Yeah, I think while we're thinking about it we might as well add in
> the
> MIN_TIMER thing you mention above (if you don't mind doing it).
> 
Sure I don't mind. I'll do it.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/24] xen: credit2: implement yield()
  2016-08-17 17:18 ` [PATCH 06/24] xen: credit2: implement yield() Dario Faggioli
  2016-09-13 13:33   ` George Dunlap
@ 2016-09-20 13:25   ` George Dunlap
  2016-09-20 13:37     ` George Dunlap
  1 sibling, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-20 13:25 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Andrew Cooper, Anshul Makkar, Jan Beulich

On 17/08/16 18:18, Dario Faggioli wrote:
> When a vcpu explicitly yields it is usually giving
> us an advice of "let someone else run and come back
> to me in a bit."
> 
> Credit2 isn't, so far, doing anything when a vcpu
> yields, which means an yield is basically a NOP (well,
> actually, it's pure overhead, as it causes the scheduler
> kick in, but the result is --at least 99% of the time--
> that the very same vcpu that yielded continues to run).
> 
> Implement a "preempt bias", to be applied to yielding
> vcpus. Basically when evaluating what vcpu to run next,
> if a vcpu that has just yielded is encountered, we give
> it a credit penalty, and check whether there is anyone
> else that would better take over the cpu (of course,
> if there isn't the yielding vcpu will continue).
> 
> The value of this bias can be configured with a boot
> time parameter, and the default is set to 1 ms.
> 
> Also, add an yield performance counter, and fix the
> style of a couple of comments.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> Note that this *only* consider the bias during the very scheduling decision
> that retults from the vcpu calling yield. After that, the __CSFLAG_vcpu_yield
> flag is reset, and during all furute scheduling decisions, the vcpu will
> compete with the other ones with its own amount of credits.
> 
> Alternatively, we can actually _subtract_ some credits to a yielding vcpu.
> That will sort of make the effect of a call to yield last in time.
> 
> I'm not sure which path is best. Personally, I like the subtract approach
> (perhaps, with a smaller bias than 1ms), but I think the "one shot" behavior
> implemented here is a good starting point. It is _something_, which is better
> than nothing, which is what we have without this patch! :-) It's lightweight
> (in its impact on the crediting algorithm, I mean), and benchmarks looks nice,
> so I propose we go for this one, and explore the "permanent" --subtraction
> based-- solution a bit more.
> ---
>  docs/misc/xen-command-line.markdown |   10 ++++++
>  xen/common/sched_credit2.c          |   62 +++++++++++++++++++++++++++++++----
>  xen/common/schedule.c               |    2 +
>  xen/include/xen/perfc_defn.h        |    1 +
>  4 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 3a250cb..5f469b1 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1389,6 +1389,16 @@ Choose the default scheduler.
>  ### sched\_credit2\_migrate\_resist
>  > `= <integer>`
>  
> +### sched\_credit2\_yield\_bias
> +> `= <integer>`
> +
> +> Default: `1000`
> +
> +Set how much a yielding vcpu will be penalized, in order to actually
> +give a chance to run to some other vcpu. This is basically a bias, in
> +favour of the non-yielding vcpus, expressed in microseconds (default
> +is 1ms).
> +
>  ### sched\_credit\_tslice\_ms
>  > `= <integer>`
>  
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index a3d7beb..569174b 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -144,6 +144,9 @@
>  #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
>  /* How much to "compensate" a vcpu for L2 migration */
>  #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
> +/* How big of a bias we should have against a yielding vcpu */
> +#define CSCHED2_YIELD_BIAS           ((opt_yield_bias)*MICROSECS(1))
> +#define CSCHED2_YIELD_BIAS_MIN       CSCHED2_MIN_TIMER
>  /* Reset: Value below which credit will be reset. */
>  #define CSCHED2_CREDIT_RESET         0
>  /* Max timer: Maximum time a guest can be run for. */
> @@ -181,11 +184,20 @@
>   */
>  #define __CSFLAG_runq_migrate_request 3
>  #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
> -
> +/*
> + * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The
> + * scheduler is invoked to see if we can give the cpu to someone else, and
> + * get back to the yielding vcpu in a while.
> + */
> +#define __CSFLAG_vcpu_yield 4
> +#define CSFLAG_vcpu_yield (1<<__CSFLAG_vcpu_yield)
>  
>  static unsigned int __read_mostly opt_migrate_resist = 500;
>  integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>  
> +static unsigned int __read_mostly opt_yield_bias = 1000;
> +integer_param("sched_credit2_yield_bias", opt_yield_bias);
> +
>  /*
>   * Useful macros
>   */
> @@ -1432,6 +1444,14 @@ out:
>  }
>  
>  static void
> +csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v)
> +{
> +    struct csched2_vcpu * const svc = CSCHED2_VCPU(v);
> +
> +    __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
> +}
> +
> +static void
>  csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
>  {
>      struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
> @@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>      struct list_head *iter;
>      struct csched2_vcpu *snext = NULL;
>      struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
> +    int yield_bias = 0;
>  
>      /* Default to current if runnable, idle otherwise */
>      if ( vcpu_runnable(scurr->vcpu) )
> +    {
> +        /*
> +         * The way we actually take yields into account is like this:
> +         * if scurr is yielding, when comparing its credits with other
> +         * vcpus in the runqueue, act like those other vcpus had yield_bias
> +         * more credits.
> +         */
> +        if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) )
> +            yield_bias = CSCHED2_YIELD_BIAS;
> +
>          snext = scurr;
> +    }
>      else
>          snext = CSCHED2_VCPU(idle_vcpu[cpu]);
>  
> @@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>      list_for_each( iter, &rqd->runq )
>      {
>          struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
> +        int svc_credit = svc->credit + yield_bias;
>  
>          /* Only consider vcpus that are allowed to run on this processor. */
>          if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
> @@ -2288,19 +2321,23 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>              continue;
>          }
>  
> -        /* If this is on a different processor, don't pull it unless
> -         * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
> +        /*
> +         * If this is on a different processor, don't pull it unless
> +         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> +         */
>          if ( svc->vcpu->processor != cpu
> -             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
> +             && snext->credit + CSCHED2_MIGRATE_RESIST > svc_credit )
>          {
>              (*pos)++;
>              SCHED_STAT_CRANK(migrate_resisted);
>              continue;
>          }
>  
> -        /* If the next one on the list has more credit than current
> -         * (or idle, if current is not runnable), choose it. */
> -        if ( svc->credit > snext->credit )
> +        /*
> +         * If the next one on the list has more credit than current
> +         * (or idle, if current is not runnable), choose it.
> +         */
> +        if ( svc_credit > snext->credit )
>              snext = svc;

Hmm, if we change snext, shouldn't we also zero out the yield bias?
Otherwise vcpus competing with snext (which will at this point have had
the YIELD flag cleared) will be given the yield bonus as well, which is
not what we want.  In fact, I think in this case we'll always choose the
*last* vcpu on the list unless there's one where the gap between N and
N+1 is greater than YIELD_BIAS, won't it?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields
  2016-08-17 17:18 ` [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields Dario Faggioli
@ 2016-09-20 13:32   ` George Dunlap
  2016-09-29 16:46     ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-20 13:32 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar

On 17/08/16 18:18, Dario Faggioli wrote:
> In both Credit1 and Credit2, if a vcpu yields, let it...
> well... yield!
> 
> In fact, context switch rate limiting has been primarily
> introduced to avoid too heavy context switch rate due to
> interrupts, and, in general, asynchronous events.
> 
> In a vcpu "voluntarily" yields, we really should let it
> give up the cpu for a while. For instance, the reason may
> be that it's about to start spinning, and there's few
> point in forcing a vcpu to spin for (potentially) the
> entire rate-limiting period.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit.c  |   20 +++++++++++--------
>  xen/common/sched_credit2.c |   47 +++++++++++++++++++++++---------------------
>  2 files changed, 37 insertions(+), 30 deletions(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 3f439a0..ca04732 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -1771,9 +1771,18 @@ csched_schedule(
>       *   cpu and steal it.
>       */
>  
> -    /* If we have schedule rate limiting enabled, check to see
> -     * how long we've run for. */
> -    if ( !tasklet_work_scheduled
> +    /*
> +     * If we have schedule rate limiting enabled, check to see
> +     * how long we've run for.
> +     *
> +     * If scurr is yielding, however, we don't let rate limiting kick in.
> +     * In fact, it may be the case that scurr is about to spin, and there's
> +     * no point forcing it to do so until rate limiting expires.
> +     *
> +     * While there, take the chance for clearing the yield flag at once.
> +     */
> +    if ( !test_and_clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr->flags)

It looks like YIELD is implemented by putting it lower in the runqueue
in __runqueue_insert().  But here you're clearing the flag before the
insert happens -- won't this effectively disable yield() for credit1?


> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 569174b..c8e0ee7 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -2267,36 +2267,40 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>      struct list_head *iter;
>      struct csched2_vcpu *snext = NULL;
>      struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
> -    int yield_bias = 0;
> -
> -    /* Default to current if runnable, idle otherwise */
> -    if ( vcpu_runnable(scurr->vcpu) )
> -    {
> -        /*
> -         * The way we actually take yields into account is like this:
> -         * if scurr is yielding, when comparing its credits with other
> -         * vcpus in the runqueue, act like those other vcpus had yield_bias
> -         * more credits.
> -         */
> -        if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) )
> -            yield_bias = CSCHED2_YIELD_BIAS;
> -
> -        snext = scurr;
> -    }
> -    else
> -        snext = CSCHED2_VCPU(idle_vcpu[cpu]);
> +    /*
> +     * The way we actually take yields into account is like this:
> +     * if scurr is yielding, when comparing its credits with other vcpus in
> +     * the runqueue, act like those other vcpus had yield_bias more credits.
> +     */
> +    int yield_bias = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) ?
> +                     CSCHED2_YIELD_BIAS : 0;
>  
>      /*
>       * Return the current vcpu if it has executed for less than ratelimit.
>       * Adjuststment for the selected vcpu's credit and decision
>       * for how long it will run will be taken in csched2_runtime.
> +     *
> +     * Note that, if scurr is yielding, we don't let rate limiting kick in.
> +     * In fact, it may be the case that scurr is about to spin, and there's
> +     * no point forcing it to do so until rate limiting expires.
> +     *
> +     * To check whether we are yielding, it's enough to look at yield_bias
> +     * (as CSCHED2_YIELD_BIAS can't be zero). Also, note that the yield flag
> +     * has been cleared already above.
>       */
> -    if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
> +    if ( !yield_bias &&
> +         prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
>           vcpu_runnable(scurr->vcpu) &&
>           (now - scurr->vcpu->runstate.state_entry_time) <
>            MICROSECS(prv->ratelimit_us) )
>          return scurr;
>  
> +    /* Default to current if runnable, idle otherwise */
> +    if ( vcpu_runnable(scurr->vcpu) )
> +        snext = scurr;
> +    else
> +        snext = CSCHED2_VCPU(idle_vcpu[cpu]);

This looks good, but the code re-organization probably goes better in
the previous patch.  Since you're re-sending anyway, would you mind
moving it there?

I'm not sure the credit2 yield-ratelimit needs to be in a separate
patch; since you're implementing yield in credit2 from scratch you could
just implement it all in one go.  But since you have a patch for credit1
anyway, I think whichever way is fine.

> +
>      list_for_each( iter, &rqd->runq )
>      {
>          struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
> @@ -2423,7 +2427,8 @@ csched2_schedule(
>       */
>      if ( tasklet_work_scheduled )
>      {
> -        trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0,  NULL);
> +        __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
> +        trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
>          snext = CSCHED2_VCPU(idle_vcpu[cpu]);
>      }
>      else
> @@ -2436,8 +2441,6 @@ csched2_schedule(
>           && vcpu_runnable(current) )
>          __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags);
>  
> -    __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
> -

This should probably go in the previous patch though.

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/24] xen: credit2: implement yield()
  2016-09-20 13:25   ` George Dunlap
@ 2016-09-20 13:37     ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-20 13:37 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Andrew Cooper, Anshul Makkar, Jan Beulich

On 20/09/16 14:25, George Dunlap wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
>> When a vcpu explicitly yields it is usually giving
>> us an advice of "let someone else run and come back
>> to me in a bit."
>>
>> Credit2 isn't, so far, doing anything when a vcpu
>> yields, which means an yield is basically a NOP (well,
>> actually, it's pure overhead, as it causes the scheduler
>> kick in, but the result is --at least 99% of the time--
>> that the very same vcpu that yielded continues to run).
>>
>> Implement a "preempt bias", to be applied to yielding
>> vcpus. Basically when evaluating what vcpu to run next,
>> if a vcpu that has just yielded is encountered, we give
>> it a credit penalty, and check whether there is anyone
>> else that would better take over the cpu (of course,
>> if there isn't the yielding vcpu will continue).
>>
>> The value of this bias can be configured with a boot
>> time parameter, and the default is set to 1 ms.
>>
>> Also, add an yield performance counter, and fix the
>> style of a couple of comments.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> ---
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Anshul Makkar <anshul.makkar@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> Note that this *only* consider the bias during the very scheduling decision
>> that retults from the vcpu calling yield. After that, the __CSFLAG_vcpu_yield
>> flag is reset, and during all furute scheduling decisions, the vcpu will
>> compete with the other ones with its own amount of credits.
>>
>> Alternatively, we can actually _subtract_ some credits to a yielding vcpu.
>> That will sort of make the effect of a call to yield last in time.
>>
>> I'm not sure which path is best. Personally, I like the subtract approach
>> (perhaps, with a smaller bias than 1ms), but I think the "one shot" behavior
>> implemented here is a good starting point. It is _something_, which is better
>> than nothing, which is what we have without this patch! :-) It's lightweight
>> (in its impact on the crediting algorithm, I mean), and benchmarks looks nice,
>> so I propose we go for this one, and explore the "permanent" --subtraction
>> based-- solution a bit more.
>> ---
>>  docs/misc/xen-command-line.markdown |   10 ++++++
>>  xen/common/sched_credit2.c          |   62 +++++++++++++++++++++++++++++++----
>>  xen/common/schedule.c               |    2 +
>>  xen/include/xen/perfc_defn.h        |    1 +
>>  4 files changed, 68 insertions(+), 7 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
>> index 3a250cb..5f469b1 100644
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1389,6 +1389,16 @@ Choose the default scheduler.
>>  ### sched\_credit2\_migrate\_resist
>>  > `= <integer>`
>>  
>> +### sched\_credit2\_yield\_bias
>> +> `= <integer>`
>> +
>> +> Default: `1000`
>> +
>> +Set how much a yielding vcpu will be penalized, in order to actually
>> +give a chance to run to some other vcpu. This is basically a bias, in
>> +favour of the non-yielding vcpus, expressed in microseconds (default
>> +is 1ms).
>> +
>>  ### sched\_credit\_tslice\_ms
>>  > `= <integer>`
>>  
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index a3d7beb..569174b 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -144,6 +144,9 @@
>>  #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
>>  /* How much to "compensate" a vcpu for L2 migration */
>>  #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
>> +/* How big of a bias we should have against a yielding vcpu */
>> +#define CSCHED2_YIELD_BIAS           ((opt_yield_bias)*MICROSECS(1))
>> +#define CSCHED2_YIELD_BIAS_MIN       CSCHED2_MIN_TIMER
>>  /* Reset: Value below which credit will be reset. */
>>  #define CSCHED2_CREDIT_RESET         0
>>  /* Max timer: Maximum time a guest can be run for. */
>> @@ -181,11 +184,20 @@
>>   */
>>  #define __CSFLAG_runq_migrate_request 3
>>  #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
>> -
>> +/*
>> + * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The
>> + * scheduler is invoked to see if we can give the cpu to someone else, and
>> + * get back to the yielding vcpu in a while.
>> + */
>> +#define __CSFLAG_vcpu_yield 4
>> +#define CSFLAG_vcpu_yield (1<<__CSFLAG_vcpu_yield)
>>  
>>  static unsigned int __read_mostly opt_migrate_resist = 500;
>>  integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>>  
>> +static unsigned int __read_mostly opt_yield_bias = 1000;
>> +integer_param("sched_credit2_yield_bias", opt_yield_bias);
>> +
>>  /*
>>   * Useful macros
>>   */
>> @@ -1432,6 +1444,14 @@ out:
>>  }
>>  
>>  static void
>> +csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v)
>> +{
>> +    struct csched2_vcpu * const svc = CSCHED2_VCPU(v);
>> +
>> +    __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
>> +}
>> +
>> +static void
>>  csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
>>  {
>>      struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
>> @@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>>      struct list_head *iter;
>>      struct csched2_vcpu *snext = NULL;
>>      struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
>> +    int yield_bias = 0;
>>  
>>      /* Default to current if runnable, idle otherwise */
>>      if ( vcpu_runnable(scurr->vcpu) )
>> +    {
>> +        /*
>> +         * The way we actually take yields into account is like this:
>> +         * if scurr is yielding, when comparing its credits with other
>> +         * vcpus in the runqueue, act like those other vcpus had yield_bias
>> +         * more credits.
>> +         */
>> +        if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) )
>> +            yield_bias = CSCHED2_YIELD_BIAS;
>> +
>>          snext = scurr;
>> +    }
>>      else
>>          snext = CSCHED2_VCPU(idle_vcpu[cpu]);
>>  
>> @@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>>      list_for_each( iter, &rqd->runq )
>>      {
>>          struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
>> +        int svc_credit = svc->credit + yield_bias;
>>  
>>          /* Only consider vcpus that are allowed to run on this processor. */
>>          if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
>> @@ -2288,19 +2321,23 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>>              continue;
>>          }
>>  
>> -        /* If this is on a different processor, don't pull it unless
>> -         * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
>> +        /*
>> +         * If this is on a different processor, don't pull it unless
>> +         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
>> +         */
>>          if ( svc->vcpu->processor != cpu
>> -             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
>> +             && snext->credit + CSCHED2_MIGRATE_RESIST > svc_credit )
>>          {
>>              (*pos)++;
>>              SCHED_STAT_CRANK(migrate_resisted);
>>              continue;
>>          }
>>  
>> -        /* If the next one on the list has more credit than current
>> -         * (or idle, if current is not runnable), choose it. */
>> -        if ( svc->credit > snext->credit )
>> +        /*
>> +         * If the next one on the list has more credit than current
>> +         * (or idle, if current is not runnable), choose it.
>> +         */
>> +        if ( svc_credit > snext->credit )
>>              snext = svc;
> 
> Hmm, if we change snext, shouldn't we also zero out the yield bias?
> Otherwise vcpus competing with snext (which will at this point have had
> the YIELD flag cleared) will be given the yield bonus as well, which is
> not what we want.  In fact, I think in this case we'll always choose the
> *last* vcpu on the list unless there's one where the gap between N and
> N+1 is greater than YIELD_BIAS, won't it?

Oops -- just noticed the next line:

        /* In any case, if we got this far, break. */
        break;

Nevermind. :-)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting.
  2016-08-17 17:18 ` [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting Dario Faggioli
  2016-08-18  0:57   ` Meng Xu
@ 2016-09-20 13:50   ` George Dunlap
  1 sibling, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-20 13:50 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Anshul Makkar, Meng Xu

On 17/08/16 18:18, Dario Faggioli wrote:
> As far as {csched, csched2, rt}_schedule() are concerned,
> an "empty" event, would already make it easier to read and
> understand a trace.
> 
> But while there, add a few useful information, like
> if the cpu that is going through the scheduler has
> been tickled or not, if it is currently idle, etc
> (they vary, on a per-scheduler basis).
> 
> For Credit1 and Credit2, add a record about when
> rate-limiting kicks in too.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit.c  |    7 +++++++
>  xen/common/sched_credit2.c |   38 +++++++++++++++++++++++++++++++++++++-
>  xen/common/sched_rt.c      |   15 +++++++++++++++
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index ca04732..f9d3ac9 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -134,6 +134,8 @@
>  #define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
>  #define TRC_CSCHED_BOOST_START   TRC_SCHED_CLASS_EVT(CSCHED, 7)
>  #define TRC_CSCHED_BOOST_END     TRC_SCHED_CLASS_EVT(CSCHED, 8)
> +#define TRC_CSCHED_SCHEDULE      TRC_SCHED_CLASS_EVT(CSCHED, 9)
> +#define TRC_CSCHED_RATELIMIT     TRC_SCHED_CLASS_EVT(CSCHED, 10)
>  
>  
>  /*
> @@ -1743,6 +1745,9 @@ csched_schedule(
>      SCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
>  
> +    TRACE_3D(TRC_CSCHED_SCHEDULE, cpu, tasklet_work_scheduled,
> +             is_idle_vcpu(current));

Sorry to be annoying, but you're using two full words here for two bits
of information, and scheduling isn't exactly a once-every-few-seconds
phenomenon. :-)  Would you mind packing this in a bit?

> +
>      runtime = now - current->runstate.state_entry_time;
>      if ( runtime < 0 ) /* Does this ever happen? */
>          runtime = 0;
> @@ -1792,6 +1797,8 @@ csched_schedule(
>          snext->start_time += now;
>          perfc_incr(delay_ms);
>          tslice = MICROSECS(prv->ratelimit_us) - runtime;
> +        TRACE_3D(TRC_CSCHED_RATELIMIT, scurr->vcpu->domain->domain_id,
> +                 scurr->vcpu->vcpu_id, runtime);

Same for this one, if you don't mind -- this one is less important
probably, but since you essentially have the code in credit2, it seems
like a pretty straightforward exercise to copy-and-paste it.

The credit2 traces look good -- thanks!

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 09/24] xen/tools: tracing: improve tracing of context switches.
  2016-08-17 17:18 ` [PATCH 09/24] xen/tools: tracing: improve tracing of context switches Dario Faggioli
@ 2016-09-20 14:08   ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-20 14:08 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

On 17/08/16 18:18, Dario Faggioli wrote:
> Right now, two out of the three events related to
> context switch (that is TRC_SCHED_SWITCH_INFPREV and
> TRC_SCHED_SWITCH_INFNEXT) only report the domain id,
> and not the vcpu id.
> 
> That's omitting a useful piece of information, and
> even if we be figured that out by looking at other
> records, that's unnecessarily complicated (especially
> if working on a trace from a sctipt).
> 
> This changes both the tracing code in Xen and the parsing
> code in tools at once, to avoid introducing transitional
> regressions.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Hmm, I'm tempted to complain about the lack of packing; but in any case
I think these traces are redundant with the runstate change information,
so there's no need to be terribly picky. :-)

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/xentrace/formats    |    4 ++--
>  tools/xentrace/xenalyze.c |   17 +++++++++--------
>  xen/common/schedule.c     |    8 ++++----
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/xentrace/formats b/tools/xentrace/formats
> index caafb5f..0de7990 100644
> --- a/tools/xentrace/formats
> +++ b/tools/xentrace/formats
> @@ -32,8 +32,8 @@
>  0x0002800b  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  s_timer_fn
>  0x0002800c  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  t_timer_fn
>  0x0002800d  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  dom_timer_fn
> -0x0002800e  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infprev    [ old_domid = 0x%(1)08x, runtime = %(2)d ]
> -0x0002800f  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infnext    [ new_domid = 0x%(1)08x, time = %(2)d, r_time = %(3)d ]
> +0x0002800e  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infprev    [ dom:vcpu = 0x%(1)04x%(2)04x, runtime = %(3)d ]
> +0x0002800f  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  switch_infnext    [ new_dom:vcpu = 0x%(1)04x%(2)04x, time = %(3)d, r_time = %(4)d ]
>  0x00028010  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  domain_shutdown_code [ dom:vcpu = 0x%(1)04x%(2)04x, reason = 0x%(3)08x ]
>  
>  0x00022001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:sched_tasklet
> diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
> index 11763a8..0b697d0 100644
> --- a/tools/xentrace/xenalyze.c
> +++ b/tools/xentrace/xenalyze.c
> @@ -7501,28 +7501,29 @@ void sched_process(struct pcpu_info *p)
>          case TRC_SCHED_SWITCH_INFPREV:
>              if(opt.dump_all) {
>                  struct {
> -                    unsigned int domid, runtime;
> +                    unsigned int domid, vcpuid, runtime;
>                  } *r = (typeof(r))ri->d;
>  
> -                printf(" %s sched_switch prev d%u, run for %u.%uus\n",
> -                       ri->dump_header, r->domid, r->runtime / 1000,
> -                       r->runtime % 1000);
> +                printf(" %s sched_switch prev d%uv%d, run for %u.%uus\n",
> +                       ri->dump_header, r->domid, r->vcpuid,
> +                       r->runtime / 1000, r->runtime % 1000);
>              }
>              break;
>          case TRC_SCHED_SWITCH_INFNEXT:
>              if(opt.dump_all)
>              {
>                  struct {
> -                    unsigned int domid, rsince;
> +                    unsigned int domid, vcpuid, rsince;
>                      int slice;
>                  } *r = (typeof(r))ri->d;
>  
> -                printf(" %s sched_switch next d%u", ri->dump_header, r->domid);
> +                printf(" %s sched_switch next d%uv%u", ri->dump_header,
> +                       r->domid, r->vcpuid);
>                  if ( r->rsince != 0 )
> -                    printf(", was runnable for %u.%uus, ", r->rsince / 1000,
> +                    printf(", was runnable for %u.%uus", r->rsince / 1000,
>                             r->rsince % 1000);
>                  if ( r->slice > 0 )
> -                    printf("next slice %u.%uus", r->slice / 1000,
> +                    printf(", next slice %u.%uus", r->slice / 1000,
>                             r->slice % 1000);
>                  printf("\n");
>              }
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index abe063d..5b444c4 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1390,11 +1390,11 @@ static void schedule(void)
>          return continue_running(prev);
>      }
>  
> -    TRACE_2D(TRC_SCHED_SWITCH_INFPREV,
> -             prev->domain->domain_id,
> +    TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
> +             prev->domain->domain_id, prev->vcpu_id,
>               now - prev->runstate.state_entry_time);
> -    TRACE_3D(TRC_SCHED_SWITCH_INFNEXT,
> -             next->domain->domain_id,
> +    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
> +             next->domain->domain_id, next->vcpu_id,
>               (next->runstate.state == RUNSTATE_runnable) ?
>               (now - next->runstate.state_entry_time) : 0,
>               next_slice.time);
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records
  2016-08-17 17:18 ` [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records Dario Faggioli
@ 2016-09-20 14:35   ` George Dunlap
  2016-09-29 17:23     ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-20 14:35 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

On 17/08/16 18:18, Dario Faggioli wrote:
> In both Credit2's trace records relative to checking
> whether we want to preempt a vcpu (in runq_tickle()),
> and to credits being burn, make it explicit on which
> pcpu the vcpu being considered is running.
> 
> Such information isn't currently available, not even
> by looking at on which pcpu the events happen, as we
> do both the above operation from a certain pcpu on
> vcpus running on different pcpus.

But you should be able to tell where a given vcpu is currently running
from the runstate changes, right?  Obviously xentrace_format couldn't
tell you that, but xenalyze should be able to, unless there were lost
trace records on the vcpu in question.

My modus operandi has been "try to keep trace volume from growing"
rather than "wait until trace volume is noticably an issue and reduce
it".  Presumably you've been doing a lot of tracing -- do you think I
should change my approach?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 11/24] tools: tracing: handle more scheduling related events.
  2016-08-17 17:18 ` [PATCH 11/24] tools: tracing: handle more scheduling related events Dario Faggioli
@ 2016-09-20 14:37   ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-20 14:37 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

On 17/08/16 18:18, Dario Faggioli wrote:
> There are some scheduling related trace records that
> are not being taken care of (and hence only dumped as
> raw records).
> 
> Some of them are being introduced in this series, while
> other were just neglected by previous patches.
> 
> Add support for them.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/xentrace/formats    |    8 ++++
>  tools/xentrace/xenalyze.c |  101 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 109 insertions(+)
> 
> diff --git a/tools/xentrace/formats b/tools/xentrace/formats
> index adff681..3488a06 100644
> --- a/tools/xentrace/formats
> +++ b/tools/xentrace/formats
> @@ -42,6 +42,10 @@
>  0x00022004  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:stolen_vcpu   [ dom:vcpu = 0x%(2)04x%(3)04x, from = %(1)d ]
>  0x00022005  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:picked_cpu    [ dom:vcpu = 0x%(1)04x%(2)04x, cpu = %(3)d ]
>  0x00022006  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:tickle        [ cpu = %(1)d ]
> +0x00022007  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:boost         [ dom:vcpu = 0x%(1)04x%(2)04x ]
> +0x00022008  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:unboost       [ dom:vcpu = 0x%(1)04x%(2)04x ]
> +0x00022009  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:schedule      [ cpu = %(1)d, tasklet_scheduled = %(2)d, was_idle = %(3)d ]
> +0x0002200A  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched:ratelimit     [ dom:vcpu = 0x%(1)04x%(2)04x, runtime = %(3)d ]
>  
>  0x00022201  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:tick
>  0x00022202  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_pos       [ dom:vcpu = 0x%(1)08x, pos = %(2)d]
> @@ -61,12 +65,16 @@
>  0x00022210  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:load_check     [ lrq_id[16]:orq_id[16] = 0x%(1)08x, delta = %(2)d ]
>  0x00022211  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:load_balance   [ l_bavgload = 0x%(2)08x%(1)08x, o_bavgload = 0x%(4)08x%(3)08x, lrq_id[16]:orq_id[16] = 0x%(5)08x ]
>  0x00022212  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:pick_cpu       [ b_avgload = 0x%(2)08x%(1)08x, dom:vcpu = 0x%(3)08x, rq_id[16]:new_cpu[16] = %(4)d ]
> +0x00022213  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:runq_candidate [ dom:vcpu = 0x%(1)08x, runq_pos = %(2)d tickled_cpu = %(3)d ]
> +0x00022214  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:schedule       [ rq:cpu = 0x%(1)08x, tasklet[8]:idle[8]:smt_idle[8]:tickled[8] = %(2)08x ]
> +0x00022215  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  csched2:ratelimit      [ dom:vcpu = 0x%(1)08x, runtime = %(2)d ]
>  
>  0x00022801  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:tickle        [ cpu = %(1)d ]
>  0x00022802  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:runq_pick     [ dom:vcpu = 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
>  0x00022803  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:burn_budget   [ dom:vcpu = 0x%(1)08x, cur_budget = 0x%(3)08x%(2)08x, delta = %(4)d ]
>  0x00022804  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:repl_budget   [ dom:vcpu = 0x%(1)08x, cur_deadline = 0x%(3)08x%(2)08x, cur_budget = 0x%(5)08x%(4)08x ]
>  0x00022805  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:sched_tasklet
> +0x00022806  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtds:schedule      [ cpu[16]:tasklet[8]:idle[4]:tickled[4] = %(1)08x ]
>  
>  0x00041001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  domain_create   [ dom = 0x%(1)08x ]
>  0x00041002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  domain_destroy  [ dom = 0x%(1)08x ]
> diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
> index 58a8d41..aaff1d9 100644
> --- a/tools/xentrace/xenalyze.c
> +++ b/tools/xentrace/xenalyze.c
> @@ -7590,6 +7590,50 @@ void sched_process(struct pcpu_info *p)
>                         ri->dump_header, r->cpu);
>              }
>              break;
> +        case TRC_SCHED_CLASS_EVT(CSCHED, 7): /* BOOST_START   */
> +            if(opt.dump_all) {
> +                struct {
> +                    unsigned int domid, vcpuid;
> +                } *r = (typeof(r))ri->d;
> +
> +                printf(" %s csched: d%uv%u boosted\n",
> +                       ri->dump_header, r->domid, r->vcpuid);
> +            }
> +            break;
> +        case TRC_SCHED_CLASS_EVT(CSCHED, 8): /* BOOST_END     */
> +            if(opt.dump_all) {
> +                struct {
> +                    unsigned int domid, vcpuid;
> +                } *r = (typeof(r))ri->d;
> +
> +                printf(" %s csched: d%uv%u unboosted\n",
> +                       ri->dump_header, r->domid, r->vcpuid);
> +            }
> +            break;
> +        case TRC_SCHED_CLASS_EVT(CSCHED, 9): /* SCHEDULE      */
> +            if(opt.dump_all) {
> +                struct {
> +                    unsigned int cpu, tasklet, idle;
> +                } *r = (typeof(r))ri->d;
> +
> +                printf(" %s csched:schedule cpu %u, %s%s\n",
> +                       ri->dump_header, r->cpu,
> +                       r->tasklet ? ", tasklet scheduled" : "",
> +                       r->idle ? ", idle" : ", busy");
> +            }
> +            break;
> +        case TRC_SCHED_CLASS_EVT(CSCHED, 10): /* RATELIMIT     */
> +            if(opt.dump_all) {
> +                struct {
> +                    unsigned int domid, vcpuid;
> +                    unsigned int runtime;
> +                } *r = (typeof(r))ri->d;
> +
> +                printf(" %s csched:ratelimit, d%uv%u run only %u.%uus\n",
> +                       ri->dump_header, r->domid, r->vcpuid,
> +                       r->runtime / 1000, r->runtime % 1000);
> +            }
> +            break;
>          /* CREDIT 2 (TRC_CSCHED2_xxx) */
>          case TRC_SCHED_CLASS_EVT(CSCHED2, 1): /* TICK              */
>          case TRC_SCHED_CLASS_EVT(CSCHED2, 4): /* CREDIT_ADD        */
> @@ -7779,6 +7823,50 @@ void sched_process(struct pcpu_info *p)
>                         ri->dump_header, r->domid, r->vcpuid, r->rqi, r->cpu);
>              }
>              break;
> +        case TRC_SCHED_CLASS_EVT(CSCHED2, 20): /* RUNQ_CANDIDATE   */
> +            if (opt.dump_all) {
> +                struct {
> +                    unsigned vcpuid:16, domid:16;
> +                    unsigned tickled_cpu, position;
> +                } *r = (typeof(r))ri->d;
> +
> +                printf(" %s csched2:runq_candidate d%uv%u, "
> +                       "pos in runq %u, ",
> +                       ri->dump_header, r->domid, r->vcpuid,
> +                       r->position);
> +                if (r->tickled_cpu == (unsigned)-1)
> +                    printf("no cpu was tickled");
> +                else
> +                    printf("cpu %u was tickled\n", r->tickled_cpu);
> +            }
> +            break;
> +        case TRC_SCHED_CLASS_EVT(CSCHED2, 21): /* SCHEDULE         */
> +            if (opt.dump_all) {
> +                struct {
> +                    unsigned cpu:16, rqi:16;
> +                    unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
> +                } *r = (typeof(r))ri->d;
> +
> +                printf(" %s csched2:schedule cpu %u, rq# %u%s%s%s%s\n",
> +                       ri->dump_header, r->cpu, r->rqi,
> +                       r->tasklet ? ", tasklet scheduled" : "",
> +                       r->idle ? ", idle" : ", busy",
> +                       r->idle ? (r->smt_idle ? ", SMT idle" : ", SMT busy") : "",
> +                       r->tickled ? ", tickled" : ", not tickled");
> +            }
> +            break;
> +        case TRC_SCHED_CLASS_EVT(CSCHED2, 22): /* RATELIMIT        */
> +            if (opt.dump_all) {
> +                struct {
> +                    unsigned int vcpuid:16, domid:16;
> +                    unsigned int runtime;
> +                } *r = (typeof(r))ri->d;
> +
> +                printf(" %s csched2:ratelimit, d%uv%u run only %u.%uus\n",
> +                       ri->dump_header, r->domid, r->vcpuid,
> +                       r->runtime / 1000, r->runtime % 1000);
> +            }
> +            break;
>          /* RTDS (TRC_RTDS_xxx) */
>          case TRC_SCHED_CLASS_EVT(RTDS, 1): /* TICKLE           */
>              if(opt.dump_all) {
> @@ -7831,6 +7919,19 @@ void sched_process(struct pcpu_info *p)
>              if(opt.dump_all)
>                  printf(" %s rtds:sched_tasklet\n", ri->dump_header);
>              break;
> +        case TRC_SCHED_CLASS_EVT(RTDS, 6): /* SCHEDULE         */
> +            if (opt.dump_all) {
> +                struct {
> +                    unsigned cpu:16, tasklet:8, idle:4, tickled:4;
> +                } __attribute__((packed)) *r = (typeof(r))ri->d;
> +
> +                printf(" %s rtds:schedule cpu %u, %s%s%s\n",
> +                       ri->dump_header, r->cpu,
> +                       r->tasklet ? ", tasklet scheduled" : "",
> +                       r->idle ? ", idle" : ", busy",
> +                       r->tickled ? ", tickled" : ", not tickled");
> +            }
> +            break;
>          default:
>              process_generic(ri);
>          }
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/24] xen: libxc: allow to set the ratelimit value online
  2016-08-17 17:18 ` [PATCH 12/24] xen: libxc: allow to set the ratelimit value online Dario Faggioli
@ 2016-09-20 14:43   ` George Dunlap
  2016-09-20 14:45     ` Wei Liu
  2016-09-28 15:44   ` George Dunlap
  1 sibling, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-20 14:43 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Wei Liu, Anshul Makkar, Ian Jackson, Jan Beulich

On 17/08/16 18:18, Dario Faggioli wrote:
> The main purpose of the patch is to provide the xen-libxc
> plumbing necessary to be able to change the value of the
> ratelimit_us parameter online, for Credit2 (like it is
> already for Credit1).
> 
> While there:
>  - mention in the Xen logs when rate limiting was enables
>    and is being disabled (and vice-versa);
>  - fix csched2_sys_cntl() which was always returning
>    -EINVAL in the XEN_SYSCTL_SCHEDOP_putinfo case.

How weird!

> 
> And also:
>  - fix style of an if in csched_sys_cntl();
>  - fix the style of the switch in csched2_sys_cntl();
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

Wei / Ian, I think this is relatively independent of other changes -- if
you give me an Ack for the tools side I can check this in.

  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/24] xen: libxc: allow to set the ratelimit value online
  2016-09-20 14:43   ` George Dunlap
@ 2016-09-20 14:45     ` Wei Liu
  0 siblings, 0 replies; 84+ messages in thread
From: Wei Liu @ 2016-09-20 14:45 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, George Dunlap, Dario Faggioli, Ian Jackson, Jan Beulich,
	xen-devel, Anshul Makkar

On Tue, Sep 20, 2016 at 03:43:57PM +0100, George Dunlap wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
> > The main purpose of the patch is to provide the xen-libxc
> > plumbing necessary to be able to change the value of the
> > ratelimit_us parameter online, for Credit2 (like it is
> > already for Credit1).
> > 
> > While there:
> >  - mention in the Xen logs when rate limiting was enables
> >    and is being disabled (and vice-versa);
> >  - fix csched2_sys_cntl() which was always returning
> >    -EINVAL in the XEN_SYSCTL_SCHEDOP_putinfo case.
> 
> How weird!
> 
> > 
> > And also:
> >  - fix style of an if in csched_sys_cntl();
> >  - fix the style of the switch in csched2_sys_cntl();
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
> 
> Wei / Ian, I think this is relatively independent of other changes -- if
> you give me an Ack for the tools side I can check this in.
> 

Here you go:

Acked-by: Wei Liu <wei.liu2@citrix.com>

>   -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers
  2016-08-17 17:19 ` [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers Dario Faggioli
@ 2016-09-20 15:10   ` Wei Liu
  0 siblings, 0 replies; 84+ messages in thread
From: Wei Liu @ 2016-09-20 15:10 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Ian Jackson, Wei Liu

On Wed, Aug 17, 2016 at 07:19:04PM +0200, Dario Faggioli wrote:
> In fact, libxc wrappers should, on error, set errno and
> return -1.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2
  2016-08-22  9:28   ` Ian Jackson
@ 2016-09-28 15:37     ` George Dunlap
  2016-09-30  1:03     ` Dario Faggioli
  1 sibling, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-28 15:37 UTC (permalink / raw)
  To: Ian Jackson, Dario Faggioli; +Cc: George Dunlap, xen-devel, Wei Liu

On 22/08/16 10:28, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2"):
> ...
>> -    rc = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
>> -    if ( rc < 0 ) {
>> -        LOGE(ERROR, "setting sched credit param");
>> -        GC_FREE;
>> -        return ERROR_FAIL;
>> +    r = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
>> +    if ( r < 0 ) {
>> +        LOGE(ERROR, "Setting Credit scheduler parameters");
>> +        rc = ERROR_FAIL;
>> +        goto out;
> 
> I had to read this three times to figure out what the change was.
> 
> It is good that you are fixing the coding style but can you please put
> it in a separate patch ?
> 
> In general I was surprised at how large this patch, and the
> corresponding xl one, was.  Perhaps after the coding style fixes are
> split off the functional patches will be smaller.
> 
> But I wonder whether there will still be lots of rather formulaic code
> that could profitably be generalised somehow.  I'd appreciate your
> views on whether that would be possible, and whether it would be a
> good idea..

FWIW, I find that the massive use of #defines to define entire functions
or large code blocks (particularly blocks which define variables or hide
code flowss) make it very difficult to tell what's going on.  When there
is risk of code getting out of sync, then it may well be worth the cost;
but in a case like this, when there is no inter-dependency, and little
shared functionality, it doesn't make sense to unify things more than
Dario already has (e.g., by making a function to check the parameter value).

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2
  2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
  2016-08-22  9:21   ` Ian Jackson
  2016-08-22  9:28   ` Ian Jackson
@ 2016-09-28 15:39   ` George Dunlap
  2 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-28 15:39 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

On 17/08/16 18:19, Dario Faggioli wrote:
> This is the remaining part of the plumbing (the libxl
> one) necessary to be able to change the value of the
> ratelimit_us parameter online, for Credit2 (like it is
> already for Credit1).
> 
> Note that, so far, we were rejecting (for Credit1) a
> new value of zero, despite it is a pretty nice way to
> ask for the rate limiting to be disabled, and the
> hypervisor is already capable of dealing with it in
> that way.
> 
> Therefore, we change things so that it is possible to
> do so, both for Credit1 and Credit2
> 
> While there, fix the error handling path (make it
> compliant with libxl's codying style) in Credit1
> rate limiting related functions.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

With the LIBXL_HAVE #define:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/24] xen: libxc: allow to set the ratelimit value online
  2016-08-17 17:18 ` [PATCH 12/24] xen: libxc: allow to set the ratelimit value online Dario Faggioli
  2016-09-20 14:43   ` George Dunlap
@ 2016-09-28 15:44   ` George Dunlap
  1 sibling, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-28 15:44 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: George Dunlap, Wei Liu, Anshul Makkar, Ian Jackson, Jan Beulich

On 17/08/16 18:18, Dario Faggioli wrote:
> The main purpose of the patch is to provide the xen-libxc
> plumbing necessary to be able to change the value of the
> ratelimit_us parameter online, for Credit2 (like it is
> already for Credit1).
> 
> While there:
>  - mention in the Xen logs when rate limiting was enables
>    and is being disabled (and vice-versa);
>  - fix csched2_sys_cntl() which was always returning
>    -EINVAL in the XEN_SYSCTL_SCHEDOP_putinfo case.
> 
> And also:
>  - fix style of an if in csched_sys_cntl();
>  - fix the style of the switch in csched2_sys_cntl();
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

BTW, I've applied both this and the following patch.

When I did a rebase of the series on top of this, this patch erroneously
double-applies the hunk in xc_csched2.c, causing the subsequent patch to
report a merge conflict.  So you may want to beware of that if you just
do a plain git rebase. :-)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 15/24] xl: allow to set the ratelimit value online for Credit2
  2016-08-17 17:19 ` [PATCH 15/24] xl: " Dario Faggioli
@ 2016-09-28 15:46   ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-28 15:46 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

On 17/08/16 18:19, Dario Faggioli wrote:
> Last part of the wiring necessary for allowing to
> change the value of the ratelimit_us parameter online,
> for Credit2 (like it is already for Credit1).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  docs/man/xl.pod.1.in      |    9 ++++
>  tools/libxl/xl_cmdimpl.c  |   91 +++++++++++++++++++++++++++++++++++++--------
>  tools/libxl/xl_cmdtable.c |    2 +
>  3 files changed, 86 insertions(+), 16 deletions(-)
> 
> diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
> index 1adf322..013591b 100644
> --- a/docs/man/xl.pod.1.in
> +++ b/docs/man/xl.pod.1.in
> @@ -1089,6 +1089,15 @@ to 65535 and the default is 256.
>  
>  Restrict output to domains in the specified cpupool.
>  
> +=item B<-s>, B<--schedparam>
> +
> +Specify to list or set pool-wide scheduler parameters.
> +
> +=item B<-r RLIMIT>, B<--ratelimit_us=RLIMIT>
> +
> +Attempts to limit the rate of context switching. It is basically the same
> +as B<--ratelimit_us> in B<sched-credit>
> +

This is a bit of a strange interface, but it follows suit with what the
credit1 command does, so:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

>  =back
>  
>  =item B<sched-rtds> [I<OPTIONS>]
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 7f961e3..5bdeda8 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -6452,8 +6452,29 @@ static int sched_credit_pool_output(uint32_t poolid)
>      return 0;
>  }
>  
> -static int sched_credit2_domain_output(
> -    int domid)
> +static int sched_credit2_params_set(int poolid,
> +                                    libxl_sched_credit2_params *scinfo)
> +{
> +    if (libxl_sched_credit2_params_set(ctx, poolid, scinfo)) {
> +        fprintf(stderr, "libxl_sched_credit2_params_set failed.\n");
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int sched_credit2_params_get(int poolid,
> +                                    libxl_sched_credit2_params *scinfo)
> +{
> +    if (libxl_sched_credit2_params_get(ctx, poolid, scinfo)) {
> +        fprintf(stderr, "libxl_sched_credit2_params_get failed.\n");
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int sched_credit2_domain_output(int domid)
>  {
>      char *domname;
>      libxl_domain_sched_params scinfo;
> @@ -6478,6 +6499,22 @@ static int sched_credit2_domain_output(
>      return 0;
>  }
>  
> +static int sched_credit2_pool_output(uint32_t poolid)
> +{
> +    libxl_sched_credit2_params scparam;
> +    char *poolname = libxl_cpupoolid_to_name(ctx, poolid);
> +
> +    if (sched_credit2_params_get(poolid, &scparam))
> +        printf("Cpupool %s: [sched params unavailable]\n", poolname);
> +    else
> +        printf("Cpupool %s: ratelimit=%dus\n",
> +               poolname, scparam.ratelimit_us);
> +
> +    free(poolname);
> +
> +    return 0;
> +}
> +
>  static int sched_rtds_domain_output(
>      int domid)
>  {
> @@ -6577,17 +6614,6 @@ static int sched_rtds_pool_output(uint32_t poolid)
>      return 0;
>  }
>  
> -static int sched_default_pool_output(uint32_t poolid)
> -{
> -    char *poolname;
> -
> -    poolname = libxl_cpupoolid_to_name(ctx, poolid);
> -    printf("Cpupool %s:\n",
> -           poolname);
> -    free(poolname);
> -    return 0;
> -}
> -
>  static int sched_domain_output(libxl_scheduler sched, int (*output)(int),
>                                 int (*pooloutput)(uint32_t), const char *cpupool)
>  {
> @@ -6833,17 +6859,22 @@ int main_sched_credit2(int argc, char **argv)
>  {
>      const char *dom = NULL;
>      const char *cpupool = NULL;
> +    int ratelimit = 0;
>      int weight = 256;
> +    bool opt_s = false;
> +    bool opt_r = false;
>      bool opt_w = false;
>      int opt, rc;
>      static struct option opts[] = {
>          {"domain", 1, 0, 'd'},
>          {"weight", 1, 0, 'w'},
> +        {"schedparam", 0, 0, 's'},
> +        {"ratelimit_us", 1, 0, 'r'},
>          {"cpupool", 1, 0, 'p'},
>          COMMON_LONG_OPTS
>      };
>  
> -    SWITCH_FOREACH_OPT(opt, "d:w:p:", opts, "sched-credit2", 0) {
> +    SWITCH_FOREACH_OPT(opt, "d:w:p:r:s", opts, "sched-credit2", 0) {
>      case 'd':
>          dom = optarg;
>          break;
> @@ -6851,6 +6882,13 @@ int main_sched_credit2(int argc, char **argv)
>          weight = strtol(optarg, NULL, 10);
>          opt_w = true;
>          break;
> +    case 's':
> +        opt_s = true;
> +        break;
> +    case 'r':
> +        ratelimit = strtol(optarg, NULL, 10);
> +        opt_r = true;
> +        break;
>      case 'p':
>          cpupool = optarg;
>          break;
> @@ -6866,10 +6904,31 @@ int main_sched_credit2(int argc, char **argv)
>          return EXIT_FAILURE;
>      }
>  
> -    if (!dom) { /* list all domain's credit scheduler info */
> +    if (opt_s) {
> +        libxl_sched_credit2_params scparam;
> +        uint32_t poolid = 0;
> +
> +        if (cpupool) {
> +            if (libxl_cpupool_qualifier_to_cpupoolid(ctx, cpupool,
> +                                                     &poolid, NULL) ||
> +                !libxl_cpupoolid_is_valid(ctx, poolid)) {
> +                fprintf(stderr, "unknown cpupool \'%s\'\n", cpupool);
> +                return EXIT_FAILURE;
> +            }
> +        }
> +
> +        if (!opt_r) { /* Output scheduling parameters */
> +            if (sched_credit2_pool_output(poolid))
> +                return EXIT_FAILURE;
> +        } else {      /* Set scheduling parameters (so far, just ratelimit) */
> +            scparam.ratelimit_us = ratelimit;
> +            if (sched_credit2_params_set(poolid, &scparam))
> +                return EXIT_FAILURE;
> +        }
> +    } else if (!dom) { /* list all domain's credit scheduler info */
>          if (sched_domain_output(LIBXL_SCHEDULER_CREDIT2,
>                                  sched_credit2_domain_output,
> -                                sched_default_pool_output,
> +                                sched_credit2_pool_output,
>                                  cpupool))
>              return EXIT_FAILURE;
>      } else {
> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> index 85c1e0f..a420415 100644
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -265,6 +265,8 @@ struct cmd_spec cmd_table[] = {
>        "[-d <Domain> [-w[=WEIGHT]]] [-p CPUPOOL]",
>        "-d DOMAIN, --domain=DOMAIN     Domain to modify\n"
>        "-w WEIGHT, --weight=WEIGHT     Weight (int)\n"
> +      "-s         --schedparam        Query / modify scheduler parameters\n"
> +      "-r RLIMIT, --ratelimit_us=RLIMIT Set the scheduling rate limit, in microseconds\n"
>        "-p CPUPOOL, --cpupool=CPUPOOL  Restrict output to CPUPOOL"
>      },
>      { "sched-rtds",
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c
  2016-08-17 17:19 ` [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
@ 2016-09-28 15:49   ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-28 15:49 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar, Justin T. Weaver

On 17/08/16 18:19, Dario Faggioli wrote:
> make it possible to use the various helpers from other
> schedulers, e.g., for implementing soft affinity within
> them.
> 
> Since we are touching the code, also make it start using
> variables called v for struct_vcpu*, as it is preferrable.
> 
> No functional change intended.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle()
  2016-08-17 17:19 ` [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
  2016-09-01 10:52   ` anshul makkar
@ 2016-09-28 20:44   ` George Dunlap
  1 sibling, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-28 20:44 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar, Justin T. Weaver

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]

On 17/08/16 18:19, Dario Faggioli wrote:
> This is done by means of the "usual" two steps loop:
>  - soft affinity balance step;
>  - hard affinity balance step.
> 
> The entire logic implemented in runq_tickle() is
> applied, during the first step, considering only the
> CPUs in the vcpu's soft affinity. In the second step,
> we fall back to use all the CPUs from its hard
> affinity (as it is doing now, without this patch).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c |  243 ++++++++++++++++++++++++++++----------------
>  1 file changed, 157 insertions(+), 86 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 0d83bd7..3aef1b4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -902,6 +902,42 @@ __runq_remove(struct csched2_vcpu *svc)
>      list_del_init(&svc->runq_elem);
>  }
>  
> +/*
> + * During the soft-affinity step, only actually preempt someone if
> + * he does not have soft-affinity with cpu (while we have).
> + *
> + * BEWARE that this uses cpumask_scratch, trowing away what's in there!
> + */
> +static inline bool_t soft_aff_check_preempt(unsigned int bs, unsigned int cpu)
> +{
> +    struct csched2_vcpu * cur = CSCHED2_VCPU(curr_on_cpu(cpu));
> +
> +    /*
> +     * If we're doing hard-affinity, always check whether to preempt cur.
> +     * If we're doing soft-affinity, but cur doesn't have one, check as well.
> +     */
> +    if ( bs == BALANCE_HARD_AFFINITY ||
> +         !has_soft_affinity(cur->vcpu, cur->vcpu->cpu_hard_affinity) )
> +        return 1;
> +
> +    /*
> +     * We're doing soft-affinity, and we know that the current vcpu on cpu
> +     * has a soft affinity. We now want to know whether cpu itself is in
> +     * such affinity. In fact, since we now that new (in runq_tickle()) is:

This is a bit confusing.  I think you mean, "We know that the vcpu we
want to place has soft affinity with the target cpu; now we want to know
whether the vcpu running on the target cpu has soft affinity with that
cpu or not."

> +     *  - if cpu is not in cur's soft-affinity, we should indeed check to
> +     *    see whether new should preempt cur. If that will be the case, that
> +     *    would be an improvement wrt respecting soft affinity;
> +     *  - if cpu is in cur's soft-affinity, we leave it alone and (in
> +     *    runq_tickle()) move on to another cpu. In fact, we don't want to
> +     *    be too harsh with someone which is running within its soft-affinity.
> +     *    This is safe because later, if we don't fine anyone else during the
> +     *    soft-affinity step, we will check cpu for preemption anyway, when
> +     *    doing hard-affinity.

But by doing this, isn't it actually more likely that we'll end up
somewhere outside our soft affinity, even though there are cpus inside
our soft affinity where we have higher credit?

It would be nice if we could pre-empt somebody outside their soft
affinity before pre-empting somebody inside their soft affinity.

It occurs to me -- in the normal case, the number of cpus involved here
should be lower than 8, and often lower than that.  Rather than loop
around twice, with potentially two "inner" loops, would it make sense
just to sweep through the hard affinity once, calculating a "score" that
factored in the different things we want to factor in, and then choosing
the highest score (if any)?

Something like the attached (compile-tested only)?

 -George



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-xen-credit2-soft-affinity-awareness-in-runq_tickle.patch --]
[-- Type: text/x-diff; name="0001-xen-credit2-soft-affinity-awareness-in-runq_tickle.patch", Size: 6835 bytes --]

From 93346e02da5def9c1bca502e0e47aa8be9b3f2a6 Mon Sep 17 00:00:00 2001
From: Dario Faggioli <dario.faggioli@citrix.com>
Date: Thu, 15 Sep 2016 12:35:05 +0100
Subject: [PATCH] xen: credit2: soft-affinity awareness in runq_tickle()

Rather than the usual two-step loop, after first checking for idlers,
we scan through each cpu in the runqueue and find a "score" for the
utility of tickling each cpu.

FIXME - needs filling out. :-)

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
 xen/common/sched_credit2.c | 126 ++++++++++++++++++++++++++++++---------------
 1 file changed, 84 insertions(+), 42 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 0d83bd7..36acf82 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -904,6 +904,64 @@ __runq_remove(struct csched2_vcpu *svc)
 
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
 
+/* 
+ * Score to preempt the target cpu.  Return a negative number if the
+ * credit isn't high enough; if it is, favor preemptions in this
+ * order:
+ * - cpu is in new's soft affinity, not in cur's soft affinity
+ * - cpu is in new's soft affinity and cur's soft affinity
+ * - cpu is not in new's soft affinity
+ * - Within the same class, the highest difference of credit
+ */
+static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now,
+                             struct csched2_vcpu *new, unsigned int cpu)
+{
+    struct csched2_vcpu * cur;
+    s_time_t score;
+    
+    cur = CSCHED2_VCPU(curr_on_cpu(cpu));
+
+    burn_credits(rqd, cur, now);
+
+    score = new->credit - cur->credit;
+
+    if ( new->vcpu->processor != cpu )
+        score -= CSCHED2_MIGRATE_RESIST;
+
+    /* 
+     * At this point, if cur->credit + RESISTANCE >= new->credit,
+     * score will be negative (or zero), which means default -1 is
+     * still higher.  
+     *
+     * Otherwise, add bonuses for soft affinities.
+     */
+    
+    if ( score > 0 && cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+    {
+        score += CSCHED2_CREDIT_INIT;
+        if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) )
+            score += CSCHED2_CREDIT_INIT;
+    }
+
+    if ( unlikely(tb_init_done) )
+    {
+        struct {
+            unsigned vcpu:16, dom:16;
+            unsigned cpu, credit, score;
+        } d;
+        d.dom = cur->vcpu->domain->domain_id;
+        d.vcpu = cur->vcpu->vcpu_id;
+        d.credit = cur->credit;
+        d.score = score;
+        d.cpu = cpu;
+        __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
+    }
+        
+    return score;
+}
+
 /*
  * Check what processor it is best to 'wake', for picking up a vcpu that has
  * just been put (back) in the runqueue. Logic is as follows:
@@ -924,11 +982,10 @@ static void
 runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
 {
     int i, ipid = -1;
-    s_time_t lowest = (1<<30);
+    s_time_t max = 0;
     unsigned int cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
     cpumask_t mask;
-    struct csched2_vcpu * cur;
 
     ASSERT(new->rqd == rqd);
 
@@ -959,7 +1016,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
     else
         cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, new->vcpu->cpu_soft_affinity);
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -974,7 +1031,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
      * gone through the scheduler yet.
      */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, new->vcpu->cpu_soft_affinity);
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -993,63 +1050,48 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
     cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
     if ( cpumask_test_cpu(cpu, &mask) )
     {
-        cur = CSCHED2_VCPU(curr_on_cpu(cpu));
-        burn_credits(rqd, cur, now);
+        s_time_t score = tickle_score(rqd, now, new, cpu);
 
-        if ( cur->credit < new->credit )
+        if ( score > max )
         {
-            SCHED_STAT_CRANK(tickled_busy_cpu);
+            max = score;
             ipid = cpu;
-            goto tickle;
+            
+            /* If this is in the vcpu's soft affinity, just take it */
+            if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+                goto tickle;
         }
     }
 
     for_each_cpu(i, &mask)
     {
+        s_time_t score;
+        
         /* Already looked at this one above */
         if ( i == cpu )
             continue;
 
-        cur = CSCHED2_VCPU(curr_on_cpu(i));
-
-        ASSERT(!is_idle_vcpu(cur->vcpu));
-
-        /* Update credits for current to see if we want to preempt. */
-        burn_credits(rqd, cur, now);
-
-        if ( cur->credit < lowest )
-        {
-            ipid = i;
-            lowest = cur->credit;
-        }
+        /* 
+         * This will factor in both our soft affinity and the soft
+         * affinity of the vcpu currently running on i.
+         */
+        score = tickle_score(rqd, now, new, i);
 
-        if ( unlikely(tb_init_done) )
+        if ( score > max )
         {
-            struct {
-                unsigned vcpu:16, dom:16;
-                unsigned cpu, credit;
-            } d;
-            d.dom = cur->vcpu->domain->domain_id;
-            d.vcpu = cur->vcpu->vcpu_id;
-            d.credit = cur->credit;
-            d.cpu = i;
-            __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
+            max = score;
+            ipid = cpu;
         }
     }
-
-    /*
-     * Only switch to another processor if the credit difference is
-     * greater than the migrate resistance.
-     */
-    if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit )
+        
+    if ( ipid != -1 )
     {
-        SCHED_STAT_CRANK(tickled_no_cpu);
-        return;
+        SCHED_STAT_CRANK(tickled_busy_cpu);
+        goto tickle;
     }
 
-    SCHED_STAT_CRANK(tickled_busy_cpu);
+    SCHED_STAT_CRANK(tickled_no_cpu);
+    return;
  tickle:
     BUG_ON(ipid == -1);
 
-- 
2.1.4


[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()
  2016-08-17 17:19 ` [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick() Dario Faggioli
  2016-09-01 11:08   ` anshul makkar
@ 2016-09-29 11:11   ` George Dunlap
  1 sibling, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-29 11:11 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar, Justin T. Weaver

On 17/08/16 18:19, Dario Faggioli wrote:
> For get_fallback_cpu(), by putting in place the "usual"
> two steps (soft affinity step and hard affinity step)
> loop. We just move the core logic of the function inside
> the body of the loop itself.
> 
> For csched2_cpu_pick(), what is important is to find
> the runqueue with the least average load. Currently,
> we do that by looping on all runqueues and checking,
> well, their load. For soft affinity, we want to know
> which one is the runqueue with the least load, among
> the ones where the vcpu would prefer to be assigned.
> 
> We find both the least loaded runqueue among the soft
> affinity "friendly" ones, and the overall least loaded
> one, in the same pass.
> 
> (Also, kill a spurious ';' when defining MAX_LOAD.)
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>

Looks good:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-09-13 11:13       ` George Dunlap
@ 2016-09-29 15:24         ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-29 15:24 UTC (permalink / raw)
  To: George Dunlap, anshul makkar, xen-devel; +Cc: Andrew Cooper, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2597 bytes --]

On Tue, 2016-09-13 at 12:13 +0100, George Dunlap wrote:
> On 05/09/16 14:47, Dario Faggioli wrote:
> > On Wed, 2016-08-31 at 18:10 +0100, anshul makkar wrote:
> > > > @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct
> > > > scheduler
> > > > *ops, struct vcpu *vc, void *dd)
> > > >       else
> > > >       {
> > > >           ASSERT(svc->sdom == NULL);
> > > > +        svc->tickled_cpu = svc->vcpu->vcpu_id;
> > > If I understood correctly, tickled_cpu refers to pcpu and not a
> > > vcpu. 
> > > Saving vcpu_id in tickled_cpu looks wrong.
> > > 
> > Yes, and in fact, as you can see in the previous hunk, for pretty
> > much
> > all vcpus, tickled_cpu is initialized to -1.
> > 
> > Here, we are dealing with the vcpus of the idle domain. And for
> > vcpus
> > of the idle domain, their vcpu id is the same as the id of the pcpu
> > they're associated to.
> 
> But what I haven't sussed out yet is why we need to initialize this
> for
> the idle domain at all.  What benefit does it give you, and what
> effect
> does it have?
> 
It makes things more consistent and uniform, one effect being that it
is easier to manage a performance counter, for recording the number of
time this 'direct tickling' mechanism has been overridden.

In fact, I've tried it and, AFAICR, doing this was looking worse, when
I was not initializing the field for idle vcpus.

static struct csched2_vcpu *
runq_candidate(struct csched2_runqueue_data *rqd,
               struct csched2_vcpu *scurr,
               int cpu, s_time_t now,
               unsigned int *pos)
{
    ...
    [1]
    if ( unlikely(snext->tickled_cpu != -1 && snext->tickled_cpu != cpu) )
        SCHED_STAT_CRANK(tickled_cpu_overridden);

    return snext;
}

In fact, it is possible to reach [1] with snext being the idle vcpu, in
which case, if tickled_cpu is 0 for all of them (which would be the
case, I think, if I don't init it) the counter will be incremented in a
not really predictable way.

That being said, initializing to -1 should work, and it's easier to
read and understand (as I won't be special casing idle vcpus). So I'd
go for it (and test it, of course).. what do you think?

Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/24] xen: credit2: implement yield()
  2016-09-13 13:33   ` George Dunlap
@ 2016-09-29 16:05     ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-29 16:05 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: George Dunlap, Andrew Cooper, Anshul Makkar, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 5463 bytes --]

On Tue, 2016-09-13 at 14:33 +0100, George Dunlap wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
> > Alternatively, we can actually _subtract_ some credits to a
> > yielding vcpu.
> > That will sort of make the effect of a call to yield last in time.
> 
> But normally we want the yield to be temporary, right?  The kinds of
> places it typically gets called is when the vcpu is waiting for a
> spinlock held by another (probably pre-empted) vcpu.  Doing a
> permanent
> credit subtraction will bias the credit algorithm against cpus that
> have
> a high amount of spinlock contention (since probably all the vcpus
> will
> be calling yield pretty regularly)
> 
Yes, indeed. Good point, actually. However, one can also think of a
scenario where:
 - A yields, and is descheduled in favour of B, as a consequence of
   that
 - B runs for just a little while and blocks
 - C and A are in runqueue, and A, without counting the idle bias, has 
   more credit than C. So A will be picked up again, even if it 
   yielded very recently, and it may still be in the spinlock wait (or 
   whatever place that is yielding in a tight loop)

Well, in this case, A will yield again, and C will be picked, i.e.,
what would have happened in the first place, if we subtracted credits
to A. (I.e., functionally, this would work the same way, but with more
overhead.)

So, again, can this happen? How frequently, both in absolute and
relative terms? Very hard to tell! So, really...
> 
> Yes, this is simple and should be effective for now.  We can look at
> improving it later.
> 
...glad you also think this. Let's go for it. :-)

> > diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-
> > command-line.markdown
> > @@ -1389,6 +1389,16 @@ Choose the default scheduler.
> >  ### sched\_credit2\_migrate\_resist
> >  > `= <integer>`
> >  
> > +### sched\_credit2\_yield\_bias
> > +> `= <integer>`
> > +
> > +> Default: `1000`
> > +
> > +Set how much a yielding vcpu will be penalized, in order to
> > actually
> > +give a chance to run to some other vcpu. This is basically a bias,
> > in
> > +favour of the non-yielding vcpus, expressed in microseconds
> > (default
> > +is 1ms).
> 
> Probably add _us to the end to indicate that the number is in
> microseconds.
> 
Good idea, although right now we have "sched_credit2_migrate_resist"
which does not have the suffixe.

Still, I'm doing as you suggest because I like it better, and we'll fix
"migrate_resist" later, if we want consistency.

> > @@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data
> > *rqd,
> >      struct list_head *iter;
> >      struct csched2_vcpu *snext = NULL;
> >      struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler,
> > cpu));
> > +    int yield_bias = 0;
> >  
> >      /* Default to current if runnable, idle otherwise */
> >      if ( vcpu_runnable(scurr->vcpu) )
> > +    {
> > +        /*
> > +         * The way we actually take yields into account is like
> > this:
> > +         * if scurr is yielding, when comparing its credits with
> > other
> > +         * vcpus in the runqueue, act like those other vcpus had
> > yield_bias
> > +         * more credits.
> > +         */
> > +        if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) )
> > +            yield_bias = CSCHED2_YIELD_BIAS;
> > +
> >          snext = scurr;
> > +    }
> >      else
> >          snext = CSCHED2_VCPU(idle_vcpu[cpu]);
> >  
> > @@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data
> > *rqd,
> >      list_for_each( iter, &rqd->runq )
> >      {
> >          struct csched2_vcpu * svc = list_entry(iter, struct
> > csched2_vcpu, runq_elem);
> > +        int svc_credit = svc->credit + yield_bias;
> 
> Just curious, why did you decide to add yield_bias to everyone else,
> rather than just subtracting it from snext->credit?
> 
I honestly don't recall. :-)

It indeed feels more natural to subtract to next. I've done it that way
now, let me give it a test spin and resend...

> > @@ -2918,6 +2957,14 @@ csched2_init(struct scheduler *ops)
> >      printk(XENLOG_INFO "load tracking window lenght %llu ns\n",
> >             1ULL << opt_load_window_shift);
> >  
> > +    if ( opt_yield_bias < CSCHED2_YIELD_BIAS_MIN )
> > +    {
> > +        printk("WARNING: %s: opt_yield_bias %d too small,
> > resetting\n",
> > +               __func__, opt_yield_bias);
> > +        opt_yield_bias = 1000; /* 1 ms */
> > +    }
> 
> Why do we need a minimum bias?  And why reset it to 1ms rather than
> SCHED2_YIELD_BIAS_MIN?
> 
You know what, I don't think we need that. I probably was thinking that
we may always want to force yield to have _some_ effect, but there may
be (or may will be) someone who just want to disable it at all... And
in that case, this check will be in his way.

I'll kill it.

Thanks and regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields
  2016-09-20 13:32   ` George Dunlap
@ 2016-09-29 16:46     ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-29 16:46 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Anshul Makkar


[-- Attachment #1.1: Type: text/plain, Size: 2660 bytes --]

On Tue, 2016-09-20 at 14:32 +0100, George Dunlap wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
> > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> > @@ -1771,9 +1771,18 @@ csched_schedule(
> >       *   cpu and steal it.
> >       */
> >  
> > -    /* If we have schedule rate limiting enabled, check to see
> > -     * how long we've run for. */
> > -    if ( !tasklet_work_scheduled
> > +    /*
> > +     * If we have schedule rate limiting enabled, check to see
> > +     * how long we've run for.
> > +     *
> > +     * If scurr is yielding, however, we don't let rate limiting
> > kick in.
> > +     * In fact, it may be the case that scurr is about to spin,
> > and there's
> > +     * no point forcing it to do so until rate limiting expires.
> > +     *
> > +     * While there, take the chance for clearing the yield flag at
> > once.
> > +     */
> > +    if ( !test_and_clear_bit(CSCHED_FLAG_VCPU_YIELD, &scurr-
> > >flags)
> 
> It looks like YIELD is implemented by putting it lower in the
> runqueue
> in __runqueue_insert().  But here you're clearing the flag before the
> insert happens -- won't this effectively disable yield() for credit1?
> 
Yes, I think you're right... I'm not sure how I thought it would work.
:-O

Thanks for pointing this out, will fix in v2.

> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > index 569174b..c8e0ee7 100644
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -2267,36 +2267,40 @@ runq_candidate(struct csched2_runqueue_data
> > *rqd,
> > [...]
> This looks good, but the code re-organization probably goes better in
> the previous patch.  Since you're re-sending anyway, would you mind
> moving it there?
> 
> I'm not sure the credit2 yield-ratelimit needs to be in a separate
> patch; since you're implementing yield in credit2 from scratch you
> could
> just implement it all in one go.  But since you have a patch for
> credit1
> anyway, I think whichever way is fine.
> 
Ok, yes, I'm moving all the Credit2 bits from this patch to the one
that is actually implementing yield in Credit2 from scratch, and
leaving this as the patch that makes Credit1 stop ratelimiting upon
yield.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records
  2016-09-20 14:35   ` George Dunlap
@ 2016-09-29 17:23     ` Dario Faggioli
  2016-09-29 17:28       ` George Dunlap
  0 siblings, 1 reply; 84+ messages in thread
From: Dario Faggioli @ 2016-09-29 17:23 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 2346 bytes --]

On Tue, 2016-09-20 at 15:35 +0100, George Dunlap wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
> > 
> > In both Credit2's trace records relative to checking
> > whether we want to preempt a vcpu (in runq_tickle()),
> > and to credits being burn, make it explicit on which
> > pcpu the vcpu being considered is running.
> > 
> > Such information isn't currently available, not even
> > by looking at on which pcpu the events happen, as we
> > do both the above operation from a certain pcpu on
> > vcpus running on different pcpus.
> 
> But you should be able to tell where a given vcpu is currently
> running
> from the runstate changes, right?  Obviously xentrace_format couldn't
> tell you that, but xenalyze should be able to, unless there were lost
> trace records on the vcpu in question.
> 
Well, yes and no. For instance, burn_credits() is not only called from
csched_schedule(), where indeed we have the information in close
records. It's also called from inside runq_tickle() itself (as we want
to update the credits of the various vcpus we are considering
preempting), which in turns can be called during vcpu wakeup.

In that case, knowing where a certain vcpu that we're asking to burn
its credit is running, may mean going quite a bit up in the trace, to
find its last context switch/runstate change, which is not always the
easiest thing to do.

It indeed can be scripted, but when _looking_ at a trace, trying to
figure out why you're observing this or that weird behavior, I think
knowing v->processor is a useful information.

> My modus operandi has been "try to keep trace volume from growing"
> rather than "wait until trace volume is noticably an issue and reduce
> it".  Presumably you've been doing a lot of tracing -- do you think I
> should change my approach?
> 
No, I think the approach is a good one. It's just that, in this case, I
think this is useful information, so I'll keep the patch in v2. But if
you're not sure, just ignore it, and we can sort this at another time.
:-)

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records
  2016-09-29 17:23     ` Dario Faggioli
@ 2016-09-29 17:28       ` George Dunlap
  2016-09-29 20:53         ` Dario Faggioli
  0 siblings, 1 reply; 84+ messages in thread
From: George Dunlap @ 2016-09-29 17:28 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson

On 29/09/16 18:23, Dario Faggioli wrote:
> On Tue, 2016-09-20 at 15:35 +0100, George Dunlap wrote:
>> On 17/08/16 18:18, Dario Faggioli wrote:
>>>
>>> In both Credit2's trace records relative to checking
>>> whether we want to preempt a vcpu (in runq_tickle()),
>>> and to credits being burn, make it explicit on which
>>> pcpu the vcpu being considered is running.
>>>
>>> Such information isn't currently available, not even
>>> by looking at on which pcpu the events happen, as we
>>> do both the above operation from a certain pcpu on
>>> vcpus running on different pcpus.
>>
>> But you should be able to tell where a given vcpu is currently
>> running
>> from the runstate changes, right?  Obviously xentrace_format couldn't
>> tell you that, but xenalyze should be able to, unless there were lost
>> trace records on the vcpu in question.
>>
> Well, yes and no. For instance, burn_credits() is not only called from
> csched_schedule(), where indeed we have the information in close
> records. It's also called from inside runq_tickle() itself (as we want
> to update the credits of the various vcpus we are considering
> preempting), which in turns can be called during vcpu wakeup.
> 
> In that case, knowing where a certain vcpu that we're asking to burn
> its credit is running, may mean going quite a bit up in the trace, to
> find its last context switch/runstate change, which is not always the
> easiest thing to do.
> 
> It indeed can be scripted, but when _looking_ at a trace, trying to
> figure out why you're observing this or that weird behavior, I think
> knowing v->processor is a useful information.

But if you're using xenalyze, xenalyze will know where the vcpu is
running / was last run; couldn't you have xenalyze report that
information when dumping the burn_credits record?

Again, I'm just pushing back to make sure the additional trace volume is
actually necessary. :-)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records
  2016-09-29 17:28       ` George Dunlap
@ 2016-09-29 20:53         ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-29 20:53 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Wei Liu, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1588 bytes --]

On Thu, 2016-09-29 at 18:28 +0100, George Dunlap wrote:
> On 29/09/16 18:23, Dario Faggioli wrote:
> > In that case, knowing where a certain vcpu that we're asking to
> > burn
> > its credit is running, may mean going quite a bit up in the trace,
> > to
> > find its last context switch/runstate change, which is not always
> > the
> > easiest thing to do.
> > 
> > It indeed can be scripted, but when _looking_ at a trace, trying to
> > figure out why you're observing this or that weird behavior, I
> > think
> > knowing v->processor is a useful information.
> 
> But if you're using xenalyze, xenalyze will know where the vcpu is
> running / was last run; couldn't you have xenalyze report that
> information when dumping the burn_credits record?
> 
Yes, this is indeed a possibility.

Xenalyze is not doing anything like or similar to that for now (at
least, not in dump mode), that's probably why it did not occur to me
that this could be done.

But yeah, we've already discussed that it can become more intelligent
and do more complex things and more refined reports, and this can well
fit into that.

> Again, I'm just pushing back to make sure the additional trace volume
> is
> actually necessary. :-)
> 
Sure, that's fine! Ok, let's drop this patch for now then. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2
  2016-08-22  9:28   ` Ian Jackson
  2016-09-28 15:37     ` George Dunlap
@ 2016-09-30  1:03     ` Dario Faggioli
  1 sibling, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-30  1:03 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, Wei Liu


[-- Attachment #1.1: Type: text/plain, Size: 1997 bytes --]

On Mon, 2016-08-22 at 10:28 +0100, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 14/24] libxl: allow to set the
> ratelimit value online for Credit2"):
> ...
> > 
> > -    rc = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
> > -    if ( rc < 0 ) {
> > -        LOGE(ERROR, "setting sched credit param");
> > -        GC_FREE;
> > -        return ERROR_FAIL;
> > +    r = xc_sched_credit_params_set(ctx->xch, poolid, &sparam);
> > +    if ( r < 0 ) {
> > +        LOGE(ERROR, "Setting Credit scheduler parameters");
> > +        rc = ERROR_FAIL;
> > +        goto out;
> 
> I had to read this three times to figure out what the change was.
> 
> It is good that you are fixing the coding style but can you please
> put
> it in a separate patch ?
> 
Done in v2.

> But I wonder whether there will still be lots of rather formulaic
> code
> that could profitably be generalised somehow.  I'd appreciate your
> views on whether that would be possible, and whether it would be a
> good idea..
> 
I've checked, as promised. TBH, as George said already, I don't see
much more room for factoring or generalizing. Certainly, not in libxl.

In xl (that would be next patch, though), I especially dislike
those main_sched_credit(), main_sched_credit2(), etc., but I think
that, given how the interface looks like, there is again few that we
can do (there's some level of generalization and indirection already,
actually).

So, again, apart from splitting coding style and functional changes, I
don't see other ways for improving this patch. If you have more
concrete ides, please share them. :-)

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
  2016-09-13 11:28   ` George Dunlap
@ 2016-09-30  2:22     ` Dario Faggioli
  0 siblings, 0 replies; 84+ messages in thread
From: Dario Faggioli @ 2016-09-30  2:22 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Andrew Cooper, Anshul Makkar, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 4698 bytes --]

On Tue, 2016-09-13 at 12:28 +0100, George Dunlap wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
> > 
> diff --git a/xen/common/sched_credit2.c
> > @@ -2233,7 +2241,8 @@ void __dump_execstate(void *unused);
> >  static struct csched2_vcpu *
> >  runq_candidate(struct csched2_runqueue_data *rqd,
> >                 struct csched2_vcpu *scurr,
> > -               int cpu, s_time_t now)
> > +               int cpu, s_time_t now,
> > +               unsigned int *pos)
> 
> I think I'd prefer if this were called "skipped" or something like
> that
> -- to indicate how many vcpus in the runqueue had been skipped before
> coming to this one.
> 
Done.

> > @@ -2298,6 +2340,7 @@ csched2_schedule(
> >      struct csched2_runqueue_data *rqd;
> >      struct csched2_vcpu * const scurr = CSCHED2_VCPU(current);
> >      struct csched2_vcpu *snext = NULL;
> > +    unsigned int snext_pos = 0;
> >      struct task_slice ret;
> >  
> >      SCHED_STAT_CRANK(schedule);
> > @@ -2347,7 +2390,7 @@ csched2_schedule(
> >          snext = CSCHED2_VCPU(idle_vcpu[cpu]);
> >      }
> >      else
> > -        snext=runq_candidate(rqd, scurr, cpu, now);
> > +        snext = runq_candidate(rqd, scurr, cpu, now, &snext_pos);
> >  
> >      /* If switching from a non-idle runnable vcpu, put it
> >       * back on the runqueue. */
> > @@ -2371,8 +2414,21 @@ csched2_schedule(
> >              __set_bit(__CSFLAG_scheduled, &snext->flags);
> >          }
> >  
> > -        /* Check for the reset condition */
> > -        if ( snext->credit <= CSCHED2_CREDIT_RESET )
> > +        /*
> > +         * The reset condition is "has a scheduler epoch come to
> > an end?".
> > +         * The way this is enforced is checking whether the vcpu
> > at the top
> > +         * of the runqueue has negative credits. This means the
> > epochs have
> > +         * variable lenght, as in one epoch expores when:
> > +         *  1) the vcpu at the top of the runqueue has executed
> > for
> > +         *     around 10 ms (with default parameters);
> > +         *  2) no other vcpu with higher credits wants to run.
> > +         *
> > +         * Here, where we want to check for reset, we need to make
> > sure the
> > +         * proper vcpu is being used. In fact,
> > runqueue_candidate() may have
> > +         * not returned the first vcpu in the runqueue, for
> > various reasons
> > +         * (e.g., affinity). Only trigger a reset when it does.
> > +         */
> > +        if ( snext_pos == 0 && snext->credit <=
> > CSCHED2_CREDIT_RESET )
> 
> This bit wasn't mentioned in the description. :-)
> 
You're right. Actually, I think this change deserves to be in its own
patch, so in v2 I'm splitting this patch in two.

> There's a certain amount of sense to the idea here, but it's the kind
> of
> thing that may have strange side effects.  Did you look at traces
> before
> and after this change?  And does the behavior seem more rational?
> 
I have. It's not like it was happening a lot of times that we were
resetting upon the wrong vcpus, but I indeed have caught a couple of
examples.

And yes, the trace looked more 'regular' with this patch. Or, IOW,
without this patch, there were some of the reset events that were
suspiciously closer between each other.

TBH, in the vast majority of the cases, even when a "spurious reset"
was involved, the difference was rather hard to tell, but please,
consider that the combination of hard-affinity, this patch and soft-
affinity will potentially make things much worse (and in fact, I saw
the most severe occurrences when using hard-affinity).

It's also rather hard to measure the effect, but I think what is
implemented here is the right thing to do. And even if it may be hard
to measure the performance impact, I claim that this is a 'correctness'
issue, or at least a matter of adhering as much as possible to the
algorithm theory and idea.

> If so, I'm happy to trust your judgement -- just want to check to
> make
> sure. :-)
> 
Ah, thanks. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER
  2016-08-17 17:20 ` [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER Dario Faggioli
@ 2016-09-30 15:30   ` George Dunlap
  0 siblings, 0 replies; 84+ messages in thread
From: George Dunlap @ 2016-09-30 15:30 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Anshul Makkar

On 17/08/16 18:20, Dario Faggioli wrote:
> Credit2 is already event based, rather than tick
> based. This means, the time at which the (i+1)-eth
> scheduling decision needs to happen is computed
> during the i-eth scheduling decision, and a timer
> is set accordingly.
> 
> If there's nothing imminent (or, the most imminent
> event is really really really far away), it is
> ok to say "well, let's double-check things in
> a little bit anyway", but such 'little bit' does
> not need to be too little, as, most likely, it's
> just pure overhead.
> 
> The current period, for this "safety catch"-alike
> timer is 2ms, which indeed is high, but it can
> well be higher. In fact, benchmarks show that
> setting it to 10ms --combined with other
> optimizations-- does actually improve performance.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

We might as well toss this one in:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2016-09-30 15:30 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
2016-09-12 15:01   ` George Dunlap
2016-08-17 17:17 ` [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1 Dario Faggioli
2016-08-17 23:42   ` Dario Faggioli
2016-09-12 15:04     ` George Dunlap
2016-08-17 17:17 ` [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice Dario Faggioli
2016-09-12 15:14   ` George Dunlap
2016-09-12 17:00     ` Dario Faggioli
2016-09-14  9:34       ` George Dunlap
2016-09-14 13:54         ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu Dario Faggioli
2016-09-12 17:11   ` George Dunlap
2016-08-17 17:18 ` [PATCH 05/24] xen: credit2: make tickling more deterministic Dario Faggioli
2016-08-31 17:10   ` anshul makkar
2016-09-05 13:47     ` Dario Faggioli
2016-09-07 12:25       ` anshul makkar
2016-09-13 11:13       ` George Dunlap
2016-09-29 15:24         ` Dario Faggioli
2016-09-13 11:28   ` George Dunlap
2016-09-30  2:22     ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 06/24] xen: credit2: implement yield() Dario Faggioli
2016-09-13 13:33   ` George Dunlap
2016-09-29 16:05     ` Dario Faggioli
2016-09-20 13:25   ` George Dunlap
2016-09-20 13:37     ` George Dunlap
2016-08-17 17:18 ` [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields Dario Faggioli
2016-09-20 13:32   ` George Dunlap
2016-09-29 16:46     ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting Dario Faggioli
2016-08-18  0:57   ` Meng Xu
2016-08-18  9:41     ` Dario Faggioli
2016-09-20 13:50   ` George Dunlap
2016-08-17 17:18 ` [PATCH 09/24] xen/tools: tracing: improve tracing of context switches Dario Faggioli
2016-09-20 14:08   ` George Dunlap
2016-08-17 17:18 ` [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records Dario Faggioli
2016-09-20 14:35   ` George Dunlap
2016-09-29 17:23     ` Dario Faggioli
2016-09-29 17:28       ` George Dunlap
2016-09-29 20:53         ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 11/24] tools: tracing: handle more scheduling related events Dario Faggioli
2016-09-20 14:37   ` George Dunlap
2016-08-17 17:18 ` [PATCH 12/24] xen: libxc: allow to set the ratelimit value online Dario Faggioli
2016-09-20 14:43   ` George Dunlap
2016-09-20 14:45     ` Wei Liu
2016-09-28 15:44   ` George Dunlap
2016-08-17 17:19 ` [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers Dario Faggioli
2016-09-20 15:10   ` Wei Liu
2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
2016-08-22  9:21   ` Ian Jackson
2016-09-05 14:02     ` Dario Faggioli
2016-08-22  9:28   ` Ian Jackson
2016-09-28 15:37     ` George Dunlap
2016-09-30  1:03     ` Dario Faggioli
2016-09-28 15:39   ` George Dunlap
2016-08-17 17:19 ` [PATCH 15/24] xl: " Dario Faggioli
2016-09-28 15:46   ` George Dunlap
2016-08-17 17:19 ` [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
2016-09-28 15:49   ` George Dunlap
2016-08-17 17:19 ` [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
2016-09-01 10:52   ` anshul makkar
2016-09-05 14:55     ` Dario Faggioli
2016-09-07 13:24       ` anshul makkar
2016-09-07 13:31         ` Dario Faggioli
2016-09-28 20:44   ` George Dunlap
2016-08-17 17:19 ` [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick() Dario Faggioli
2016-09-01 11:08   ` anshul makkar
2016-09-05 13:26     ` Dario Faggioli
2016-09-07 12:52       ` anshul makkar
2016-09-29 11:11   ` George Dunlap
2016-08-17 17:19 ` [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing Dario Faggioli
2016-09-02 11:46   ` anshul makkar
2016-09-05 12:49     ` Dario Faggioli
2016-08-17 17:19 ` [PATCH 20/24] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
2016-08-17 17:20 ` [PATCH 21/24] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
2016-08-17 17:20 ` [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER Dario Faggioli
2016-09-30 15:30   ` George Dunlap
2016-08-17 17:20 ` [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit Dario Faggioli
2016-09-02 12:38   ` anshul makkar
2016-09-05 12:52     ` Dario Faggioli
2016-08-17 17:20 ` [PATCH 24/24] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
2016-08-18  0:11 ` [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
2016-08-18 11:49 ` Dario Faggioli
2016-08-18 11:53 ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.