All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue
@ 2020-05-28 21:29 Dario Faggioli
  2020-05-28 21:29 ` [PATCH v2 1/7] xen: credit2: factor cpu to runqueue matching in a function Dario Faggioli
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Andrew Cooper,
	George Dunlap, Jan Beulich

Hello!

Here's v2 of this series... a bit late, but technically still in time
for code-freeze, although I understand this is quite tight! :-P

Anyway, In Credit2, the CPUs are assigned to runqueues according to the
host topology. For instance, if we want per-socket runqueues (which is
the default), the CPUs that are in the same socket will end up in the
same runqueue.

This is generally good for scalability, at least until the number of
CPUs that end up in the same runqueue is not too high. In fact, all this
CPUs will compete for the same spinlock, for making scheduling decisions
and manipulating the scheduler data structures. Therefore, if they are
too many, that can become a bottleneck.

This has not been an issue so far, but architectures with 128 CPUs per
socket are now available, and it is certainly unideal to have so many
CPUs in the same runqueue, competing for the same locks, etc.

Let's therefore set a cap to the total number of CPUs that can share a
runqueue. The value is set to 16, by default, but can be changed with
a boot command line parameter.

Note also that, if the host has hyperthreading (or equivalent), and we
are not using core-scheduling), additional care is taken to avoid splitting
CPUs that are hyperthread siblings among different runqueues.

In v2, in addition to trying to address the review comments, I've added
the logic for doing a full rebalance of the scheduler runqueues, while
the system is running. This is actually something that itself came up
during review of v1, when we realized that we do not only wanted a cap,
we also wanted some balancing, and if you want real balancing, you have
to be able to re-arrange the runqueue layout, dynamically.

It took a while because I, although I had something that looked a lot
like the final solution implemented in this patch, could not see how to
solve cleanly and effectively the issue of having the vCPUs in the
runqueues, while trying to re-balance them. It was while talking with
Juergen that we figured out that we can actually pause the domains,
which I had not thought at... So, once again, Juergen, thanks! :-)

I have done the most of the stress testing with core-scheduling
disabled, and it has survived to anything I threw at it, but of course
the more testing the better, and I will be able to actually do more of
it, in the coming days.

IAC, I have also verified that at least a few core-scheduling enabled
configs also work.

There are git branches here:
 git://xenbits.xen.org/people/dariof/xen.git  sched/credit2-max-cpus-runqueue-v2
 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/sched/credit2-max-cpus-runqueue-v2

 https://github.com/dfaggioli/xen/tree/sched/credit2-max-cpus-runqueue-v2

While v1 is at the following link:
 https://lore.kernel.org/xen-devel/158818022727.24327.14309662489731832234.stgit@Palanthas/T/#m1e885a0f0a1feef83790ac179ab66512201cb770

Thanks and Regards
---
Dario Faggioli (7):
      xen: credit2: factor cpu to runqueue matching in a function
      xen: credit2: factor runqueue initialization in its own function.
      xen: cpupool: add a back-pointer from a scheduler to its pool
      xen: credit2: limit the max number of CPUs in a runqueue
      xen: credit2: compute cpus per-runqueue more dynamically.
      cpupool: create an the 'cpupool sync' infrastructure
      xen: credit2: rebalance the number of CPUs in the scheduler runqueues

 docs/misc/xen-command-line.pandoc |   14 +
 xen/common/sched/cpupool.c        |   53 ++++
 xen/common/sched/credit2.c        |  437 ++++++++++++++++++++++++++++++++++---
 xen/common/sched/private.h        |    7 +
 xen/include/asm-arm/cpufeature.h  |    5 
 xen/include/asm-x86/processor.h   |    5 
 xen/include/xen/sched.h           |    1 
 7 files changed, 491 insertions(+), 31 deletions(-)

--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/7] xen: credit2: factor cpu to runqueue matching in a function
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
@ 2020-05-28 21:29 ` Dario Faggioli
  2020-05-29 14:48   ` Jürgen Groß
  2020-05-28 21:29 ` [PATCH v2 2/7] xen: credit2: factor runqueue initialization in its own function Dario Faggioli
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap

Just move the big if() condition in an inline function.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
 xen/common/sched/credit2.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 34f05c3e2a..697c9f917d 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -838,6 +838,20 @@ static inline bool same_core(unsigned int cpua, unsigned int cpub)
            cpu_to_core(cpua) == cpu_to_core(cpub);
 }
 
+static inline bool
+cpu_runqueue_match(const struct csched2_runqueue_data *rqd, unsigned int cpu)
+{
+    unsigned int peer_cpu = rqd->pick_bias;
+
+    BUG_ON(cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
+
+    /* OPT_RUNQUEUE_CPU will never find an existing runqueue. */
+    return opt_runqueue == OPT_RUNQUEUE_ALL ||
+           (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
+           (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) ||
+           (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu));
+}
+
 static struct csched2_runqueue_data *
 cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
 {
@@ -855,21 +869,11 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
     rqd_ins = &prv->rql;
     list_for_each_entry ( rqd, &prv->rql, rql )
     {
-        unsigned int peer_cpu;
-
         /* Remember first unused queue index. */
         if ( !rqi_unused && rqd->id > rqi )
             rqi_unused = true;
 
-        peer_cpu = rqd->pick_bias;
-        BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
-               cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
-
-        /* OPT_RUNQUEUE_CPU will never find an existing runqueue. */
-        if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
-             (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
-             (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) ||
-             (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) )
+        if ( cpu_runqueue_match(rqd, cpu) )
         {
             rqd_valid = true;
             break;
@@ -3744,6 +3748,8 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
     struct csched2_pcpu *spc;
     struct csched2_runqueue_data *rqd;
 
+    BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID);
+
     spc = xzalloc(struct csched2_pcpu);
     if ( spc == NULL )
         return ERR_PTR(-ENOMEM);



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/7] xen: credit2: factor runqueue initialization in its own function.
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
  2020-05-28 21:29 ` [PATCH v2 1/7] xen: credit2: factor cpu to runqueue matching in a function Dario Faggioli
@ 2020-05-28 21:29 ` Dario Faggioli
  2020-05-29 14:50   ` Jürgen Groß
  2020-05-28 21:29 ` [PATCH v2 3/7] xen: cpupool: add a back-pointer from a scheduler to its pool Dario Faggioli
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap

As it will be useful in later changes. While there, fix
the doc-comment.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
* new patch
---
 xen/common/sched/credit2.c |   35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 697c9f917d..8a4f28b9f5 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3766,21 +3766,16 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
     return spc;
 }
 
-/* Returns the ID of the runqueue the cpu is assigned to. */
-static struct csched2_runqueue_data *
-init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
-           unsigned int cpu)
+/*
+ * Do what's necessary to add cpu to the rqd (including activating the
+ * runqueue, if this is the first CPU we put in it).
+ */
+static void
+init_cpu_runqueue(struct csched2_private *prv, struct csched2_pcpu *spc,
+                  unsigned int cpu, struct csched2_runqueue_data *rqd)
 {
-    struct csched2_runqueue_data *rqd;
     unsigned int rcpu;
 
-    ASSERT(rw_is_write_locked(&prv->lock));
-    ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
-    /* CPU data needs to be allocated, but still uninitialized. */
-    ASSERT(spc);
-
-    rqd = spc->rqd;
-
     ASSERT(rqd && !cpumask_test_cpu(cpu, &spc->rqd->active));
 
     printk(XENLOG_INFO "Adding cpu %d to runqueue %d\n", cpu, rqd->id);
@@ -3816,6 +3811,22 @@ init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
 
     if ( rqd->nr_cpus == 1 )
         rqd->pick_bias = cpu;
+}
+
+/* Returns a pointer to the runqueue the cpu is assigned to. */
+static struct csched2_runqueue_data *
+init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
+           unsigned int cpu)
+{
+    struct csched2_runqueue_data *rqd;
+
+    ASSERT(rw_is_write_locked(&prv->lock));
+    ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
+    /* CPU data needs to be allocated, but still uninitialized. */
+    ASSERT(spc);
+
+    rqd = spc->rqd;
+    init_cpu_runqueue(prv, spc, cpu, rqd);
 
     return rqd;
 }



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/7] xen: cpupool: add a back-pointer from a scheduler to its pool
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
  2020-05-28 21:29 ` [PATCH v2 1/7] xen: credit2: factor cpu to runqueue matching in a function Dario Faggioli
  2020-05-28 21:29 ` [PATCH v2 2/7] xen: credit2: factor runqueue initialization in its own function Dario Faggioli
@ 2020-05-28 21:29 ` Dario Faggioli
  2020-05-29 14:54   ` Jürgen Groß
  2020-05-28 21:29 ` [PATCH v2 4/7] xen: credit2: limit the max number of CPUs in a runqueue Dario Faggioli
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap

If we need to know within which pool a particular scheduler
is working, we can do that by querying the cpupool pointer
of any of the sched_resource-s (i.e., ~ any of the CPUs)
assigned to the scheduler itself.

Basically, we pick any sched_resource that we know uses that
scheduler, and we check its *cpupool pointer. If we really
know that the resource uses the scheduler, this is fine, as
it also means the resource is inside the pool we are
looking for.

But, of course, we can do that for a pool/scheduler that has
not any been given any sched_resource yet (or if we do not
know whether or not it has any sched_resource).

To overcome such limitation, add a back pointer from the
scheduler, to its own pool.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: George Dunlap <george.dunlap@citrix.com>
---
Changes from v1:
* new patch
---
 xen/common/sched/cpupool.c |    1 +
 xen/common/sched/private.h |    1 +
 2 files changed, 2 insertions(+)

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 0664f7fa3d..7ea641ca26 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -287,6 +287,7 @@ static struct cpupool *cpupool_create(
         if ( c->sched == NULL )
             goto err;
     }
+    c->sched->cpupool = c;
     c->gran = opt_sched_granularity;
 
     *q = c;
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index b9a5b4c01c..df50976eb2 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -275,6 +275,7 @@ struct scheduler {
     char *opt_name;         /* option name for this scheduler    */
     unsigned int sched_id;  /* ID for this scheduler             */
     void *sched_data;       /* global data pointer               */
+    struct cpupool *cpupool;/* points to this scheduler's pool   */
 
     int          (*global_init)    (void);
 



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/7] xen: credit2: limit the max number of CPUs in a runqueue
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
                   ` (2 preceding siblings ...)
  2020-05-28 21:29 ` [PATCH v2 3/7] xen: cpupool: add a back-pointer from a scheduler to its pool Dario Faggioli
@ 2020-05-28 21:29 ` Dario Faggioli
  2020-05-29 15:23   ` Jürgen Groß
  2020-05-28 21:29 ` [PATCH v2 5/7] xen: credit2: compute cpus per-runqueue more dynamically Dario Faggioli
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Andrew Cooper, George Dunlap, Jan Beulich

In Credit2 CPUs (can) share runqueues, depending on the topology. For
instance, with per-socket runqueues (the default) all the CPUs that are
part of the same socket share a runqueue.

On platform with a huge number of CPUs per socket, that could be a
problem. An example is AMD EPYC2 servers, where we can have up to 128
CPUs in a socket.

It is of course possible to define other, still topology-based, runqueue
arrangements (e.g., per-LLC, per-DIE, etc). But that may still result in
runqueues with too many CPUs on other/future platforms. For instance, a
system with 96 CPUs and 2 NUMA nodes will end up having 48 CPUs per
runqueue. Not as bad, but still a lot!

Therefore, let's set a limit to the max number of CPUs that can share a
Credit2 runqueue. The actual value is configurable (at boot time), the
default being 16. If, for instance,  there are more than 16 CPUs in a
socket, they'll be split among two (or more) runqueues.

Note: with core scheduling enabled, this parameter sets the max number
of *scheduling resources* that can share a runqueue. Therefore, with
granularity set to core (and assumint 2 threads per core), we will have
at most 16 cores per runqueue, which corresponds to 32 threads. But that
is fine, considering how core scheduling works.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
- always try to add a CPU to the runqueue with the least CPUs already in
  it. This should guarantee a more even distribution of CPUs among
  runqueues, as requested during review;
- rename the matching function from foo_smt_bar() to foo_siblings_bar(),
  which is more generic, and do the same to the per-arch wrappers;
- deal with the case where the user is trying to set fewer CPUs per
  runqueue than there are siblings per core (by putting siblings in the
  same runq anyway, but logging a message), as requested during review;
- use the per-cpupool value for the scheduling granularity, as requested
  during review;
- add a comment about why we also count siblings that are currently
  outside of our cpupool, as suggested during review;
- add a boot command line doc entry;
- fix typos in comments;
---
 docs/misc/xen-command-line.pandoc |   14 ++++
 xen/common/sched/credit2.c        |  144 +++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/cpufeature.h  |    5 +
 xen/include/asm-x86/processor.h   |    5 +
 4 files changed, 162 insertions(+), 6 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index e16bb90184..1787f2c8fb 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1840,6 +1840,20 @@ with read and write permissions.
 
 Choose the default scheduler.
 
+### sched_credit2_max_cpus_runqueue
+> `= <integer>`
+
+> Default: `16`
+
+Defines how many CPUs will be put, at most, in each Credit2 runqueue.
+
+Runqueues are still arranged according to the host topology (and following
+what indicated by the 'credit2_runqueue' parameter). But we also have a cap
+to the number of CPUs that share each runqueues.
+
+A value that is a submultiple of the number of online CPUs is recommended,
+as that would likely produce a perfectly balanced runqueue configuration.
+
 ### sched_credit2_migrate_resist
 > `= <integer>`
 
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 8a4f28b9f5..f4d3f8ae6b 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -25,6 +25,7 @@
 #include <xen/trace.h>
 #include <xen/cpu.h>
 #include <xen/keyhandler.h>
+#include <asm/processor.h>
 
 #include "private.h"
 
@@ -471,6 +472,22 @@ static int __init parse_credit2_runqueue(const char *s)
 }
 custom_param("credit2_runqueue", parse_credit2_runqueue);
 
+/*
+ * How many CPUs will be put, at most, in each runqueue.
+ *
+ * Runqueues are still arranged according to the host topology (and according
+ * to the value of the 'credit2_runqueue' parameter). But we also have a cap
+ * to the number of CPUs that share runqueues.
+ *
+ * This should be considered an upper limit. In fact, we also try to balance
+ * the number of CPUs in each runqueue. And, when doing that, it is possible
+ * that fewer CPUs than what this parameters mandates will actually be put
+ * in each runqueue.
+ */
+#define MAX_CPUS_RUNQ 16
+static unsigned int __read_mostly opt_max_cpus_runqueue = MAX_CPUS_RUNQ;
+integer_param("sched_credit2_max_cpus_runqueue", opt_max_cpus_runqueue);
+
 /*
  * Per-runqueue data
  */
@@ -852,18 +869,83 @@ cpu_runqueue_match(const struct csched2_runqueue_data *rqd, unsigned int cpu)
            (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu));
 }
 
+/*
+ * Additional checks, to avoid separating siblings in different runqueues.
+ * This deals with both Intel's HTs and AMD's CUs. An arch that does not have
+ * any similar concept will just have cpu_nr_siblings() always return 1, and
+ * setup the cpu_sibling_mask-s acordingly (as currently does ARM), and things
+ * will just work as well.
+ */
+static bool
+cpu_runqueue_siblings_match(const struct csched2_runqueue_data *rqd,
+                            unsigned int cpu, unsigned int max_cpus_runq)
+{
+    unsigned int nr_sibls = cpu_nr_siblings(cpu);
+    unsigned int rcpu, tot_sibls = 0;
+
+    /*
+     * If we put the CPU in this runqueue, we must be sure that there will
+     * be enough room for accepting its sibling(s) as well.
+     */
+    cpumask_clear(cpumask_scratch_cpu(cpu));
+    for_each_cpu ( rcpu, &rqd->active )
+    {
+        ASSERT(rcpu != cpu);
+        if ( !cpumask_intersects(per_cpu(cpu_sibling_mask, rcpu), cpumask_scratch_cpu(cpu)) )
+        {
+            /*
+             * For each CPU already in the runqueue, account for it and for
+             * its sibling(s), independently from whether they are in the
+             * runqueue or not. Of course, we do this only once, for each CPU
+             * that is already inside the runqueue and all its siblings!
+             *
+             * This way, even if there are CPUs in the runqueue with siblings
+             * in a different cpupools, we still count all of them here.
+             * The reason for this is that, if at some future point we will
+             * move those sibling CPUs to this cpupool, we want them to land
+             * in this runqueue. Hence we must be sure to leave space for them.
+             */
+            cpumask_or(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                       per_cpu(cpu_sibling_mask, rcpu));
+            tot_sibls += cpu_nr_siblings(rcpu);
+        }
+    }
+    /*
+     * We know that neither the CPU, nor any of its sibling are here,
+     * or we wouldn't even have entered the function.
+     */
+    ASSERT(!cpumask_intersects(cpumask_scratch_cpu(cpu),
+                               per_cpu(cpu_sibling_mask, cpu)));
+
+    /* Try adding CPU and its sibling(s) to the count and check... */
+    return tot_sibls + nr_sibls <= max_cpus_runq;
+}
+
 static struct csched2_runqueue_data *
-cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
+cpu_add_to_runqueue(const struct scheduler *ops, unsigned int cpu)
 {
+    struct csched2_private *prv = csched2_priv(ops);
     struct csched2_runqueue_data *rqd, *rqd_new;
+    struct csched2_runqueue_data *rqd_valid = NULL;
     struct list_head *rqd_ins;
     unsigned long flags;
     int rqi = 0;
-    bool rqi_unused = false, rqd_valid = false;
+    unsigned int min_rqs, max_cpus_runq;
+    bool rqi_unused = false;
 
     /* Prealloc in case we need it - not allowed with interrupts off. */
     rqd_new = xzalloc(struct csched2_runqueue_data);
 
+    /*
+     * While respecting the limit of not having more than the max number of
+     * CPUs per runqueue, let's also try to "spread" the CPU, as evenly as
+     * possible, among the runqueues. For doing that, we need to know upfront
+     * how many CPUs we have, so let's use the number of CPUs that are online
+     * for that.
+     */
+    min_rqs = ((num_online_cpus() - 1) / opt_max_cpus_runqueue) + 1;
+    max_cpus_runq = num_online_cpus() / min_rqs;
+
     write_lock_irqsave(&prv->lock, flags);
 
     rqd_ins = &prv->rql;
@@ -873,10 +955,59 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
         if ( !rqi_unused && rqd->id > rqi )
             rqi_unused = true;
 
+        /*
+         * First of all, let's check whether, according to the system
+         * topology, this CPU belongs in this runqueue.
+         */
         if ( cpu_runqueue_match(rqd, cpu) )
         {
-            rqd_valid = true;
-            break;
+            /*
+             * If the CPU has any siblings, they are online and they are
+             * being added to this cpupool, always keep them together. Even
+             * if that means violating what the opt_max_cpus_runqueue param
+             * indicates. However, if this happens, chances are high that a
+             * too small value was used for the parameter, so warn the user
+             * about that.
+             *
+             * Note that we cannot check this once and for all, say, during
+             * scheduler initialization. In fact, at least in theory, the
+             * number of siblings a CPU has may not be the same for all the
+             * CPUs.
+             */
+            if ( cpumask_intersects(&rqd->active, per_cpu(cpu_sibling_mask, cpu)) )
+            {
+                if ( cpumask_weight(&rqd->active) >= opt_max_cpus_runqueue )
+                {
+                        printk("WARNING: %s: more than opt_max_cpus_runqueue "
+                               "in a runqueue (%u vs %u), due to topology constraints.\n"
+                               "Consider raising it!\n",
+                               __func__, opt_max_cpus_runqueue,
+                               cpumask_weight(&rqd->active));
+                }
+                rqd_valid = rqd;
+                break;
+            }
+
+            /*
+             * If we're using core (or socket) scheduling, no need to do any
+             * further checking beyond the number of CPUs already in this
+             * runqueue respecting our upper bound.
+             *
+             * Otherwise, let's try to make sure that siblings stay in the
+             * same runqueue, pretty much under any cinrcumnstances.
+             */
+            if ( rqd->refcnt < max_cpus_runq && (ops->cpupool->gran != SCHED_GRAN_cpu ||
+                  cpu_runqueue_siblings_match(rqd, cpu, max_cpus_runq)) )
+            {
+                /*
+                 * This runqueue is ok, but as we said, we also want an even
+                 * distribution of the CPUs. So, unless this is the very first
+                 * match, we go on, check all runqueues and actually add the
+                 * CPU into the one that is less full.
+                 */
+                if ( !rqd_valid || rqd->refcnt < rqd_valid->refcnt )
+                    rqd_valid = rqd;
+            }
         }
 
         if ( !rqi_unused )
@@ -900,6 +1031,8 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
         rqd->pick_bias = cpu;
         rqd->id = rqi;
     }
+    else
+        rqd = rqd_valid;
 
     rqd->refcnt++;
 
@@ -3744,7 +3877,6 @@ csched2_dump(const struct scheduler *ops)
 static void *
 csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
-    struct csched2_private *prv = csched2_priv(ops);
     struct csched2_pcpu *spc;
     struct csched2_runqueue_data *rqd;
 
@@ -3754,7 +3886,7 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
     if ( spc == NULL )
         return ERR_PTR(-ENOMEM);
 
-    rqd = cpu_add_to_runqueue(prv, cpu);
+    rqd = cpu_add_to_runqueue(ops, cpu);
     if ( IS_ERR(rqd) )
     {
         xfree(spc);
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index 9af5666628..8fdf9685d7 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -64,6 +64,11 @@ static inline bool cpus_have_cap(unsigned int num)
     return test_bit(num, cpu_hwcaps);
 }
 
+static inline cpu_nr_siblings(unsigned int)
+{
+    return 1;
+}
+
 /* System capability check for constant cap */
 #define cpus_have_const_cap(num) ({                 \
         register_t __ret;                           \
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 070691882b..73017c3f4b 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -174,6 +174,11 @@ extern void init_intel_cacheinfo(struct cpuinfo_x86 *c);
 
 unsigned int apicid_to_socket(unsigned int);
 
+static inline int cpu_nr_siblings(unsigned int cpu)
+{
+    return cpu_data[cpu].x86_num_siblings;
+}
+
 /*
  * Generic CPUID function
  * clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/7] xen: credit2: compute cpus per-runqueue more dynamically.
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
                   ` (3 preceding siblings ...)
  2020-05-28 21:29 ` [PATCH v2 4/7] xen: credit2: limit the max number of CPUs in a runqueue Dario Faggioli
@ 2020-05-28 21:29 ` Dario Faggioli
  2020-05-28 21:29 ` [PATCH v2 6/7] cpupool: create an the 'cpupool sync' infrastructure Dario Faggioli
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap

During boot, we use num_online_cpus() as an indication of how
many CPUs will end up in cpupool 0. We then decide (basing also
on the value of the boot time parameter opt_max_cpus_runqueue)
the actual number of CPUs that we want in each runqueue, in such
a way that the runqueue themselves are as balanced (in therms of
how many CPUs they have) as much as possible.

After boot, though, when for instance we are creating a cpupool,
it would be more appropriate to use the number of CPUs of the
pool, rather than the total number of online CPUs.

Do exactly that, even if this means (since from Xen's perspective
CPUs are added to pools one by one) we'll be computing a different
maximum number of CPUs per runqueue at each time.

In fact, we do it in preparation for the next change where,
after having computed the new value, we will also re-balance
the runqueues, by rebuilding them in such a way that the newly
computed maximum is actually respected for all of them.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
* new patch
---
 xen/common/sched/credit2.c |   30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index f4d3f8ae6b..af6d374677 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -922,7 +922,8 @@ cpu_runqueue_siblings_match(const struct csched2_runqueue_data *rqd,
 }
 
 static struct csched2_runqueue_data *
-cpu_add_to_runqueue(const struct scheduler *ops, unsigned int cpu)
+cpu_add_to_runqueue(const struct scheduler *ops, unsigned int nr_cpus,
+                    unsigned int cpu)
 {
     struct csched2_private *prv = csched2_priv(ops);
     struct csched2_runqueue_data *rqd, *rqd_new;
@@ -943,8 +944,8 @@ cpu_add_to_runqueue(const struct scheduler *ops, unsigned int cpu)
      * how many CPUs we have, so let's use the number of CPUs that are online
      * for that.
      */
-    min_rqs = ((num_online_cpus() - 1) / opt_max_cpus_runqueue) + 1;
-    max_cpus_runq = num_online_cpus() / min_rqs;
+    min_rqs = ((nr_cpus - 1) / opt_max_cpus_runqueue) + 1;
+    max_cpus_runq = nr_cpus / min_rqs;
 
     write_lock_irqsave(&prv->lock, flags);
 
@@ -3781,8 +3782,10 @@ csched2_dump(const struct scheduler *ops)
      */
     read_lock_irqsave(&prv->lock, flags);
 
-    printk("Active queues: %d\n"
+    printk("Active CPUs: %u\n"
+           "Active queues: %u\n"
            "\tdefault-weight     = %d\n",
+           cpumask_weight(&prv->initialized),
            prv->active_queues,
            CSCHED2_DEFAULT_WEIGHT);
     list_for_each_entry ( rqd, &prv->rql, rql )
@@ -3879,6 +3882,7 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     struct csched2_pcpu *spc;
     struct csched2_runqueue_data *rqd;
+    unsigned int nr_cpus;
 
     BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID);
 
@@ -3886,7 +3890,23 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
     if ( spc == NULL )
         return ERR_PTR(-ENOMEM);
 
-    rqd = cpu_add_to_runqueue(ops, cpu);
+    /*
+     * If the system is booting, we know that, at this point, num_online_cpus()
+     * CPUs have been brought up, and will be added to the default cpupool and
+     * hence to this scheduler. This is valuable information that we can use
+     * to build the runqueues in an already balanced state.
+     *
+     * On the other hand, when we are live, and e.g., are creating a new
+     * cpupool, or adding CPUs to an already existing one,  we have no way to
+     * know in advance, from here, how many CPUs it will have. Therefore, in
+     * that case, we just use the current number of CPUs that the pool has,
+     * plus 1, because we are in the process of adding it, for the balancing.
+     * This will likely provide suboptimal results, and we rely on dynamic
+     * runqueue rebalancing for fixing it up.
+     */
+    nr_cpus = system_state < SYS_STATE_active ? num_online_cpus() :
+        cpumask_weight(&csched2_priv(ops)->initialized) + 1;
+    rqd = cpu_add_to_runqueue(ops, nr_cpus, cpu);
     if ( IS_ERR(rqd) )
     {
         xfree(spc);



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/7] cpupool: create an the 'cpupool sync' infrastructure
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
                   ` (4 preceding siblings ...)
  2020-05-28 21:29 ` [PATCH v2 5/7] xen: credit2: compute cpus per-runqueue more dynamically Dario Faggioli
@ 2020-05-28 21:29 ` Dario Faggioli
  2020-05-28 21:30 ` [PATCH v2 7/7] xen: credit2: rebalance the number of CPUs in the scheduler runqueues Dario Faggioli
  2020-07-21 12:08 ` [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Jan Beulich
  7 siblings, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Andrew Cooper,
	George Dunlap, Jan Beulich

In case we want to make some live changes to the configuration
of (typically) the scheduler of a cpupool, we need things to be
quiet in that pool.

Not necessarily like with stop machine, but we at least need
to make sure that no domains are neither running not sitting
in the runqueues of the scheduler itself.

In fact, we need exactly something like this mechanism, for
changing "on the fly" which CPUs are assigned to which runqueue
in a Credit2 cpupool (check the following changes).
Therefore, instead than doing something specific for such a
use case, let's implement a generic mechanism.

Reason is, of course, that it may turn out to be useful for
other purposes, in future. But even for this specific case,
it is much easier and cleaner to just cede control to cpupool
code, instead of trying to do everything inside the scheduler.

Within the new cpupool_sync() function, we want to pause all
domains of a pool, including potentially the one calling
the function. Therefore, we defer the pausing, the actual work
and also the unpausing to a tasklet.

Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
Changes from v1:
* new patch
---
 xen/common/sched/cpupool.c |   52 ++++++++++++++++++++++++++++++++++++++++++++
 xen/common/sched/private.h |    6 +++++
 xen/include/xen/sched.h    |    1 +
 3 files changed, 59 insertions(+)

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 7ea641ca26..122c371c7a 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -234,6 +234,42 @@ void cpupool_put(struct cpupool *pool)
     free_cpupool_struct(pool);
 }
 
+void do_cpupool_sync(void *arg)
+{
+    struct cpupool *c = arg;
+    struct domain *d;
+
+
+    spin_lock(&cpupool_lock);
+
+    /*
+     * With this second call (and this time to domain_pause()) we basically
+     * make sure that all the domains have actually stopped running.
+     */
+    rcu_read_lock(&domlist_read_lock);
+    for_each_domain_in_cpupool(d, c)
+        domain_pause(d);
+    rcu_read_unlock(&domlist_read_lock);
+
+    /*
+     * Let's invoke the function that the caller provided. We pass a reference
+     * to our own scheduler as a parameter, with which it should easily reach
+     * anything it needs.
+     */
+    c->sync_ctl.func(c->sched);
+
+    /* We called pause twice, so we need to to the same with unpause. */
+    rcu_read_lock(&domlist_read_lock);
+    for_each_domain_in_cpupool(d, c)
+    {
+        domain_unpause(d);
+        domain_unpause(d);
+    }
+    rcu_read_unlock(&domlist_read_lock);
+
+    spin_unlock(&cpupool_lock);
+}
+
 /*
  * create a new cpupool with specified poolid and scheduler
  * returns pointer to new cpupool structure if okay, NULL else
@@ -292,6 +328,8 @@ static struct cpupool *cpupool_create(
 
     *q = c;
 
+    tasklet_init(&c->sync_ctl.tasklet, do_cpupool_sync, c);
+
     spin_unlock(&cpupool_lock);
 
     debugtrace_printk("Created cpupool %d with scheduler %s (%s)\n",
@@ -332,6 +370,7 @@ static int cpupool_destroy(struct cpupool *c)
         return -EBUSY;
     }
     *q = c->next;
+    tasklet_kill(&c->sync_ctl.tasklet);
     spin_unlock(&cpupool_lock);
 
     cpupool_put(c);
@@ -372,6 +411,19 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
     return ret;
 }
 
+void cpupool_sync(struct cpupool *c, void (*func)(void*))
+{
+    struct domain *d;
+
+    rcu_read_lock(&domlist_read_lock);
+    for_each_domain_in_cpupool(d, c)
+        domain_pause_nosync(d);
+    rcu_read_unlock(&domlist_read_lock);
+
+    c->sync_ctl.func = func;
+    tasklet_schedule_on_cpu(&c->sync_ctl.tasklet, cpumask_first(c->cpu_valid));
+}
+
 /*
  * assign a specific cpu to a cpupool
  * cpupool_lock must be held
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index df50976eb2..4705c8b119 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -503,6 +503,11 @@ static inline void sched_unit_unpause(const struct sched_unit *unit)
 #define REGISTER_SCHEDULER(x) static const struct scheduler *x##_entry \
   __used_section(".data.schedulers") = &x;
 
+struct cpupool_sync_ctl {
+    struct tasklet tasklet;
+    void (*func)(void*);
+};
+
 struct cpupool
 {
     int              cpupool_id;
@@ -514,6 +519,7 @@ struct cpupool
     struct scheduler *sched;
     atomic_t         refcnt;
     enum sched_gran  gran;
+    struct cpupool_sync_ctl sync_ctl;
 };
 
 static inline cpumask_t *cpupool_domain_master_cpumask(const struct domain *d)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..e2a233c96c 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1061,6 +1061,7 @@ extern enum cpufreq_controller {
 } cpufreq_controller;
 
 int cpupool_move_domain(struct domain *d, struct cpupool *c);
+void cpupool_sync(struct cpupool *c, void (*func)(void*));
 int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op);
 int cpupool_get_id(const struct domain *d);
 const cpumask_t *cpupool_valid_cpus(const struct cpupool *pool);



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 7/7] xen: credit2: rebalance the number of CPUs in the scheduler runqueues
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
                   ` (5 preceding siblings ...)
  2020-05-28 21:29 ` [PATCH v2 6/7] cpupool: create an the 'cpupool sync' infrastructure Dario Faggioli
@ 2020-05-28 21:30 ` Dario Faggioli
  2020-07-21 12:08 ` [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Jan Beulich
  7 siblings, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2020-05-28 21:30 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap

When adding and removing CPUs to/from a pool, we can end up in a
situation where some runqueues have a lot of CPUs, while other have only
a couple of them. Even if the scheduler (namely, the load balancer)
should be capable of dealing with such a situation, it is something that
is better avoided.

We now have all the pieces in place to attempt an actual re-balancement
of the Credit2 scheduler runqueues, so let's go for it.

In short:
- every time we add or remove a CPU, especially considering the topology
  implications (e.g., we may have removed the last HT from a queue, so
  now there's space there for two CPUs, etc), we try to rebalance;
- rebalancing happens under the control of the cpupool_sync() mechanism.
  Basically, it happens from inside a tasklet, after having put the
  cpupool in a quiescent state;
- the new runqueue configuration may end up being both different, but
  also identical to the current one. It would be good to have a way to
  check whether the result would be identical, and in which case skip
  the balancing, but there is no way to do that.

Rebalancing, since it pauses all the domain of a pool, etc, is a time
consuming operation. But it only happens when the cpupool configuration
is changed, so it is considered acceptable.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
* new patch
---
 xen/common/sched/credit2.c |  208 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index af6d374677..72d1961b1b 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3983,6 +3983,194 @@ init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
     return rqd;
 }
 
+/*
+ * Let's get the hard work of rebalancing runqueues done.
+ *
+ * This function is called from a tasklet, spawned by cpupool_sync().
+ * We run in idle vcpu context and we can assume that all the vcpus of all
+ * the domains within this cpupool are stopped... So we are relatively free
+ * to manipulate the scheduler's runqueues.
+ */
+static void do_rebalance_runqueues(void *arg)
+{
+    const struct scheduler *ops = arg;
+    struct csched2_private *prv = csched2_priv(ops);
+    struct csched2_runqueue_data *rqd, *rqd_temp;
+    struct csched2_unit *csu, *csu_temp;
+    struct list_head rq_list, csu_list;
+    spinlock_t temp_lock;
+    unsigned long flags;
+    unsigned int cpu;
+
+    INIT_LIST_HEAD(&rq_list);
+    INIT_LIST_HEAD(&csu_list);
+    spin_lock_init(&temp_lock);
+
+    /*
+     * This is where we will temporarily re-route the locks of all the CPUs,
+     * while we take them outside of their current runqueue, and before adding
+     * them to their new ones. Let's just take it right away, so any sort of
+     * scheduling activity in any of them will stop at it, and won't race with
+     * us.
+     */
+    spin_lock_irq(&temp_lock);
+
+    /*
+     * Everyone is paused, so we don't have any unit in any runqueue. Still,
+     * units are "assigned" each one to a runqueue, for debug dumps and for
+     * calculating and tracking the weights. Since the current runqueues are
+     * going away, we need to deassign everyone from its runqueue. We will
+     * put all of them back into one of the new runqueue, before the end.
+     */
+    write_lock(&prv->lock);
+    list_for_each_entry_safe ( rqd, rqd_temp, &prv->rql, rql )
+    {
+        spin_lock(&rqd->lock);
+        /*
+         * We're deassigning the units, but we don't want to loose track
+         * of them... Otherwise how do we do the re-assignment to the new
+         * runqueues? So, let's stash them in a list.
+         */
+        list_for_each_entry_safe ( csu, csu_temp, &rqd->svc, rqd_elem )
+        {
+            runq_deassign(csu->unit);
+            list_add(&csu->rqd_elem, &csu_list);
+        }
+
+        /*
+         * Now we want to prepare for getting rid of the runqueues as well.
+         * Each CPU has a pointer to the scheduler lock, which in case of
+         * Credit2 is the runqueue lock of the runqueue where the CPU is.
+         * But again, runqueues are vanishing, so let's re-route all such
+         * locks to our safe temporary solution that we introduced above.
+         */
+        for_each_cpu ( cpu, &rqd->active )
+            get_sched_res(cpu)->schedule_lock = &temp_lock;
+        spin_unlock(&rqd->lock);
+
+        /*
+         * And, finally, "dequeue the runqueues", one by one. Similarly to
+         * what we do with units, we need to park them in a temporary list.
+         * In this case, they are actually going away, but we need to do this
+         * because we can't free() them with IRQs disabled.
+         */
+        prv->active_queues--;
+        list_del(&rqd->rql);
+        list_add(&rqd->rql, &rq_list);
+    }
+    ASSERT(prv->active_queues == 0);
+    write_unlock(&prv->lock);
+
+    spin_unlock_irq(&temp_lock);
+
+    /*
+     * Since we have to drop the lock anyway (in order to be able to call
+     * cpu_add_to_runqueue() below), let's also get rid of the old runqueues,
+     * now that we can.
+     */
+    list_for_each_entry_safe ( rqd, rqd_temp, &rq_list, rql )
+    {
+        list_del(&rqd->rql);
+        xfree(rqd);
+    }
+    rqd = NULL;
+
+    /*
+     * We've got no lock! Well, this is still fine as:
+     * - the CPUs may, for some reason, try to schedule, and manage to do so,
+     *   taking turns on our global temporary spinlock. But there should be
+     *   nothing to schedule;
+     * - we are safe from more cpupool manipulations as cpupool_sync() owns
+     *   the cpupool_lock.
+     */
+
+    /*
+     * Now, for each CPU, we have to put them back in a runqueue. Of course,
+     * we have no runqueue any longer, so they'll be re-created. We basically
+     * follow pretty much the exact same path of when we add a CPU to a pool.
+     */
+    for_each_cpu ( cpu, &prv->initialized )
+    {
+        struct csched2_pcpu *spc = csched2_pcpu(cpu);
+
+        /*
+         * The new runqueues need to be allocated, and cpu_add_to_runqueue()
+         * takes care of that. We are, however, in a very delicate state, as
+         * we have destroyed all the previous runqueues. I.e., if an error
+         * (e.g., not enough memory) occurs here, there is no way we can
+         * go back to a sane previous state, so let's just crash.
+         *
+         * Note that, at this time, the number of CPUs we have in the
+         * "initialized" mask represents how many CPUs we have in this pool.
+         * So we can use it, for computing the balance, basically in the same
+         * way as we use num_online_cpu() during boot time. 
+         */
+        rqd = cpu_add_to_runqueue(ops, cpumask_weight(&prv->initialized), cpu);
+        if ( IS_ERR(rqd) )
+        {
+            printk(XENLOG_ERR " Major problems while rebalancing the runqueues!\n");
+            BUG();
+        }
+        spc->rqd = rqd;
+
+        spin_lock_irq(&temp_lock);
+        write_lock(&prv->lock);
+
+        init_cpu_runqueue(prv, spc, cpu, rqd);
+
+        /*
+         * Bring the scheduler lock back to where it belongs, given the new
+         * runqueue, for the various CPUs. Barrier is there because we want
+         * all the runqueue initialization steps that we have made to be
+         * visible, exactly as it was for everyone that takes the lock
+         * (see the comment in common/sched/core.c:schedule_cpu_add() ).
+         */
+        smp_wmb();
+        get_sched_res(cpu)->schedule_lock = &rqd->lock;
+
+        write_unlock(&prv->lock);
+        spin_unlock_irq(&temp_lock);
+    }
+
+    /*
+     * And, finally, everything should be in place again. We can finalize the
+     * work by adding back the units in the runqueues' lists (picking them
+     * up from the temporary list we used). Note that it is not necessary to
+     * call csched2_res_pick(), for deciding on which runqueue to put each
+     * of them. Thins is:
+     *  - with the old runqueue, each entity was associated to a
+     *    sched_resource / CPU;
+     *  - they where assigned to the runqueue in which such CPU was;
+     *  - all the CPUs that were there, with the old runqueues, are still
+     *    here, although in different runqueues;
+     *  - we can just let the units be associated with the runqueues where
+     *    theirs CPU has gone.
+     *
+     *  This means that, even though the load was balanced, with the previous
+     *  runqueues, it now most likely now will not be. But this is not a big
+     *  deal as the load balancer will make things even again, given a little
+     *  time.
+     */
+    list_for_each_entry_safe ( csu, csu_temp, &csu_list, rqd_elem )
+    {
+        spinlock_t *lock;
+
+        lock = unit_schedule_lock_irqsave(csu->unit, &flags);
+        list_del_init(&csu->rqd_elem);
+        runq_assign(csu->unit);
+        unit_schedule_unlock_irqrestore(lock, flags, csu->unit);
+    }
+}
+
+static void rebalance_runqueues(const struct scheduler *ops)
+{
+    struct cpupool *c = ops->cpupool;
+
+    ASSERT(c->sched == ops);
+
+    cpupool_sync(c, do_rebalance_runqueues);
+}
+
 /* Change the scheduler of cpu to us (Credit2). */
 static spinlock_t *
 csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
@@ -4017,6 +4205,16 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu,
      */
     ASSERT(get_sched_res(cpu)->schedule_lock != &rqd->lock);
 
+    /*
+     * We have added a CPU to the pool. Unless we are booting (in which
+     * case cpu_add_to_runqueue() balances the CPUs by itself) or we are in
+     * the very special case of still having fewer CPUs than how many we
+     * can put in just one runqueue... We need to try to rebalance.
+     */
+    if ( system_state >= SYS_STATE_active &&
+          cpumask_weight(&prv->initialized) > opt_max_cpus_runqueue )
+        rebalance_runqueues(new_ops);
+
     write_unlock(&prv->lock);
 
     return &rqd->lock;
@@ -4105,7 +4303,15 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     else
         rqd = NULL;
 
-    write_unlock_irqrestore(&prv->lock, flags);
+    /*
+     * Similarly to what said in csched2_switch_sched(), since we have just
+     * removed a CPU, it's good to check whether we can rebalance the
+     * runqueues.
+     */
+    if ( cpumask_weight(&prv->initialized) >= opt_max_cpus_runqueue )
+        rebalance_runqueues(ops);
+
+     write_unlock_irqrestore(&prv->lock, flags);
 
     xfree(rqd);
     xfree(pcpu);



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/7] xen: credit2: factor cpu to runqueue matching in a function
  2020-05-28 21:29 ` [PATCH v2 1/7] xen: credit2: factor cpu to runqueue matching in a function Dario Faggioli
@ 2020-05-29 14:48   ` Jürgen Groß
  0 siblings, 0 replies; 16+ messages in thread
From: Jürgen Groß @ 2020-05-29 14:48 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 28.05.20 23:29, Dario Faggioli wrote:
> Just move the big if() condition in an inline function.
> 
> No functional change intended.
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/7] xen: credit2: factor runqueue initialization in its own function.
  2020-05-28 21:29 ` [PATCH v2 2/7] xen: credit2: factor runqueue initialization in its own function Dario Faggioli
@ 2020-05-29 14:50   ` Jürgen Groß
  0 siblings, 0 replies; 16+ messages in thread
From: Jürgen Groß @ 2020-05-29 14:50 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 28.05.20 23:29, Dario Faggioli wrote:
> As it will be useful in later changes. While there, fix
> the doc-comment.
> 
> No functional change intended.
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/7] xen: cpupool: add a back-pointer from a scheduler to its pool
  2020-05-28 21:29 ` [PATCH v2 3/7] xen: cpupool: add a back-pointer from a scheduler to its pool Dario Faggioli
@ 2020-05-29 14:54   ` Jürgen Groß
  2020-05-29 14:56     ` Dario Faggioli
  0 siblings, 1 reply; 16+ messages in thread
From: Jürgen Groß @ 2020-05-29 14:54 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 28.05.20 23:29, Dario Faggioli wrote:
> If we need to know within which pool a particular scheduler
> is working, we can do that by querying the cpupool pointer
> of any of the sched_resource-s (i.e., ~ any of the CPUs)
> assigned to the scheduler itself.
> 
> Basically, we pick any sched_resource that we know uses that
> scheduler, and we check its *cpupool pointer. If we really
> know that the resource uses the scheduler, this is fine, as
> it also means the resource is inside the pool we are
> looking for.
> 
> But, of course, we can do that for a pool/scheduler that has

s/can/can't/ ?

> not any been given any sched_resource yet (or if we do not
> know whether or not it has any sched_resource).
> 
> To overcome such limitation, add a back pointer from the
> scheduler, to its own pool.
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/7] xen: cpupool: add a back-pointer from a scheduler to its pool
  2020-05-29 14:54   ` Jürgen Groß
@ 2020-05-29 14:56     ` Dario Faggioli
  0 siblings, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2020-05-29 14:56 UTC (permalink / raw)
  To: Jürgen Groß, xen-devel; +Cc: George Dunlap

[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]

On Fri, 2020-05-29 at 16:54 +0200, Jürgen Groß wrote:
> On 28.05.20 23:29, Dario Faggioli wrote:
> > If we need to know within which pool a particular scheduler
> > is working, we can do that by querying the cpupool pointer
> > of any of the sched_resource-s (i.e., ~ any of the CPUs)
> > assigned to the scheduler itself.
> > 
> > Basically, we pick any sched_resource that we know uses that
> > scheduler, and we check its *cpupool pointer. If we really
> > know that the resource uses the scheduler, this is fine, as
> > it also means the resource is inside the pool we are
> > looking for.
> > 
> > But, of course, we can do that for a pool/scheduler that has
> 
> s/can/can't/ ?
> 
And I even wrote "of course"... :-/

Yes, _of_course_, I meant can't. :-)

> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/7] xen: credit2: limit the max number of CPUs in a runqueue
  2020-05-28 21:29 ` [PATCH v2 4/7] xen: credit2: limit the max number of CPUs in a runqueue Dario Faggioli
@ 2020-05-29 15:23   ` Jürgen Groß
  2020-05-29 15:36     ` Dario Faggioli
  0 siblings, 1 reply; 16+ messages in thread
From: Jürgen Groß @ 2020-05-29 15:23 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

On 28.05.20 23:29, Dario Faggioli wrote:
> In Credit2 CPUs (can) share runqueues, depending on the topology. For
> instance, with per-socket runqueues (the default) all the CPUs that are
> part of the same socket share a runqueue.
> 
> On platform with a huge number of CPUs per socket, that could be a
> problem. An example is AMD EPYC2 servers, where we can have up to 128
> CPUs in a socket.
> 
> It is of course possible to define other, still topology-based, runqueue
> arrangements (e.g., per-LLC, per-DIE, etc). But that may still result in
> runqueues with too many CPUs on other/future platforms. For instance, a
> system with 96 CPUs and 2 NUMA nodes will end up having 48 CPUs per
> runqueue. Not as bad, but still a lot!
> 
> Therefore, let's set a limit to the max number of CPUs that can share a
> Credit2 runqueue. The actual value is configurable (at boot time), the
> default being 16. If, for instance,  there are more than 16 CPUs in a
> socket, they'll be split among two (or more) runqueues.
> 
> Note: with core scheduling enabled, this parameter sets the max number
> of *scheduling resources* that can share a runqueue. Therefore, with
> granularity set to core (and assumint 2 threads per core), we will have
> at most 16 cores per runqueue, which corresponds to 32 threads. But that
> is fine, considering how core scheduling works.
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>

with one additional question below.

> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Changes from v1:
> - always try to add a CPU to the runqueue with the least CPUs already in
>    it. This should guarantee a more even distribution of CPUs among
>    runqueues, as requested during review;
> - rename the matching function from foo_smt_bar() to foo_siblings_bar(),
>    which is more generic, and do the same to the per-arch wrappers;
> - deal with the case where the user is trying to set fewer CPUs per
>    runqueue than there are siblings per core (by putting siblings in the
>    same runq anyway, but logging a message), as requested during review;
> - use the per-cpupool value for the scheduling granularity, as requested
>    during review;
> - add a comment about why we also count siblings that are currently
>    outside of our cpupool, as suggested during review;
> - add a boot command line doc entry;
> - fix typos in comments;
> ---
>   docs/misc/xen-command-line.pandoc |   14 ++++
>   xen/common/sched/credit2.c        |  144 +++++++++++++++++++++++++++++++++++--
>   xen/include/asm-arm/cpufeature.h  |    5 +
>   xen/include/asm-x86/processor.h   |    5 +
>   4 files changed, 162 insertions(+), 6 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index e16bb90184..1787f2c8fb 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1840,6 +1840,20 @@ with read and write permissions.
>   
>   Choose the default scheduler.
>   
> +### sched_credit2_max_cpus_runqueue
> +> `= <integer>`
> +
> +> Default: `16`
> +
> +Defines how many CPUs will be put, at most, in each Credit2 runqueue.
> +
> +Runqueues are still arranged according to the host topology (and following
> +what indicated by the 'credit2_runqueue' parameter). But we also have a cap
> +to the number of CPUs that share each runqueues.
> +
> +A value that is a submultiple of the number of online CPUs is recommended,
> +as that would likely produce a perfectly balanced runqueue configuration.
> +
>   ### sched_credit2_migrate_resist
>   > `= <integer>`
>   
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index 8a4f28b9f5..f4d3f8ae6b 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -25,6 +25,7 @@
>   #include <xen/trace.h>
>   #include <xen/cpu.h>
>   #include <xen/keyhandler.h>
> +#include <asm/processor.h>
>   
>   #include "private.h"
>   
> @@ -471,6 +472,22 @@ static int __init parse_credit2_runqueue(const char *s)
>   }
>   custom_param("credit2_runqueue", parse_credit2_runqueue);
>   
> +/*
> + * How many CPUs will be put, at most, in each runqueue.
> + *
> + * Runqueues are still arranged according to the host topology (and according
> + * to the value of the 'credit2_runqueue' parameter). But we also have a cap
> + * to the number of CPUs that share runqueues.
> + *
> + * This should be considered an upper limit. In fact, we also try to balance
> + * the number of CPUs in each runqueue. And, when doing that, it is possible
> + * that fewer CPUs than what this parameters mandates will actually be put
> + * in each runqueue.
> + */
> +#define MAX_CPUS_RUNQ 16
> +static unsigned int __read_mostly opt_max_cpus_runqueue = MAX_CPUS_RUNQ;
> +integer_param("sched_credit2_max_cpus_runqueue", opt_max_cpus_runqueue);
> +
>   /*
>    * Per-runqueue data
>    */
> @@ -852,18 +869,83 @@ cpu_runqueue_match(const struct csched2_runqueue_data *rqd, unsigned int cpu)
>              (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu));
>   }
>   
> +/*
> + * Additional checks, to avoid separating siblings in different runqueues.
> + * This deals with both Intel's HTs and AMD's CUs. An arch that does not have
> + * any similar concept will just have cpu_nr_siblings() always return 1, and
> + * setup the cpu_sibling_mask-s acordingly (as currently does ARM), and things
> + * will just work as well.
> + */
> +static bool
> +cpu_runqueue_siblings_match(const struct csched2_runqueue_data *rqd,
> +                            unsigned int cpu, unsigned int max_cpus_runq)
> +{
> +    unsigned int nr_sibls = cpu_nr_siblings(cpu);
> +    unsigned int rcpu, tot_sibls = 0;
> +
> +    /*
> +     * If we put the CPU in this runqueue, we must be sure that there will
> +     * be enough room for accepting its sibling(s) as well.
> +     */
> +    cpumask_clear(cpumask_scratch_cpu(cpu));
> +    for_each_cpu ( rcpu, &rqd->active )
> +    {
> +        ASSERT(rcpu != cpu);
> +        if ( !cpumask_intersects(per_cpu(cpu_sibling_mask, rcpu), cpumask_scratch_cpu(cpu)) )
> +        {
> +            /*
> +             * For each CPU already in the runqueue, account for it and for
> +             * its sibling(s), independently from whether they are in the
> +             * runqueue or not. Of course, we do this only once, for each CPU
> +             * that is already inside the runqueue and all its siblings!
> +             *
> +             * This way, even if there are CPUs in the runqueue with siblings
> +             * in a different cpupools, we still count all of them here.
> +             * The reason for this is that, if at some future point we will
> +             * move those sibling CPUs to this cpupool, we want them to land
> +             * in this runqueue. Hence we must be sure to leave space for them.
> +             */
> +            cpumask_or(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                       per_cpu(cpu_sibling_mask, rcpu));
> +            tot_sibls += cpu_nr_siblings(rcpu);
> +        }
> +    }
> +    /*
> +     * We know that neither the CPU, nor any of its sibling are here,
> +     * or we wouldn't even have entered the function.
> +     */
> +    ASSERT(!cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                               per_cpu(cpu_sibling_mask, cpu)));
> +
> +    /* Try adding CPU and its sibling(s) to the count and check... */
> +    return tot_sibls + nr_sibls <= max_cpus_runq;
> +}
> +
>   static struct csched2_runqueue_data *
> -cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
> +cpu_add_to_runqueue(const struct scheduler *ops, unsigned int cpu)
>   {
> +    struct csched2_private *prv = csched2_priv(ops);
>       struct csched2_runqueue_data *rqd, *rqd_new;
> +    struct csched2_runqueue_data *rqd_valid = NULL;
>       struct list_head *rqd_ins;
>       unsigned long flags;
>       int rqi = 0;
> -    bool rqi_unused = false, rqd_valid = false;
> +    unsigned int min_rqs, max_cpus_runq;
> +    bool rqi_unused = false;
>   
>       /* Prealloc in case we need it - not allowed with interrupts off. */
>       rqd_new = xzalloc(struct csched2_runqueue_data);
>   
> +    /*
> +     * While respecting the limit of not having more than the max number of
> +     * CPUs per runqueue, let's also try to "spread" the CPU, as evenly as
> +     * possible, among the runqueues. For doing that, we need to know upfront
> +     * how many CPUs we have, so let's use the number of CPUs that are online
> +     * for that.
> +     */
> +    min_rqs = ((num_online_cpus() - 1) / opt_max_cpus_runqueue) + 1;
> +    max_cpus_runq = num_online_cpus() / min_rqs;
> +
>       write_lock_irqsave(&prv->lock, flags);
>   
>       rqd_ins = &prv->rql;
> @@ -873,10 +955,59 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>           if ( !rqi_unused && rqd->id > rqi )
>               rqi_unused = true;
>   
> +        /*
> +         * First of all, let's check whether, according to the system
> +         * topology, this CPU belongs in this runqueue.
> +         */
>           if ( cpu_runqueue_match(rqd, cpu) )
>           {
> -            rqd_valid = true;
> -            break;
> +            /*
> +             * If the CPU has any siblings, they are online and they are
> +             * being added to this cpupool, always keep them together. Even
> +             * if that means violating what the opt_max_cpus_runqueue param
> +             * indicates. However, if this happens, chances are high that a
> +             * too small value was used for the parameter, so warn the user
> +             * about that.
> +             *
> +             * Note that we cannot check this once and for all, say, during
> +             * scheduler initialization. In fact, at least in theory, the
> +             * number of siblings a CPU has may not be the same for all the
> +             * CPUs.
> +             */
> +            if ( cpumask_intersects(&rqd->active, per_cpu(cpu_sibling_mask, cpu)) )
> +            {
> +                if ( cpumask_weight(&rqd->active) >= opt_max_cpus_runqueue )
> +                {
> +                        printk("WARNING: %s: more than opt_max_cpus_runqueue "
> +                               "in a runqueue (%u vs %u), due to topology constraints.\n"
> +                               "Consider raising it!\n",
> +                               __func__, opt_max_cpus_runqueue,

Is printing the function name really adding any important information?

> +                               cpumask_weight(&rqd->active));
> +                }
> +                rqd_valid = rqd;
> +                break;
> +            }
> +
> +            /*
> +             * If we're using core (or socket) scheduling, no need to do any
> +             * further checking beyond the number of CPUs already in this
> +             * runqueue respecting our upper bound.
> +             *
> +             * Otherwise, let's try to make sure that siblings stay in the
> +             * same runqueue, pretty much under any cinrcumnstances.
> +             */
> +            if ( rqd->refcnt < max_cpus_runq && (ops->cpupool->gran != SCHED_GRAN_cpu ||
> +                  cpu_runqueue_siblings_match(rqd, cpu, max_cpus_runq)) )
> +            {
> +                /*
> +                 * This runqueue is ok, but as we said, we also want an even
> +                 * distribution of the CPUs. So, unless this is the very first
> +                 * match, we go on, check all runqueues and actually add the
> +                 * CPU into the one that is less full.
> +                 */
> +                if ( !rqd_valid || rqd->refcnt < rqd_valid->refcnt )
> +                    rqd_valid = rqd;
> +            }
>           }
>   
>           if ( !rqi_unused )
> @@ -900,6 +1031,8 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>           rqd->pick_bias = cpu;
>           rqd->id = rqi;
>       }
> +    else
> +        rqd = rqd_valid;
>   
>       rqd->refcnt++;
>   
> @@ -3744,7 +3877,6 @@ csched2_dump(const struct scheduler *ops)
>   static void *
>   csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>   {
> -    struct csched2_private *prv = csched2_priv(ops);
>       struct csched2_pcpu *spc;
>       struct csched2_runqueue_data *rqd;
>   
> @@ -3754,7 +3886,7 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>       if ( spc == NULL )
>           return ERR_PTR(-ENOMEM);
>   
> -    rqd = cpu_add_to_runqueue(prv, cpu);
> +    rqd = cpu_add_to_runqueue(ops, cpu);
>       if ( IS_ERR(rqd) )
>       {
>           xfree(spc);
> diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
> index 9af5666628..8fdf9685d7 100644
> --- a/xen/include/asm-arm/cpufeature.h
> +++ b/xen/include/asm-arm/cpufeature.h
> @@ -64,6 +64,11 @@ static inline bool cpus_have_cap(unsigned int num)
>       return test_bit(num, cpu_hwcaps);
>   }
>   
> +static inline cpu_nr_siblings(unsigned int)
> +{
> +    return 1;
> +}
> +
>   /* System capability check for constant cap */
>   #define cpus_have_const_cap(num) ({                 \
>           register_t __ret;                           \
> diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
> index 070691882b..73017c3f4b 100644
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -174,6 +174,11 @@ extern void init_intel_cacheinfo(struct cpuinfo_x86 *c);
>   
>   unsigned int apicid_to_socket(unsigned int);
>   
> +static inline int cpu_nr_siblings(unsigned int cpu)
> +{
> +    return cpu_data[cpu].x86_num_siblings;
> +}
> +
>   /*
>    * Generic CPUID function
>    * clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/7] xen: credit2: limit the max number of CPUs in a runqueue
  2020-05-29 15:23   ` Jürgen Groß
@ 2020-05-29 15:36     ` Dario Faggioli
  0 siblings, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2020-05-29 15:36 UTC (permalink / raw)
  To: Jürgen Groß, xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 3227 bytes --]

On Fri, 2020-05-29 at 17:23 +0200, Jürgen Groß wrote:
> On 28.05.20 23:29, Dario Faggioli wrote:
> > In Credit2 CPUs (can) share runqueues, depending on the topology.
> > For
> > instance, with per-socket runqueues (the default) all the CPUs that
> > are
> > part of the same socket share a runqueue.
> > 
> > On platform with a huge number of CPUs per socket, that could be a
> > problem. An example is AMD EPYC2 servers, where we can have up to
> > 128
> > CPUs in a socket.
> > 
> > It is of course possible to define other, still topology-based,
> > runqueue
> > arrangements (e.g., per-LLC, per-DIE, etc). But that may still
> > result in
> > runqueues with too many CPUs on other/future platforms. For
> > instance, a
> > system with 96 CPUs and 2 NUMA nodes will end up having 48 CPUs per
> > runqueue. Not as bad, but still a lot!
> > 
> > Therefore, let's set a limit to the max number of CPUs that can
> > share a
> > Credit2 runqueue. The actual value is configurable (at boot time),
> > the
> > default being 16. If, for instance,  there are more than 16 CPUs in
> > a
> > socket, they'll be split among two (or more) runqueues.
> > 
> > Note: with core scheduling enabled, this parameter sets the max
> > number
> > of *scheduling resources* that can share a runqueue. Therefore,
> > with
> > granularity set to core (and assumint 2 threads per core), we will
> > have
> > at most 16 cores per runqueue, which corresponds to 32 threads. But
> > that
> > is fine, considering how core scheduling works.
> > 
> > Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
Thanks!

> with one additional question below.
> 
Sure...

> > +            if ( cpumask_intersects(&rqd->active,
> > per_cpu(cpu_sibling_mask, cpu)) )
> > +            {
> > +                if ( cpumask_weight(&rqd->active) >=
> > opt_max_cpus_runqueue )
> > +                {
> > +                        printk("WARNING: %s: more than
> > opt_max_cpus_runqueue "
> > +                               "in a runqueue (%u vs %u), due to
> > topology constraints.\n"
> > +                               "Consider raising it!\n",
> > +                               __func__, opt_max_cpus_runqueue,
> 
> Is printing the function name really adding any important
> information?
> 
I personally don't think it does. I am mostly following suite from both
this file and other places, even for this kind of warnings, striving
for consistency.

I'd surely be fine removing it from all the warnings about the boot
time parameter, here in credit2 and in scheduling in general. And if
now it's not the time for doing that, I'm happy to get rid of it from
here, and do the rest later. Or to leave it everywhere for now but
remove it from everywhere later.

And I don't have a too strong opinion myself, so, whatever we think is
best, I can do it.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue
  2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
                   ` (6 preceding siblings ...)
  2020-05-28 21:30 ` [PATCH v2 7/7] xen: credit2: rebalance the number of CPUs in the scheduler runqueues Dario Faggioli
@ 2020-07-21 12:08 ` Jan Beulich
  2020-07-22 15:33   ` Dario Faggioli
  7 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2020-07-21 12:08 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Andrew Cooper,
	George Dunlap, xen-devel

On 28.05.2020 23:29, Dario Faggioli wrote:
> Dario Faggioli (7):
>       xen: credit2: factor cpu to runqueue matching in a function
>       xen: credit2: factor runqueue initialization in its own function.
>       xen: cpupool: add a back-pointer from a scheduler to its pool
>       xen: credit2: limit the max number of CPUs in a runqueue
>       xen: credit2: compute cpus per-runqueue more dynamically.
>       cpupool: create an the 'cpupool sync' infrastructure
>       xen: credit2: rebalance the number of CPUs in the scheduler runqueues

I still have the last three patches here as well as "xen: credit2:
document that min_rqd is valid and ok to use" in my "waiting for
sufficient acks" folder. Would you mind indicating if they should
stay there (and you will take care of pinging or whatever is
needed), or whether I may drop them (and you'll eventually re-
submit)?

Thanks, Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue
  2020-07-21 12:08 ` [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Jan Beulich
@ 2020-07-22 15:33   ` Dario Faggioli
  0 siblings, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2020-07-22 15:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Andrew Cooper,
	George Dunlap, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

On Tue, 2020-07-21 at 14:08 +0200, Jan Beulich wrote:
> On 28.05.2020 23:29, Dario Faggioli wrote:
> > Dario Faggioli (7):
> >       xen: credit2: factor cpu to runqueue matching in a function
> >       xen: credit2: factor runqueue initialization in its own
> > function.
> >       xen: cpupool: add a back-pointer from a scheduler to its pool
> >       xen: credit2: limit the max number of CPUs in a runqueue
> >       xen: credit2: compute cpus per-runqueue more dynamically.
> >       cpupool: create an the 'cpupool sync' infrastructure
> >       xen: credit2: rebalance the number of CPUs in the scheduler
> > runqueues
> 
> I still have the last three patches here as well as "xen: credit2:
> document that min_rqd is valid and ok to use" in my "waiting for
> sufficient acks" folder. 
>
Ok.

> Would you mind indicating if they should
> stay there (and you will take care of pinging or whatever is
> needed), or whether I may drop them (and you'll eventually re-
> submit)?
> 
So, the last 3 patches of this series, despite their status is indeed
"waiting for sufficient acks", in the sense that they've not been
looked at due to code freeze being imminent back then, but it still
would be ok for people to look at them, I'm happy for you to drop this
from your queue.

I will take care of resending just them in a new series and everything.
And thanks.

For the min_rqd one... That should be quite easy, in theory, so let me
ping the thread right now. Especially considering that I just looked
back at it, and noticed that I forgot to Cc George in the first place!
:-O

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-07-22 15:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-28 21:29 [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Dario Faggioli
2020-05-28 21:29 ` [PATCH v2 1/7] xen: credit2: factor cpu to runqueue matching in a function Dario Faggioli
2020-05-29 14:48   ` Jürgen Groß
2020-05-28 21:29 ` [PATCH v2 2/7] xen: credit2: factor runqueue initialization in its own function Dario Faggioli
2020-05-29 14:50   ` Jürgen Groß
2020-05-28 21:29 ` [PATCH v2 3/7] xen: cpupool: add a back-pointer from a scheduler to its pool Dario Faggioli
2020-05-29 14:54   ` Jürgen Groß
2020-05-29 14:56     ` Dario Faggioli
2020-05-28 21:29 ` [PATCH v2 4/7] xen: credit2: limit the max number of CPUs in a runqueue Dario Faggioli
2020-05-29 15:23   ` Jürgen Groß
2020-05-29 15:36     ` Dario Faggioli
2020-05-28 21:29 ` [PATCH v2 5/7] xen: credit2: compute cpus per-runqueue more dynamically Dario Faggioli
2020-05-28 21:29 ` [PATCH v2 6/7] cpupool: create an the 'cpupool sync' infrastructure Dario Faggioli
2020-05-28 21:30 ` [PATCH v2 7/7] xen: credit2: rebalance the number of CPUs in the scheduler runqueues Dario Faggioli
2020-07-21 12:08 ` [PATCH v2 0/7] xen: credit2: limit the number of CPUs per runqueue Jan Beulich
2020-07-22 15:33   ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.