From: Juergen Gross <jgross@suse.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
Uma Sharma <uma.sharma523@gmail.com>
Subject: Re: [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot
Date: Tue, 22 Mar 2016 08:46:14 +0100 [thread overview]
Message-ID: <56F0F846.3020804@suse.com> (raw)
In-Reply-To: <20160318190538.8117.96025.stgit@Solace.station>
On 18/03/16 20:05, Dario Faggioli wrote:
> In fact, credit2 uses CPU topology to decide how to arrange
> its internal runqueues. Before this change, only 'one runqueue
> per socket' was allowed. However, experiments have shown that,
> for instance, having one runqueue per physical core improves
> performance, especially in case hyperthreading is available.
>
> In general, it makes sense to allow users to pick one runqueue
> arrangement at boot time, so that:
> - more experiments can be easily performed to even better
> assess and improve performance;
> - one can select the best configuration for his specific
> use case and/or hardware.
>
> This patch enables the above.
>
> Note that, for correctly arranging runqueues to be per-core,
> just checking cpu_to_core() on the host CPUs is not enough.
> In fact, cores (and hyperthreads) on different sockets, can
> have the same core (and thread) IDs! We, therefore, need to
> check whether the full topology of two CPUs matches, for
> them to be put in the same runqueue.
>
> Note also that the default (although not functional) for
> credit2, since now, has been per-socket runqueue. This patch
> leaves things that way, to avoid mixing policy and technical
> changes.
>
> Finally, it would be a nice feature to be able to select
> a particular runqueue arrangement, even when creating a
> Credit2 cpupool. This is left as future work.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Uma Sharma <uma.sharma523@gmail.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Cahnges from v1:
> * added 'node' and 'global' runqueue arrangements, as
> suggested during review;
> ---
> docs/misc/xen-command-line.markdown | 19 +++++++++
> xen/common/sched_credit2.c | 76 +++++++++++++++++++++++++++++++++--
> 2 files changed, 90 insertions(+), 5 deletions(-)
>
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index ca77e3b..0047f94 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option.
> ### credit2\_load\_window\_shift
> > `= <integer>`
>
> +### credit2\_runqueue
> +> `= core | socket | node | all`
> +
> +> Default: `socket`
> +
> +Specify how host CPUs are arranged in runqueues. Runqueues are kept
> +balanced with respect to the load generated by the vCPUs running on
> +them. Smaller runqueues (as in with `core`) means more accurate load
> +balancing (for instance, it will deal better with hyperthreading),
> +but also more overhead.
> +
> +Available alternatives, with their meaning, are:
> +* `core`: one runqueue per each physical core of the host;
> +* `socket`: one runqueue per each physical socket (which often,
> + but not always, matches a NUMA node) of the host;
> +* `node`: one runqueue per each NUMA node of the host;
> +* `all`: just one runqueue shared by all the logical pCPUs of
> + the host
> +
> ### dbgp
> > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 456b9ea..c242dc4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -81,10 +81,6 @@
> * Credits are "reset" when the next vcpu in the runqueue is less than
> * or equal to zero. At that point, everyone's credits are "clipped"
> * to a small value, and a fixed credit is added to everyone.
> - *
> - * The plan is for all cores that share an L2 will share the same
> - * runqueue. At the moment, there is one global runqueue for all
> - * cores.
> */
>
> /*
> @@ -193,6 +189,55 @@ static int __read_mostly opt_overload_balance_tolerance = -3;
> integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>
> /*
> + * Runqueue organization.
> + *
> + * The various cpus are to be assigned each one to a runqueue, and we
> + * want that to happen basing on topology. At the moment, it is possible
> + * to choose to arrange runqueues to be:
> + *
> + * - per-core: meaning that there will be one runqueue per each physical
> + * core of the host. This will happen if the opt_runqueue
> + * parameter is set to 'core';
> + *
> + * - per-node: meaning that there will be one runqueue per each physical
> + * NUMA node of the host. This will happen if the opt_runqueue
> + * parameter is set to 'node';
> + *
> + * - per-socket: meaning that there will be one runqueue per each physical
> + * socket (AKA package, which often, but not always, also
> + * matches a NUMA node) of the host; This will happen if
> + * the opt_runqueue parameter is set to 'socket';
> + *
> + * - global: meaning that there will be only one runqueue to which all the
> + * (logical) processors of the host belongs. This will happen if
> + * the opt_runqueue parameter is set to 'all'.
> + *
> + * Depending on the value of opt_runqueue, therefore, cpus that are part of
> + * either the same physical core, or of the same physical socket, will be
> + * put together to form runqueues.
> + */
> +#define OPT_RUNQUEUE_CORE 1
> +#define OPT_RUNQUEUE_SOCKET 2
> +#define OPT_RUNQUEUE_NODE 3
> +#define OPT_RUNQUEUE_ALL 4
> +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
> +
> +static void parse_credit2_runqueue(const char *s)
> +{
> + if ( !strncmp(s, "core", 4) && !s[4] )
> + opt_runqueue = OPT_RUNQUEUE_CORE;
> + else if ( !strncmp(s, "socket", 6) && !s[6] )
> + opt_runqueue = OPT_RUNQUEUE_SOCKET;
> + else if ( !strncmp(s, "node", 4) && !s[4] )
> + opt_runqueue = OPT_RUNQUEUE_NODE;
> + else if ( !strncmp(s, "all", 6) && !s[6] )
The length is wrong. Should be 3 instead of 6 here.
Which poses the question: why don't you use strcmp() here? I don't see
any advantage using strncmp() in this case, especially as you've just
proven it is more error prone here.
> + opt_runqueue = OPT_RUNQUEUE_ALL;
> + else
> + printk("WARNING, unrecognized value of credit2_runqueue option!\n");
> +}
> +custom_param("credit2_runqueue", parse_credit2_runqueue);
> +
> +/*
> * Per-runqueue data
> */
> struct csched2_runqueue_data {
> @@ -1971,6 +2016,22 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
> cpumask_clear_cpu(rqi, &prv->active_queues);
> }
>
> +static inline bool_t same_node(unsigned int cpua, unsigned int cpub)
> +{
> + return cpu_to_node(cpua) == cpu_to_node(cpub);
> +}
> +
> +static inline bool_t same_socket(unsigned int cpua, unsigned int cpub)
> +{
> + return cpu_to_socket(cpua) == cpu_to_socket(cpub);
> +}
> +
> +static inline bool_t same_core(unsigned int cpua, unsigned int cpub)
> +{
> + return same_socket(cpua, cpub) &&
> + cpu_to_core(cpua) == cpu_to_core(cpub);
> +}
> +
> static unsigned int
> cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
> {
> @@ -2003,7 +2064,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
> BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
> cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
>
> - if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) )
> + if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
> + (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
> + (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) ||
> + (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) )
> break;
> }
>
> @@ -2157,6 +2221,8 @@ csched2_init(struct scheduler *ops)
> printk(" load_window_shift: %d\n", opt_load_window_shift);
> printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance);
> printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance);
> + printk(" runqueues arrangement: per-%s\n",
> + opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket");
node? all?
>
> if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN )
> {
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-03-22 7:46 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
2016-03-19 2:22 ` Meng Xu
2016-03-23 15:37 ` George Dunlap
2016-03-18 19:04 ` [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
2016-03-22 8:08 ` Juergen Gross
2016-03-23 17:32 ` George Dunlap
2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
2016-03-19 2:23 ` Meng Xu
2016-03-21 14:22 ` Jan Beulich
2016-03-23 17:36 ` George Dunlap
2016-03-24 9:43 ` Jan Beulich
2016-03-24 13:14 ` Dario Faggioli
2016-03-21 14:48 ` Juergen Gross
2016-03-21 15:07 ` Jan Beulich
2016-04-01 17:01 ` Dario Faggioli
2016-04-04 4:21 ` Juergen Gross
2016-04-04 6:13 ` Jan Beulich
2016-04-05 16:01 ` Dario Faggioli
2016-03-23 17:38 ` George Dunlap
2016-03-18 19:04 ` [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
2016-03-19 2:24 ` Meng Xu
2016-03-22 8:03 ` Juergen Gross
2016-03-23 17:46 ` George Dunlap
2016-03-18 19:04 ` [PATCH 05/16] xen: sched: move pCPU initialization in an helper Dario Faggioli
2016-03-23 17:51 ` George Dunlap
2016-03-23 18:09 ` George Dunlap
2016-03-24 13:21 ` Dario Faggioli
2016-03-18 19:04 ` [PATCH 06/16] xen: sched: prepare a .switch_sched hook for Credit1 Dario Faggioli
2016-03-18 19:04 ` [PATCH 07/16] xen: sched: prepare a .switch_sched hook for Credit2 Dario Faggioli
2016-03-18 19:04 ` [PATCH 08/16] " Dario Faggioli
2016-03-19 2:24 ` Meng Xu
2016-03-21 14:25 ` Jan Beulich
2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
2016-03-19 2:25 ` Meng Xu
2016-03-23 19:05 ` George Dunlap
2016-04-05 16:26 ` Dario Faggioli
2016-04-06 15:51 ` Dario Faggioli
2016-03-24 12:14 ` George Dunlap
2016-04-05 17:37 ` Dario Faggioli
2016-04-06 16:21 ` Dario Faggioli
2016-03-18 19:05 ` [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness Dario Faggioli
2016-03-21 14:51 ` Juergen Gross
2016-03-24 12:20 ` George Dunlap
2016-03-18 19:05 ` [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle Dario Faggioli
2016-03-24 15:03 ` George Dunlap
2016-03-18 19:05 ` [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2 Dario Faggioli
2016-03-24 12:24 ` George Dunlap
2016-03-18 19:05 ` [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot Dario Faggioli
2016-03-22 7:46 ` Juergen Gross [this message]
2016-03-24 12:36 ` George Dunlap
2016-03-18 19:05 ` [PATCH 14/16] xen: sched: per-core runqueues as default in credit2 Dario Faggioli
2016-03-24 12:37 ` George Dunlap
2016-03-18 19:06 ` [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2 Dario Faggioli
2016-03-18 19:27 ` Andrew Cooper
2016-03-24 12:44 ` George Dunlap
2016-03-24 12:56 ` Andrew Cooper
2016-03-24 13:10 ` Dario Faggioli
2016-03-18 19:06 ` [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2 Dario Faggioli
2016-03-24 15:42 ` George Dunlap
2016-04-05 16:50 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56F0F846.3020804@suse.com \
--to=jgross@suse.com \
--cc=dario.faggioli@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=uma.sharma523@gmail.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).