xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Uma Sharma <uma.sharma523@gmail.com>
Subject: Re: [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot
Date: Tue, 22 Mar 2016 08:46:14 +0100	[thread overview]
Message-ID: <56F0F846.3020804@suse.com> (raw)
In-Reply-To: <20160318190538.8117.96025.stgit@Solace.station>

On 18/03/16 20:05, Dario Faggioli wrote:
> In fact, credit2 uses CPU topology to decide how to arrange
> its internal runqueues. Before this change, only 'one runqueue
> per socket' was allowed. However, experiments have shown that,
> for instance, having one runqueue per physical core improves
> performance, especially in case hyperthreading is available.
> 
> In general, it makes sense to allow users to pick one runqueue
> arrangement at boot time, so that:
>  - more experiments can be easily performed to even better
>    assess and improve performance;
>  - one can select the best configuration for his specific
>    use case and/or hardware.
> 
> This patch enables the above.
> 
> Note that, for correctly arranging runqueues to be per-core,
> just checking cpu_to_core() on the host CPUs is not enough.
> In fact, cores (and hyperthreads) on different sockets, can
> have the same core (and thread) IDs! We, therefore, need to
> check whether the full topology of two CPUs matches, for
> them to be put in the same runqueue.
> 
> Note also that the default (although not functional) for
> credit2, since now, has been per-socket runqueue. This patch
> leaves things that way, to avoid mixing policy and technical
> changes.
> 
> Finally, it would be a nice feature to be able to select
> a particular runqueue arrangement, even when creating a
> Credit2 cpupool. This is left as future work.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Uma Sharma <uma.sharma523@gmail.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Cahnges from v1:
>  * added 'node' and 'global' runqueue arrangements, as
>    suggested during review;
> ---
>  docs/misc/xen-command-line.markdown |   19 +++++++++
>  xen/common/sched_credit2.c          |   76 +++++++++++++++++++++++++++++++++--
>  2 files changed, 90 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index ca77e3b..0047f94 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option.
>  ### credit2\_load\_window\_shift
>  > `= <integer>`
>  
> +### credit2\_runqueue
> +> `= core | socket | node | all`
> +
> +> Default: `socket`
> +
> +Specify how host CPUs are arranged in runqueues. Runqueues are kept
> +balanced with respect to the load generated by the vCPUs running on
> +them. Smaller runqueues (as in with `core`) means more accurate load
> +balancing (for instance, it will deal better with hyperthreading),
> +but also more overhead.
> +
> +Available alternatives, with their meaning, are:
> +* `core`: one runqueue per each physical core of the host;
> +* `socket`: one runqueue per each physical socket (which often,
> +            but not always, matches a NUMA node) of the host;
> +* `node`: one runqueue per each NUMA node of the host;
> +* `all`: just one runqueue shared by all the logical pCPUs of
> +         the host
> +
>  ### dbgp
>  > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
>  
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 456b9ea..c242dc4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -81,10 +81,6 @@
>   * Credits are "reset" when the next vcpu in the runqueue is less than
>   * or equal to zero.  At that point, everyone's credits are "clipped"
>   * to a small value, and a fixed credit is added to everyone.
> - *
> - * The plan is for all cores that share an L2 will share the same
> - * runqueue.  At the moment, there is one global runqueue for all
> - * cores.
>   */
>  
>  /*
> @@ -193,6 +189,55 @@ static int __read_mostly opt_overload_balance_tolerance = -3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> + * Runqueue organization.
> + *
> + * The various cpus are to be assigned each one to a runqueue, and we
> + * want that to happen basing on topology. At the moment, it is possible
> + * to choose to arrange runqueues to be:
> + *
> + * - per-core: meaning that there will be one runqueue per each physical
> + *             core of the host. This will happen if the opt_runqueue
> + *             parameter is set to 'core';
> + *
> + * - per-node: meaning that there will be one runqueue per each physical
> + *             NUMA node of the host. This will happen if the opt_runqueue
> + *             parameter is set to 'node';
> + *
> + * - per-socket: meaning that there will be one runqueue per each physical
> + *               socket (AKA package, which often, but not always, also
> + *               matches a NUMA node) of the host; This will happen if
> + *               the opt_runqueue parameter is set to 'socket';
> + *
> + * - global: meaning that there will be only one runqueue to which all the
> + *           (logical) processors of the host belongs. This will happen if
> + *           the opt_runqueue parameter is set to 'all'.
> + *
> + * Depending on the value of opt_runqueue, therefore, cpus that are part of
> + * either the same physical core, or of the same physical socket, will be
> + * put together to form runqueues.
> + */
> +#define OPT_RUNQUEUE_CORE   1
> +#define OPT_RUNQUEUE_SOCKET 2
> +#define OPT_RUNQUEUE_NODE   3
> +#define OPT_RUNQUEUE_ALL    4
> +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
> +
> +static void parse_credit2_runqueue(const char *s)
> +{
> +    if ( !strncmp(s, "core", 4) && !s[4] )
> +        opt_runqueue = OPT_RUNQUEUE_CORE;
> +    else if ( !strncmp(s, "socket", 6) && !s[6] )
> +        opt_runqueue = OPT_RUNQUEUE_SOCKET;
> +    else if ( !strncmp(s, "node", 4) && !s[4] )
> +        opt_runqueue = OPT_RUNQUEUE_NODE;
> +    else if ( !strncmp(s, "all", 6) && !s[6] )

The length is wrong. Should be 3 instead of 6 here.

Which poses the question: why don't you use strcmp() here? I don't see
any advantage using strncmp() in this case, especially as you've just
proven it is more error prone here.

> +        opt_runqueue = OPT_RUNQUEUE_ALL;
> +    else
> +        printk("WARNING, unrecognized value of credit2_runqueue option!\n");
> +}
> +custom_param("credit2_runqueue", parse_credit2_runqueue);
> +
> +/*
>   * Per-runqueue data
>   */
>  struct csched2_runqueue_data {
> @@ -1971,6 +2016,22 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
>      cpumask_clear_cpu(rqi, &prv->active_queues);
>  }
>  
> +static inline bool_t same_node(unsigned int cpua, unsigned int cpub)
> +{
> +    return cpu_to_node(cpua) == cpu_to_node(cpub);
> +}
> +
> +static inline bool_t same_socket(unsigned int cpua, unsigned int cpub)
> +{
> +    return cpu_to_socket(cpua) == cpu_to_socket(cpub);
> +}
> +
> +static inline bool_t same_core(unsigned int cpua, unsigned int cpub)
> +{
> +    return same_socket(cpua, cpub) &&
> +           cpu_to_core(cpua) == cpu_to_core(cpub);
> +}
> +
>  static unsigned int
>  cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>  {
> @@ -2003,7 +2064,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
>          BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
>                 cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
>  
> -        if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) )
> +        if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
> +             (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
> +             (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) ||
> +             (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) )
>              break;
>      }
>  
> @@ -2157,6 +2221,8 @@ csched2_init(struct scheduler *ops)
>      printk(" load_window_shift: %d\n", opt_load_window_shift);
>      printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance);
>      printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance);
> +    printk(" runqueues arrangement: per-%s\n",
> +           opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket");

node? all?

>  
>      if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN )
>      {

Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-03-22  7:46 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-18 19:03 [PATCH 00/16] Fixes and improvement (including hard affinity!) for Credit2 Dario Faggioli
2016-03-18 19:04 ` [PATCH 01/16] xen: sched: fix locking when allocating an RTDS pCPU Dario Faggioli
2016-03-19  2:22   ` Meng Xu
2016-03-23 15:37   ` George Dunlap
2016-03-18 19:04 ` [PATCH 02/16] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
2016-03-22  8:08   ` Juergen Gross
2016-03-23 17:32   ` George Dunlap
2016-03-18 19:04 ` [PATCH 03/16] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
2016-03-19  2:23   ` Meng Xu
2016-03-21 14:22   ` Jan Beulich
2016-03-23 17:36     ` George Dunlap
2016-03-24  9:43       ` Jan Beulich
2016-03-24 13:14         ` Dario Faggioli
2016-03-21 14:48   ` Juergen Gross
2016-03-21 15:07     ` Jan Beulich
2016-04-01 17:01       ` Dario Faggioli
2016-04-04  4:21         ` Juergen Gross
2016-04-04  6:13         ` Jan Beulich
2016-04-05 16:01           ` Dario Faggioli
2016-03-23 17:38   ` George Dunlap
2016-03-18 19:04 ` [PATCH 04/16] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
2016-03-19  2:24   ` Meng Xu
2016-03-22  8:03   ` Juergen Gross
2016-03-23 17:46     ` George Dunlap
2016-03-18 19:04 ` [PATCH 05/16] xen: sched: move pCPU initialization in an helper Dario Faggioli
2016-03-23 17:51   ` George Dunlap
2016-03-23 18:09     ` George Dunlap
2016-03-24 13:21     ` Dario Faggioli
2016-03-18 19:04 ` [PATCH 06/16] xen: sched: prepare a .switch_sched hook for Credit1 Dario Faggioli
2016-03-18 19:04 ` [PATCH 07/16] xen: sched: prepare a .switch_sched hook for Credit2 Dario Faggioli
2016-03-18 19:04 ` [PATCH 08/16] " Dario Faggioli
2016-03-19  2:24   ` Meng Xu
2016-03-21 14:25   ` Jan Beulich
2016-03-18 19:05 ` [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Dario Faggioli
2016-03-19  2:25   ` Meng Xu
2016-03-23 19:05   ` George Dunlap
2016-04-05 16:26     ` Dario Faggioli
2016-04-06 15:51       ` Dario Faggioli
2016-03-24 12:14   ` George Dunlap
2016-04-05 17:37     ` Dario Faggioli
2016-04-06 16:21       ` Dario Faggioli
2016-03-18 19:05 ` [PATCH 10/16] xen: sched: improve credit2 bootparams' scope, placement and signedness Dario Faggioli
2016-03-21 14:51   ` Juergen Gross
2016-03-24 12:20   ` George Dunlap
2016-03-18 19:05 ` [PATCH 11/16] xen: sched: on Credit2, don't reprogram the timer if idle Dario Faggioli
2016-03-24 15:03   ` George Dunlap
2016-03-18 19:05 ` [PATCH 12/16] xen: sched: fix per-socket runqueue creation in credit2 Dario Faggioli
2016-03-24 12:24   ` George Dunlap
2016-03-18 19:05 ` [PATCH 13/16] xen: sched: allow for choosing credit2 runqueues configuration at boot Dario Faggioli
2016-03-22  7:46   ` Juergen Gross [this message]
2016-03-24 12:36   ` George Dunlap
2016-03-18 19:05 ` [PATCH 14/16] xen: sched: per-core runqueues as default in credit2 Dario Faggioli
2016-03-24 12:37   ` George Dunlap
2016-03-18 19:06 ` [PATCH 15/16] xen: sched: scratch space for cpumasks on Credit2 Dario Faggioli
2016-03-18 19:27   ` Andrew Cooper
2016-03-24 12:44     ` George Dunlap
2016-03-24 12:56       ` Andrew Cooper
2016-03-24 13:10       ` Dario Faggioli
2016-03-18 19:06 ` [PATCH 16/16] xen: sched: implement vcpu hard affinity in Credit2 Dario Faggioli
2016-03-24 15:42   ` George Dunlap
2016-04-05 16:50     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F0F846.3020804@suse.com \
    --to=jgross@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=uma.sharma523@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).