All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
@ 2019-07-01 14:08 Jan Beulich
  2019-07-01 15:10 ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2019-07-01 14:08 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	Ian Jackson, xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>       * Set the tmp value unconditionally, so that the check in the iret
>       * hypercall works.
>       */
> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
> -                 st->vcpu->cpu_hard_affinity);
> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
> +                 st->vcpu->sched_unit->cpu_hard_affinity);

Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
in the unit, yet every vCPU in there may want to make use of the
field in parallel.

I also wonder how the code further down in this function fits with
the scheduler unit concept. But perhaps that's going to be dealt with
by later patches...

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>  
>  static void vcpu_destroy(struct vcpu *v)
>  {
> -    free_cpumask_var(v->cpu_hard_affinity);
> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
> -    free_cpumask_var(v->cpu_hard_affinity_saved);
> -    free_cpumask_var(v->cpu_soft_affinity);
> -
>      free_vcpu_struct(v);
>  }
>  
> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>  
>      grant_table_init_vcpu(v);
>  
> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
> -        goto fail;

Seeing these, I'm actually having trouble understanding how you mean
to retain the user visible interface behavior here: If you only store an
affinity per sched unit, then how are you meaning to honor the vCPU
granular requests coming in?

> @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d)
>           */
>          for_each_vcpu ( d, v )
>          {
> -            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
> +            cpumask_or(dom_cpumask, dom_cpumask,
> +                       v->sched_unit->cpu_hard_affinity);
>              cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
> -                       v->cpu_soft_affinity);
> +                       v->sched_unit->cpu_soft_affinity);
>          }

There's not going to be a for_each_sched_unit(), is there? It
would mean less iterations here, and less redundant CPU mask
operations. Ah, that's the subject of patch 30.

> @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v)
>      v->async_exception_mask = 0;
>      memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
>  #endif
> -    cpumask_clear(v->cpu_hard_affinity_tmp);
> +    cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp);

Same issue as above - you're affecting other vCPU-s here.

> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      case XEN_DOMCTL_getvcpuaffinity:
>      {
>          struct vcpu *v;
> +        struct sched_unit *unit;

const?

> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key)
>                  printk("dirty_cpu=%u", v->dirty_cpu);
>              printk("\n");
>              printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
> -                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
> -                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity),
> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity));

I don't see the value of logging the same information multiple times
(for each vCPU in a sched unit). I think you want to split this up.

> --- a/xen/common/wait.c
> +++ b/xen/common/wait.c
> @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
>  
>      /* Save current VCPU affinity; force wakeup on *this* CPU only. */
>      wqv->wakeup_cpu = smp_processor_id();
> -    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
> +    cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>      if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>      {
>          gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
> @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void)
>      {
>          /* Re-set VCPU affinity and re-enter the scheduler. */
>          struct vcpu *curr = current;
> -        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
> +        cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>          if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>          {
>              gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");

Same problem as above - the consumer of ->saved_affinity will affect
vCPU-s other than the subject one.

> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>   * * The hard affinity is not a subset of soft affinity
>   * * There is an overlap between the soft and hard affinity masks
>   */
> -static inline int has_soft_affinity(const struct vcpu *v)
> +static inline int has_soft_affinity(const struct sched_unit *unit)
>  {
> -    return v->soft_aff_effective &&
> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
> -                           v->cpu_soft_affinity);
> +    return unit->soft_aff_effective &&
> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
> +                           unit->cpu_soft_affinity);
>  }

Okay, at the moment there looks to be a 1:1 relationship between sched
units and vCPU-s. This would - at this point of the series - invalidate most
my earlier comments. However, in patch 57 I don't see how this unit->vcpu
mapping would get broken, and I can't seem to identify any other patch
where this might be happening. Looking at the github branch I also get the
impression that the struct vcpu * pointer out of struct sched_unit survives
until the end of the series, which doesn't seem right to me.

In any event, for the purpose here, I think there should be a backlink to
struct domain in struct sched_unit right away, and it should get used here.

> @@ -283,6 +265,22 @@ struct sched_unit {
>      void                  *priv;      /* scheduler private data */
>      struct sched_unit     *next_in_list;
>      struct sched_resource *res;
> +
> +    /* Last time when unit has been scheduled out. */
> +    uint64_t               last_run_time;
> +
> +    /* Item needs affinity restored. */
> +    bool                   affinity_broken;
> +    /* Does soft affinity actually play a role (given hard affinity)? */
> +    bool                   soft_aff_effective;
> +    /* Bitmask of CPUs on which this VCPU may run. */
> +    cpumask_var_t          cpu_hard_affinity;
> +    /* Used to change affinity temporarily. */
> +    cpumask_var_t          cpu_hard_affinity_tmp;
> +    /* Used to restore affinity across S3. */
> +    cpumask_var_t          cpu_hard_affinity_saved;
> +    /* Bitmask of CPUs on which this VCPU prefers to run. */
> +    cpumask_var_t          cpu_soft_affinity;
>  };

The mentions of "VCPU" in the comments here also survive till the end
of the series, which I also don't think is quite right.

> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
>  static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
>  {
>      return (is_hardware_domain(v->domain) &&
> -            cpumask_weight(v->cpu_hard_affinity) == 1);
> +            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
>  }

Seeing this - how is pinning (by command line option or by Dom0
doing this on itself transiently) going to work with core scheduling?

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH 00/60] xen: add core scheduling support
@ 2019-05-28 10:32 Juergen Gross
  2019-05-28 10:32   ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Roger Pau Monné

Add support for core- and socket-scheduling in the Xen hypervisor.

Via boot parameter sched-gran=core (or sched-gran=socket)
it is possible to change the scheduling granularity from cpu (the
default) to either whole cores or even sockets.

All logical cpus (threads) of the core or socket are always scheduled
together. This means that on a core always vcpus of the same domain
will be active, and those vcpus will always be scheduled at the same
time.

This is achieved by switching the scheduler to no longer see vcpus as
the primary object to schedule, but "schedule units". Each schedule
unit consists of as many vcpus as each core has threads on the current
system. The vcpu->unit relation is fixed.

I have done some very basic performance testing: on a 4 cpu system
(2 cores with 2 threads each) I did a "make -j 4" for building the Xen
hypervisor. With This test has been run on dom0, once with no other
guest active and once with another guest with 4 vcpus running the same
test. The results are (always elapsed time, system time, user time):

sched-gran=cpu,    no other guest: 116.10 177.65 207.84
sched-gran=core,   no other guest: 114.04 175.47 207.45
sched-gran=cpu,    other guest:    202.30 334.21 384.63
sched-gran=core,   other guest:    207.24 293.04 371.37

The performance tests have been performed with credit2, the other
schedulers are tested only briefly to be able to create a domain in a
cpupool.

Cpupools have been moderately tested (cpu add/remove, create, destroy,
move domain).

Cpu on-/offlining has been moderately tested, too.

The complete patch series is available under:

  git://github.com/jgross1/xen/ sched-v1

Changes in V1:
- cpupools are working now
- cpu on-/offlining working now
- all schedulers working now
- renamed "items" to "units"
- introduction of "idle scheduler"
- several new patches (see individual patches, mostly splits of
  former patches or cpupool and cpu on-/offlining support)
- all review comments addressed
- some minor changes (see individual patches)

Changes in RFC V2:
- ARM is building now
- HVM domains are working now
- idling will always be done with idle_vcpu active
- other small changes see individual patches

Juergen Gross (60):
  xen/sched: only allow schedulers with all mandatory functions
    available
  xen/sched: add inline wrappers for calling per-scheduler functions
  xen/sched: let sched_switch_sched() return new lock address
  xen/sched: use new sched_unit instead of vcpu in scheduler interfaces
  xen/sched: alloc struct sched_unit for each vcpu
  xen/sched: move per-vcpu scheduler private data pointer to sched_unit
  xen/sched: build a linked list of struct sched_unit
  xen/sched: introduce struct sched_resource
  xen/sched: let pick_cpu return a scheduler resource
  xen/sched: switch schedule_data.curr to point at sched_unit
  xen/sched: move per cpu scheduler private data into struct
    sched_resource
  xen/sched: switch vcpu_schedule_lock to unit_schedule_lock
  xen/sched: move some per-vcpu items to struct sched_unit
  xen/sched: add scheduler helpers hiding vcpu
  xen/sched: add domain pointer to struct sched_unit
  xen/sched: add id to struct sched_unit
  xen/sched: rename scheduler related perf counters
  xen/sched: switch struct task_slice from vcpu to sched_unit
  xen/sched: add is_running indicator to struct sched_unit
  xen/sched: make null scheduler vcpu agnostic.
  xen/sched: make rt scheduler vcpu agnostic.
  xen/sched: make credit scheduler vcpu agnostic.
  xen/sched: make credit2 scheduler vcpu agnostic.
  xen/sched: make arinc653 scheduler vcpu agnostic.
  xen: add sched_unit_pause_nosync() and sched_unit_unpause()
  xen: let vcpu_create() select processor
  xen/sched: use sched_resource cpu instead smp_processor_id in
    schedulers
  xen/sched: switch schedule() from vcpus to sched_units
  xen/sched: switch sched_move_irqs() to take sched_unit as parameter
  xen: switch from for_each_vcpu() to for_each_sched_unit()
  xen/sched: add runstate counters to struct sched_unit
  xen/sched: rework and rename vcpu_force_reschedule()
  xen/sched: Change vcpu_migrate_*() to operate on schedule unit
  xen/sched: move struct task_slice into struct sched_unit
  xen/sched: add code to sync scheduling of all vcpus of a sched unit
  xen/sched: introduce unit_runnable_state()
  xen/sched: add support for multiple vcpus per sched unit where missing
  x86: make loading of GDT at context switch more modular
  x86: optimize loading of GDT at context switch
  xen/sched: modify cpupool_domain_cpumask() to be an unit mask
  xen/sched: support allocating multiple vcpus into one sched unit
  xen/sched: add a scheduler_percpu_init() function
  xen/sched: add a percpu resource index
  xen/sched: add fall back to idle vcpu when scheduling unit
  xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware
  xen/sched: carve out freeing sched_unit memory into dedicated function
  xen/sched: move per-cpu variable scheduler to struct sched_resource
  xen/sched: move per-cpu variable cpupool to struct sched_resource
  xen/sched: reject switching smt on/off with core scheduling active
  xen/sched: prepare per-cpupool scheduling granularity
  xen/sched: use one schedule lock for all free cpus
  xen/sched: populate cpupool0 only after all cpus are up
  xen/sched: remove cpu from pool0 before removing it
  xen/sched: add minimalistic idle scheduler for free cpus
  xen/sched: split schedule_cpu_switch()
  xen/sched: protect scheduling resource via rcu
  xen/sched: support multiple cpus per scheduling resource
  xen/sched: support differing granularity in schedule_cpu_[add/rm]()
  xen/sched: support core scheduling for moving cpus to/from cpupools
  xen/sched: add scheduling granularity enum

 xen/arch/arm/domain.c             |    2 +-
 xen/arch/arm/domain_build.c       |   13 +-
 xen/arch/x86/acpi/cpu_idle.c      |    1 -
 xen/arch/x86/cpu/common.c         |    3 +
 xen/arch/x86/cpu/mcheck/mce.c     |    1 -
 xen/arch/x86/cpu/mcheck/mctelem.c |    1 -
 xen/arch/x86/dom0_build.c         |   10 +-
 xen/arch/x86/domain.c             |   93 +-
 xen/arch/x86/hvm/dom0_build.c     |    9 +-
 xen/arch/x86/pv/dom0_build.c      |   10 +-
 xen/arch/x86/pv/emul-priv-op.c    |    1 +
 xen/arch/x86/pv/shim.c            |    4 +-
 xen/arch/x86/pv/traps.c           |    5 +-
 xen/arch/x86/setup.c              |    1 -
 xen/arch/x86/smpboot.c            |    1 -
 xen/arch/x86/sysctl.c             |    3 +-
 xen/arch/x86/traps.c              |    9 +-
 xen/common/cpupool.c              |  326 ++++---
 xen/common/domain.c               |   34 +-
 xen/common/domctl.c               |   23 +-
 xen/common/keyhandler.c           |    4 +-
 xen/common/sched_arinc653.c       |  270 +++---
 xen/common/sched_credit.c         |  783 ++++++++-------
 xen/common/sched_credit2.c        | 1134 +++++++++++-----------
 xen/common/sched_null.c           |  443 +++++----
 xen/common/sched_rt.c             |  555 +++++------
 xen/common/schedule.c             | 1923 +++++++++++++++++++++++++++++--------
 xen/common/softirq.c              |    6 +-
 xen/common/wait.c                 |    4 +-
 xen/include/asm-arm/current.h     |    1 +
 xen/include/asm-x86/cpuidle.h     |   11 -
 xen/include/asm-x86/current.h     |    7 +-
 xen/include/asm-x86/desc.h        |    1 +
 xen/include/asm-x86/dom0_build.h  |    3 +-
 xen/include/asm-x86/smp.h         |    3 +
 xen/include/xen/domain.h          |    3 +-
 xen/include/xen/perfc_defn.h      |   32 +-
 xen/include/xen/sched-if.h        |  444 +++++++--
 xen/include/xen/sched.h           |   99 +-
 xen/include/xen/softirq.h         |    1 +
 40 files changed, 3905 insertions(+), 2372 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-07-02 10:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01 14:08 [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Jan Beulich
2019-07-01 15:10 ` Juergen Gross
2019-07-01 15:46   ` Jan Beulich
2019-07-02  6:30     ` Juergen Gross
2019-07-02  7:54       ` Jan Beulich
2019-07-02  8:14         ` Juergen Gross
2019-07-02  8:27           ` Jan Beulich
2019-07-02  8:44             ` Juergen Gross
2019-07-02  9:05               ` Jan Beulich
2019-07-02  9:16                 ` Juergen Gross
2019-07-02  8:21         ` Dario Faggioli
2019-07-02  8:29           ` Jan Beulich
2019-07-02  9:40             ` Dario Faggioli
2019-07-02 10:01               ` Jan Beulich
2019-07-02 10:25                 ` Juergen Gross
  -- strict thread matches above, loose matches on Subject: below --
2019-05-28 10:32 [PATCH 00/60] xen: add core scheduling support Juergen Gross
2019-05-28 10:32 ` [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Juergen Gross
2019-05-28 10:32   ` Juergen Gross
2019-06-13  7:18     ` Andrii Anisov
2019-06-13  7:29       ` Juergen Gross
2019-06-13  7:34         ` Andrii Anisov
2019-06-13  8:39           ` Juergen Gross
2019-06-13  8:49             ` Andrii Anisov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.