xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
@ 2019-07-01 14:08 Jan Beulich
  2019-07-01 15:10 ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2019-07-01 14:08 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	Ian Jackson, xen-devel, Roger Pau Monne

>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>       * Set the tmp value unconditionally, so that the check in the iret
>       * hypercall works.
>       */
> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
> -                 st->vcpu->cpu_hard_affinity);
> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
> +                 st->vcpu->sched_unit->cpu_hard_affinity);

Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
in the unit, yet every vCPU in there may want to make use of the
field in parallel.

I also wonder how the code further down in this function fits with
the scheduler unit concept. But perhaps that's going to be dealt with
by later patches...

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>  
>  static void vcpu_destroy(struct vcpu *v)
>  {
> -    free_cpumask_var(v->cpu_hard_affinity);
> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
> -    free_cpumask_var(v->cpu_hard_affinity_saved);
> -    free_cpumask_var(v->cpu_soft_affinity);
> -
>      free_vcpu_struct(v);
>  }
>  
> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>  
>      grant_table_init_vcpu(v);
>  
> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
> -        goto fail;

Seeing these, I'm actually having trouble understanding how you mean
to retain the user visible interface behavior here: If you only store an
affinity per sched unit, then how are you meaning to honor the vCPU
granular requests coming in?

> @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d)
>           */
>          for_each_vcpu ( d, v )
>          {
> -            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
> +            cpumask_or(dom_cpumask, dom_cpumask,
> +                       v->sched_unit->cpu_hard_affinity);
>              cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
> -                       v->cpu_soft_affinity);
> +                       v->sched_unit->cpu_soft_affinity);
>          }

There's not going to be a for_each_sched_unit(), is there? It
would mean less iterations here, and less redundant CPU mask
operations. Ah, that's the subject of patch 30.

> @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v)
>      v->async_exception_mask = 0;
>      memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
>  #endif
> -    cpumask_clear(v->cpu_hard_affinity_tmp);
> +    cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp);

Same issue as above - you're affecting other vCPU-s here.

> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      case XEN_DOMCTL_getvcpuaffinity:
>      {
>          struct vcpu *v;
> +        struct sched_unit *unit;

const?

> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key)
>                  printk("dirty_cpu=%u", v->dirty_cpu);
>              printk("\n");
>              printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
> -                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
> -                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity),
> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity));

I don't see the value of logging the same information multiple times
(for each vCPU in a sched unit). I think you want to split this up.

> --- a/xen/common/wait.c
> +++ b/xen/common/wait.c
> @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
>  
>      /* Save current VCPU affinity; force wakeup on *this* CPU only. */
>      wqv->wakeup_cpu = smp_processor_id();
> -    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
> +    cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>      if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>      {
>          gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
> @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void)
>      {
>          /* Re-set VCPU affinity and re-enter the scheduler. */
>          struct vcpu *curr = current;
> -        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
> +        cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>          if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>          {
>              gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");

Same problem as above - the consumer of ->saved_affinity will affect
vCPU-s other than the subject one.

> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>   * * The hard affinity is not a subset of soft affinity
>   * * There is an overlap between the soft and hard affinity masks
>   */
> -static inline int has_soft_affinity(const struct vcpu *v)
> +static inline int has_soft_affinity(const struct sched_unit *unit)
>  {
> -    return v->soft_aff_effective &&
> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
> -                           v->cpu_soft_affinity);
> +    return unit->soft_aff_effective &&
> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
> +                           unit->cpu_soft_affinity);
>  }

Okay, at the moment there looks to be a 1:1 relationship between sched
units and vCPU-s. This would - at this point of the series - invalidate most
my earlier comments. However, in patch 57 I don't see how this unit->vcpu
mapping would get broken, and I can't seem to identify any other patch
where this might be happening. Looking at the github branch I also get the
impression that the struct vcpu * pointer out of struct sched_unit survives
until the end of the series, which doesn't seem right to me.

In any event, for the purpose here, I think there should be a backlink to
struct domain in struct sched_unit right away, and it should get used here.

> @@ -283,6 +265,22 @@ struct sched_unit {
>      void                  *priv;      /* scheduler private data */
>      struct sched_unit     *next_in_list;
>      struct sched_resource *res;
> +
> +    /* Last time when unit has been scheduled out. */
> +    uint64_t               last_run_time;
> +
> +    /* Item needs affinity restored. */
> +    bool                   affinity_broken;
> +    /* Does soft affinity actually play a role (given hard affinity)? */
> +    bool                   soft_aff_effective;
> +    /* Bitmask of CPUs on which this VCPU may run. */
> +    cpumask_var_t          cpu_hard_affinity;
> +    /* Used to change affinity temporarily. */
> +    cpumask_var_t          cpu_hard_affinity_tmp;
> +    /* Used to restore affinity across S3. */
> +    cpumask_var_t          cpu_hard_affinity_saved;
> +    /* Bitmask of CPUs on which this VCPU prefers to run. */
> +    cpumask_var_t          cpu_soft_affinity;
>  };

The mentions of "VCPU" in the comments here also survive till the end
of the series, which I also don't think is quite right.

> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
>  static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
>  {
>      return (is_hardware_domain(v->domain) &&
> -            cpumask_weight(v->cpu_hard_affinity) == 1);
> +            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
>  }

Seeing this - how is pinning (by command line option or by Dom0
doing this on itself transiently) going to work with core scheduling?

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-01 14:08 [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Jan Beulich
@ 2019-07-01 15:10 ` Juergen Gross
  2019-07-01 15:46   ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2019-07-01 15:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	Ian Jackson, xen-devel, Roger Pau Monne

On 01.07.19 16:08, Jan Beulich wrote:
>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>>        * Set the tmp value unconditionally, so that the check in the iret
>>        * hypercall works.
>>        */
>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>> -                 st->vcpu->cpu_hard_affinity);
>> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
>> +                 st->vcpu->sched_unit->cpu_hard_affinity);
> 
> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
> in the unit, yet every vCPU in there may want to make use of the
> field in parallel.

Hmm, yes, we'll need a usage bitmask.

Please note that affecting all vcpus in the unit is per design. With
multiple vcpus of a unit needing this feature in parallel there is no
way they can have different needs regarding temporary affinity.

> 
> I also wonder how the code further down in this function fits with
> the scheduler unit concept. But perhaps that's going to be dealt with
> by later patches...
> 
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>>   
>>   static void vcpu_destroy(struct vcpu *v)
>>   {
>> -    free_cpumask_var(v->cpu_hard_affinity);
>> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
>> -    free_cpumask_var(v->cpu_hard_affinity_saved);
>> -    free_cpumask_var(v->cpu_soft_affinity);
>> -
>>       free_vcpu_struct(v);
>>   }
>>   
>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>>   
>>       grant_table_init_vcpu(v);
>>   
>> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
>> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
>> -        goto fail;
> 
> Seeing these, I'm actually having trouble understanding how you mean
> to retain the user visible interface behavior here: If you only store an
> affinity per sched unit, then how are you meaning to honor the vCPU
> granular requests coming in?

With core scheduling it is only possible to set (virtual) core
affinities. Whenever an affinity of a vcpu is being set it will set the
affinity of the whole unit.

> 
>> @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d)
>>            */
>>           for_each_vcpu ( d, v )
>>           {
>> -            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
>> +            cpumask_or(dom_cpumask, dom_cpumask,
>> +                       v->sched_unit->cpu_hard_affinity);
>>               cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
>> -                       v->cpu_soft_affinity);
>> +                       v->sched_unit->cpu_soft_affinity);
>>           }
> 
> There's not going to be a for_each_sched_unit(), is there? It
> would mean less iterations here, and less redundant CPU mask
> operations. Ah, that's the subject of patch 30.

Right.

> 
>> @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v)
>>       v->async_exception_mask = 0;
>>       memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
>>   #endif
>> -    cpumask_clear(v->cpu_hard_affinity_tmp);
>> +    cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp);
> 
> Same issue as above - you're affecting other vCPU-s here.

Yes, we'll need a usage bitmask to be tested here.

> 
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>       case XEN_DOMCTL_getvcpuaffinity:
>>       {
>>           struct vcpu *v;
>> +        struct sched_unit *unit;
> 
> const?
> 
>> --- a/xen/common/keyhandler.c
>> +++ b/xen/common/keyhandler.c
>> @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key)
>>                   printk("dirty_cpu=%u", v->dirty_cpu);
>>               printk("\n");
>>               printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
>> -                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
>> -                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
>> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity),
>> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity));
> 
> I don't see the value of logging the same information multiple times
> (for each vCPU in a sched unit). I think you want to split this up.

Yes, true.

> 
>> --- a/xen/common/wait.c
>> +++ b/xen/common/wait.c
>> @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
>>   
>>       /* Save current VCPU affinity; force wakeup on *this* CPU only. */
>>       wqv->wakeup_cpu = smp_processor_id();
>> -    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
>> +    cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>>       if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>>       {
>>           gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
>> @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void)
>>       {
>>           /* Re-set VCPU affinity and re-enter the scheduler. */
>>           struct vcpu *curr = current;
>> -        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
>> +        cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>>           if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>>           {
>>               gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
> 
> Same problem as above - the consumer of ->saved_affinity will affect
> vCPU-s other than the subject one.

Yes.

> 
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>    * * The hard affinity is not a subset of soft affinity
>>    * * There is an overlap between the soft and hard affinity masks
>>    */
>> -static inline int has_soft_affinity(const struct vcpu *v)
>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>   {
>> -    return v->soft_aff_effective &&
>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>> -                           v->cpu_soft_affinity);
>> +    return unit->soft_aff_effective &&
>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>> +                           unit->cpu_soft_affinity);
>>   }
> 
> Okay, at the moment there looks to be a 1:1 relationship between sched
> units and vCPU-s. This would - at this point of the series - invalidate most
> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
> mapping would get broken, and I can't seem to identify any other patch
> where this might be happening. Looking at the github branch I also get the
> impression that the struct vcpu * pointer out of struct sched_unit survives
> until the end of the series, which doesn't seem right to me.

It is right. The vcpu pointer in the sched_unit is pointing to the first
vcpu of the unit at the end of the series. Further vcpus are found via
v->next_in_list.

> In any event, for the purpose here, I think there should be a backlink to
> struct domain in struct sched_unit right away, and it should get used here.

See patch 15.

> 
>> @@ -283,6 +265,22 @@ struct sched_unit {
>>       void                  *priv;      /* scheduler private data */
>>       struct sched_unit     *next_in_list;
>>       struct sched_resource *res;
>> +
>> +    /* Last time when unit has been scheduled out. */
>> +    uint64_t               last_run_time;
>> +
>> +    /* Item needs affinity restored. */
>> +    bool                   affinity_broken;
>> +    /* Does soft affinity actually play a role (given hard affinity)? */
>> +    bool                   soft_aff_effective;
>> +    /* Bitmask of CPUs on which this VCPU may run. */
>> +    cpumask_var_t          cpu_hard_affinity;
>> +    /* Used to change affinity temporarily. */
>> +    cpumask_var_t          cpu_hard_affinity_tmp;
>> +    /* Used to restore affinity across S3. */
>> +    cpumask_var_t          cpu_hard_affinity_saved;
>> +    /* Bitmask of CPUs on which this VCPU prefers to run. */
>> +    cpumask_var_t          cpu_soft_affinity;
>>   };
> 
> The mentions of "VCPU" in the comments here also survive till the end
> of the series, which I also don't think is quite right.

Will modify.

> 
>> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
>>   static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
>>   {
>>       return (is_hardware_domain(v->domain) &&
>> -            cpumask_weight(v->cpu_hard_affinity) == 1);
>> +            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
>>   }
> 
> Seeing this - how is pinning (by command line option or by Dom0
> doing this on itself transiently) going to work with core scheduling?

In the end only the bit of the first vcpu of a unit will be set in the
affinity masks, affecting all vcpus of the unit.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-01 15:10 ` Juergen Gross
@ 2019-07-01 15:46   ` Jan Beulich
  2019-07-02  6:30     ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2019-07-01 15:46 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 01.07.2019 17:10, Juergen Gross wrote:
> On 01.07.19 16:08, Jan Beulich wrote:
>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>>>        * Set the tmp value unconditionally, so that the check in the iret
>>>        * hypercall works.
>>>        */
>>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>>> -                 st->vcpu->cpu_hard_affinity);
>>> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
>>> +                 st->vcpu->sched_unit->cpu_hard_affinity);
>>
>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
>> in the unit, yet every vCPU in there may want to make use of the
>> field in parallel.
> 
> Hmm, yes, we'll need a usage bitmask.
> 
> Please note that affecting all vcpus in the unit is per design. With
> multiple vcpus of a unit needing this feature in parallel there is no
> way they can have different needs regarding temporary affinity.

But how will this work? I.e. how will all vCPU-s in a unit get
their temporary affinity pointing to the one specific pCPU in question?
It's not just the "all at the same time" that I don't see working here,
I'm also having trouble seeing how the potential cross-core or cross-
node movement that's apparently needed here would end up working. I'm
not going to exclude that the set of possible pCPU-s a vCPU needs to
move to here is still within the unit, but then I'd expect assertions
to that effect to be added.

>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>>>   static void vcpu_destroy(struct vcpu *v)
>>>   {
>>> -    free_cpumask_var(v->cpu_hard_affinity);
>>> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
>>> -    free_cpumask_var(v->cpu_hard_affinity_saved);
>>> -    free_cpumask_var(v->cpu_soft_affinity);
>>> -
>>>       free_vcpu_struct(v);
>>>   }
>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>>>       grant_table_init_vcpu(v);
>>> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
>>> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
>>> -        goto fail;
>>
>> Seeing these, I'm actually having trouble understanding how you mean
>> to retain the user visible interface behavior here: If you only store an
>> affinity per sched unit, then how are you meaning to honor the vCPU
>> granular requests coming in?
> 
> With core scheduling it is only possible to set (virtual) core
> affinities. Whenever an affinity of a vcpu is being set it will set the
> affinity of the whole unit.

Hmm, that's indeed what I was deducing, but how will we sell this
to people actually fiddling with vCPU affinities? I foresee getting
bug reports that the respective xl command(s) do(es)n't do anymore
what it used to do.

>>> --- a/xen/include/xen/sched-if.h
>>> +++ b/xen/include/xen/sched-if.h
>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>    * * The hard affinity is not a subset of soft affinity
>>>    * * There is an overlap between the soft and hard affinity masks
>>>    */
>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>   {
>>> -    return v->soft_aff_effective &&
>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>> -                           v->cpu_soft_affinity);
>>> +    return unit->soft_aff_effective &&
>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>> +                           unit->cpu_soft_affinity);
>>>   }
>>
>> Okay, at the moment there looks to be a 1:1 relationship between sched
>> units and vCPU-s. This would - at this point of the series - invalidate most
>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>> mapping would get broken, and I can't seem to identify any other patch
>> where this might be happening. Looking at the github branch I also get the
>> impression that the struct vcpu * pointer out of struct sched_unit survives
>> until the end of the series, which doesn't seem right to me.
> 
> It is right. The vcpu pointer in the sched_unit is pointing to the first
> vcpu of the unit at the end of the series. Further vcpus are found via
> v->next_in_list.

I'm afraid this sets us up for misunderstanding and misuse. I don't
think there should be a straight struct vcpu * out of struct sched_unit.

>>> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
>>>   static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
>>>   {
>>>       return (is_hardware_domain(v->domain) &&
>>> -            cpumask_weight(v->cpu_hard_affinity) == 1);
>>> +            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
>>>   }
>>
>> Seeing this - how is pinning (by command line option or by Dom0
>> doing this on itself transiently) going to work with core scheduling?
> 
> In the end only the bit of the first vcpu of a unit will be set in the
> affinity masks, affecting all vcpus of the unit.

I'm confused - what "bit of the first vcpu of a unit" are you referring
to?

To give an example of what I meant with my earlier reply: How is Dom0
requesting its vCPU 5 to be pinned to pCPU 3 going to be satisfied,
independent of the sched unit that vCPU 5 is associated with? Is the
whole sched unit getting moved over then? If so, what if another vCPU
in the same sched unit at the same time requests to be pinned to pCPU
17, on a different node/socket?

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-01 15:46   ` Jan Beulich
@ 2019-07-02  6:30     ` Juergen Gross
  2019-07-02  7:54       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2019-07-02  6:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 01.07.19 17:46, Jan Beulich wrote:
> On 01.07.2019 17:10, Juergen Gross wrote:
>> On 01.07.19 16:08, Jan Beulich wrote:
>>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
>>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>>>>         * Set the tmp value unconditionally, so that the check in the iret
>>>>         * hypercall works.
>>>>         */
>>>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>>>> -                 st->vcpu->cpu_hard_affinity);
>>>> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
>>>> +                 st->vcpu->sched_unit->cpu_hard_affinity);
>>>
>>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
>>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
>>> in the unit, yet every vCPU in there may want to make use of the
>>> field in parallel.
>>
>> Hmm, yes, we'll need a usage bitmask.
>>
>> Please note that affecting all vcpus in the unit is per design. With
>> multiple vcpus of a unit needing this feature in parallel there is no
>> way they can have different needs regarding temporary affinity.
> 
> But how will this work? I.e. how will all vCPU-s in a unit get
> their temporary affinity pointing to the one specific pCPU in question?

The _unit_ is pinned, so all the vcpus in that unit are pinned, too.

> It's not just the "all at the same time" that I don't see working here,
> I'm also having trouble seeing how the potential cross-core or cross-
> node movement that's apparently needed here would end up working. I'm

The unit is moved to another core via normal scheduling mechanisms. As
switching context is synchronized (see patch 35) all vcpus of a unit are
moved together.

> not going to exclude that the set of possible pCPU-s a vCPU needs to
> move to here is still within the unit, but then I'd expect assertions
> to that effect to be added.

Okay, will do.

> 
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>>>>    static void vcpu_destroy(struct vcpu *v)
>>>>    {
>>>> -    free_cpumask_var(v->cpu_hard_affinity);
>>>> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
>>>> -    free_cpumask_var(v->cpu_hard_affinity_saved);
>>>> -    free_cpumask_var(v->cpu_soft_affinity);
>>>> -
>>>>        free_vcpu_struct(v);
>>>>    }
>>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>>>>        grant_table_init_vcpu(v);
>>>> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
>>>> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
>>>> -        goto fail;
>>>
>>> Seeing these, I'm actually having trouble understanding how you mean
>>> to retain the user visible interface behavior here: If you only store an
>>> affinity per sched unit, then how are you meaning to honor the vCPU
>>> granular requests coming in?
>>
>> With core scheduling it is only possible to set (virtual) core
>> affinities. Whenever an affinity of a vcpu is being set it will set the
>> affinity of the whole unit.
> 
> Hmm, that's indeed what I was deducing, but how will we sell this
> to people actually fiddling with vCPU affinities? I foresee getting
> bug reports that the respective xl command(s) do(es)n't do anymore
> what it used to do.

The new behavior must be documented, sure.

> 
>>>> --- a/xen/include/xen/sched-if.h
>>>> +++ b/xen/include/xen/sched-if.h
>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>>     * * The hard affinity is not a subset of soft affinity
>>>>     * * There is an overlap between the soft and hard affinity masks
>>>>     */
>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>    {
>>>> -    return v->soft_aff_effective &&
>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>> -                           v->cpu_soft_affinity);
>>>> +    return unit->soft_aff_effective &&
>>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>> +                           unit->cpu_soft_affinity);
>>>>    }
>>>
>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>> units and vCPU-s. This would - at this point of the series - invalidate most
>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>>> mapping would get broken, and I can't seem to identify any other patch
>>> where this might be happening. Looking at the github branch I also get the
>>> impression that the struct vcpu * pointer out of struct sched_unit survives
>>> until the end of the series, which doesn't seem right to me.
>>
>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>> vcpu of the unit at the end of the series. Further vcpus are found via
>> v->next_in_list.
> 
> I'm afraid this sets us up for misunderstanding and misuse. I don't
> think there should be a straight struct vcpu * out of struct sched_unit.

That was the most effective way to do it. What are you suggesting?

> 
>>>> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
>>>>    static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
>>>>    {
>>>>        return (is_hardware_domain(v->domain) &&
>>>> -            cpumask_weight(v->cpu_hard_affinity) == 1);
>>>> +            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
>>>>    }
>>>
>>> Seeing this - how is pinning (by command line option or by Dom0
>>> doing this on itself transiently) going to work with core scheduling?
>>
>> In the end only the bit of the first vcpu of a unit will be set in the
>> affinity masks, affecting all vcpus of the unit.
> 
> I'm confused - what "bit of the first vcpu of a unit" are you referring
> to?

Sorry, I meant to rewrite this sentence and forgot before sending.

But I think below explanation is making it rather clear.

> To give an example of what I meant with my earlier reply: How is Dom0
> requesting its vCPU 5 to be pinned to pCPU 3 going to be satisfied,
> independent of the sched unit that vCPU 5 is associated with? Is the
> whole sched unit getting moved over then? If so, what if another vCPU
> in the same sched unit at the same time requests to be pinned to pCPU
> 17, on a different node/socket?

Yes, the whole sched unit will be moved.

And in case a unit is already pinned and a conflicting request is made
the new request will either override the old pinning (in case that was
a "normal" pinning via xl command) or it will be rejected (in case the
old pinning was done via SCHEDOP_pin_override).


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  6:30     ` Juergen Gross
@ 2019-07-02  7:54       ` Jan Beulich
  2019-07-02  8:14         ` Juergen Gross
  2019-07-02  8:21         ` Dario Faggioli
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2019-07-02  7:54 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 02.07.2019 08:30, Juergen Gross wrote:
> On 01.07.19 17:46, Jan Beulich wrote:
>> On 01.07.2019 17:10, Juergen Gross wrote:
>>> On 01.07.19 16:08, Jan Beulich wrote:
>>>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
>>>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>>>>>         * Set the tmp value unconditionally, so that the check in the iret
>>>>>         * hypercall works.
>>>>>         */
>>>>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>>>>> -                 st->vcpu->cpu_hard_affinity);
>>>>> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
>>>>> +                 st->vcpu->sched_unit->cpu_hard_affinity);
>>>>
>>>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
>>>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
>>>> in the unit, yet every vCPU in there may want to make use of the
>>>> field in parallel.
>>>
>>> Hmm, yes, we'll need a usage bitmask.
>>>
>>> Please note that affecting all vcpus in the unit is per design. With
>>> multiple vcpus of a unit needing this feature in parallel there is no
>>> way they can have different needs regarding temporary affinity.
>>
>> But how will this work? I.e. how will all vCPU-s in a unit get
>> their temporary affinity pointing to the one specific pCPU in question?
> 
> The _unit_ is pinned, so all the vcpus in that unit are pinned, too.

Yes, but ...

>> It's not just the "all at the same time" that I don't see working here,
>> I'm also having trouble seeing how the potential cross-core or cross-
>> node movement that's apparently needed here would end up working. I'm
> 
> The unit is moved to another core via normal scheduling mechanisms. As
> switching context is synchronized (see patch 35) all vcpus of a unit are
> moved together.

... they may get pinned to different pCPU-s or all the same pCPU here.
Both cases need to work, and I'm currently not seeing how that would
be achieved.

>>>>> --- a/xen/common/domain.c
>>>>> +++ b/xen/common/domain.c
>>>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>>>>>    static void vcpu_destroy(struct vcpu *v)
>>>>>    {
>>>>> -    free_cpumask_var(v->cpu_hard_affinity);
>>>>> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
>>>>> -    free_cpumask_var(v->cpu_hard_affinity_saved);
>>>>> -    free_cpumask_var(v->cpu_soft_affinity);
>>>>> -
>>>>>        free_vcpu_struct(v);
>>>>>    }
>>>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>>>>>        grant_table_init_vcpu(v);
>>>>> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>>>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>>>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
>>>>> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
>>>>> -        goto fail;
>>>>
>>>> Seeing these, I'm actually having trouble understanding how you mean
>>>> to retain the user visible interface behavior here: If you only store an
>>>> affinity per sched unit, then how are you meaning to honor the vCPU
>>>> granular requests coming in?
>>>
>>> With core scheduling it is only possible to set (virtual) core
>>> affinities. Whenever an affinity of a vcpu is being set it will set the
>>> affinity of the whole unit.
>>
>> Hmm, that's indeed what I was deducing, but how will we sell this
>> to people actually fiddling with vCPU affinities? I foresee getting
>> bug reports that the respective xl command(s) do(es)n't do anymore
>> what it used to do.
> 
> The new behavior must be documented, sure.

Documentation is just one aspect. Often enough people only read docs
when wanting to introduce new functionality, which I consider a fair
model. Such people will be caught by surprise that the pinning
behavior does not work the same way anymore.

And again - if someone pins every vCPU to a single pCPU, that last
such pinning operation will be what takes long term effect. Aiui all
vCPU-s in the unit will then be pinned to that one pCPU, i.e.
they'll either all compete for the one pCPU's time, or only one of
them will ever get scheduled.

>>>>> --- a/xen/include/xen/sched-if.h
>>>>> +++ b/xen/include/xen/sched-if.h
>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>>>     * * The hard affinity is not a subset of soft affinity
>>>>>     * * There is an overlap between the soft and hard affinity masks
>>>>>     */
>>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>>    {
>>>>> -    return v->soft_aff_effective &&
>>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>>> -                           v->cpu_soft_affinity);
>>>>> +    return unit->soft_aff_effective &&
>>>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>>> +                           unit->cpu_soft_affinity);
>>>>>    }
>>>>
>>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>>> units and vCPU-s. This would - at this point of the series - invalidate most
>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>>>> mapping would get broken, and I can't seem to identify any other patch
>>>> where this might be happening. Looking at the github branch I also get the
>>>> impression that the struct vcpu * pointer out of struct sched_unit survives
>>>> until the end of the series, which doesn't seem right to me.
>>>
>>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>>> vcpu of the unit at the end of the series. Further vcpus are found via
>>> v->next_in_list.
>>
>> I'm afraid this sets us up for misunderstanding and misuse. I don't
>> think there should be a straight struct vcpu * out of struct sched_unit.
> 
> That was the most effective way to do it. What are you suggesting?

An actual list, i.e. with a struct list_head. That'll make obvious that
more than one vCPU might be associated with a unit. That's even more so
that the ability to associate more than one appears only quite late in
the series, i.e. there may be further instances like the code above, and
it would require a careful audit (rather than the compiler finding such
instance) to determine all places where using the first vCPU in a unit
isn't really what was meant.

Once the list approach was enforced, in a second step we might then
discuss whether the list management is too much overhead, and hence
whether perhaps to switch to the simpler model you're using right now.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  7:54       ` Jan Beulich
@ 2019-07-02  8:14         ` Juergen Gross
  2019-07-02  8:27           ` Jan Beulich
  2019-07-02  8:21         ` Dario Faggioli
  1 sibling, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2019-07-02  8:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 02.07.19 09:54, Jan Beulich wrote:
> On 02.07.2019 08:30, Juergen Gross wrote:
>> On 01.07.19 17:46, Jan Beulich wrote:
>>> On 01.07.2019 17:10, Juergen Gross wrote:
>>>> On 01.07.19 16:08, Jan Beulich wrote:
>>>>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
>>>>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>>>>>>          * Set the tmp value unconditionally, so that the check in the iret
>>>>>>          * hypercall works.
>>>>>>          */
>>>>>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>>>>>> -                 st->vcpu->cpu_hard_affinity);
>>>>>> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
>>>>>> +                 st->vcpu->sched_unit->cpu_hard_affinity);
>>>>>
>>>>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
>>>>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
>>>>> in the unit, yet every vCPU in there may want to make use of the
>>>>> field in parallel.
>>>>
>>>> Hmm, yes, we'll need a usage bitmask.
>>>>
>>>> Please note that affecting all vcpus in the unit is per design. With
>>>> multiple vcpus of a unit needing this feature in parallel there is no
>>>> way they can have different needs regarding temporary affinity.
>>>
>>> But how will this work? I.e. how will all vCPU-s in a unit get
>>> their temporary affinity pointing to the one specific pCPU in question?
>>
>> The _unit_ is pinned, so all the vcpus in that unit are pinned, too.
> 
> Yes, but ...
> 
>>> It's not just the "all at the same time" that I don't see working here,
>>> I'm also having trouble seeing how the potential cross-core or cross-
>>> node movement that's apparently needed here would end up working. I'm
>>
>> The unit is moved to another core via normal scheduling mechanisms. As
>> switching context is synchronized (see patch 35) all vcpus of a unit are
>> moved together.
> 
> ... they may get pinned to different pCPU-s or all the same pCPU here.
> Both cases need to work, and I'm currently not seeing how that would
> be achieved.
> 
>>>>>> --- a/xen/common/domain.c
>>>>>> +++ b/xen/common/domain.c
>>>>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>>>>>>     static void vcpu_destroy(struct vcpu *v)
>>>>>>     {
>>>>>> -    free_cpumask_var(v->cpu_hard_affinity);
>>>>>> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
>>>>>> -    free_cpumask_var(v->cpu_hard_affinity_saved);
>>>>>> -    free_cpumask_var(v->cpu_soft_affinity);
>>>>>> -
>>>>>>         free_vcpu_struct(v);
>>>>>>     }
>>>>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>>>>>>         grant_table_init_vcpu(v);
>>>>>> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>>>>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>>>>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
>>>>>> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
>>>>>> -        goto fail;
>>>>>
>>>>> Seeing these, I'm actually having trouble understanding how you mean
>>>>> to retain the user visible interface behavior here: If you only store an
>>>>> affinity per sched unit, then how are you meaning to honor the vCPU
>>>>> granular requests coming in?
>>>>
>>>> With core scheduling it is only possible to set (virtual) core
>>>> affinities. Whenever an affinity of a vcpu is being set it will set the
>>>> affinity of the whole unit.
>>>
>>> Hmm, that's indeed what I was deducing, but how will we sell this
>>> to people actually fiddling with vCPU affinities? I foresee getting
>>> bug reports that the respective xl command(s) do(es)n't do anymore
>>> what it used to do.
>>
>> The new behavior must be documented, sure.
> 
> Documentation is just one aspect. Often enough people only read docs
> when wanting to introduce new functionality, which I consider a fair
> model. Such people will be caught by surprise that the pinning
> behavior does not work the same way anymore.
> 
> And again - if someone pins every vCPU to a single pCPU, that last
> such pinning operation will be what takes long term effect. Aiui all
> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
> they'll either all compete for the one pCPU's time, or only one of
> them will ever get scheduled.

No, that's not how it works. Lets say we have a system with the
following topology and core scheduling active:

cpu0: core 0, thread 0
cpu1: core 0, thread 1
cpu2: core 1, thread 0
cpu3: core 1, thread 1

Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
while any odd numbered vcpu will only run on cpu1 or cpu3.

So virtual cores get scheduled on physical cores. Virtual thread 0 will
only run on physical thread 0 and the associated virtual thread 1 will
run on the associated physical thread 1 of the same physical core.

Pinning a virtual thread 1 to a physical thread 0 is not possible (in
reality only the virtual core is being pinned to the physical core).


Juergen

> 
>>>>>> --- a/xen/include/xen/sched-if.h
>>>>>> +++ b/xen/include/xen/sched-if.h
>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>>>>      * * The hard affinity is not a subset of soft affinity
>>>>>>      * * There is an overlap between the soft and hard affinity masks
>>>>>>      */
>>>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>>>     {
>>>>>> -    return v->soft_aff_effective &&
>>>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>>>> -                           v->cpu_soft_affinity);
>>>>>> +    return unit->soft_aff_effective &&
>>>>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>>>> +                           unit->cpu_soft_affinity);
>>>>>>     }
>>>>>
>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>>>> units and vCPU-s. This would - at this point of the series - invalidate most
>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>>>>> mapping would get broken, and I can't seem to identify any other patch
>>>>> where this might be happening. Looking at the github branch I also get the
>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives
>>>>> until the end of the series, which doesn't seem right to me.
>>>>
>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>>>> vcpu of the unit at the end of the series. Further vcpus are found via
>>>> v->next_in_list.
>>>
>>> I'm afraid this sets us up for misunderstanding and misuse. I don't
>>> think there should be a straight struct vcpu * out of struct sched_unit.
>>
>> That was the most effective way to do it. What are you suggesting?
> 
> An actual list, i.e. with a struct list_head. That'll make obvious that
> more than one vCPU might be associated with a unit. That's even more so
> that the ability to associate more than one appears only quite late in
> the series, i.e. there may be further instances like the code above, and
> it would require a careful audit (rather than the compiler finding such
> instance) to determine all places where using the first vCPU in a unit
> isn't really what was meant.

TBH I don't see how this would help at all.

Instead of using the vcpu pointer I'd had to use the list head instead.
Why is that different to a plain pointer regarding finding the places
where using the first vcpu was wrong?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  7:54       ` Jan Beulich
  2019-07-02  8:14         ` Juergen Gross
@ 2019-07-02  8:21         ` Dario Faggioli
  2019-07-02  8:29           ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2019-07-02  8:21 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel,
	Ian Jackson, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1647 bytes --]

On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote:
> On 02.07.2019 08:30, Juergen Gross wrote:
> > On 01.07.19 17:46, Jan Beulich wrote:
> > > 
> > > Hmm, that's indeed what I was deducing, but how will we sell this
> > > to people actually fiddling with vCPU affinities? I foresee
> > > getting
> > > bug reports that the respective xl command(s) do(es)n't do
> > > anymore
> > > what it used to do.
> > 
> > The new behavior must be documented, sure.
> 
> Documentation is just one aspect. Often enough people only read docs
> when wanting to introduce new functionality, which I consider a fair
> model. Such people will be caught by surprise that the pinning
> behavior does not work the same way anymore.
> 
That is indeed the case, and we need to think about how to address it,
I agree.

> And again - if someone pins every vCPU to a single pCPU, that last
> such pinning operation will be what takes long term effect. Aiui all
> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
> they'll either all compete for the one pCPU's time, or only one of
> them will ever get scheduled.
> 
I'm not sure I'm getting this. On an, say, SMT system, with 4 threads
per core, a unit is 4 vCPUs and a pCPU is 4 threads.

If we pin all the 4 vCPUs of a unit to one 4 thread pCPU, each vCPU
will get a thread.

Isn't it so?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  8:14         ` Juergen Gross
@ 2019-07-02  8:27           ` Jan Beulich
  2019-07-02  8:44             ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2019-07-02  8:27 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 02.07.2019 10:14, Juergen Gross wrote:
> On 02.07.19 09:54, Jan Beulich wrote:
>> And again - if someone pins every vCPU to a single pCPU, that last
>> such pinning operation will be what takes long term effect. Aiui all
>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>> they'll either all compete for the one pCPU's time, or only one of
>> them will ever get scheduled.
> 
> No, that's not how it works. Lets say we have a system with the
> following topology and core scheduling active:
> 
> cpu0: core 0, thread 0
> cpu1: core 0, thread 1
> cpu2: core 1, thread 0
> cpu3: core 1, thread 1
> 
> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
> while any odd numbered vcpu will only run on cpu1 or cpu3.
> 
> So virtual cores get scheduled on physical cores. Virtual thread 0 will
> only run on physical thread 0 and the associated virtual thread 1 will
> run on the associated physical thread 1 of the same physical core.
> 
> Pinning a virtual thread 1 to a physical thread 0 is not possible (in
> reality only the virtual core is being pinned to the physical core).

But that's what existing guests may be doing. You may want to
take a look at our old, non-pvops kernels, in particular the
functionality provided by their drivers/xen/core/domctl.c. Yes,
{sys,dom}ctl-s aren't supposed to be used by the kernel, but to
achieve the intended effects I saw no way around (ab)using them.
(I mean this to be taken as an example only - I realize that the
code there wouldn't work on modern Xen without updating, due to
the sysctl/domctl interface version that needs setting.)

>>>>>>> --- a/xen/include/xen/sched-if.h
>>>>>>> +++ b/xen/include/xen/sched-if.h
>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>>>>>      * * The hard affinity is not a subset of soft affinity
>>>>>>>      * * There is an overlap between the soft and hard affinity masks
>>>>>>>      */
>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>>>>     {
>>>>>>> -    return v->soft_aff_effective &&
>>>>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>>>>> -                           v->cpu_soft_affinity);
>>>>>>> +    return unit->soft_aff_effective &&
>>>>>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>>>>> +                           unit->cpu_soft_affinity);
>>>>>>>     }
>>>>>>
>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>>>>> units and vCPU-s. This would - at this point of the series - invalidate most
>>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>>>>>> mapping would get broken, and I can't seem to identify any other patch
>>>>>> where this might be happening. Looking at the github branch I also get the
>>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives
>>>>>> until the end of the series, which doesn't seem right to me.
>>>>>
>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>>>>> vcpu of the unit at the end of the series. Further vcpus are found via
>>>>> v->next_in_list.
>>>>
>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't
>>>> think there should be a straight struct vcpu * out of struct sched_unit.
>>>
>>> That was the most effective way to do it. What are you suggesting?
>>
>> An actual list, i.e. with a struct list_head. That'll make obvious that
>> more than one vCPU might be associated with a unit. That's even more so
>> that the ability to associate more than one appears only quite late in
>> the series, i.e. there may be further instances like the code above, and
>> it would require a careful audit (rather than the compiler finding such
>> instance) to determine all places where using the first vCPU in a unit
>> isn't really what was meant.
> 
> TBH I don't see how this would help at all.
> 
> Instead of using the vcpu pointer I'd had to use the list head instead.
> Why is that different to a plain pointer regarding finding the places
> where using the first vcpu was wrong?

Take the example above: Is it correct to act on just the first vCPU?
I guess _here_ it is, but the same pattern could be found elsewhere.
If, from the beginning, you use a clearly identifiable list construct,
then it'll be obvious to you as the writer and to reviewers that by
the end of the series there may be multiple entities that need dealing
with - we'd see list_first*() or for_each*() constructs right away
(and you wouldn't be able to circumvent their use in a way that
wouldn't trigger "don't open-code" comments).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  8:21         ` Dario Faggioli
@ 2019-07-02  8:29           ` Jan Beulich
  2019-07-02  9:40             ` Dario Faggioli
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2019-07-02  8:29 UTC (permalink / raw)
  To: Dario Faggioli, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel,
	Ian Jackson, Roger Pau Monne

On 02.07.2019 10:21, Dario Faggioli wrote:
> On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote:
>> On 02.07.2019 08:30, Juergen Gross wrote:
>>> On 01.07.19 17:46, Jan Beulich wrote:
>>>>
>>>> Hmm, that's indeed what I was deducing, but how will we sell this
>>>> to people actually fiddling with vCPU affinities? I foresee
>>>> getting
>>>> bug reports that the respective xl command(s) do(es)n't do
>>>> anymore
>>>> what it used to do.
>>>
>>> The new behavior must be documented, sure.
>>
>> Documentation is just one aspect. Often enough people only read docs
>> when wanting to introduce new functionality, which I consider a fair
>> model. Such people will be caught by surprise that the pinning
>> behavior does not work the same way anymore.
>>
> That is indeed the case, and we need to think about how to address it,
> I agree.
> 
>> And again - if someone pins every vCPU to a single pCPU, that last
>> such pinning operation will be what takes long term effect. Aiui all
>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>> they'll either all compete for the one pCPU's time, or only one of
>> them will ever get scheduled.
>>
> I'm not sure I'm getting this. On an, say, SMT system, with 4 threads
> per core, a unit is 4 vCPUs and a pCPU is 4 threads.

No, the meaning of pCPU is a single thread of a single core. I.e.
what is represented by a single cpumask_t bit.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  8:27           ` Jan Beulich
@ 2019-07-02  8:44             ` Juergen Gross
  2019-07-02  9:05               ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2019-07-02  8:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 02.07.19 10:27, Jan Beulich wrote:
> On 02.07.2019 10:14, Juergen Gross wrote:
>> On 02.07.19 09:54, Jan Beulich wrote:
>>> And again - if someone pins every vCPU to a single pCPU, that last
>>> such pinning operation will be what takes long term effect. Aiui all
>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>>> they'll either all compete for the one pCPU's time, or only one of
>>> them will ever get scheduled.
>>
>> No, that's not how it works. Lets say we have a system with the
>> following topology and core scheduling active:
>>
>> cpu0: core 0, thread 0
>> cpu1: core 0, thread 1
>> cpu2: core 1, thread 0
>> cpu3: core 1, thread 1
>>
>> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
>> while any odd numbered vcpu will only run on cpu1 or cpu3.
>>
>> So virtual cores get scheduled on physical cores. Virtual thread 0 will
>> only run on physical thread 0 and the associated virtual thread 1 will
>> run on the associated physical thread 1 of the same physical core.
>>
>> Pinning a virtual thread 1 to a physical thread 0 is not possible (in
>> reality only the virtual core is being pinned to the physical core).
> 
> But that's what existing guests may be doing. You may want to
> take a look at our old, non-pvops kernels, in particular the
> functionality provided by their drivers/xen/core/domctl.c. Yes,
> {sys,dom}ctl-s aren't supposed to be used by the kernel, but to
> achieve the intended effects I saw no way around (ab)using them.
> (I mean this to be taken as an example only - I realize that the
> code there wouldn't work on modern Xen without updating, due to
> the sysctl/domctl interface version that needs setting.)

First - speaking of "guests" is a little bit misleading here. This is
restricted to dom0.

So when you want to use such a dom0 kernel with Xen 4.13 or later you'd
need to stay with cpu scheduling. Any recent kernel will run just fine
as dom0 with core scheduling active.

> 
>>>>>>>> --- a/xen/include/xen/sched-if.h
>>>>>>>> +++ b/xen/include/xen/sched-if.h
>>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>>>>>>       * * The hard affinity is not a subset of soft affinity
>>>>>>>>       * * There is an overlap between the soft and hard affinity masks
>>>>>>>>       */
>>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>>>>>      {
>>>>>>>> -    return v->soft_aff_effective &&
>>>>>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>>>>>> -                           v->cpu_soft_affinity);
>>>>>>>> +    return unit->soft_aff_effective &&
>>>>>>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>>>>>> +                           unit->cpu_soft_affinity);
>>>>>>>>      }
>>>>>>>
>>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>>>>>> units and vCPU-s. This would - at this point of the series - invalidate most
>>>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>>>>>>> mapping would get broken, and I can't seem to identify any other patch
>>>>>>> where this might be happening. Looking at the github branch I also get the
>>>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives
>>>>>>> until the end of the series, which doesn't seem right to me.
>>>>>>
>>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>>>>>> vcpu of the unit at the end of the series. Further vcpus are found via
>>>>>> v->next_in_list.
>>>>>
>>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't
>>>>> think there should be a straight struct vcpu * out of struct sched_unit.
>>>>
>>>> That was the most effective way to do it. What are you suggesting?
>>>
>>> An actual list, i.e. with a struct list_head. That'll make obvious that
>>> more than one vCPU might be associated with a unit. That's even more so
>>> that the ability to associate more than one appears only quite late in
>>> the series, i.e. there may be further instances like the code above, and
>>> it would require a careful audit (rather than the compiler finding such
>>> instance) to determine all places where using the first vCPU in a unit
>>> isn't really what was meant.
>>
>> TBH I don't see how this would help at all.
>>
>> Instead of using the vcpu pointer I'd had to use the list head instead.
>> Why is that different to a plain pointer regarding finding the places
>> where using the first vcpu was wrong?
> 
> Take the example above: Is it correct to act on just the first vCPU?
> I guess _here_ it is, but the same pattern could be found elsewhere.
> If, from the beginning, you use a clearly identifiable list construct,
> then it'll be obvious to you as the writer and to reviewers that by
> the end of the series there may be multiple entities that need dealing
> with - we'd see list_first*() or for_each*() constructs right away
> (and you wouldn't be able to circumvent their use in a way that
> wouldn't trigger "don't open-code" comments).

Would you be fine with just renaming the pointer to "vcpu_list"?
This would avoid the need to introduce another vcpu list in struct vcpu
and I could re-use the already existing list as today.

It should be noted that I named the pointer just "vcpu" in order to
avoid lots of reformatting due to longer lines, though.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  8:44             ` Juergen Gross
@ 2019-07-02  9:05               ` Jan Beulich
  2019-07-02  9:16                 ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2019-07-02  9:05 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 02.07.2019 10:44, Juergen Gross wrote:
> On 02.07.19 10:27, Jan Beulich wrote:
>> On 02.07.2019 10:14, Juergen Gross wrote:
>>> On 02.07.19 09:54, Jan Beulich wrote:
>>>> And again - if someone pins every vCPU to a single pCPU, that last
>>>> such pinning operation will be what takes long term effect. Aiui all
>>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>>>> they'll either all compete for the one pCPU's time, or only one of
>>>> them will ever get scheduled.
>>>
>>> No, that's not how it works. Lets say we have a system with the
>>> following topology and core scheduling active:
>>>
>>> cpu0: core 0, thread 0
>>> cpu1: core 0, thread 1
>>> cpu2: core 1, thread 0
>>> cpu3: core 1, thread 1
>>>
>>> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
>>> while any odd numbered vcpu will only run on cpu1 or cpu3.
>>>
>>> So virtual cores get scheduled on physical cores. Virtual thread 0 will
>>> only run on physical thread 0 and the associated virtual thread 1 will
>>> run on the associated physical thread 1 of the same physical core.
>>>
>>> Pinning a virtual thread 1 to a physical thread 0 is not possible (in
>>> reality only the virtual core is being pinned to the physical core).
>>
>> But that's what existing guests may be doing. You may want to
>> take a look at our old, non-pvops kernels, in particular the
>> functionality provided by their drivers/xen/core/domctl.c. Yes,
>> {sys,dom}ctl-s aren't supposed to be used by the kernel, but to
>> achieve the intended effects I saw no way around (ab)using them.
>> (I mean this to be taken as an example only - I realize that the
>> code there wouldn't work on modern Xen without updating, due to
>> the sysctl/domctl interface version that needs setting.)
> 
> First - speaking of "guests" is a little bit misleading here. This is
> restricted to dom0.
> 
> So when you want to use such a dom0 kernel with Xen 4.13 or later you'd
> need to stay with cpu scheduling. Any recent kernel will run just fine
> as dom0 with core scheduling active.

Right, but such recent kernels have (afaict) no solution to some of
the problems that the (ab)use of the {sys,dom}ctl-s were meant to
address.

>>>>>>>>> --- a/xen/include/xen/sched-if.h
>>>>>>>>> +++ b/xen/include/xen/sched-if.h
>>>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>>>>>>>       * * The hard affinity is not a subset of soft affinity
>>>>>>>>>       * * There is an overlap between the soft and hard affinity masks
>>>>>>>>>       */
>>>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>>>>>>      {
>>>>>>>>> -    return v->soft_aff_effective &&
>>>>>>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>>>>>>> -                           v->cpu_soft_affinity);
>>>>>>>>> +    return unit->soft_aff_effective &&
>>>>>>>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>>>>>>> +                           unit->cpu_soft_affinity);
>>>>>>>>>      }
>>>>>>>>
>>>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>>>>>>> units and vCPU-s. This would - at this point of the series - invalidate most
>>>>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>>>>>>>> mapping would get broken, and I can't seem to identify any other patch
>>>>>>>> where this might be happening. Looking at the github branch I also get the
>>>>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives
>>>>>>>> until the end of the series, which doesn't seem right to me.
>>>>>>>
>>>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>>>>>>> vcpu of the unit at the end of the series. Further vcpus are found via
>>>>>>> v->next_in_list.
>>>>>>
>>>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't
>>>>>> think there should be a straight struct vcpu * out of struct sched_unit.
>>>>>
>>>>> That was the most effective way to do it. What are you suggesting?
>>>>
>>>> An actual list, i.e. with a struct list_head. That'll make obvious that
>>>> more than one vCPU might be associated with a unit. That's even more so
>>>> that the ability to associate more than one appears only quite late in
>>>> the series, i.e. there may be further instances like the code above, and
>>>> it would require a careful audit (rather than the compiler finding such
>>>> instance) to determine all places where using the first vCPU in a unit
>>>> isn't really what was meant.
>>>
>>> TBH I don't see how this would help at all.
>>>
>>> Instead of using the vcpu pointer I'd had to use the list head instead.
>>> Why is that different to a plain pointer regarding finding the places
>>> where using the first vcpu was wrong?
>>
>> Take the example above: Is it correct to act on just the first vCPU?
>> I guess _here_ it is, but the same pattern could be found elsewhere.
>> If, from the beginning, you use a clearly identifiable list construct,
>> then it'll be obvious to you as the writer and to reviewers that by
>> the end of the series there may be multiple entities that need dealing
>> with - we'd see list_first*() or for_each*() constructs right away
>> (and you wouldn't be able to circumvent their use in a way that
>> wouldn't trigger "don't open-code" comments).
> 
> Would you be fine with just renaming the pointer to "vcpu_list"?
> This would avoid the need to introduce another vcpu list in struct vcpu
> and I could re-use the already existing list as today.

Well, yes, this would at least make obvious at use sites that there
may be more than one of them.

> It should be noted that I named the pointer just "vcpu" in order to
> avoid lots of reformatting due to longer lines, though.

I can appreciate this aspect, but as said I'm afraid it would be
misleading (and potentially inviting for mis-use).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  9:05               ` Jan Beulich
@ 2019-07-02  9:16                 ` Juergen Gross
  0 siblings, 0 replies; 21+ messages in thread
From: Juergen Gross @ 2019-07-02  9:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu,
	xen-devel, Ian Jackson, Roger Pau Monne

On 02.07.19 11:05, Jan Beulich wrote:
> On 02.07.2019 10:44, Juergen Gross wrote:
>> On 02.07.19 10:27, Jan Beulich wrote:
>>> On 02.07.2019 10:14, Juergen Gross wrote:
>>>> On 02.07.19 09:54, Jan Beulich wrote:
>>>>> And again - if someone pins every vCPU to a single pCPU, that last
>>>>> such pinning operation will be what takes long term effect. Aiui all
>>>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>>>>> they'll either all compete for the one pCPU's time, or only one of
>>>>> them will ever get scheduled.
>>>>
>>>> No, that's not how it works. Lets say we have a system with the
>>>> following topology and core scheduling active:
>>>>
>>>> cpu0: core 0, thread 0
>>>> cpu1: core 0, thread 1
>>>> cpu2: core 1, thread 0
>>>> cpu3: core 1, thread 1
>>>>
>>>> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
>>>> while any odd numbered vcpu will only run on cpu1 or cpu3.
>>>>
>>>> So virtual cores get scheduled on physical cores. Virtual thread 0 will
>>>> only run on physical thread 0 and the associated virtual thread 1 will
>>>> run on the associated physical thread 1 of the same physical core.
>>>>
>>>> Pinning a virtual thread 1 to a physical thread 0 is not possible (in
>>>> reality only the virtual core is being pinned to the physical core).
>>>
>>> But that's what existing guests may be doing. You may want to
>>> take a look at our old, non-pvops kernels, in particular the
>>> functionality provided by their drivers/xen/core/domctl.c. Yes,
>>> {sys,dom}ctl-s aren't supposed to be used by the kernel, but to
>>> achieve the intended effects I saw no way around (ab)using them.
>>> (I mean this to be taken as an example only - I realize that the
>>> code there wouldn't work on modern Xen without updating, due to
>>> the sysctl/domctl interface version that needs setting.)
>>
>> First - speaking of "guests" is a little bit misleading here. This is
>> restricted to dom0.
>>
>> So when you want to use such a dom0 kernel with Xen 4.13 or later you'd
>> need to stay with cpu scheduling. Any recent kernel will run just fine
>> as dom0 with core scheduling active.
> 
> Right, but such recent kernels have (afaict) no solution to some of
> the problems that the (ab)use of the {sys,dom}ctl-s were meant to
> address.

With SCHEDOP_pin_override this should all be doable. The kernel would
need to execute the SCHEDOP_pin_override code on a suitable vcpu
(ideally with vcpu-id == pcpu-id).

I'd prefer new hypervisor interfaces for that purpose, though.

> 
>>>>>>>>>> --- a/xen/include/xen/sched-if.h
>>>>>>>>>> +++ b/xen/include/xen/sched-if.h
>>>>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>>>>>>>>        * * The hard affinity is not a subset of soft affinity
>>>>>>>>>>        * * There is an overlap between the soft and hard affinity masks
>>>>>>>>>>        */
>>>>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>>>>>>>       {
>>>>>>>>>> -    return v->soft_aff_effective &&
>>>>>>>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>>>>>>>> -                           v->cpu_soft_affinity);
>>>>>>>>>> +    return unit->soft_aff_effective &&
>>>>>>>>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>>>>>>>> +                           unit->cpu_soft_affinity);
>>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>>>>>>>> units and vCPU-s. This would - at this point of the series - invalidate most
>>>>>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>>>>>>>>> mapping would get broken, and I can't seem to identify any other patch
>>>>>>>>> where this might be happening. Looking at the github branch I also get the
>>>>>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives
>>>>>>>>> until the end of the series, which doesn't seem right to me.
>>>>>>>>
>>>>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>>>>>>>> vcpu of the unit at the end of the series. Further vcpus are found via
>>>>>>>> v->next_in_list.
>>>>>>>
>>>>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't
>>>>>>> think there should be a straight struct vcpu * out of struct sched_unit.
>>>>>>
>>>>>> That was the most effective way to do it. What are you suggesting?
>>>>>
>>>>> An actual list, i.e. with a struct list_head. That'll make obvious that
>>>>> more than one vCPU might be associated with a unit. That's even more so
>>>>> that the ability to associate more than one appears only quite late in
>>>>> the series, i.e. there may be further instances like the code above, and
>>>>> it would require a careful audit (rather than the compiler finding such
>>>>> instance) to determine all places where using the first vCPU in a unit
>>>>> isn't really what was meant.
>>>>
>>>> TBH I don't see how this would help at all.
>>>>
>>>> Instead of using the vcpu pointer I'd had to use the list head instead.
>>>> Why is that different to a plain pointer regarding finding the places
>>>> where using the first vcpu was wrong?
>>>
>>> Take the example above: Is it correct to act on just the first vCPU?
>>> I guess _here_ it is, but the same pattern could be found elsewhere.
>>> If, from the beginning, you use a clearly identifiable list construct,
>>> then it'll be obvious to you as the writer and to reviewers that by
>>> the end of the series there may be multiple entities that need dealing
>>> with - we'd see list_first*() or for_each*() constructs right away
>>> (and you wouldn't be able to circumvent their use in a way that
>>> wouldn't trigger "don't open-code" comments).
>>
>> Would you be fine with just renaming the pointer to "vcpu_list"?
>> This would avoid the need to introduce another vcpu list in struct vcpu
>> and I could re-use the already existing list as today.
> 
> Well, yes, this would at least make obvious at use sites that there
> may be more than one of them.

Thanks.

> 
>> It should be noted that I named the pointer just "vcpu" in order to
>> avoid lots of reformatting due to longer lines, though.
> 
> I can appreciate this aspect, but as said I'm afraid it would be
> misleading (and potentially inviting for mis-use).

Okay, lets live with some more multi-line statements then.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  8:29           ` Jan Beulich
@ 2019-07-02  9:40             ` Dario Faggioli
  2019-07-02 10:01               ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2019-07-02  9:40 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel,
	Ian Jackson, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1411 bytes --]

On Tue, 2019-07-02 at 08:29 +0000, Jan Beulich wrote:
> On 02.07.2019 10:21, Dario Faggioli wrote:
> > On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote:
> > > 
> > > And again - if someone pins every vCPU to a single pCPU, that
> > > last
> > > such pinning operation will be what takes long term effect. Aiui
> > > all
> > > vCPU-s in the unit will then be pinned to that one pCPU, i.e.
> > > they'll either all compete for the one pCPU's time, or only one
> > > of
> > > them will ever get scheduled.
> > > 
> > I'm not sure I'm getting this. On an, say, SMT system, with 4
> > threads
> > per core, a unit is 4 vCPUs and a pCPU is 4 threads.
> 
> No, the meaning of pCPU is a single thread of a single core. I.e.
> what is represented by a single cpumask_t bit.
>
Fine, let's continue to call that a pCPU. Then, when core-scheduling is
enabled, there's no <<multiple vCPUs of a unit being pinned to the same
pCPU and all competing for jut its CPU time>>.

There's units of 4 vCPUs being pinned to 4 pCPUs (the 4 pCPUs of a
core, not 4 random, nor arbitrary, ones).

That is the point, AFAIUI.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02  9:40             ` Dario Faggioli
@ 2019-07-02 10:01               ` Jan Beulich
  2019-07-02 10:25                 ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2019-07-02 10:01 UTC (permalink / raw)
  To: Dario Faggioli, Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel,
	Ian Jackson, Roger Pau Monne

On 02.07.2019 11:40, Dario Faggioli wrote:
> On Tue, 2019-07-02 at 08:29 +0000, Jan Beulich wrote:
>> On 02.07.2019 10:21, Dario Faggioli wrote:
>>> On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote:
>>>>
>>>> And again - if someone pins every vCPU to a single pCPU, that
>>>> last
>>>> such pinning operation will be what takes long term effect. Aiui
>>>> all
>>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>>>> they'll either all compete for the one pCPU's time, or only one
>>>> of
>>>> them will ever get scheduled.
>>>>
>>> I'm not sure I'm getting this. On an, say, SMT system, with 4
>>> threads
>>> per core, a unit is 4 vCPUs and a pCPU is 4 threads.
>>
>> No, the meaning of pCPU is a single thread of a single core. I.e.
>> what is represented by a single cpumask_t bit.
>>
> Fine, let's continue to call that a pCPU. Then, when core-scheduling is
> enabled, there's no <<multiple vCPUs of a unit being pinned to the same
> pCPU and all competing for jut its CPU time>>.
> 
> There's units of 4 vCPUs being pinned to 4 pCPUs (the 4 pCPUs of a
> core, not 4 random, nor arbitrary, ones).
> 
> That is the point, AFAIUI.

Well, okay, quite possible. But then for the example topology
you gave, what is going to be the effect of

xl vcpu-pin 0 0 0

? In particular for Dom0 and in particular for CPU 0, there may
be reasons to pin a particular vCPU to it.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-07-02 10:01               ` Jan Beulich
@ 2019-07-02 10:25                 ` Juergen Gross
  0 siblings, 0 replies; 21+ messages in thread
From: Juergen Gross @ 2019-07-02 10:25 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel,
	Ian Jackson, Roger Pau Monne

On 02.07.19 12:01, Jan Beulich wrote:
> On 02.07.2019 11:40, Dario Faggioli wrote:
>> On Tue, 2019-07-02 at 08:29 +0000, Jan Beulich wrote:
>>> On 02.07.2019 10:21, Dario Faggioli wrote:
>>>> On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote:
>>>>>
>>>>> And again - if someone pins every vCPU to a single pCPU, that
>>>>> last
>>>>> such pinning operation will be what takes long term effect. Aiui
>>>>> all
>>>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>>>>> they'll either all compete for the one pCPU's time, or only one
>>>>> of
>>>>> them will ever get scheduled.
>>>>>
>>>> I'm not sure I'm getting this. On an, say, SMT system, with 4
>>>> threads
>>>> per core, a unit is 4 vCPUs and a pCPU is 4 threads.
>>>
>>> No, the meaning of pCPU is a single thread of a single core. I.e.
>>> what is represented by a single cpumask_t bit.
>>>
>> Fine, let's continue to call that a pCPU. Then, when core-scheduling is
>> enabled, there's no <<multiple vCPUs of a unit being pinned to the same
>> pCPU and all competing for jut its CPU time>>.
>>
>> There's units of 4 vCPUs being pinned to 4 pCPUs (the 4 pCPUs of a
>> core, not 4 random, nor arbitrary, ones).
>>
>> That is the point, AFAIUI.
> 
> Well, okay, quite possible. But then for the example topology
> you gave, what is going to be the effect of
> 
> xl vcpu-pin 0 0 0

dom0 vcpu0 will be pinned to cpu0.
dom0 vcpu1 will be pinned to cpu1.
dom0 vcpu2 will be pinned to cpu2.
dom0 vcpu3 will be pinned to cpu3.

> ? In particular for Dom0 and in particular for CPU 0, there may
> be reasons to pin a particular vCPU to it.

In reality only pinning vcpu0 to cpu0 should be needed. And this is done
e.g. in dcdbas_smi_request().


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  8:39         ` Juergen Gross
@ 2019-06-13  8:49           ` Andrii Anisov
  0 siblings, 0 replies; 21+ messages in thread
From: Andrii Anisov @ 2019-06-13  8:49 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, xen-devel, Dario Faggioli,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 278 bytes --]

чт, 13 черв. 2019 о 11:39 Juergen Gross <jgross@suse.com> пише:

> github.com/jgross1/xen sched-v1-rebase
>
> Only compile tested on x86 up to now, but rebase was rather easy.
>

Cool, will take it and check for ARM.
Thank you.

Sincerely,
Andrii Anisov.

[-- Attachment #1.2: Type: text/html, Size: 809 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  7:34       ` Andrii Anisov
@ 2019-06-13  8:39         ` Juergen Gross
  2019-06-13  8:49           ` Andrii Anisov
  0 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2019-06-13  8:39 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

On 13.06.19 09:34, Andrii Anisov wrote:
> 
> 
> On 13.06.19 10:29, Juergen Gross wrote:
>> Thanks for the heads up, but I've rebased already. :-)
> 
> Oh, great. I'm just wondering if you put it already on your github?

github.com/jgross1/xen sched-v1-rebase

Only compile tested on x86 up to now, but rebase was rather easy.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  7:29     ` Juergen Gross
@ 2019-06-13  7:34       ` Andrii Anisov
  2019-06-13  8:39         ` Juergen Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Andrii Anisov @ 2019-06-13  7:34 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné



On 13.06.19 10:29, Juergen Gross wrote:
> Thanks for the heads up, but I've rebased already. :-)

Oh, great. I'm just wondering if you put it already on your github?
I'm playing with scheduling on my site, and I have a strong feeling I should be based on your series ;)

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-06-13  7:18   ` Andrii Anisov
@ 2019-06-13  7:29     ` Juergen Gross
  2019-06-13  7:34       ` Andrii Anisov
  0 siblings, 1 reply; 21+ messages in thread
From: Juergen Gross @ 2019-06-13  7:29 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Hi Andrii,

On 13.06.19 09:18, Andrii Anisov wrote:
> Hello Juergen,
> 
> Please note that this patch will clash with [1].
> 
> On 28.05.19 13:32, Juergen Gross wrote:
>> vcpu->last_run_time is primarily used by sched_credit, so move it to
>> struct sched_unit, too.
> 
> `last_run_time` is moved to credit privates as for current staging.

Thanks for the heads up, but I've rebased already. :-)


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-05-28 10:32 ` [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Juergen Gross
  2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
@ 2019-06-13  7:18   ` Andrii Anisov
  2019-06-13  7:29     ` Juergen Gross
  1 sibling, 1 reply; 21+ messages in thread
From: Andrii Anisov @ 2019-06-13  7:18 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Hello Juergen,

Please note that this patch will clash with [1].

On 28.05.19 13:32, Juergen Gross wrote:
> vcpu->last_run_time is primarily used by sched_credit, so move it to
> struct sched_unit, too.

`last_run_time` is moved to credit privates as for current staging.


[1] https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=608639ffa0a0d6f219e14ba7397ab2cc018b93c9

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
  2019-05-28 10:32 ` [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Juergen Gross
@ 2019-05-28 10:32   ` Juergen Gross
  2019-06-13  7:18   ` Andrii Anisov
  1 sibling, 0 replies; 21+ messages in thread
From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Affinities are scheduler specific attributes, they should be per
scheduling unit. So move all affinity related fields in struct vcpu
to struct sched_unit. While at it switch affinity related functions in
sched-if.h to use a pointer to sched_unit instead to vcpu as parameter.

vcpu->last_run_time is primarily used by sched_credit, so move it to
struct sched_unit, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/pv/emul-priv-op.c |   1 +
 xen/arch/x86/pv/traps.c        |   5 +-
 xen/arch/x86/traps.c           |   9 ++--
 xen/common/domain.c            |  19 ++-----
 xen/common/domctl.c            |  13 +++--
 xen/common/keyhandler.c        |   4 +-
 xen/common/sched_credit.c      |  20 ++++----
 xen/common/sched_credit2.c     |  42 ++++++++--------
 xen/common/sched_null.c        |  16 +++---
 xen/common/sched_rt.c          |   9 ++--
 xen/common/schedule.c          | 110 ++++++++++++++++++++++++-----------------
 xen/common/wait.c              |   4 +-
 xen/include/xen/sched-if.h     |  17 ++++---
 xen/include/xen/sched.h        |  36 +++++++-------
 14 files changed, 163 insertions(+), 142 deletions(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index b20d79c7a3..a7a24a2053 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -23,6 +23,7 @@
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/iocap.h>
+#include <xen/sched.h>
 #include <xen/spinlock.h>
 #include <xen/trace.h>
 
diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 1740784ff2..419abc3d95 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -22,6 +22,7 @@
 #include <xen/event.h>
 #include <xen/hypercall.h>
 #include <xen/lib.h>
+#include <xen/sched.h>
 #include <xen/trace.h>
 #include <xen/softirq.h>
 
@@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
      * Set the tmp value unconditionally, so that the check in the iret
      * hypercall works.
      */
-    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
-                 st->vcpu->cpu_hard_affinity);
+    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
+                 st->vcpu->sched_unit->cpu_hard_affinity);
 
     if ( (cpu != st->processor) ||
          (st->processor != st->vcpu->processor) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 05ddc39bfe..c3b39d6296 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1594,16 +1594,17 @@ static void pci_serr_softirq(void)
 void async_exception_cleanup(struct vcpu *curr)
 {
     int trap;
+    struct sched_unit *unit = curr->sched_unit;
 
     if ( !curr->async_exception_mask )
         return;
 
     /* Restore affinity.  */
-    if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
-         !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) )
+    if ( !cpumask_empty(unit->cpu_hard_affinity_tmp) &&
+         !cpumask_equal(unit->cpu_hard_affinity_tmp, unit->cpu_hard_affinity) )
     {
-        vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp);
-        cpumask_clear(curr->cpu_hard_affinity_tmp);
+        vcpu_set_hard_affinity(curr, unit->cpu_hard_affinity_tmp);
+        cpumask_clear(unit->cpu_hard_affinity_tmp);
     }
 
     if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 90c66079f9..c200e9024f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
 
 static void vcpu_destroy(struct vcpu *v)
 {
-    free_cpumask_var(v->cpu_hard_affinity);
-    free_cpumask_var(v->cpu_hard_affinity_tmp);
-    free_cpumask_var(v->cpu_hard_affinity_saved);
-    free_cpumask_var(v->cpu_soft_affinity);
-
     free_vcpu_struct(v);
 }
 
@@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
 
     grant_table_init_vcpu(v);
 
-    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
-        goto fail;
-
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
@@ -198,7 +187,6 @@ struct vcpu *vcpu_create(
     sched_destroy_vcpu(v);
  fail_wq:
     destroy_waitqueue_vcpu(v);
- fail:
     vcpu_destroy(v);
 
     return NULL;
@@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d)
          */
         for_each_vcpu ( d, v )
         {
-            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
+            cpumask_or(dom_cpumask, dom_cpumask,
+                       v->sched_unit->cpu_hard_affinity);
             cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-                       v->cpu_soft_affinity);
+                       v->sched_unit->cpu_soft_affinity);
         }
         /* Filter out non-online cpus */
         cpumask_and(dom_cpumask, dom_cpumask, online);
@@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
-    cpumask_clear(v->cpu_hard_affinity_tmp);
+    cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index bade9a63b1..bc986d131d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     case XEN_DOMCTL_getvcpuaffinity:
     {
         struct vcpu *v;
+        struct sched_unit *unit;
         struct xen_domctl_vcpuaffinity *vcpuaff = &op->u.vcpuaffinity;
 
         ret = -EINVAL;
@@ -624,6 +625,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         if ( (v = d->vcpu[vcpuaff->vcpu]) == NULL )
             break;
 
+        unit = v->sched_unit;
         ret = -EINVAL;
         if ( vcpuaffinity_params_invalid(vcpuaff) )
             break;
@@ -643,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                 ret = -ENOMEM;
                 break;
             }
-            cpumask_copy(old_affinity, v->cpu_hard_affinity);
+            cpumask_copy(old_affinity, unit->cpu_hard_affinity);
 
             if ( !alloc_cpumask_var(&new_affinity) )
             {
@@ -676,7 +678,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * For hard affinity, what we return is the intersection of
                  * cpupool's online mask and the new hard affinity.
                  */
-                cpumask_and(new_affinity, online, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, online, unit->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
                                                new_affinity);
             }
@@ -705,7 +707,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                  * hard affinity.
                  */
                 cpumask_and(new_affinity, new_affinity, online);
-                cpumask_and(new_affinity, new_affinity, v->cpu_hard_affinity);
+                cpumask_and(new_affinity, new_affinity,
+                            unit->cpu_hard_affinity);
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
                                                new_affinity);
             }
@@ -718,10 +721,10 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         {
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
-                                               v->cpu_hard_affinity);
+                                               unit->cpu_hard_affinity);
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT )
                 ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
-                                               v->cpu_soft_affinity);
+                                               unit->cpu_soft_affinity);
         }
         break;
     }
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 4f4a660b0c..1729f73af0 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -312,8 +312,8 @@ static void dump_domains(unsigned char key)
                 printk("dirty_cpu=%u", v->dirty_cpu);
             printk("\n");
             printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
-                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
-                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
+                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity),
+                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity));
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
             arch_dump_vcpu_info(v);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 74c85df334..ffac2f4bbb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -350,6 +350,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu = new->vcpu->processor;
+    struct sched_unit *unit = new->vcpu->sched_unit;
     struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask, idle_mask, *online;
@@ -375,7 +376,7 @@ static inline void __runq_tickle(struct csched_unit *new)
     if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) &&
                   cpumask_test_cpu(cpu, &idle_mask)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         __cpumask_set_cpu(cpu, &mask);
         goto tickle;
@@ -410,11 +411,11 @@ static inline void __runq_tickle(struct csched_unit *new)
             int new_idlers_empty;
 
             if ( balance_step == BALANCE_SOFT_AFFINITY
-                 && !has_soft_affinity(new->vcpu) )
+                 && !has_soft_affinity(unit) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
-            affinity_balance_cpumask(new->vcpu, balance_step,
+            affinity_balance_cpumask(unit, balance_step,
                                      cpumask_scratch_cpu(cpu));
             cpumask_and(cpumask_scratch_cpu(cpu),
                         cpumask_scratch_cpu(cpu), &idle_mask);
@@ -443,8 +444,7 @@ static inline void __runq_tickle(struct csched_unit *new)
              */
             if ( new_idlers_empty && new->pri > cur->pri )
             {
-                if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity,
-                                        &idle_mask) )
+                if ( cpumask_intersects(unit->cpu_hard_affinity, &idle_mask) )
                 {
                     SCHED_VCPU_STAT_CRANK(cur, kicked_away);
                     SCHED_VCPU_STAT_CRANK(cur, migrate_r);
@@ -695,7 +695,7 @@ static inline bool
 __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v)
 {
     bool hot = prv->vcpu_migr_delay &&
-               (NOW() - v->last_run_time) < prv->vcpu_migr_delay;
+               (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay;
 
     if ( hot )
         SCHED_STAT_CRANK(vcpu_hot);
@@ -733,7 +733,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 
     for_each_affinity_balance_step( balance_step )
     {
-        affinity_balance_cpumask(vc, balance_step, cpus);
+        affinity_balance_cpumask(vc->sched_unit, balance_step, cpus);
         cpumask_and(cpus, online, cpus);
         /*
          * We want to pick up a pcpu among the ones that are online and
@@ -752,7 +752,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
          * balancing step all together.
          */
         if ( balance_step == BALANCE_SOFT_AFFINITY &&
-             (!has_soft_affinity(vc) || cpumask_empty(cpus)) )
+             (!has_soft_affinity(vc->sched_unit) || cpumask_empty(cpus)) )
             continue;
 
         /* If present, prefer vc's current processor */
@@ -1652,10 +1652,10 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
          * or counter.
          */
         if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY &&
-                                !has_soft_affinity(vc)) )
+                                !has_soft_affinity(vc->sched_unit)) )
             continue;
 
-        affinity_balance_cpumask(vc, balance_step, cpumask_scratch);
+        affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch);
         if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) )
         {
             /* We got a candidate. Grab it! */
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 562e73d99e..dabd5636f5 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -699,10 +699,10 @@ static int get_fallback_cpu(struct csched2_unit *svc)
     {
         int cpu = v->processor;
 
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     cpupool_domain_cpumask(v->domain));
 
@@ -1390,10 +1390,10 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
      */
     if ( score > 0 )
     {
-        if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+        if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
 
-        if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) )
+        if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) )
             score += CSCHED2_CREDIT_INIT;
     }
 
@@ -1436,6 +1436,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t max = 0;
+    struct sched_unit *unit = new->vcpu->sched_unit;
     unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
     cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain);
@@ -1473,7 +1474,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
                   cpumask_test_cpu(cpu, &rqd->idle) &&
                   !cpumask_test_cpu(cpu, &rqd->tickled)) )
     {
-        ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu);
+        ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu);
         SCHED_STAT_CRANK(tickled_idle_cpu_excl);
         ipid = cpu;
         goto tickle;
@@ -1482,10 +1483,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
     for_each_affinity_balance_step( bs )
     {
         /* Just skip first step, if we don't have a soft affinity */
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(new->vcpu) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
 
         /*
          * First of all, consider idle cpus, checking if we can just
@@ -1557,7 +1558,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now)
             ipid = cpu;
 
             /* If this is in new's soft affinity, just take it */
-            if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) )
+            if ( cpumask_test_cpu(cpu, unit->cpu_soft_affinity) )
             {
                 SCHED_STAT_CRANK(tickled_busy_cpu);
                 goto tickle;
@@ -2243,7 +2244,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
         goto out;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(vc->domain));
 
     /*
@@ -2288,7 +2289,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
      *
      * Find both runqueues in one pass.
      */
-    has_soft = has_soft_affinity(vc);
+    has_soft = has_soft_affinity(unit);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -2335,7 +2336,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
             cpumask_t mask;
 
             cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
-            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            if ( cpumask_intersects(&mask, unit->cpu_soft_affinity) )
             {
                 min_s_avgload = rqd_avgload;
                 min_s_rqi = i;
@@ -2357,9 +2358,9 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * Note that, to obtain the soft-affinity mask, we "just" put what we
          * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
          * ok because:
-         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have
          *   a non-empty intersection (because has_soft is true);
-         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         * - we have unit->cpu_hard_affinity & cpupool_domain_cpumask() already
          *   in cpumask_scratch, we do save a lot doing like this.
          *
          * It's kind of like open coding affinity_balance_cpumask() but, in
@@ -2367,7 +2368,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit)
          * cpumask operations.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    vc->cpu_soft_affinity);
+                    unit->cpu_soft_affinity);
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &prv->rqd[min_s_rqi].active);
     }
@@ -2475,6 +2476,7 @@ static void migrate(const struct scheduler *ops,
                     s_time_t now)
 {
     int cpu = svc->vcpu->processor;
+    struct sched_unit *unit = svc->vcpu->sched_unit;
 
     if ( unlikely(tb_init_done) )
     {
@@ -2512,7 +2514,7 @@ static void migrate(const struct scheduler *ops,
         }
         _runq_deassign(svc);
 
-        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                     cpupool_domain_cpumask(svc->vcpu->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
@@ -2546,7 +2548,7 @@ static bool vcpu_is_migrateable(struct csched2_unit *svc,
     struct vcpu *v = svc->vcpu;
     int cpu = svc->vcpu->processor;
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
@@ -2780,7 +2782,7 @@ csched2_unit_migrate(
 
     /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
     ASSERT(cpumask_test_cpu(new_cpu, &csched2_priv(ops)->initialized));
-    ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
+    ASSERT(cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity));
 
     trqd = c2rqd(ops, new_cpu);
 
@@ -3320,9 +3322,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     }
 
     /* If scurr has a soft-affinity, let's check whether cpu is part of it */
-    if ( has_soft_affinity(scurr->vcpu) )
+    if ( has_soft_affinity(scurr->vcpu->sched_unit) )
     {
-        affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY,
+        affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
@@ -3377,7 +3379,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         }
 
         /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+        if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) )
         {
             (*skipped)++;
             continue;
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index ee3a8cf064..56b0055a42 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -123,7 +123,8 @@ static inline struct null_unit *null_unit(const struct sched_unit *unit)
 static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu,
                                        unsigned int balance_step)
 {
-    affinity_balance_cpumask(v, balance_step, cpumask_scratch_cpu(cpu));
+    affinity_balance_cpumask(v->sched_unit, balance_step,
+                             cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 cpupool_domain_cpumask(v->domain));
 
@@ -281,10 +282,10 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
 
     for_each_affinity_balance_step( bs )
     {
-        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) )
+        if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) )
             continue;
 
-        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpus);
 
         /*
@@ -321,7 +322,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit)
      * as we will actually assign the vCPU to the pCPU we return from here,
      * only if the pCPU is free.
      */
-    cpumask_and(cpumask_scratch_cpu(cpu), cpus, v->cpu_hard_affinity);
+    cpumask_and(cpumask_scratch_cpu(cpu), cpus, unit->cpu_hard_affinity);
     new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
 
  out:
@@ -430,7 +431,7 @@ static void null_unit_insert(const struct scheduler *ops,
 
     lock = unit_schedule_lock(unit);
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                 cpupool_domain_cpumask(v->domain));
 
     /* If the pCPU is free, we assign v to it */
@@ -488,7 +489,8 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v)
     {
         list_for_each_entry( wvc, &prv->waitq, waitq_elem )
         {
-            if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->vcpu) )
+            if ( bs == BALANCE_SOFT_AFFINITY &&
+                 !has_soft_affinity(wvc->vcpu->sched_unit) )
                 continue;
 
             if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
@@ -767,7 +769,7 @@ static struct task_slice null_schedule(const struct scheduler *ops,
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
                 if ( bs == BALANCE_SOFT_AFFINITY &&
-                     !has_soft_affinity(wvc->vcpu) )
+                     !has_soft_affinity(wvc->vcpu->sched_unit) )
                     continue;
 
                 if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) )
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index cd737131a3..d640d87b43 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -327,7 +327,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc)
     mask = cpumask_scratch_cpu(svc->vcpu->processor);
 
     cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain);
-    cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
+    cpumask_and(mask, cpupool_mask, svc->vcpu->sched_unit->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
            " \t\t priority_level=%d has_extratime=%d\n"
@@ -645,7 +645,7 @@ rt_res_pick(const struct scheduler *ops, struct sched_unit *unit)
     int cpu;
 
     online = cpupool_domain_cpumask(vc->domain);
-    cpumask_and(&cpus, online, vc->cpu_hard_affinity);
+    cpumask_and(&cpus, online, unit->cpu_hard_affinity);
 
     cpu = cpumask_test_cpu(vc->processor, &cpus)
             ? vc->processor
@@ -1023,7 +1023,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 
         /* mask cpu_hard_affinity & cpupool & mask */
         online = cpupool_domain_cpumask(iter_svc->vcpu->domain);
-        cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity);
+        cpumask_and(&cpu_common, online,
+                    iter_svc->vcpu->sched_unit->cpu_hard_affinity);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
             continue;
@@ -1192,7 +1193,7 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new)
         return;
 
     online = cpupool_domain_cpumask(new->vcpu->domain);
-    cpumask_and(&not_tickled, online, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&not_tickled, online, new->vcpu->sched_unit->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
 
     /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 1321c86111..212c1e637f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -270,6 +270,12 @@ static void sched_free_unit(struct sched_unit *unit)
     }
 
     unit->vcpu->sched_unit = NULL;
+
+    free_cpumask_var(unit->cpu_hard_affinity);
+    free_cpumask_var(unit->cpu_hard_affinity_tmp);
+    free_cpumask_var(unit->cpu_hard_affinity_saved);
+    free_cpumask_var(unit->cpu_soft_affinity);
+
     xfree(unit);
 }
 
@@ -293,7 +299,17 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
     unit->next_in_list = *prev_unit;
     *prev_unit = unit;
 
+    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
+        goto fail;
+
     return unit;
+
+ fail:
+    sched_free_unit(unit);
+    return NULL;
 }
 
 int sched_init_vcpu(struct vcpu *v, unsigned int processor)
@@ -363,7 +379,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for_each_vcpu ( d, v )
     {
-        if ( v->affinity_broken )
+        if ( v->sched_unit->affinity_broken )
             return -EBUSY;
     }
 
@@ -682,7 +698,7 @@ static void vcpu_migrate_finish(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == get_sched_res(new_cpu)->schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
+                 cpumask_test_cpu(new_cpu, v->sched_unit->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
 
@@ -758,6 +774,7 @@ void restore_vcpu_affinity(struct domain *d)
     {
         spinlock_t *lock;
         unsigned int old_cpu = v->processor;
+        struct sched_unit *unit = v->sched_unit;
 
         ASSERT(!vcpu_runnable(v));
 
@@ -769,15 +786,15 @@ void restore_vcpu_affinity(struct domain *d)
          * set v->processor of each of their vCPUs to something that will
          * make sense for the scheduler of the cpupool in which they are in.
          */
-        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                     cpupool_domain_cpumask(d));
         if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
         {
-            if ( v->affinity_broken )
+            if ( unit->affinity_broken )
             {
-                sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-                v->affinity_broken = 0;
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+                unit->affinity_broken = 0;
+                cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
 
@@ -785,18 +802,17 @@ void restore_vcpu_affinity(struct domain *d)
             {
                 printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
                 sched_set_affinity(v, &cpumask_all, NULL);
-                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
                             cpupool_domain_cpumask(d));
             }
         }
 
         v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
-        v->sched_unit->res = get_sched_res(v->processor);
+        unit->res = get_sched_res(v->processor);
 
-        lock = unit_schedule_lock_irq(v->sched_unit);
-        v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v),
-                                                 v->sched_unit);
-        v->processor = v->sched_unit->res->processor;
+        lock = unit_schedule_lock_irq(unit);
+        unit->res = sched_pick_resource(vcpu_scheduler(v), unit);
+        v->processor = unit->res->processor;
         spin_unlock_irq(lock);
 
         if ( old_cpu != v->processor )
@@ -828,16 +844,17 @@ int cpu_disable_scheduler(unsigned int cpu)
         for_each_vcpu ( d, v )
         {
             unsigned long flags;
-            spinlock_t *lock = unit_schedule_lock_irqsave(v->sched_unit, &flags);
+            struct sched_unit *unit = v->sched_unit;
+            spinlock_t *lock = unit_schedule_lock_irqsave(unit, &flags);
 
-            cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
+            cpumask_and(&online_affinity, unit->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
-                 cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
+                 cpumask_test_cpu(cpu, unit->cpu_hard_affinity) )
             {
-                if ( v->affinity_broken )
+                if ( unit->affinity_broken )
                 {
                     /* The vcpu is temporarily pinned, can't move it. */
-                    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+                    unit_schedule_unlock_irqrestore(lock, flags, unit);
                     ret = -EADDRINUSE;
                     break;
                 }
@@ -850,7 +867,7 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( v->processor != cpu )
             {
                 /* The vcpu is not on this cpu, so we can move on. */
-                unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+                unit_schedule_unlock_irqrestore(lock, flags, unit);
                 continue;
             }
 
@@ -863,7 +880,7 @@ int cpu_disable_scheduler(unsigned int cpu)
              *    things would have failed before getting in here.
              */
             vcpu_migrate_start(v);
-            unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+            unit_schedule_unlock_irqrestore(lock, flags, unit);
 
             vcpu_migrate_finish(v);
 
@@ -892,7 +909,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 
     for_each_domain_in_cpupool ( d, c )
         for_each_vcpu ( d, v )
-            if ( v->affinity_broken )
+            if ( v->sched_unit->affinity_broken )
                 return -EADDRINUSE;
 
     return 0;
@@ -908,28 +925,31 @@ static int cpu_disable_scheduler_check(unsigned int cpu)
 void sched_set_affinity(
     struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft)
 {
-    sched_adjust_affinity(dom_scheduler(v->domain), v->sched_unit, hard, soft);
+    struct sched_unit *unit = v->sched_unit;
+
+    sched_adjust_affinity(dom_scheduler(v->domain), unit, hard, soft);
 
     if ( hard )
-        cpumask_copy(v->cpu_hard_affinity, hard);
+        cpumask_copy(unit->cpu_hard_affinity, hard);
     if ( soft )
-        cpumask_copy(v->cpu_soft_affinity, soft);
+        cpumask_copy(unit->cpu_soft_affinity, soft);
 
-    v->soft_aff_effective = !cpumask_subset(v->cpu_hard_affinity,
-                                            v->cpu_soft_affinity) &&
-                            cpumask_intersects(v->cpu_soft_affinity,
-                                               v->cpu_hard_affinity);
+    unit->soft_aff_effective = !cpumask_subset(unit->cpu_hard_affinity,
+                                               unit->cpu_soft_affinity) &&
+                               cpumask_intersects(unit->cpu_soft_affinity,
+                                                  unit->cpu_hard_affinity);
 }
 
 static int vcpu_set_affinity(
     struct vcpu *v, const cpumask_t *affinity, const cpumask_t *which)
 {
+    struct sched_unit *unit = v->sched_unit;
     spinlock_t *lock;
     int ret = 0;
 
-    lock = unit_schedule_lock_irq(v->sched_unit);
+    lock = unit_schedule_lock_irq(unit);
 
-    if ( v->affinity_broken )
+    if ( unit->affinity_broken )
         ret = -EBUSY;
     else
     {
@@ -937,19 +957,19 @@ static int vcpu_set_affinity(
          * Tell the scheduler we changes something about affinity,
          * and ask to re-evaluate vcpu placement.
          */
-        if ( which == v->cpu_hard_affinity )
+        if ( which == unit->cpu_hard_affinity )
         {
             sched_set_affinity(v, affinity, NULL);
         }
         else
         {
-            ASSERT(which == v->cpu_soft_affinity);
+            ASSERT(which == unit->cpu_soft_affinity);
             sched_set_affinity(v, NULL, affinity);
         }
         vcpu_migrate_start(v);
     }
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -968,12 +988,12 @@ int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity)
     if ( cpumask_empty(&online_affinity) )
         return -EINVAL;
 
-    return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_hard_affinity);
 }
 
 int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity)
 {
-    return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity);
+    return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_soft_affinity);
 }
 
 /* Block the currently-executing domain until a pertinent event occurs. */
@@ -1167,28 +1187,30 @@ void watchdog_domain_destroy(struct domain *d)
 
 int vcpu_pin_override(struct vcpu *v, int cpu)
 {
+    struct sched_unit *unit = v->sched_unit;
     spinlock_t *lock;
     int ret = -EINVAL;
 
-    lock = unit_schedule_lock_irq(v->sched_unit);
+    lock = unit_schedule_lock_irq(unit);
 
     if ( cpu < 0 )
     {
-        if ( v->affinity_broken )
+        if ( unit->affinity_broken )
         {
-            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-            v->affinity_broken = 0;
+            sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL);
+            unit->affinity_broken = 0;
             ret = 0;
         }
     }
     else if ( cpu < nr_cpu_ids )
     {
-        if ( v->affinity_broken )
+        if ( unit->affinity_broken )
             ret = -EBUSY;
         else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
         {
-            cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity);
-            v->affinity_broken = 1;
+            cpumask_copy(unit->cpu_hard_affinity_saved,
+                         unit->cpu_hard_affinity);
+            unit->affinity_broken = 1;
             sched_set_affinity(v, cpumask_of(cpu), NULL);
             ret = 0;
         }
@@ -1197,7 +1219,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu)
     if ( ret == 0 )
         vcpu_migrate_start(v);
 
-    unit_schedule_unlock_irq(lock, v->sched_unit);
+    unit_schedule_unlock_irq(lock, unit);
 
     domain_update_node_affinity(v->domain);
 
@@ -1549,7 +1571,7 @@ static void schedule(void)
         ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
          (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
         now);
-    prev->last_run_time = now;
+    prev->sched_unit->last_run_time = now;
 
     ASSERT(next->runstate.state != RUNSTATE_running);
     vcpu_runstate_change(next, RUNSTATE_running, now);
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 4f830a14e8..37e9e0d016 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+    cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
     if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
@@ -199,7 +199,7 @@ void check_wakeup_from_wait(void)
     {
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
+        cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
         if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 20e36ea39b..17c01abc25 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
  * * The hard affinity is not a subset of soft affinity
  * * There is an overlap between the soft and hard affinity masks
  */
-static inline int has_soft_affinity(const struct vcpu *v)
+static inline int has_soft_affinity(const struct sched_unit *unit)
 {
-    return v->soft_aff_effective &&
-           !cpumask_subset(cpupool_domain_cpumask(v->domain),
-                           v->cpu_soft_affinity);
+    return unit->soft_aff_effective &&
+           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
+                           unit->cpu_soft_affinity);
 }
 
 /*
@@ -452,17 +452,18 @@ static inline int has_soft_affinity(const struct vcpu *v)
  * to avoid running a vcpu where it would like, but is not allowed to!
  */
 static inline void
-affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask)
+affinity_balance_cpumask(const struct sched_unit *unit, int step,
+                         cpumask_t *mask)
 {
     if ( step == BALANCE_SOFT_AFFINITY )
     {
-        cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity);
+        cpumask_and(mask, unit->cpu_soft_affinity, unit->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, v->cpu_hard_affinity);
+            cpumask_copy(mask, unit->cpu_hard_affinity);
     }
     else /* step == BALANCE_HARD_AFFINITY */
-        cpumask_copy(mask, v->cpu_hard_affinity);
+        cpumask_copy(mask, unit->cpu_hard_affinity);
 }
 
 #endif /* __XEN_SCHED_IF_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ee316cddd7..13c99a9194 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -175,9 +175,6 @@ struct vcpu
     } runstate_guest; /* guest address */
 #endif
 
-    /* last time when vCPU is scheduled out */
-    uint64_t last_run_time;
-
     /* Has the FPU been initialised? */
     bool             fpu_initialised;
     /* Has the FPU been used since it was last saved? */
@@ -203,8 +200,6 @@ struct vcpu
     bool             defer_shutdown;
     /* VCPU is paused following shutdown request (d->is_shutting_down)? */
     bool             paused_for_shutdown;
-    /* VCPU need affinity restored */
-    bool             affinity_broken;
 
     /* A hypercall has been preempted. */
     bool             hcall_preempted;
@@ -213,9 +208,6 @@ struct vcpu
     bool             hcall_compat;
 #endif
 
-    /* Does soft affinity actually play a role (given hard affinity)? */
-    bool             soft_aff_effective;
-
     /* The CPU, if any, which is holding onto this VCPU's state. */
 #define VCPU_CPU_CLEAN (~0u)
     unsigned int     dirty_cpu;
@@ -247,16 +239,6 @@ struct vcpu
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
 
-    /* Bitmask of CPUs on which this VCPU may run. */
-    cpumask_var_t    cpu_hard_affinity;
-    /* Used to change affinity temporarily. */
-    cpumask_var_t    cpu_hard_affinity_tmp;
-    /* Used to restore affinity across S3. */
-    cpumask_var_t    cpu_hard_affinity_saved;
-
-    /* Bitmask of CPUs on which this VCPU prefers to run. */
-    cpumask_var_t    cpu_soft_affinity;
-
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
@@ -283,6 +265,22 @@ struct sched_unit {
     void                  *priv;      /* scheduler private data */
     struct sched_unit     *next_in_list;
     struct sched_resource *res;
+
+    /* Last time when unit has been scheduled out. */
+    uint64_t               last_run_time;
+
+    /* Item needs affinity restored. */
+    bool                   affinity_broken;
+    /* Does soft affinity actually play a role (given hard affinity)? */
+    bool                   soft_aff_effective;
+    /* Bitmask of CPUs on which this VCPU may run. */
+    cpumask_var_t          cpu_hard_affinity;
+    /* Used to change affinity temporarily. */
+    cpumask_var_t          cpu_hard_affinity_tmp;
+    /* Used to restore affinity across S3. */
+    cpumask_var_t          cpu_hard_affinity_saved;
+    /* Bitmask of CPUs on which this VCPU prefers to run. */
+    cpumask_var_t          cpu_soft_affinity;
 };
 
 #define for_each_sched_unit(d, e)                                         \
@@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
 static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
 {
     return (is_hardware_domain(v->domain) &&
-            cpumask_weight(v->cpu_hard_affinity) == 1);
+            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
 }
 
 #ifdef CONFIG_HAS_PASSTHROUGH
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-07-02 10:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01 14:08 [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Jan Beulich
2019-07-01 15:10 ` Juergen Gross
2019-07-01 15:46   ` Jan Beulich
2019-07-02  6:30     ` Juergen Gross
2019-07-02  7:54       ` Jan Beulich
2019-07-02  8:14         ` Juergen Gross
2019-07-02  8:27           ` Jan Beulich
2019-07-02  8:44             ` Juergen Gross
2019-07-02  9:05               ` Jan Beulich
2019-07-02  9:16                 ` Juergen Gross
2019-07-02  8:21         ` Dario Faggioli
2019-07-02  8:29           ` Jan Beulich
2019-07-02  9:40             ` Dario Faggioli
2019-07-02 10:01               ` Jan Beulich
2019-07-02 10:25                 ` Juergen Gross
  -- strict thread matches above, loose matches on Subject: below --
2019-05-28 10:32 [PATCH 00/60] xen: add core scheduling support Juergen Gross
2019-05-28 10:32 ` [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Juergen Gross
2019-05-28 10:32   ` [Xen-devel] " Juergen Gross
2019-06-13  7:18   ` Andrii Anisov
2019-06-13  7:29     ` Juergen Gross
2019-06-13  7:34       ` Andrii Anisov
2019-06-13  8:39         ` Juergen Gross
2019-06-13  8:49           ` Andrii Anisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).