Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit

From: Jan Beulich <JBeulich@suse.com>
To: Juergen Gross <JGross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>, Dario Faggioli <dfaggioli@suse.com>,
	Julien Grall <julien.grall@arm.com>,
	Meng Xu <mengxu@cis.upenn.edu>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Ian Jackson <ian.jackson@citrix.com>,
	Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
Date: Mon, 1 Jul 2019 15:46:14 +0000	[thread overview]
Message-ID: <de741925-b823-92ee-c9be-c4cc55da859d@suse.com> (raw)
In-Reply-To: <45139cf0-0b53-1ca2-c8b2-2c2e8813a82d@suse.com>

On 01.07.2019 17:10, Juergen Gross wrote:
> On 01.07.19 16:08, Jan Beulich wrote:
>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>>>        * Set the tmp value unconditionally, so that the check in the iret
>>>        * hypercall works.
>>>        */
>>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>>> -                 st->vcpu->cpu_hard_affinity);
>>> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
>>> +                 st->vcpu->sched_unit->cpu_hard_affinity);
>>
>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
>> in the unit, yet every vCPU in there may want to make use of the
>> field in parallel.
> 
> Hmm, yes, we'll need a usage bitmask.
> 
> Please note that affecting all vcpus in the unit is per design. With
> multiple vcpus of a unit needing this feature in parallel there is no
> way they can have different needs regarding temporary affinity.

But how will this work? I.e. how will all vCPU-s in a unit get
their temporary affinity pointing to the one specific pCPU in question?
It's not just the "all at the same time" that I don't see working here,
I'm also having trouble seeing how the potential cross-core or cross-
node movement that's apparently needed here would end up working. I'm
not going to exclude that the set of possible pCPU-s a vCPU needs to
move to here is still within the unit, but then I'd expect assertions
to that effect to be added.

>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>>>   static void vcpu_destroy(struct vcpu *v)
>>>   {
>>> -    free_cpumask_var(v->cpu_hard_affinity);
>>> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
>>> -    free_cpumask_var(v->cpu_hard_affinity_saved);
>>> -    free_cpumask_var(v->cpu_soft_affinity);
>>> -
>>>       free_vcpu_struct(v);
>>>   }
>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>>>       grant_table_init_vcpu(v);
>>> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
>>> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
>>> -        goto fail;
>>
>> Seeing these, I'm actually having trouble understanding how you mean
>> to retain the user visible interface behavior here: If you only store an
>> affinity per sched unit, then how are you meaning to honor the vCPU
>> granular requests coming in?
> 
> With core scheduling it is only possible to set (virtual) core
> affinities. Whenever an affinity of a vcpu is being set it will set the
> affinity of the whole unit.

Hmm, that's indeed what I was deducing, but how will we sell this
to people actually fiddling with vCPU affinities? I foresee getting
bug reports that the respective xl command(s) do(es)n't do anymore
what it used to do.

>>> --- a/xen/include/xen/sched-if.h
>>> +++ b/xen/include/xen/sched-if.h
>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>>    * * The hard affinity is not a subset of soft affinity
>>>    * * There is an overlap between the soft and hard affinity masks
>>>    */
>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>   {
>>> -    return v->soft_aff_effective &&
>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>> -                           v->cpu_soft_affinity);
>>> +    return unit->soft_aff_effective &&
>>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>> +                           unit->cpu_soft_affinity);
>>>   }
>>
>> Okay, at the moment there looks to be a 1:1 relationship between sched
>> units and vCPU-s. This would - at this point of the series - invalidate most
>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
>> mapping would get broken, and I can't seem to identify any other patch
>> where this might be happening. Looking at the github branch I also get the
>> impression that the struct vcpu * pointer out of struct sched_unit survives
>> until the end of the series, which doesn't seem right to me.
> 
> It is right. The vcpu pointer in the sched_unit is pointing to the first
> vcpu of the unit at the end of the series. Further vcpus are found via
> v->next_in_list.

I'm afraid this sets us up for misunderstanding and misuse. I don't
think there should be a straight struct vcpu * out of struct sched_unit.

>>> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
>>>   static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
>>>   {
>>>       return (is_hardware_domain(v->domain) &&
>>> -            cpumask_weight(v->cpu_hard_affinity) == 1);
>>> +            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
>>>   }
>>
>> Seeing this - how is pinning (by command line option or by Dom0
>> doing this on itself transiently) going to work with core scheduling?
> 
> In the end only the bit of the first vcpu of a unit will be set in the
> affinity masks, affecting all vcpus of the unit.

I'm confused - what "bit of the first vcpu of a unit" are you referring
to?

To give an example of what I meant with my earlier reply: How is Dom0
requesting its vCPU 5 to be pinned to pCPU 3 going to be satisfied,
independent of the sched unit that vCPU 5 is associated with? Is the
whole sched unit getting moved over then? If so, what if another vCPU
in the same sched unit at the same time requests to be pinned to pCPU
17, on a different node/socket?

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel