All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>, Dario Faggioli <dfaggioli@suse.com>,
	Julien Grall <julien.grall@arm.com>,
	Meng Xu <mengxu@cis.upenn.edu>,
	Ian Jackson <ian.jackson@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit
Date: Mon, 1 Jul 2019 17:10:28 +0200	[thread overview]
Message-ID: <45139cf0-0b53-1ca2-c8b2-2c2e8813a82d@suse.com> (raw)
In-Reply-To: <be24ad7e-9510-912c-9984-7eec792fae86@suse.com>

On 01.07.19 16:08, Jan Beulich wrote:
>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote:
>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
>>        * Set the tmp value unconditionally, so that the check in the iret
>>        * hypercall works.
>>        */
>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>> -                 st->vcpu->cpu_hard_affinity);
>> +    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
>> +                 st->vcpu->sched_unit->cpu_hard_affinity);
> 
> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
> in the unit, yet every vCPU in there may want to make use of the
> field in parallel.

Hmm, yes, we'll need a usage bitmask.

Please note that affecting all vcpus in the unit is per design. With
multiple vcpus of a unit needing this feature in parallel there is no
way they can have different needs regarding temporary affinity.

> 
> I also wonder how the code further down in this function fits with
> the scheduler unit concept. But perhaps that's going to be dealt with
> by later patches...
> 
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
>>   
>>   static void vcpu_destroy(struct vcpu *v)
>>   {
>> -    free_cpumask_var(v->cpu_hard_affinity);
>> -    free_cpumask_var(v->cpu_hard_affinity_tmp);
>> -    free_cpumask_var(v->cpu_hard_affinity_saved);
>> -    free_cpumask_var(v->cpu_soft_affinity);
>> -
>>       free_vcpu_struct(v);
>>   }
>>   
>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
>>   
>>       grant_table_init_vcpu(v);
>>   
>> -    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>> -         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
>> -         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
>> -        goto fail;
> 
> Seeing these, I'm actually having trouble understanding how you mean
> to retain the user visible interface behavior here: If you only store an
> affinity per sched unit, then how are you meaning to honor the vCPU
> granular requests coming in?

With core scheduling it is only possible to set (virtual) core
affinities. Whenever an affinity of a vcpu is being set it will set the
affinity of the whole unit.

> 
>> @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d)
>>            */
>>           for_each_vcpu ( d, v )
>>           {
>> -            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
>> +            cpumask_or(dom_cpumask, dom_cpumask,
>> +                       v->sched_unit->cpu_hard_affinity);
>>               cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
>> -                       v->cpu_soft_affinity);
>> +                       v->sched_unit->cpu_soft_affinity);
>>           }
> 
> There's not going to be a for_each_sched_unit(), is there? It
> would mean less iterations here, and less redundant CPU mask
> operations. Ah, that's the subject of patch 30.

Right.

> 
>> @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v)
>>       v->async_exception_mask = 0;
>>       memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
>>   #endif
>> -    cpumask_clear(v->cpu_hard_affinity_tmp);
>> +    cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp);
> 
> Same issue as above - you're affecting other vCPU-s here.

Yes, we'll need a usage bitmask to be tested here.

> 
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>       case XEN_DOMCTL_getvcpuaffinity:
>>       {
>>           struct vcpu *v;
>> +        struct sched_unit *unit;
> 
> const?
> 
>> --- a/xen/common/keyhandler.c
>> +++ b/xen/common/keyhandler.c
>> @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key)
>>                   printk("dirty_cpu=%u", v->dirty_cpu);
>>               printk("\n");
>>               printk("    cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n",
>> -                   nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity),
>> -                   nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity));
>> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity),
>> +                   nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity));
> 
> I don't see the value of logging the same information multiple times
> (for each vCPU in a sched unit). I think you want to split this up.

Yes, true.

> 
>> --- a/xen/common/wait.c
>> +++ b/xen/common/wait.c
>> @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
>>   
>>       /* Save current VCPU affinity; force wakeup on *this* CPU only. */
>>       wqv->wakeup_cpu = smp_processor_id();
>> -    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
>> +    cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>>       if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>>       {
>>           gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
>> @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void)
>>       {
>>           /* Re-set VCPU affinity and re-enter the scheduler. */
>>           struct vcpu *curr = current;
>> -        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
>> +        cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity);
>>           if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>>           {
>>               gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
> 
> Same problem as above - the consumer of ->saved_affinity will affect
> vCPU-s other than the subject one.

Yes.

> 
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
>>    * * The hard affinity is not a subset of soft affinity
>>    * * There is an overlap between the soft and hard affinity masks
>>    */
>> -static inline int has_soft_affinity(const struct vcpu *v)
>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>   {
>> -    return v->soft_aff_effective &&
>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>> -                           v->cpu_soft_affinity);
>> +    return unit->soft_aff_effective &&
>> +           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>> +                           unit->cpu_soft_affinity);
>>   }
> 
> Okay, at the moment there looks to be a 1:1 relationship between sched
> units and vCPU-s. This would - at this point of the series - invalidate most
> my earlier comments. However, in patch 57 I don't see how this unit->vcpu
> mapping would get broken, and I can't seem to identify any other patch
> where this might be happening. Looking at the github branch I also get the
> impression that the struct vcpu * pointer out of struct sched_unit survives
> until the end of the series, which doesn't seem right to me.

It is right. The vcpu pointer in the sched_unit is pointing to the first
vcpu of the unit at the end of the series. Further vcpus are found via
v->next_in_list.

> In any event, for the purpose here, I think there should be a backlink to
> struct domain in struct sched_unit right away, and it should get used here.

See patch 15.

> 
>> @@ -283,6 +265,22 @@ struct sched_unit {
>>       void                  *priv;      /* scheduler private data */
>>       struct sched_unit     *next_in_list;
>>       struct sched_resource *res;
>> +
>> +    /* Last time when unit has been scheduled out. */
>> +    uint64_t               last_run_time;
>> +
>> +    /* Item needs affinity restored. */
>> +    bool                   affinity_broken;
>> +    /* Does soft affinity actually play a role (given hard affinity)? */
>> +    bool                   soft_aff_effective;
>> +    /* Bitmask of CPUs on which this VCPU may run. */
>> +    cpumask_var_t          cpu_hard_affinity;
>> +    /* Used to change affinity temporarily. */
>> +    cpumask_var_t          cpu_hard_affinity_tmp;
>> +    /* Used to restore affinity across S3. */
>> +    cpumask_var_t          cpu_hard_affinity_saved;
>> +    /* Bitmask of CPUs on which this VCPU prefers to run. */
>> +    cpumask_var_t          cpu_soft_affinity;
>>   };
> 
> The mentions of "VCPU" in the comments here also survive till the end
> of the series, which I also don't think is quite right.

Will modify.

> 
>> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v)
>>   static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v)
>>   {
>>       return (is_hardware_domain(v->domain) &&
>> -            cpumask_weight(v->cpu_hard_affinity) == 1);
>> +            cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1);
>>   }
> 
> Seeing this - how is pinning (by command line option or by Dom0
> doing this on itself transiently) going to work with core scheduling?

In the end only the bit of the first vcpu of a unit will be set in the
affinity masks, affecting all vcpus of the unit.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-07-01 15:10 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-01 14:08 [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Jan Beulich
2019-07-01 15:10 ` Juergen Gross [this message]
2019-07-01 15:46   ` Jan Beulich
2019-07-02  6:30     ` Juergen Gross
2019-07-02  7:54       ` Jan Beulich
2019-07-02  8:14         ` Juergen Gross
2019-07-02  8:27           ` Jan Beulich
2019-07-02  8:44             ` Juergen Gross
2019-07-02  9:05               ` Jan Beulich
2019-07-02  9:16                 ` Juergen Gross
2019-07-02  8:21         ` Dario Faggioli
2019-07-02  8:29           ` Jan Beulich
2019-07-02  9:40             ` Dario Faggioli
2019-07-02 10:01               ` Jan Beulich
2019-07-02 10:25                 ` Juergen Gross
  -- strict thread matches above, loose matches on Subject: below --
2019-05-28 10:32 [PATCH 00/60] xen: add core scheduling support Juergen Gross
2019-05-28 10:32 ` [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Juergen Gross
2019-05-28 10:32   ` Juergen Gross
2019-06-13  7:18     ` Andrii Anisov
2019-06-13  7:29       ` Juergen Gross
2019-06-13  7:34         ` Andrii Anisov
2019-06-13  8:39           ` Juergen Gross
2019-06-13  8:49             ` Andrii Anisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45139cf0-0b53-1ca2-c8b2-2c2e8813a82d@suse.com \
    --to=jgross@suse.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=ian.jackson@citrix.com \
    --cc=julien.grall@arm.com \
    --cc=konrad.wilk@oracle.com \
    --cc=mengxu@cis.upenn.edu \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=tim@xen.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.