* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit @ 2019-07-01 14:08 Jan Beulich 2019-07-01 15:10 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2019-07-01 14:08 UTC (permalink / raw) To: Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, Ian Jackson, xen-devel, Roger Pau Monne >>> On 28.05.19 at 12:32, <jgross@suse.com> wrote: > @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) > * Set the tmp value unconditionally, so that the check in the iret > * hypercall works. > */ > - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, > - st->vcpu->cpu_hard_affinity); > + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, > + st->vcpu->sched_unit->cpu_hard_affinity); Aiui this affects all vCPU-s in the unit, which is unlikely to be what we want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s in the unit, yet every vCPU in there may want to make use of the field in parallel. I also wonder how the code further down in this function fits with the scheduler unit concept. But perhaps that's going to be dealt with by later patches... > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) > > static void vcpu_destroy(struct vcpu *v) > { > - free_cpumask_var(v->cpu_hard_affinity); > - free_cpumask_var(v->cpu_hard_affinity_tmp); > - free_cpumask_var(v->cpu_hard_affinity_saved); > - free_cpumask_var(v->cpu_soft_affinity); > - > free_vcpu_struct(v); > } > > @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( > > grant_table_init_vcpu(v); > > - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || > - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || > - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || > - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) > - goto fail; Seeing these, I'm actually having trouble understanding how you mean to retain the user visible interface behavior here: If you only store an affinity per sched unit, then how are you meaning to honor the vCPU granular requests coming in? > @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d) > */ > for_each_vcpu ( d, v ) > { > - cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity); > + cpumask_or(dom_cpumask, dom_cpumask, > + v->sched_unit->cpu_hard_affinity); > cpumask_or(dom_cpumask_soft, dom_cpumask_soft, > - v->cpu_soft_affinity); > + v->sched_unit->cpu_soft_affinity); > } There's not going to be a for_each_sched_unit(), is there? It would mean less iterations here, and less redundant CPU mask operations. Ah, that's the subject of patch 30. > @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v) > v->async_exception_mask = 0; > memset(v->async_exception_state, 0, sizeof(v->async_exception_state)); > #endif > - cpumask_clear(v->cpu_hard_affinity_tmp); > + cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp); Same issue as above - you're affecting other vCPU-s here. > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) > case XEN_DOMCTL_getvcpuaffinity: > { > struct vcpu *v; > + struct sched_unit *unit; const? > --- a/xen/common/keyhandler.c > +++ b/xen/common/keyhandler.c > @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key) > printk("dirty_cpu=%u", v->dirty_cpu); > printk("\n"); > printk(" cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n", > - nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity), > - nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity)); > + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity), > + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity)); I don't see the value of logging the same information multiple times (for each vCPU in a sched unit). I think you want to split this up. > --- a/xen/common/wait.c > +++ b/xen/common/wait.c > @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) > > /* Save current VCPU affinity; force wakeup on *this* CPU only. */ > wqv->wakeup_cpu = smp_processor_id(); > - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); > + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); > if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) > { > gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); > @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void) > { > /* Re-set VCPU affinity and re-enter the scheduler. */ > struct vcpu *curr = current; > - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); > + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); > if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) > { > gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); Same problem as above - the consumer of ->saved_affinity will affect vCPU-s other than the subject one. > --- a/xen/include/xen/sched-if.h > +++ b/xen/include/xen/sched-if.h > @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) > * * The hard affinity is not a subset of soft affinity > * * There is an overlap between the soft and hard affinity masks > */ > -static inline int has_soft_affinity(const struct vcpu *v) > +static inline int has_soft_affinity(const struct sched_unit *unit) > { > - return v->soft_aff_effective && > - !cpumask_subset(cpupool_domain_cpumask(v->domain), > - v->cpu_soft_affinity); > + return unit->soft_aff_effective && > + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), > + unit->cpu_soft_affinity); > } Okay, at the moment there looks to be a 1:1 relationship between sched units and vCPU-s. This would - at this point of the series - invalidate most my earlier comments. However, in patch 57 I don't see how this unit->vcpu mapping would get broken, and I can't seem to identify any other patch where this might be happening. Looking at the github branch I also get the impression that the struct vcpu * pointer out of struct sched_unit survives until the end of the series, which doesn't seem right to me. In any event, for the purpose here, I think there should be a backlink to struct domain in struct sched_unit right away, and it should get used here. > @@ -283,6 +265,22 @@ struct sched_unit { > void *priv; /* scheduler private data */ > struct sched_unit *next_in_list; > struct sched_resource *res; > + > + /* Last time when unit has been scheduled out. */ > + uint64_t last_run_time; > + > + /* Item needs affinity restored. */ > + bool affinity_broken; > + /* Does soft affinity actually play a role (given hard affinity)? */ > + bool soft_aff_effective; > + /* Bitmask of CPUs on which this VCPU may run. */ > + cpumask_var_t cpu_hard_affinity; > + /* Used to change affinity temporarily. */ > + cpumask_var_t cpu_hard_affinity_tmp; > + /* Used to restore affinity across S3. */ > + cpumask_var_t cpu_hard_affinity_saved; > + /* Bitmask of CPUs on which this VCPU prefers to run. */ > + cpumask_var_t cpu_soft_affinity; > }; The mentions of "VCPU" in the comments here also survive till the end of the series, which I also don't think is quite right. > @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v) > static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v) > { > return (is_hardware_domain(v->domain) && > - cpumask_weight(v->cpu_hard_affinity) == 1); > + cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1); > } Seeing this - how is pinning (by command line option or by Dom0 doing this on itself transiently) going to work with core scheduling? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-01 14:08 [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Jan Beulich @ 2019-07-01 15:10 ` Juergen Gross 2019-07-01 15:46 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-07-01 15:10 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, Ian Jackson, xen-devel, Roger Pau Monne On 01.07.19 16:08, Jan Beulich wrote: >>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote: >> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) >> * Set the tmp value unconditionally, so that the check in the iret >> * hypercall works. >> */ >> - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, >> - st->vcpu->cpu_hard_affinity); >> + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, >> + st->vcpu->sched_unit->cpu_hard_affinity); > > Aiui this affects all vCPU-s in the unit, which is unlikely to be what we > want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s > in the unit, yet every vCPU in there may want to make use of the > field in parallel. Hmm, yes, we'll need a usage bitmask. Please note that affecting all vcpus in the unit is per design. With multiple vcpus of a unit needing this feature in parallel there is no way they can have different needs regarding temporary affinity. > > I also wonder how the code further down in this function fits with > the scheduler unit concept. But perhaps that's going to be dealt with > by later patches... > >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) >> >> static void vcpu_destroy(struct vcpu *v) >> { >> - free_cpumask_var(v->cpu_hard_affinity); >> - free_cpumask_var(v->cpu_hard_affinity_tmp); >> - free_cpumask_var(v->cpu_hard_affinity_saved); >> - free_cpumask_var(v->cpu_soft_affinity); >> - >> free_vcpu_struct(v); >> } >> >> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( >> >> grant_table_init_vcpu(v); >> >> - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || >> - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || >> - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || >> - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) >> - goto fail; > > Seeing these, I'm actually having trouble understanding how you mean > to retain the user visible interface behavior here: If you only store an > affinity per sched unit, then how are you meaning to honor the vCPU > granular requests coming in? With core scheduling it is only possible to set (virtual) core affinities. Whenever an affinity of a vcpu is being set it will set the affinity of the whole unit. > >> @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d) >> */ >> for_each_vcpu ( d, v ) >> { >> - cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity); >> + cpumask_or(dom_cpumask, dom_cpumask, >> + v->sched_unit->cpu_hard_affinity); >> cpumask_or(dom_cpumask_soft, dom_cpumask_soft, >> - v->cpu_soft_affinity); >> + v->sched_unit->cpu_soft_affinity); >> } > > There's not going to be a for_each_sched_unit(), is there? It > would mean less iterations here, and less redundant CPU mask > operations. Ah, that's the subject of patch 30. Right. > >> @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v) >> v->async_exception_mask = 0; >> memset(v->async_exception_state, 0, sizeof(v->async_exception_state)); >> #endif >> - cpumask_clear(v->cpu_hard_affinity_tmp); >> + cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp); > > Same issue as above - you're affecting other vCPU-s here. Yes, we'll need a usage bitmask to be tested here. > >> --- a/xen/common/domctl.c >> +++ b/xen/common/domctl.c >> @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) >> case XEN_DOMCTL_getvcpuaffinity: >> { >> struct vcpu *v; >> + struct sched_unit *unit; > > const? > >> --- a/xen/common/keyhandler.c >> +++ b/xen/common/keyhandler.c >> @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key) >> printk("dirty_cpu=%u", v->dirty_cpu); >> printk("\n"); >> printk(" cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n", >> - nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity), >> - nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity)); >> + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity), >> + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity)); > > I don't see the value of logging the same information multiple times > (for each vCPU in a sched unit). I think you want to split this up. Yes, true. > >> --- a/xen/common/wait.c >> +++ b/xen/common/wait.c >> @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) >> >> /* Save current VCPU affinity; force wakeup on *this* CPU only. */ >> wqv->wakeup_cpu = smp_processor_id(); >> - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); >> + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); >> if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) >> { >> gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); >> @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void) >> { >> /* Re-set VCPU affinity and re-enter the scheduler. */ >> struct vcpu *curr = current; >> - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); >> + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); >> if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) >> { >> gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); > > Same problem as above - the consumer of ->saved_affinity will affect > vCPU-s other than the subject one. Yes. > >> --- a/xen/include/xen/sched-if.h >> +++ b/xen/include/xen/sched-if.h >> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >> * * The hard affinity is not a subset of soft affinity >> * * There is an overlap between the soft and hard affinity masks >> */ >> -static inline int has_soft_affinity(const struct vcpu *v) >> +static inline int has_soft_affinity(const struct sched_unit *unit) >> { >> - return v->soft_aff_effective && >> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >> - v->cpu_soft_affinity); >> + return unit->soft_aff_effective && >> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >> + unit->cpu_soft_affinity); >> } > > Okay, at the moment there looks to be a 1:1 relationship between sched > units and vCPU-s. This would - at this point of the series - invalidate most > my earlier comments. However, in patch 57 I don't see how this unit->vcpu > mapping would get broken, and I can't seem to identify any other patch > where this might be happening. Looking at the github branch I also get the > impression that the struct vcpu * pointer out of struct sched_unit survives > until the end of the series, which doesn't seem right to me. It is right. The vcpu pointer in the sched_unit is pointing to the first vcpu of the unit at the end of the series. Further vcpus are found via v->next_in_list. > In any event, for the purpose here, I think there should be a backlink to > struct domain in struct sched_unit right away, and it should get used here. See patch 15. > >> @@ -283,6 +265,22 @@ struct sched_unit { >> void *priv; /* scheduler private data */ >> struct sched_unit *next_in_list; >> struct sched_resource *res; >> + >> + /* Last time when unit has been scheduled out. */ >> + uint64_t last_run_time; >> + >> + /* Item needs affinity restored. */ >> + bool affinity_broken; >> + /* Does soft affinity actually play a role (given hard affinity)? */ >> + bool soft_aff_effective; >> + /* Bitmask of CPUs on which this VCPU may run. */ >> + cpumask_var_t cpu_hard_affinity; >> + /* Used to change affinity temporarily. */ >> + cpumask_var_t cpu_hard_affinity_tmp; >> + /* Used to restore affinity across S3. */ >> + cpumask_var_t cpu_hard_affinity_saved; >> + /* Bitmask of CPUs on which this VCPU prefers to run. */ >> + cpumask_var_t cpu_soft_affinity; >> }; > > The mentions of "VCPU" in the comments here also survive till the end > of the series, which I also don't think is quite right. Will modify. > >> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v) >> static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v) >> { >> return (is_hardware_domain(v->domain) && >> - cpumask_weight(v->cpu_hard_affinity) == 1); >> + cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1); >> } > > Seeing this - how is pinning (by command line option or by Dom0 > doing this on itself transiently) going to work with core scheduling? In the end only the bit of the first vcpu of a unit will be set in the affinity masks, affecting all vcpus of the unit. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-01 15:10 ` Juergen Gross @ 2019-07-01 15:46 ` Jan Beulich 2019-07-02 6:30 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2019-07-01 15:46 UTC (permalink / raw) To: Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 01.07.2019 17:10, Juergen Gross wrote: > On 01.07.19 16:08, Jan Beulich wrote: >>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote: >>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) >>> * Set the tmp value unconditionally, so that the check in the iret >>> * hypercall works. >>> */ >>> - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, >>> - st->vcpu->cpu_hard_affinity); >>> + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, >>> + st->vcpu->sched_unit->cpu_hard_affinity); >> >> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we >> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s >> in the unit, yet every vCPU in there may want to make use of the >> field in parallel. > > Hmm, yes, we'll need a usage bitmask. > > Please note that affecting all vcpus in the unit is per design. With > multiple vcpus of a unit needing this feature in parallel there is no > way they can have different needs regarding temporary affinity. But how will this work? I.e. how will all vCPU-s in a unit get their temporary affinity pointing to the one specific pCPU in question? It's not just the "all at the same time" that I don't see working here, I'm also having trouble seeing how the potential cross-core or cross- node movement that's apparently needed here would end up working. I'm not going to exclude that the set of possible pCPU-s a vCPU needs to move to here is still within the unit, but then I'd expect assertions to that effect to be added. >>> --- a/xen/common/domain.c >>> +++ b/xen/common/domain.c >>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) >>> static void vcpu_destroy(struct vcpu *v) >>> { >>> - free_cpumask_var(v->cpu_hard_affinity); >>> - free_cpumask_var(v->cpu_hard_affinity_tmp); >>> - free_cpumask_var(v->cpu_hard_affinity_saved); >>> - free_cpumask_var(v->cpu_soft_affinity); >>> - >>> free_vcpu_struct(v); >>> } >>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( >>> grant_table_init_vcpu(v); >>> - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || >>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || >>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || >>> - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) >>> - goto fail; >> >> Seeing these, I'm actually having trouble understanding how you mean >> to retain the user visible interface behavior here: If you only store an >> affinity per sched unit, then how are you meaning to honor the vCPU >> granular requests coming in? > > With core scheduling it is only possible to set (virtual) core > affinities. Whenever an affinity of a vcpu is being set it will set the > affinity of the whole unit. Hmm, that's indeed what I was deducing, but how will we sell this to people actually fiddling with vCPU affinities? I foresee getting bug reports that the respective xl command(s) do(es)n't do anymore what it used to do. >>> --- a/xen/include/xen/sched-if.h >>> +++ b/xen/include/xen/sched-if.h >>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>> * * The hard affinity is not a subset of soft affinity >>> * * There is an overlap between the soft and hard affinity masks >>> */ >>> -static inline int has_soft_affinity(const struct vcpu *v) >>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>> { >>> - return v->soft_aff_effective && >>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>> - v->cpu_soft_affinity); >>> + return unit->soft_aff_effective && >>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>> + unit->cpu_soft_affinity); >>> } >> >> Okay, at the moment there looks to be a 1:1 relationship between sched >> units and vCPU-s. This would - at this point of the series - invalidate most >> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >> mapping would get broken, and I can't seem to identify any other patch >> where this might be happening. Looking at the github branch I also get the >> impression that the struct vcpu * pointer out of struct sched_unit survives >> until the end of the series, which doesn't seem right to me. > > It is right. The vcpu pointer in the sched_unit is pointing to the first > vcpu of the unit at the end of the series. Further vcpus are found via > v->next_in_list. I'm afraid this sets us up for misunderstanding and misuse. I don't think there should be a straight struct vcpu * out of struct sched_unit. >>> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v) >>> static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v) >>> { >>> return (is_hardware_domain(v->domain) && >>> - cpumask_weight(v->cpu_hard_affinity) == 1); >>> + cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1); >>> } >> >> Seeing this - how is pinning (by command line option or by Dom0 >> doing this on itself transiently) going to work with core scheduling? > > In the end only the bit of the first vcpu of a unit will be set in the > affinity masks, affecting all vcpus of the unit. I'm confused - what "bit of the first vcpu of a unit" are you referring to? To give an example of what I meant with my earlier reply: How is Dom0 requesting its vCPU 5 to be pinned to pCPU 3 going to be satisfied, independent of the sched unit that vCPU 5 is associated with? Is the whole sched unit getting moved over then? If so, what if another vCPU in the same sched unit at the same time requests to be pinned to pCPU 17, on a different node/socket? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-01 15:46 ` Jan Beulich @ 2019-07-02 6:30 ` Juergen Gross 2019-07-02 7:54 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-07-02 6:30 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 01.07.19 17:46, Jan Beulich wrote: > On 01.07.2019 17:10, Juergen Gross wrote: >> On 01.07.19 16:08, Jan Beulich wrote: >>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote: >>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) >>>> * Set the tmp value unconditionally, so that the check in the iret >>>> * hypercall works. >>>> */ >>>> - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, >>>> - st->vcpu->cpu_hard_affinity); >>>> + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, >>>> + st->vcpu->sched_unit->cpu_hard_affinity); >>> >>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we >>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s >>> in the unit, yet every vCPU in there may want to make use of the >>> field in parallel. >> >> Hmm, yes, we'll need a usage bitmask. >> >> Please note that affecting all vcpus in the unit is per design. With >> multiple vcpus of a unit needing this feature in parallel there is no >> way they can have different needs regarding temporary affinity. > > But how will this work? I.e. how will all vCPU-s in a unit get > their temporary affinity pointing to the one specific pCPU in question? The _unit_ is pinned, so all the vcpus in that unit are pinned, too. > It's not just the "all at the same time" that I don't see working here, > I'm also having trouble seeing how the potential cross-core or cross- > node movement that's apparently needed here would end up working. I'm The unit is moved to another core via normal scheduling mechanisms. As switching context is synchronized (see patch 35) all vcpus of a unit are moved together. > not going to exclude that the set of possible pCPU-s a vCPU needs to > move to here is still within the unit, but then I'd expect assertions > to that effect to be added. Okay, will do. > >>>> --- a/xen/common/domain.c >>>> +++ b/xen/common/domain.c >>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) >>>> static void vcpu_destroy(struct vcpu *v) >>>> { >>>> - free_cpumask_var(v->cpu_hard_affinity); >>>> - free_cpumask_var(v->cpu_hard_affinity_tmp); >>>> - free_cpumask_var(v->cpu_hard_affinity_saved); >>>> - free_cpumask_var(v->cpu_soft_affinity); >>>> - >>>> free_vcpu_struct(v); >>>> } >>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( >>>> grant_table_init_vcpu(v); >>>> - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || >>>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || >>>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || >>>> - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) >>>> - goto fail; >>> >>> Seeing these, I'm actually having trouble understanding how you mean >>> to retain the user visible interface behavior here: If you only store an >>> affinity per sched unit, then how are you meaning to honor the vCPU >>> granular requests coming in? >> >> With core scheduling it is only possible to set (virtual) core >> affinities. Whenever an affinity of a vcpu is being set it will set the >> affinity of the whole unit. > > Hmm, that's indeed what I was deducing, but how will we sell this > to people actually fiddling with vCPU affinities? I foresee getting > bug reports that the respective xl command(s) do(es)n't do anymore > what it used to do. The new behavior must be documented, sure. > >>>> --- a/xen/include/xen/sched-if.h >>>> +++ b/xen/include/xen/sched-if.h >>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>>> * * The hard affinity is not a subset of soft affinity >>>> * * There is an overlap between the soft and hard affinity masks >>>> */ >>>> -static inline int has_soft_affinity(const struct vcpu *v) >>>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>>> { >>>> - return v->soft_aff_effective && >>>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>>> - v->cpu_soft_affinity); >>>> + return unit->soft_aff_effective && >>>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>>> + unit->cpu_soft_affinity); >>>> } >>> >>> Okay, at the moment there looks to be a 1:1 relationship between sched >>> units and vCPU-s. This would - at this point of the series - invalidate most >>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >>> mapping would get broken, and I can't seem to identify any other patch >>> where this might be happening. Looking at the github branch I also get the >>> impression that the struct vcpu * pointer out of struct sched_unit survives >>> until the end of the series, which doesn't seem right to me. >> >> It is right. The vcpu pointer in the sched_unit is pointing to the first >> vcpu of the unit at the end of the series. Further vcpus are found via >> v->next_in_list. > > I'm afraid this sets us up for misunderstanding and misuse. I don't > think there should be a straight struct vcpu * out of struct sched_unit. That was the most effective way to do it. What are you suggesting? > >>>> @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v) >>>> static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v) >>>> { >>>> return (is_hardware_domain(v->domain) && >>>> - cpumask_weight(v->cpu_hard_affinity) == 1); >>>> + cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1); >>>> } >>> >>> Seeing this - how is pinning (by command line option or by Dom0 >>> doing this on itself transiently) going to work with core scheduling? >> >> In the end only the bit of the first vcpu of a unit will be set in the >> affinity masks, affecting all vcpus of the unit. > > I'm confused - what "bit of the first vcpu of a unit" are you referring > to? Sorry, I meant to rewrite this sentence and forgot before sending. But I think below explanation is making it rather clear. > To give an example of what I meant with my earlier reply: How is Dom0 > requesting its vCPU 5 to be pinned to pCPU 3 going to be satisfied, > independent of the sched unit that vCPU 5 is associated with? Is the > whole sched unit getting moved over then? If so, what if another vCPU > in the same sched unit at the same time requests to be pinned to pCPU > 17, on a different node/socket? Yes, the whole sched unit will be moved. And in case a unit is already pinned and a conflicting request is made the new request will either override the old pinning (in case that was a "normal" pinning via xl command) or it will be rejected (in case the old pinning was done via SCHEDOP_pin_override). Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 6:30 ` Juergen Gross @ 2019-07-02 7:54 ` Jan Beulich 2019-07-02 8:14 ` Juergen Gross 2019-07-02 8:21 ` Dario Faggioli 0 siblings, 2 replies; 21+ messages in thread From: Jan Beulich @ 2019-07-02 7:54 UTC (permalink / raw) To: Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.2019 08:30, Juergen Gross wrote: > On 01.07.19 17:46, Jan Beulich wrote: >> On 01.07.2019 17:10, Juergen Gross wrote: >>> On 01.07.19 16:08, Jan Beulich wrote: >>>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote: >>>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) >>>>> * Set the tmp value unconditionally, so that the check in the iret >>>>> * hypercall works. >>>>> */ >>>>> - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, >>>>> - st->vcpu->cpu_hard_affinity); >>>>> + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, >>>>> + st->vcpu->sched_unit->cpu_hard_affinity); >>>> >>>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we >>>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s >>>> in the unit, yet every vCPU in there may want to make use of the >>>> field in parallel. >>> >>> Hmm, yes, we'll need a usage bitmask. >>> >>> Please note that affecting all vcpus in the unit is per design. With >>> multiple vcpus of a unit needing this feature in parallel there is no >>> way they can have different needs regarding temporary affinity. >> >> But how will this work? I.e. how will all vCPU-s in a unit get >> their temporary affinity pointing to the one specific pCPU in question? > > The _unit_ is pinned, so all the vcpus in that unit are pinned, too. Yes, but ... >> It's not just the "all at the same time" that I don't see working here, >> I'm also having trouble seeing how the potential cross-core or cross- >> node movement that's apparently needed here would end up working. I'm > > The unit is moved to another core via normal scheduling mechanisms. As > switching context is synchronized (see patch 35) all vcpus of a unit are > moved together. ... they may get pinned to different pCPU-s or all the same pCPU here. Both cases need to work, and I'm currently not seeing how that would be achieved. >>>>> --- a/xen/common/domain.c >>>>> +++ b/xen/common/domain.c >>>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) >>>>> static void vcpu_destroy(struct vcpu *v) >>>>> { >>>>> - free_cpumask_var(v->cpu_hard_affinity); >>>>> - free_cpumask_var(v->cpu_hard_affinity_tmp); >>>>> - free_cpumask_var(v->cpu_hard_affinity_saved); >>>>> - free_cpumask_var(v->cpu_soft_affinity); >>>>> - >>>>> free_vcpu_struct(v); >>>>> } >>>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( >>>>> grant_table_init_vcpu(v); >>>>> - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || >>>>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || >>>>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || >>>>> - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) >>>>> - goto fail; >>>> >>>> Seeing these, I'm actually having trouble understanding how you mean >>>> to retain the user visible interface behavior here: If you only store an >>>> affinity per sched unit, then how are you meaning to honor the vCPU >>>> granular requests coming in? >>> >>> With core scheduling it is only possible to set (virtual) core >>> affinities. Whenever an affinity of a vcpu is being set it will set the >>> affinity of the whole unit. >> >> Hmm, that's indeed what I was deducing, but how will we sell this >> to people actually fiddling with vCPU affinities? I foresee getting >> bug reports that the respective xl command(s) do(es)n't do anymore >> what it used to do. > > The new behavior must be documented, sure. Documentation is just one aspect. Often enough people only read docs when wanting to introduce new functionality, which I consider a fair model. Such people will be caught by surprise that the pinning behavior does not work the same way anymore. And again - if someone pins every vCPU to a single pCPU, that last such pinning operation will be what takes long term effect. Aiui all vCPU-s in the unit will then be pinned to that one pCPU, i.e. they'll either all compete for the one pCPU's time, or only one of them will ever get scheduled. >>>>> --- a/xen/include/xen/sched-if.h >>>>> +++ b/xen/include/xen/sched-if.h >>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>>>> * * The hard affinity is not a subset of soft affinity >>>>> * * There is an overlap between the soft and hard affinity masks >>>>> */ >>>>> -static inline int has_soft_affinity(const struct vcpu *v) >>>>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>>>> { >>>>> - return v->soft_aff_effective && >>>>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>>>> - v->cpu_soft_affinity); >>>>> + return unit->soft_aff_effective && >>>>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>>>> + unit->cpu_soft_affinity); >>>>> } >>>> >>>> Okay, at the moment there looks to be a 1:1 relationship between sched >>>> units and vCPU-s. This would - at this point of the series - invalidate most >>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >>>> mapping would get broken, and I can't seem to identify any other patch >>>> where this might be happening. Looking at the github branch I also get the >>>> impression that the struct vcpu * pointer out of struct sched_unit survives >>>> until the end of the series, which doesn't seem right to me. >>> >>> It is right. The vcpu pointer in the sched_unit is pointing to the first >>> vcpu of the unit at the end of the series. Further vcpus are found via >>> v->next_in_list. >> >> I'm afraid this sets us up for misunderstanding and misuse. I don't >> think there should be a straight struct vcpu * out of struct sched_unit. > > That was the most effective way to do it. What are you suggesting? An actual list, i.e. with a struct list_head. That'll make obvious that more than one vCPU might be associated with a unit. That's even more so that the ability to associate more than one appears only quite late in the series, i.e. there may be further instances like the code above, and it would require a careful audit (rather than the compiler finding such instance) to determine all places where using the first vCPU in a unit isn't really what was meant. Once the list approach was enforced, in a second step we might then discuss whether the list management is too much overhead, and hence whether perhaps to switch to the simpler model you're using right now. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 7:54 ` Jan Beulich @ 2019-07-02 8:14 ` Juergen Gross 2019-07-02 8:27 ` Jan Beulich 2019-07-02 8:21 ` Dario Faggioli 1 sibling, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-07-02 8:14 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.19 09:54, Jan Beulich wrote: > On 02.07.2019 08:30, Juergen Gross wrote: >> On 01.07.19 17:46, Jan Beulich wrote: >>> On 01.07.2019 17:10, Juergen Gross wrote: >>>> On 01.07.19 16:08, Jan Beulich wrote: >>>>>>>> On 28.05.19 at 12:32, <jgross@suse.com> wrote: >>>>>> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) >>>>>> * Set the tmp value unconditionally, so that the check in the iret >>>>>> * hypercall works. >>>>>> */ >>>>>> - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, >>>>>> - st->vcpu->cpu_hard_affinity); >>>>>> + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, >>>>>> + st->vcpu->sched_unit->cpu_hard_affinity); >>>>> >>>>> Aiui this affects all vCPU-s in the unit, which is unlikely to be what we >>>>> want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s >>>>> in the unit, yet every vCPU in there may want to make use of the >>>>> field in parallel. >>>> >>>> Hmm, yes, we'll need a usage bitmask. >>>> >>>> Please note that affecting all vcpus in the unit is per design. With >>>> multiple vcpus of a unit needing this feature in parallel there is no >>>> way they can have different needs regarding temporary affinity. >>> >>> But how will this work? I.e. how will all vCPU-s in a unit get >>> their temporary affinity pointing to the one specific pCPU in question? >> >> The _unit_ is pinned, so all the vcpus in that unit are pinned, too. > > Yes, but ... > >>> It's not just the "all at the same time" that I don't see working here, >>> I'm also having trouble seeing how the potential cross-core or cross- >>> node movement that's apparently needed here would end up working. I'm >> >> The unit is moved to another core via normal scheduling mechanisms. As >> switching context is synchronized (see patch 35) all vcpus of a unit are >> moved together. > > ... they may get pinned to different pCPU-s or all the same pCPU here. > Both cases need to work, and I'm currently not seeing how that would > be achieved. > >>>>>> --- a/xen/common/domain.c >>>>>> +++ b/xen/common/domain.c >>>>>> @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) >>>>>> static void vcpu_destroy(struct vcpu *v) >>>>>> { >>>>>> - free_cpumask_var(v->cpu_hard_affinity); >>>>>> - free_cpumask_var(v->cpu_hard_affinity_tmp); >>>>>> - free_cpumask_var(v->cpu_hard_affinity_saved); >>>>>> - free_cpumask_var(v->cpu_soft_affinity); >>>>>> - >>>>>> free_vcpu_struct(v); >>>>>> } >>>>>> @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( >>>>>> grant_table_init_vcpu(v); >>>>>> - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || >>>>>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || >>>>>> - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || >>>>>> - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) >>>>>> - goto fail; >>>>> >>>>> Seeing these, I'm actually having trouble understanding how you mean >>>>> to retain the user visible interface behavior here: If you only store an >>>>> affinity per sched unit, then how are you meaning to honor the vCPU >>>>> granular requests coming in? >>>> >>>> With core scheduling it is only possible to set (virtual) core >>>> affinities. Whenever an affinity of a vcpu is being set it will set the >>>> affinity of the whole unit. >>> >>> Hmm, that's indeed what I was deducing, but how will we sell this >>> to people actually fiddling with vCPU affinities? I foresee getting >>> bug reports that the respective xl command(s) do(es)n't do anymore >>> what it used to do. >> >> The new behavior must be documented, sure. > > Documentation is just one aspect. Often enough people only read docs > when wanting to introduce new functionality, which I consider a fair > model. Such people will be caught by surprise that the pinning > behavior does not work the same way anymore. > > And again - if someone pins every vCPU to a single pCPU, that last > such pinning operation will be what takes long term effect. Aiui all > vCPU-s in the unit will then be pinned to that one pCPU, i.e. > they'll either all compete for the one pCPU's time, or only one of > them will ever get scheduled. No, that's not how it works. Lets say we have a system with the following topology and core scheduling active: cpu0: core 0, thread 0 cpu1: core 0, thread 1 cpu2: core 1, thread 0 cpu3: core 1, thread 1 Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2, while any odd numbered vcpu will only run on cpu1 or cpu3. So virtual cores get scheduled on physical cores. Virtual thread 0 will only run on physical thread 0 and the associated virtual thread 1 will run on the associated physical thread 1 of the same physical core. Pinning a virtual thread 1 to a physical thread 0 is not possible (in reality only the virtual core is being pinned to the physical core). Juergen > >>>>>> --- a/xen/include/xen/sched-if.h >>>>>> +++ b/xen/include/xen/sched-if.h >>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>>>>> * * The hard affinity is not a subset of soft affinity >>>>>> * * There is an overlap between the soft and hard affinity masks >>>>>> */ >>>>>> -static inline int has_soft_affinity(const struct vcpu *v) >>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>>>>> { >>>>>> - return v->soft_aff_effective && >>>>>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>>>>> - v->cpu_soft_affinity); >>>>>> + return unit->soft_aff_effective && >>>>>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>>>>> + unit->cpu_soft_affinity); >>>>>> } >>>>> >>>>> Okay, at the moment there looks to be a 1:1 relationship between sched >>>>> units and vCPU-s. This would - at this point of the series - invalidate most >>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >>>>> mapping would get broken, and I can't seem to identify any other patch >>>>> where this might be happening. Looking at the github branch I also get the >>>>> impression that the struct vcpu * pointer out of struct sched_unit survives >>>>> until the end of the series, which doesn't seem right to me. >>>> >>>> It is right. The vcpu pointer in the sched_unit is pointing to the first >>>> vcpu of the unit at the end of the series. Further vcpus are found via >>>> v->next_in_list. >>> >>> I'm afraid this sets us up for misunderstanding and misuse. I don't >>> think there should be a straight struct vcpu * out of struct sched_unit. >> >> That was the most effective way to do it. What are you suggesting? > > An actual list, i.e. with a struct list_head. That'll make obvious that > more than one vCPU might be associated with a unit. That's even more so > that the ability to associate more than one appears only quite late in > the series, i.e. there may be further instances like the code above, and > it would require a careful audit (rather than the compiler finding such > instance) to determine all places where using the first vCPU in a unit > isn't really what was meant. TBH I don't see how this would help at all. Instead of using the vcpu pointer I'd had to use the list head instead. Why is that different to a plain pointer regarding finding the places where using the first vcpu was wrong? Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 8:14 ` Juergen Gross @ 2019-07-02 8:27 ` Jan Beulich 2019-07-02 8:44 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2019-07-02 8:27 UTC (permalink / raw) To: Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.2019 10:14, Juergen Gross wrote: > On 02.07.19 09:54, Jan Beulich wrote: >> And again - if someone pins every vCPU to a single pCPU, that last >> such pinning operation will be what takes long term effect. Aiui all >> vCPU-s in the unit will then be pinned to that one pCPU, i.e. >> they'll either all compete for the one pCPU's time, or only one of >> them will ever get scheduled. > > No, that's not how it works. Lets say we have a system with the > following topology and core scheduling active: > > cpu0: core 0, thread 0 > cpu1: core 0, thread 1 > cpu2: core 1, thread 0 > cpu3: core 1, thread 1 > > Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2, > while any odd numbered vcpu will only run on cpu1 or cpu3. > > So virtual cores get scheduled on physical cores. Virtual thread 0 will > only run on physical thread 0 and the associated virtual thread 1 will > run on the associated physical thread 1 of the same physical core. > > Pinning a virtual thread 1 to a physical thread 0 is not possible (in > reality only the virtual core is being pinned to the physical core). But that's what existing guests may be doing. You may want to take a look at our old, non-pvops kernels, in particular the functionality provided by their drivers/xen/core/domctl.c. Yes, {sys,dom}ctl-s aren't supposed to be used by the kernel, but to achieve the intended effects I saw no way around (ab)using them. (I mean this to be taken as an example only - I realize that the code there wouldn't work on modern Xen without updating, due to the sysctl/domctl interface version that needs setting.) >>>>>>> --- a/xen/include/xen/sched-if.h >>>>>>> +++ b/xen/include/xen/sched-if.h >>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>>>>>> * * The hard affinity is not a subset of soft affinity >>>>>>> * * There is an overlap between the soft and hard affinity masks >>>>>>> */ >>>>>>> -static inline int has_soft_affinity(const struct vcpu *v) >>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>>>>>> { >>>>>>> - return v->soft_aff_effective && >>>>>>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>>>>>> - v->cpu_soft_affinity); >>>>>>> + return unit->soft_aff_effective && >>>>>>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>>>>>> + unit->cpu_soft_affinity); >>>>>>> } >>>>>> >>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched >>>>>> units and vCPU-s. This would - at this point of the series - invalidate most >>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >>>>>> mapping would get broken, and I can't seem to identify any other patch >>>>>> where this might be happening. Looking at the github branch I also get the >>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives >>>>>> until the end of the series, which doesn't seem right to me. >>>>> >>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first >>>>> vcpu of the unit at the end of the series. Further vcpus are found via >>>>> v->next_in_list. >>>> >>>> I'm afraid this sets us up for misunderstanding and misuse. I don't >>>> think there should be a straight struct vcpu * out of struct sched_unit. >>> >>> That was the most effective way to do it. What are you suggesting? >> >> An actual list, i.e. with a struct list_head. That'll make obvious that >> more than one vCPU might be associated with a unit. That's even more so >> that the ability to associate more than one appears only quite late in >> the series, i.e. there may be further instances like the code above, and >> it would require a careful audit (rather than the compiler finding such >> instance) to determine all places where using the first vCPU in a unit >> isn't really what was meant. > > TBH I don't see how this would help at all. > > Instead of using the vcpu pointer I'd had to use the list head instead. > Why is that different to a plain pointer regarding finding the places > where using the first vcpu was wrong? Take the example above: Is it correct to act on just the first vCPU? I guess _here_ it is, but the same pattern could be found elsewhere. If, from the beginning, you use a clearly identifiable list construct, then it'll be obvious to you as the writer and to reviewers that by the end of the series there may be multiple entities that need dealing with - we'd see list_first*() or for_each*() constructs right away (and you wouldn't be able to circumvent their use in a way that wouldn't trigger "don't open-code" comments). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 8:27 ` Jan Beulich @ 2019-07-02 8:44 ` Juergen Gross 2019-07-02 9:05 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-07-02 8:44 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.19 10:27, Jan Beulich wrote: > On 02.07.2019 10:14, Juergen Gross wrote: >> On 02.07.19 09:54, Jan Beulich wrote: >>> And again - if someone pins every vCPU to a single pCPU, that last >>> such pinning operation will be what takes long term effect. Aiui all >>> vCPU-s in the unit will then be pinned to that one pCPU, i.e. >>> they'll either all compete for the one pCPU's time, or only one of >>> them will ever get scheduled. >> >> No, that's not how it works. Lets say we have a system with the >> following topology and core scheduling active: >> >> cpu0: core 0, thread 0 >> cpu1: core 0, thread 1 >> cpu2: core 1, thread 0 >> cpu3: core 1, thread 1 >> >> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2, >> while any odd numbered vcpu will only run on cpu1 or cpu3. >> >> So virtual cores get scheduled on physical cores. Virtual thread 0 will >> only run on physical thread 0 and the associated virtual thread 1 will >> run on the associated physical thread 1 of the same physical core. >> >> Pinning a virtual thread 1 to a physical thread 0 is not possible (in >> reality only the virtual core is being pinned to the physical core). > > But that's what existing guests may be doing. You may want to > take a look at our old, non-pvops kernels, in particular the > functionality provided by their drivers/xen/core/domctl.c. Yes, > {sys,dom}ctl-s aren't supposed to be used by the kernel, but to > achieve the intended effects I saw no way around (ab)using them. > (I mean this to be taken as an example only - I realize that the > code there wouldn't work on modern Xen without updating, due to > the sysctl/domctl interface version that needs setting.) First - speaking of "guests" is a little bit misleading here. This is restricted to dom0. So when you want to use such a dom0 kernel with Xen 4.13 or later you'd need to stay with cpu scheduling. Any recent kernel will run just fine as dom0 with core scheduling active. > >>>>>>>> --- a/xen/include/xen/sched-if.h >>>>>>>> +++ b/xen/include/xen/sched-if.h >>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>>>>>>> * * The hard affinity is not a subset of soft affinity >>>>>>>> * * There is an overlap between the soft and hard affinity masks >>>>>>>> */ >>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v) >>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>>>>>>> { >>>>>>>> - return v->soft_aff_effective && >>>>>>>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>>>>>>> - v->cpu_soft_affinity); >>>>>>>> + return unit->soft_aff_effective && >>>>>>>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>>>>>>> + unit->cpu_soft_affinity); >>>>>>>> } >>>>>>> >>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched >>>>>>> units and vCPU-s. This would - at this point of the series - invalidate most >>>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >>>>>>> mapping would get broken, and I can't seem to identify any other patch >>>>>>> where this might be happening. Looking at the github branch I also get the >>>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives >>>>>>> until the end of the series, which doesn't seem right to me. >>>>>> >>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first >>>>>> vcpu of the unit at the end of the series. Further vcpus are found via >>>>>> v->next_in_list. >>>>> >>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't >>>>> think there should be a straight struct vcpu * out of struct sched_unit. >>>> >>>> That was the most effective way to do it. What are you suggesting? >>> >>> An actual list, i.e. with a struct list_head. That'll make obvious that >>> more than one vCPU might be associated with a unit. That's even more so >>> that the ability to associate more than one appears only quite late in >>> the series, i.e. there may be further instances like the code above, and >>> it would require a careful audit (rather than the compiler finding such >>> instance) to determine all places where using the first vCPU in a unit >>> isn't really what was meant. >> >> TBH I don't see how this would help at all. >> >> Instead of using the vcpu pointer I'd had to use the list head instead. >> Why is that different to a plain pointer regarding finding the places >> where using the first vcpu was wrong? > > Take the example above: Is it correct to act on just the first vCPU? > I guess _here_ it is, but the same pattern could be found elsewhere. > If, from the beginning, you use a clearly identifiable list construct, > then it'll be obvious to you as the writer and to reviewers that by > the end of the series there may be multiple entities that need dealing > with - we'd see list_first*() or for_each*() constructs right away > (and you wouldn't be able to circumvent their use in a way that > wouldn't trigger "don't open-code" comments). Would you be fine with just renaming the pointer to "vcpu_list"? This would avoid the need to introduce another vcpu list in struct vcpu and I could re-use the already existing list as today. It should be noted that I named the pointer just "vcpu" in order to avoid lots of reformatting due to longer lines, though. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 8:44 ` Juergen Gross @ 2019-07-02 9:05 ` Jan Beulich 2019-07-02 9:16 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2019-07-02 9:05 UTC (permalink / raw) To: Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.2019 10:44, Juergen Gross wrote: > On 02.07.19 10:27, Jan Beulich wrote: >> On 02.07.2019 10:14, Juergen Gross wrote: >>> On 02.07.19 09:54, Jan Beulich wrote: >>>> And again - if someone pins every vCPU to a single pCPU, that last >>>> such pinning operation will be what takes long term effect. Aiui all >>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e. >>>> they'll either all compete for the one pCPU's time, or only one of >>>> them will ever get scheduled. >>> >>> No, that's not how it works. Lets say we have a system with the >>> following topology and core scheduling active: >>> >>> cpu0: core 0, thread 0 >>> cpu1: core 0, thread 1 >>> cpu2: core 1, thread 0 >>> cpu3: core 1, thread 1 >>> >>> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2, >>> while any odd numbered vcpu will only run on cpu1 or cpu3. >>> >>> So virtual cores get scheduled on physical cores. Virtual thread 0 will >>> only run on physical thread 0 and the associated virtual thread 1 will >>> run on the associated physical thread 1 of the same physical core. >>> >>> Pinning a virtual thread 1 to a physical thread 0 is not possible (in >>> reality only the virtual core is being pinned to the physical core). >> >> But that's what existing guests may be doing. You may want to >> take a look at our old, non-pvops kernels, in particular the >> functionality provided by their drivers/xen/core/domctl.c. Yes, >> {sys,dom}ctl-s aren't supposed to be used by the kernel, but to >> achieve the intended effects I saw no way around (ab)using them. >> (I mean this to be taken as an example only - I realize that the >> code there wouldn't work on modern Xen without updating, due to >> the sysctl/domctl interface version that needs setting.) > > First - speaking of "guests" is a little bit misleading here. This is > restricted to dom0. > > So when you want to use such a dom0 kernel with Xen 4.13 or later you'd > need to stay with cpu scheduling. Any recent kernel will run just fine > as dom0 with core scheduling active. Right, but such recent kernels have (afaict) no solution to some of the problems that the (ab)use of the {sys,dom}ctl-s were meant to address. >>>>>>>>> --- a/xen/include/xen/sched-if.h >>>>>>>>> +++ b/xen/include/xen/sched-if.h >>>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>>>>>>>> * * The hard affinity is not a subset of soft affinity >>>>>>>>> * * There is an overlap between the soft and hard affinity masks >>>>>>>>> */ >>>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v) >>>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>>>>>>>> { >>>>>>>>> - return v->soft_aff_effective && >>>>>>>>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>>>>>>>> - v->cpu_soft_affinity); >>>>>>>>> + return unit->soft_aff_effective && >>>>>>>>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>>>>>>>> + unit->cpu_soft_affinity); >>>>>>>>> } >>>>>>>> >>>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched >>>>>>>> units and vCPU-s. This would - at this point of the series - invalidate most >>>>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >>>>>>>> mapping would get broken, and I can't seem to identify any other patch >>>>>>>> where this might be happening. Looking at the github branch I also get the >>>>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives >>>>>>>> until the end of the series, which doesn't seem right to me. >>>>>>> >>>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first >>>>>>> vcpu of the unit at the end of the series. Further vcpus are found via >>>>>>> v->next_in_list. >>>>>> >>>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't >>>>>> think there should be a straight struct vcpu * out of struct sched_unit. >>>>> >>>>> That was the most effective way to do it. What are you suggesting? >>>> >>>> An actual list, i.e. with a struct list_head. That'll make obvious that >>>> more than one vCPU might be associated with a unit. That's even more so >>>> that the ability to associate more than one appears only quite late in >>>> the series, i.e. there may be further instances like the code above, and >>>> it would require a careful audit (rather than the compiler finding such >>>> instance) to determine all places where using the first vCPU in a unit >>>> isn't really what was meant. >>> >>> TBH I don't see how this would help at all. >>> >>> Instead of using the vcpu pointer I'd had to use the list head instead. >>> Why is that different to a plain pointer regarding finding the places >>> where using the first vcpu was wrong? >> >> Take the example above: Is it correct to act on just the first vCPU? >> I guess _here_ it is, but the same pattern could be found elsewhere. >> If, from the beginning, you use a clearly identifiable list construct, >> then it'll be obvious to you as the writer and to reviewers that by >> the end of the series there may be multiple entities that need dealing >> with - we'd see list_first*() or for_each*() constructs right away >> (and you wouldn't be able to circumvent their use in a way that >> wouldn't trigger "don't open-code" comments). > > Would you be fine with just renaming the pointer to "vcpu_list"? > This would avoid the need to introduce another vcpu list in struct vcpu > and I could re-use the already existing list as today. Well, yes, this would at least make obvious at use sites that there may be more than one of them. > It should be noted that I named the pointer just "vcpu" in order to > avoid lots of reformatting due to longer lines, though. I can appreciate this aspect, but as said I'm afraid it would be misleading (and potentially inviting for mis-use). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 9:05 ` Jan Beulich @ 2019-07-02 9:16 ` Juergen Gross 0 siblings, 0 replies; 21+ messages in thread From: Juergen Gross @ 2019-07-02 9:16 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.19 11:05, Jan Beulich wrote: > On 02.07.2019 10:44, Juergen Gross wrote: >> On 02.07.19 10:27, Jan Beulich wrote: >>> On 02.07.2019 10:14, Juergen Gross wrote: >>>> On 02.07.19 09:54, Jan Beulich wrote: >>>>> And again - if someone pins every vCPU to a single pCPU, that last >>>>> such pinning operation will be what takes long term effect. Aiui all >>>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e. >>>>> they'll either all compete for the one pCPU's time, or only one of >>>>> them will ever get scheduled. >>>> >>>> No, that's not how it works. Lets say we have a system with the >>>> following topology and core scheduling active: >>>> >>>> cpu0: core 0, thread 0 >>>> cpu1: core 0, thread 1 >>>> cpu2: core 1, thread 0 >>>> cpu3: core 1, thread 1 >>>> >>>> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2, >>>> while any odd numbered vcpu will only run on cpu1 or cpu3. >>>> >>>> So virtual cores get scheduled on physical cores. Virtual thread 0 will >>>> only run on physical thread 0 and the associated virtual thread 1 will >>>> run on the associated physical thread 1 of the same physical core. >>>> >>>> Pinning a virtual thread 1 to a physical thread 0 is not possible (in >>>> reality only the virtual core is being pinned to the physical core). >>> >>> But that's what existing guests may be doing. You may want to >>> take a look at our old, non-pvops kernels, in particular the >>> functionality provided by their drivers/xen/core/domctl.c. Yes, >>> {sys,dom}ctl-s aren't supposed to be used by the kernel, but to >>> achieve the intended effects I saw no way around (ab)using them. >>> (I mean this to be taken as an example only - I realize that the >>> code there wouldn't work on modern Xen without updating, due to >>> the sysctl/domctl interface version that needs setting.) >> >> First - speaking of "guests" is a little bit misleading here. This is >> restricted to dom0. >> >> So when you want to use such a dom0 kernel with Xen 4.13 or later you'd >> need to stay with cpu scheduling. Any recent kernel will run just fine >> as dom0 with core scheduling active. > > Right, but such recent kernels have (afaict) no solution to some of > the problems that the (ab)use of the {sys,dom}ctl-s were meant to > address. With SCHEDOP_pin_override this should all be doable. The kernel would need to execute the SCHEDOP_pin_override code on a suitable vcpu (ideally with vcpu-id == pcpu-id). I'd prefer new hypervisor interfaces for that purpose, though. > >>>>>>>>>> --- a/xen/include/xen/sched-if.h >>>>>>>>>> +++ b/xen/include/xen/sched-if.h >>>>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) >>>>>>>>>> * * The hard affinity is not a subset of soft affinity >>>>>>>>>> * * There is an overlap between the soft and hard affinity masks >>>>>>>>>> */ >>>>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v) >>>>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit) >>>>>>>>>> { >>>>>>>>>> - return v->soft_aff_effective && >>>>>>>>>> - !cpumask_subset(cpupool_domain_cpumask(v->domain), >>>>>>>>>> - v->cpu_soft_affinity); >>>>>>>>>> + return unit->soft_aff_effective && >>>>>>>>>> + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), >>>>>>>>>> + unit->cpu_soft_affinity); >>>>>>>>>> } >>>>>>>>> >>>>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched >>>>>>>>> units and vCPU-s. This would - at this point of the series - invalidate most >>>>>>>>> my earlier comments. However, in patch 57 I don't see how this unit->vcpu >>>>>>>>> mapping would get broken, and I can't seem to identify any other patch >>>>>>>>> where this might be happening. Looking at the github branch I also get the >>>>>>>>> impression that the struct vcpu * pointer out of struct sched_unit survives >>>>>>>>> until the end of the series, which doesn't seem right to me. >>>>>>>> >>>>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first >>>>>>>> vcpu of the unit at the end of the series. Further vcpus are found via >>>>>>>> v->next_in_list. >>>>>>> >>>>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't >>>>>>> think there should be a straight struct vcpu * out of struct sched_unit. >>>>>> >>>>>> That was the most effective way to do it. What are you suggesting? >>>>> >>>>> An actual list, i.e. with a struct list_head. That'll make obvious that >>>>> more than one vCPU might be associated with a unit. That's even more so >>>>> that the ability to associate more than one appears only quite late in >>>>> the series, i.e. there may be further instances like the code above, and >>>>> it would require a careful audit (rather than the compiler finding such >>>>> instance) to determine all places where using the first vCPU in a unit >>>>> isn't really what was meant. >>>> >>>> TBH I don't see how this would help at all. >>>> >>>> Instead of using the vcpu pointer I'd had to use the list head instead. >>>> Why is that different to a plain pointer regarding finding the places >>>> where using the first vcpu was wrong? >>> >>> Take the example above: Is it correct to act on just the first vCPU? >>> I guess _here_ it is, but the same pattern could be found elsewhere. >>> If, from the beginning, you use a clearly identifiable list construct, >>> then it'll be obvious to you as the writer and to reviewers that by >>> the end of the series there may be multiple entities that need dealing >>> with - we'd see list_first*() or for_each*() constructs right away >>> (and you wouldn't be able to circumvent their use in a way that >>> wouldn't trigger "don't open-code" comments). >> >> Would you be fine with just renaming the pointer to "vcpu_list"? >> This would avoid the need to introduce another vcpu list in struct vcpu >> and I could re-use the already existing list as today. > > Well, yes, this would at least make obvious at use sites that there > may be more than one of them. Thanks. > >> It should be noted that I named the pointer just "vcpu" in order to >> avoid lots of reformatting due to longer lines, though. > > I can appreciate this aspect, but as said I'm afraid it would be > misleading (and potentially inviting for mis-use). Okay, lets live with some more multi-line statements then. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 7:54 ` Jan Beulich 2019-07-02 8:14 ` Juergen Gross @ 2019-07-02 8:21 ` Dario Faggioli 2019-07-02 8:29 ` Jan Beulich 1 sibling, 1 reply; 21+ messages in thread From: Dario Faggioli @ 2019-07-02 8:21 UTC (permalink / raw) To: Jan Beulich, Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne [-- Attachment #1.1: Type: text/plain, Size: 1647 bytes --] On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote: > On 02.07.2019 08:30, Juergen Gross wrote: > > On 01.07.19 17:46, Jan Beulich wrote: > > > > > > Hmm, that's indeed what I was deducing, but how will we sell this > > > to people actually fiddling with vCPU affinities? I foresee > > > getting > > > bug reports that the respective xl command(s) do(es)n't do > > > anymore > > > what it used to do. > > > > The new behavior must be documented, sure. > > Documentation is just one aspect. Often enough people only read docs > when wanting to introduce new functionality, which I consider a fair > model. Such people will be caught by surprise that the pinning > behavior does not work the same way anymore. > That is indeed the case, and we need to think about how to address it, I agree. > And again - if someone pins every vCPU to a single pCPU, that last > such pinning operation will be what takes long term effect. Aiui all > vCPU-s in the unit will then be pinned to that one pCPU, i.e. > they'll either all compete for the one pCPU's time, or only one of > them will ever get scheduled. > I'm not sure I'm getting this. On an, say, SMT system, with 4 threads per core, a unit is 4 vCPUs and a pCPU is 4 threads. If we pin all the 4 vCPUs of a unit to one 4 thread pCPU, each vCPU will get a thread. Isn't it so? Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 8:21 ` Dario Faggioli @ 2019-07-02 8:29 ` Jan Beulich 2019-07-02 9:40 ` Dario Faggioli 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2019-07-02 8:29 UTC (permalink / raw) To: Dario Faggioli, Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.2019 10:21, Dario Faggioli wrote: > On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote: >> On 02.07.2019 08:30, Juergen Gross wrote: >>> On 01.07.19 17:46, Jan Beulich wrote: >>>> >>>> Hmm, that's indeed what I was deducing, but how will we sell this >>>> to people actually fiddling with vCPU affinities? I foresee >>>> getting >>>> bug reports that the respective xl command(s) do(es)n't do >>>> anymore >>>> what it used to do. >>> >>> The new behavior must be documented, sure. >> >> Documentation is just one aspect. Often enough people only read docs >> when wanting to introduce new functionality, which I consider a fair >> model. Such people will be caught by surprise that the pinning >> behavior does not work the same way anymore. >> > That is indeed the case, and we need to think about how to address it, > I agree. > >> And again - if someone pins every vCPU to a single pCPU, that last >> such pinning operation will be what takes long term effect. Aiui all >> vCPU-s in the unit will then be pinned to that one pCPU, i.e. >> they'll either all compete for the one pCPU's time, or only one of >> them will ever get scheduled. >> > I'm not sure I'm getting this. On an, say, SMT system, with 4 threads > per core, a unit is 4 vCPUs and a pCPU is 4 threads. No, the meaning of pCPU is a single thread of a single core. I.e. what is represented by a single cpumask_t bit. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 8:29 ` Jan Beulich @ 2019-07-02 9:40 ` Dario Faggioli 2019-07-02 10:01 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Dario Faggioli @ 2019-07-02 9:40 UTC (permalink / raw) To: Jan Beulich, Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne [-- Attachment #1.1: Type: text/plain, Size: 1411 bytes --] On Tue, 2019-07-02 at 08:29 +0000, Jan Beulich wrote: > On 02.07.2019 10:21, Dario Faggioli wrote: > > On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote: > > > > > > And again - if someone pins every vCPU to a single pCPU, that > > > last > > > such pinning operation will be what takes long term effect. Aiui > > > all > > > vCPU-s in the unit will then be pinned to that one pCPU, i.e. > > > they'll either all compete for the one pCPU's time, or only one > > > of > > > them will ever get scheduled. > > > > > I'm not sure I'm getting this. On an, say, SMT system, with 4 > > threads > > per core, a unit is 4 vCPUs and a pCPU is 4 threads. > > No, the meaning of pCPU is a single thread of a single core. I.e. > what is represented by a single cpumask_t bit. > Fine, let's continue to call that a pCPU. Then, when core-scheduling is enabled, there's no <<multiple vCPUs of a unit being pinned to the same pCPU and all competing for jut its CPU time>>. There's units of 4 vCPUs being pinned to 4 pCPUs (the 4 pCPUs of a core, not 4 random, nor arbitrary, ones). That is the point, AFAIUI. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 9:40 ` Dario Faggioli @ 2019-07-02 10:01 ` Jan Beulich 2019-07-02 10:25 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2019-07-02 10:01 UTC (permalink / raw) To: Dario Faggioli, Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.2019 11:40, Dario Faggioli wrote: > On Tue, 2019-07-02 at 08:29 +0000, Jan Beulich wrote: >> On 02.07.2019 10:21, Dario Faggioli wrote: >>> On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote: >>>> >>>> And again - if someone pins every vCPU to a single pCPU, that >>>> last >>>> such pinning operation will be what takes long term effect. Aiui >>>> all >>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e. >>>> they'll either all compete for the one pCPU's time, or only one >>>> of >>>> them will ever get scheduled. >>>> >>> I'm not sure I'm getting this. On an, say, SMT system, with 4 >>> threads >>> per core, a unit is 4 vCPUs and a pCPU is 4 threads. >> >> No, the meaning of pCPU is a single thread of a single core. I.e. >> what is represented by a single cpumask_t bit. >> > Fine, let's continue to call that a pCPU. Then, when core-scheduling is > enabled, there's no <<multiple vCPUs of a unit being pinned to the same > pCPU and all competing for jut its CPU time>>. > > There's units of 4 vCPUs being pinned to 4 pCPUs (the 4 pCPUs of a > core, not 4 random, nor arbitrary, ones). > > That is the point, AFAIUI. Well, okay, quite possible. But then for the example topology you gave, what is going to be the effect of xl vcpu-pin 0 0 0 ? In particular for Dom0 and in particular for CPU 0, there may be reasons to pin a particular vCPU to it. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-07-02 10:01 ` Jan Beulich @ 2019-07-02 10:25 ` Juergen Gross 0 siblings, 0 replies; 21+ messages in thread From: Juergen Gross @ 2019-07-02 10:25 UTC (permalink / raw) To: Jan Beulich, Dario Faggioli Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall, Meng Xu, xen-devel, Ian Jackson, Roger Pau Monne On 02.07.19 12:01, Jan Beulich wrote: > On 02.07.2019 11:40, Dario Faggioli wrote: >> On Tue, 2019-07-02 at 08:29 +0000, Jan Beulich wrote: >>> On 02.07.2019 10:21, Dario Faggioli wrote: >>>> On Tue, 2019-07-02 at 07:54 +0000, Jan Beulich wrote: >>>>> >>>>> And again - if someone pins every vCPU to a single pCPU, that >>>>> last >>>>> such pinning operation will be what takes long term effect. Aiui >>>>> all >>>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e. >>>>> they'll either all compete for the one pCPU's time, or only one >>>>> of >>>>> them will ever get scheduled. >>>>> >>>> I'm not sure I'm getting this. On an, say, SMT system, with 4 >>>> threads >>>> per core, a unit is 4 vCPUs and a pCPU is 4 threads. >>> >>> No, the meaning of pCPU is a single thread of a single core. I.e. >>> what is represented by a single cpumask_t bit. >>> >> Fine, let's continue to call that a pCPU. Then, when core-scheduling is >> enabled, there's no <<multiple vCPUs of a unit being pinned to the same >> pCPU and all competing for jut its CPU time>>. >> >> There's units of 4 vCPUs being pinned to 4 pCPUs (the 4 pCPUs of a >> core, not 4 random, nor arbitrary, ones). >> >> That is the point, AFAIUI. > > Well, okay, quite possible. But then for the example topology > you gave, what is going to be the effect of > > xl vcpu-pin 0 0 0 dom0 vcpu0 will be pinned to cpu0. dom0 vcpu1 will be pinned to cpu1. dom0 vcpu2 will be pinned to cpu2. dom0 vcpu3 will be pinned to cpu3. > ? In particular for Dom0 and in particular for CPU 0, there may > be reasons to pin a particular vCPU to it. In reality only pinning vcpu0 to cpu0 should be needed. And this is done e.g. in dcdbas_smi_request(). Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 00/60] xen: add core scheduling support @ 2019-05-28 10:32 Juergen Gross 2019-05-28 10:32 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw) To: xen-devel Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead, Meng Xu, Jan Beulich, Roger Pau Monné Add support for core- and socket-scheduling in the Xen hypervisor. Via boot parameter sched-gran=core (or sched-gran=socket) it is possible to change the scheduling granularity from cpu (the default) to either whole cores or even sockets. All logical cpus (threads) of the core or socket are always scheduled together. This means that on a core always vcpus of the same domain will be active, and those vcpus will always be scheduled at the same time. This is achieved by switching the scheduler to no longer see vcpus as the primary object to schedule, but "schedule units". Each schedule unit consists of as many vcpus as each core has threads on the current system. The vcpu->unit relation is fixed. I have done some very basic performance testing: on a 4 cpu system (2 cores with 2 threads each) I did a "make -j 4" for building the Xen hypervisor. With This test has been run on dom0, once with no other guest active and once with another guest with 4 vcpus running the same test. The results are (always elapsed time, system time, user time): sched-gran=cpu, no other guest: 116.10 177.65 207.84 sched-gran=core, no other guest: 114.04 175.47 207.45 sched-gran=cpu, other guest: 202.30 334.21 384.63 sched-gran=core, other guest: 207.24 293.04 371.37 The performance tests have been performed with credit2, the other schedulers are tested only briefly to be able to create a domain in a cpupool. Cpupools have been moderately tested (cpu add/remove, create, destroy, move domain). Cpu on-/offlining has been moderately tested, too. The complete patch series is available under: git://github.com/jgross1/xen/ sched-v1 Changes in V1: - cpupools are working now - cpu on-/offlining working now - all schedulers working now - renamed "items" to "units" - introduction of "idle scheduler" - several new patches (see individual patches, mostly splits of former patches or cpupool and cpu on-/offlining support) - all review comments addressed - some minor changes (see individual patches) Changes in RFC V2: - ARM is building now - HVM domains are working now - idling will always be done with idle_vcpu active - other small changes see individual patches Juergen Gross (60): xen/sched: only allow schedulers with all mandatory functions available xen/sched: add inline wrappers for calling per-scheduler functions xen/sched: let sched_switch_sched() return new lock address xen/sched: use new sched_unit instead of vcpu in scheduler interfaces xen/sched: alloc struct sched_unit for each vcpu xen/sched: move per-vcpu scheduler private data pointer to sched_unit xen/sched: build a linked list of struct sched_unit xen/sched: introduce struct sched_resource xen/sched: let pick_cpu return a scheduler resource xen/sched: switch schedule_data.curr to point at sched_unit xen/sched: move per cpu scheduler private data into struct sched_resource xen/sched: switch vcpu_schedule_lock to unit_schedule_lock xen/sched: move some per-vcpu items to struct sched_unit xen/sched: add scheduler helpers hiding vcpu xen/sched: add domain pointer to struct sched_unit xen/sched: add id to struct sched_unit xen/sched: rename scheduler related perf counters xen/sched: switch struct task_slice from vcpu to sched_unit xen/sched: add is_running indicator to struct sched_unit xen/sched: make null scheduler vcpu agnostic. xen/sched: make rt scheduler vcpu agnostic. xen/sched: make credit scheduler vcpu agnostic. xen/sched: make credit2 scheduler vcpu agnostic. xen/sched: make arinc653 scheduler vcpu agnostic. xen: add sched_unit_pause_nosync() and sched_unit_unpause() xen: let vcpu_create() select processor xen/sched: use sched_resource cpu instead smp_processor_id in schedulers xen/sched: switch schedule() from vcpus to sched_units xen/sched: switch sched_move_irqs() to take sched_unit as parameter xen: switch from for_each_vcpu() to for_each_sched_unit() xen/sched: add runstate counters to struct sched_unit xen/sched: rework and rename vcpu_force_reschedule() xen/sched: Change vcpu_migrate_*() to operate on schedule unit xen/sched: move struct task_slice into struct sched_unit xen/sched: add code to sync scheduling of all vcpus of a sched unit xen/sched: introduce unit_runnable_state() xen/sched: add support for multiple vcpus per sched unit where missing x86: make loading of GDT at context switch more modular x86: optimize loading of GDT at context switch xen/sched: modify cpupool_domain_cpumask() to be an unit mask xen/sched: support allocating multiple vcpus into one sched unit xen/sched: add a scheduler_percpu_init() function xen/sched: add a percpu resource index xen/sched: add fall back to idle vcpu when scheduling unit xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware xen/sched: carve out freeing sched_unit memory into dedicated function xen/sched: move per-cpu variable scheduler to struct sched_resource xen/sched: move per-cpu variable cpupool to struct sched_resource xen/sched: reject switching smt on/off with core scheduling active xen/sched: prepare per-cpupool scheduling granularity xen/sched: use one schedule lock for all free cpus xen/sched: populate cpupool0 only after all cpus are up xen/sched: remove cpu from pool0 before removing it xen/sched: add minimalistic idle scheduler for free cpus xen/sched: split schedule_cpu_switch() xen/sched: protect scheduling resource via rcu xen/sched: support multiple cpus per scheduling resource xen/sched: support differing granularity in schedule_cpu_[add/rm]() xen/sched: support core scheduling for moving cpus to/from cpupools xen/sched: add scheduling granularity enum xen/arch/arm/domain.c | 2 +- xen/arch/arm/domain_build.c | 13 +- xen/arch/x86/acpi/cpu_idle.c | 1 - xen/arch/x86/cpu/common.c | 3 + xen/arch/x86/cpu/mcheck/mce.c | 1 - xen/arch/x86/cpu/mcheck/mctelem.c | 1 - xen/arch/x86/dom0_build.c | 10 +- xen/arch/x86/domain.c | 93 +- xen/arch/x86/hvm/dom0_build.c | 9 +- xen/arch/x86/pv/dom0_build.c | 10 +- xen/arch/x86/pv/emul-priv-op.c | 1 + xen/arch/x86/pv/shim.c | 4 +- xen/arch/x86/pv/traps.c | 5 +- xen/arch/x86/setup.c | 1 - xen/arch/x86/smpboot.c | 1 - xen/arch/x86/sysctl.c | 3 +- xen/arch/x86/traps.c | 9 +- xen/common/cpupool.c | 326 ++++--- xen/common/domain.c | 34 +- xen/common/domctl.c | 23 +- xen/common/keyhandler.c | 4 +- xen/common/sched_arinc653.c | 270 +++--- xen/common/sched_credit.c | 783 ++++++++------- xen/common/sched_credit2.c | 1134 +++++++++++----------- xen/common/sched_null.c | 443 +++++---- xen/common/sched_rt.c | 555 +++++------ xen/common/schedule.c | 1923 +++++++++++++++++++++++++++++-------- xen/common/softirq.c | 6 +- xen/common/wait.c | 4 +- xen/include/asm-arm/current.h | 1 + xen/include/asm-x86/cpuidle.h | 11 - xen/include/asm-x86/current.h | 7 +- xen/include/asm-x86/desc.h | 1 + xen/include/asm-x86/dom0_build.h | 3 +- xen/include/asm-x86/smp.h | 3 + xen/include/xen/domain.h | 3 +- xen/include/xen/perfc_defn.h | 32 +- xen/include/xen/sched-if.h | 444 +++++++-- xen/include/xen/sched.h | 99 +- xen/include/xen/softirq.h | 1 + 40 files changed, 3905 insertions(+), 2372 deletions(-) -- 2.16.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit @ 2019-05-28 10:32 ` Juergen Gross 2019-06-13 7:18 ` Andrii Anisov 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw) To: xen-devel Cc: Juergen Gross, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné Affinities are scheduler specific attributes, they should be per scheduling unit. So move all affinity related fields in struct vcpu to struct sched_unit. While at it switch affinity related functions in sched-if.h to use a pointer to sched_unit instead to vcpu as parameter. vcpu->last_run_time is primarily used by sched_credit, so move it to struct sched_unit, too. Signed-off-by: Juergen Gross <jgross@suse.com> --- xen/arch/x86/pv/emul-priv-op.c | 1 + xen/arch/x86/pv/traps.c | 5 +- xen/arch/x86/traps.c | 9 ++-- xen/common/domain.c | 19 ++----- xen/common/domctl.c | 13 +++-- xen/common/keyhandler.c | 4 +- xen/common/sched_credit.c | 20 ++++---- xen/common/sched_credit2.c | 42 ++++++++-------- xen/common/sched_null.c | 16 +++--- xen/common/sched_rt.c | 9 ++-- xen/common/schedule.c | 110 ++++++++++++++++++++++++----------------- xen/common/wait.c | 4 +- xen/include/xen/sched-if.h | 17 ++++--- xen/include/xen/sched.h | 36 +++++++------- 14 files changed, 163 insertions(+), 142 deletions(-) diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index b20d79c7a3..a7a24a2053 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -23,6 +23,7 @@ #include <xen/event.h> #include <xen/guest_access.h> #include <xen/iocap.h> +#include <xen/sched.h> #include <xen/spinlock.h> #include <xen/trace.h> diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c index 1740784ff2..419abc3d95 100644 --- a/xen/arch/x86/pv/traps.c +++ b/xen/arch/x86/pv/traps.c @@ -22,6 +22,7 @@ #include <xen/event.h> #include <xen/hypercall.h> #include <xen/lib.h> +#include <xen/sched.h> #include <xen/trace.h> #include <xen/softirq.h> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) * Set the tmp value unconditionally, so that the check in the iret * hypercall works. */ - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, - st->vcpu->cpu_hard_affinity); + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, + st->vcpu->sched_unit->cpu_hard_affinity); if ( (cpu != st->processor) || (st->processor != st->vcpu->processor) ) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 05ddc39bfe..c3b39d6296 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1594,16 +1594,17 @@ static void pci_serr_softirq(void) void async_exception_cleanup(struct vcpu *curr) { int trap; + struct sched_unit *unit = curr->sched_unit; if ( !curr->async_exception_mask ) return; /* Restore affinity. */ - if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) && - !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) ) + if ( !cpumask_empty(unit->cpu_hard_affinity_tmp) && + !cpumask_equal(unit->cpu_hard_affinity_tmp, unit->cpu_hard_affinity) ) { - vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp); - cpumask_clear(curr->cpu_hard_affinity_tmp); + vcpu_set_hard_affinity(curr, unit->cpu_hard_affinity_tmp); + cpumask_clear(unit->cpu_hard_affinity_tmp); } if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) ) diff --git a/xen/common/domain.c b/xen/common/domain.c index 90c66079f9..c200e9024f 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) static void vcpu_destroy(struct vcpu *v) { - free_cpumask_var(v->cpu_hard_affinity); - free_cpumask_var(v->cpu_hard_affinity_tmp); - free_cpumask_var(v->cpu_hard_affinity_saved); - free_cpumask_var(v->cpu_soft_affinity); - free_vcpu_struct(v); } @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( grant_table_init_vcpu(v); - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) - goto fail; - if ( is_idle_domain(d) ) { v->runstate.state = RUNSTATE_running; @@ -198,7 +187,6 @@ struct vcpu *vcpu_create( sched_destroy_vcpu(v); fail_wq: destroy_waitqueue_vcpu(v); - fail: vcpu_destroy(v); return NULL; @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d) */ for_each_vcpu ( d, v ) { - cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity); + cpumask_or(dom_cpumask, dom_cpumask, + v->sched_unit->cpu_hard_affinity); cpumask_or(dom_cpumask_soft, dom_cpumask_soft, - v->cpu_soft_affinity); + v->sched_unit->cpu_soft_affinity); } /* Filter out non-online cpus */ cpumask_and(dom_cpumask, dom_cpumask, online); @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v) v->async_exception_mask = 0; memset(v->async_exception_state, 0, sizeof(v->async_exception_state)); #endif - cpumask_clear(v->cpu_hard_affinity_tmp); + cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp); clear_bit(_VPF_blocked, &v->pause_flags); clear_bit(_VPF_in_reset, &v->pause_flags); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index bade9a63b1..bc986d131d 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) case XEN_DOMCTL_getvcpuaffinity: { struct vcpu *v; + struct sched_unit *unit; struct xen_domctl_vcpuaffinity *vcpuaff = &op->u.vcpuaffinity; ret = -EINVAL; @@ -624,6 +625,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( (v = d->vcpu[vcpuaff->vcpu]) == NULL ) break; + unit = v->sched_unit; ret = -EINVAL; if ( vcpuaffinity_params_invalid(vcpuaff) ) break; @@ -643,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) ret = -ENOMEM; break; } - cpumask_copy(old_affinity, v->cpu_hard_affinity); + cpumask_copy(old_affinity, unit->cpu_hard_affinity); if ( !alloc_cpumask_var(&new_affinity) ) { @@ -676,7 +678,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) * For hard affinity, what we return is the intersection of * cpupool's online mask and the new hard affinity. */ - cpumask_and(new_affinity, online, v->cpu_hard_affinity); + cpumask_and(new_affinity, online, unit->cpu_hard_affinity); ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard, new_affinity); } @@ -705,7 +707,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) * hard affinity. */ cpumask_and(new_affinity, new_affinity, online); - cpumask_and(new_affinity, new_affinity, v->cpu_hard_affinity); + cpumask_and(new_affinity, new_affinity, + unit->cpu_hard_affinity); ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft, new_affinity); } @@ -718,10 +721,10 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD ) ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard, - v->cpu_hard_affinity); + unit->cpu_hard_affinity); if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT ) ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft, - v->cpu_soft_affinity); + unit->cpu_soft_affinity); } break; } diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index 4f4a660b0c..1729f73af0 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key) printk("dirty_cpu=%u", v->dirty_cpu); printk("\n"); printk(" cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n", - nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity), - nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity)); + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity), + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity)); printk(" pause_count=%d pause_flags=%lx\n", atomic_read(&v->pause_count), v->pause_flags); arch_dump_vcpu_info(v); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 74c85df334..ffac2f4bbb 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -350,6 +350,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu); static inline void __runq_tickle(struct csched_unit *new) { unsigned int cpu = new->vcpu->processor; + struct sched_unit *unit = new->vcpu->sched_unit; struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu)); struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); cpumask_t mask, idle_mask, *online; @@ -375,7 +376,7 @@ static inline void __runq_tickle(struct csched_unit *new) if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) && cpumask_test_cpu(cpu, &idle_mask)) ) { - ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu); + ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu); SCHED_STAT_CRANK(tickled_idle_cpu_excl); __cpumask_set_cpu(cpu, &mask); goto tickle; @@ -410,11 +411,11 @@ static inline void __runq_tickle(struct csched_unit *new) int new_idlers_empty; if ( balance_step == BALANCE_SOFT_AFFINITY - && !has_soft_affinity(new->vcpu) ) + && !has_soft_affinity(unit) ) continue; /* Are there idlers suitable for new (for this balance step)? */ - affinity_balance_cpumask(new->vcpu, balance_step, + affinity_balance_cpumask(unit, balance_step, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &idle_mask); @@ -443,8 +444,7 @@ static inline void __runq_tickle(struct csched_unit *new) */ if ( new_idlers_empty && new->pri > cur->pri ) { - if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity, - &idle_mask) ) + if ( cpumask_intersects(unit->cpu_hard_affinity, &idle_mask) ) { SCHED_VCPU_STAT_CRANK(cur, kicked_away); SCHED_VCPU_STAT_CRANK(cur, migrate_r); @@ -695,7 +695,7 @@ static inline bool __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v) { bool hot = prv->vcpu_migr_delay && - (NOW() - v->last_run_time) < prv->vcpu_migr_delay; + (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay; if ( hot ) SCHED_STAT_CRANK(vcpu_hot); @@ -733,7 +733,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) for_each_affinity_balance_step( balance_step ) { - affinity_balance_cpumask(vc, balance_step, cpus); + affinity_balance_cpumask(vc->sched_unit, balance_step, cpus); cpumask_and(cpus, online, cpus); /* * We want to pick up a pcpu among the ones that are online and @@ -752,7 +752,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) * balancing step all together. */ if ( balance_step == BALANCE_SOFT_AFFINITY && - (!has_soft_affinity(vc) || cpumask_empty(cpus)) ) + (!has_soft_affinity(vc->sched_unit) || cpumask_empty(cpus)) ) continue; /* If present, prefer vc's current processor */ @@ -1652,10 +1652,10 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) * or counter. */ if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY && - !has_soft_affinity(vc)) ) + !has_soft_affinity(vc->sched_unit)) ) continue; - affinity_balance_cpumask(vc, balance_step, cpumask_scratch); + affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch); if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) ) { /* We got a candidate. Grab it! */ diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 562e73d99e..dabd5636f5 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -699,10 +699,10 @@ static int get_fallback_cpu(struct csched2_unit *svc) { int cpu = v->processor; - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) ) + if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) ) continue; - affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpupool_domain_cpumask(v->domain)); @@ -1390,10 +1390,10 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now, */ if ( score > 0 ) { - if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) ) + if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) ) score += CSCHED2_CREDIT_INIT; - if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) ) + if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) ) score += CSCHED2_CREDIT_INIT; } @@ -1436,6 +1436,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) { int i, ipid = -1; s_time_t max = 0; + struct sched_unit *unit = new->vcpu->sched_unit; unsigned int bs, cpu = new->vcpu->processor; struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain); @@ -1473,7 +1474,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) cpumask_test_cpu(cpu, &rqd->idle) && !cpumask_test_cpu(cpu, &rqd->tickled)) ) { - ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu); + ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu); SCHED_STAT_CRANK(tickled_idle_cpu_excl); ipid = cpu; goto tickle; @@ -1482,10 +1483,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) for_each_affinity_balance_step( bs ) { /* Just skip first step, if we don't have a soft affinity */ - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(new->vcpu) ) + if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) ) continue; - affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu)); /* * First of all, consider idle cpus, checking if we can just @@ -1557,7 +1558,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) ipid = cpu; /* If this is in new's soft affinity, just take it */ - if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) ) + if ( cpumask_test_cpu(cpu, unit->cpu_soft_affinity) ) { SCHED_STAT_CRANK(tickled_busy_cpu); goto tickle; @@ -2243,7 +2244,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) goto out; } - cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(vc->domain)); /* @@ -2288,7 +2289,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * * Find both runqueues in one pass. */ - has_soft = has_soft_affinity(vc); + has_soft = has_soft_affinity(unit); for_each_cpu(i, &prv->active_queues) { struct csched2_runqueue_data *rqd; @@ -2335,7 +2336,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) cpumask_t mask; cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active); - if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) ) + if ( cpumask_intersects(&mask, unit->cpu_soft_affinity) ) { min_s_avgload = rqd_avgload; min_s_rqi = i; @@ -2357,9 +2358,9 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * Note that, to obtain the soft-affinity mask, we "just" put what we * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is * ok because: - * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have + * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have * a non-empty intersection (because has_soft is true); - * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already + * - we have unit->cpu_hard_affinity & cpupool_domain_cpumask() already * in cpumask_scratch, we do save a lot doing like this. * * It's kind of like open coding affinity_balance_cpumask() but, in @@ -2367,7 +2368,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * cpumask operations. */ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), - vc->cpu_soft_affinity); + unit->cpu_soft_affinity); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &prv->rqd[min_s_rqi].active); } @@ -2475,6 +2476,7 @@ static void migrate(const struct scheduler *ops, s_time_t now) { int cpu = svc->vcpu->processor; + struct sched_unit *unit = svc->vcpu->sched_unit; if ( unlikely(tb_init_done) ) { @@ -2512,7 +2514,7 @@ static void migrate(const struct scheduler *ops, } _runq_deassign(svc); - cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(svc->vcpu->domain)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &trqd->active); @@ -2546,7 +2548,7 @@ static bool vcpu_is_migrateable(struct csched2_unit *svc, struct vcpu *v = svc->vcpu; int cpu = svc->vcpu->processor; - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity, cpupool_domain_cpumask(v->domain)); return !(svc->flags & CSFLAG_runq_migrate_request) && @@ -2780,7 +2782,7 @@ csched2_unit_migrate( /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */ ASSERT(cpumask_test_cpu(new_cpu, &csched2_priv(ops)->initialized)); - ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity)); + ASSERT(cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity)); trqd = c2rqd(ops, new_cpu); @@ -3320,9 +3322,9 @@ runq_candidate(struct csched2_runqueue_data *rqd, } /* If scurr has a soft-affinity, let's check whether cpu is part of it */ - if ( has_soft_affinity(scurr->vcpu) ) + if ( has_soft_affinity(scurr->vcpu->sched_unit) ) { - affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY, + affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY, cpumask_scratch); if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) ) { @@ -3377,7 +3379,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, } /* Only consider vcpus that are allowed to run on this processor. */ - if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) ) + if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) ) { (*skipped)++; continue; diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c index ee3a8cf064..56b0055a42 100644 --- a/xen/common/sched_null.c +++ b/xen/common/sched_null.c @@ -123,7 +123,8 @@ static inline struct null_unit *null_unit(const struct sched_unit *unit) static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu, unsigned int balance_step) { - affinity_balance_cpumask(v, balance_step, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(v->sched_unit, balance_step, + cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpupool_domain_cpumask(v->domain)); @@ -281,10 +282,10 @@ pick_res(struct null_private *prv, struct sched_unit *unit) for_each_affinity_balance_step( bs ) { - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) ) + if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) ) continue; - affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpus); /* @@ -321,7 +322,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit) * as we will actually assign the vCPU to the pCPU we return from here, * only if the pCPU is free. */ - cpumask_and(cpumask_scratch_cpu(cpu), cpus, v->cpu_hard_affinity); + cpumask_and(cpumask_scratch_cpu(cpu), cpus, unit->cpu_hard_affinity); new_cpu = cpumask_any(cpumask_scratch_cpu(cpu)); out: @@ -430,7 +431,7 @@ static void null_unit_insert(const struct scheduler *ops, lock = unit_schedule_lock(unit); - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(v->domain)); /* If the pCPU is free, we assign v to it */ @@ -488,7 +489,8 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v) { list_for_each_entry( wvc, &prv->waitq, waitq_elem ) { - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->vcpu) ) + if ( bs == BALANCE_SOFT_AFFINITY && + !has_soft_affinity(wvc->vcpu->sched_unit) ) continue; if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) ) @@ -767,7 +769,7 @@ static struct task_slice null_schedule(const struct scheduler *ops, list_for_each_entry( wvc, &prv->waitq, waitq_elem ) { if ( bs == BALANCE_SOFT_AFFINITY && - !has_soft_affinity(wvc->vcpu) ) + !has_soft_affinity(wvc->vcpu->sched_unit) ) continue; if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) ) diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index cd737131a3..d640d87b43 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -327,7 +327,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc) mask = cpumask_scratch_cpu(svc->vcpu->processor); cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain); - cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity); + cpumask_and(mask, cpupool_mask, svc->vcpu->sched_unit->cpu_hard_affinity); printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime")," " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n" " \t\t priority_level=%d has_extratime=%d\n" @@ -645,7 +645,7 @@ rt_res_pick(const struct scheduler *ops, struct sched_unit *unit) int cpu; online = cpupool_domain_cpumask(vc->domain); - cpumask_and(&cpus, online, vc->cpu_hard_affinity); + cpumask_and(&cpus, online, unit->cpu_hard_affinity); cpu = cpumask_test_cpu(vc->processor, &cpus) ? vc->processor @@ -1023,7 +1023,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask) /* mask cpu_hard_affinity & cpupool & mask */ online = cpupool_domain_cpumask(iter_svc->vcpu->domain); - cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity); + cpumask_and(&cpu_common, online, + iter_svc->vcpu->sched_unit->cpu_hard_affinity); cpumask_and(&cpu_common, mask, &cpu_common); if ( cpumask_empty(&cpu_common) ) continue; @@ -1192,7 +1193,7 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new) return; online = cpupool_domain_cpumask(new->vcpu->domain); - cpumask_and(¬_tickled, online, new->vcpu->cpu_hard_affinity); + cpumask_and(¬_tickled, online, new->vcpu->sched_unit->cpu_hard_affinity); cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled); /* diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 1321c86111..212c1e637f 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -270,6 +270,12 @@ static void sched_free_unit(struct sched_unit *unit) } unit->vcpu->sched_unit = NULL; + + free_cpumask_var(unit->cpu_hard_affinity); + free_cpumask_var(unit->cpu_hard_affinity_tmp); + free_cpumask_var(unit->cpu_hard_affinity_saved); + free_cpumask_var(unit->cpu_soft_affinity); + xfree(unit); } @@ -293,7 +299,17 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v) unit->next_in_list = *prev_unit; *prev_unit = unit; + if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) || + !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) || + !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) || + !zalloc_cpumask_var(&unit->cpu_soft_affinity) ) + goto fail; + return unit; + + fail: + sched_free_unit(unit); + return NULL; } int sched_init_vcpu(struct vcpu *v, unsigned int processor) @@ -363,7 +379,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) for_each_vcpu ( d, v ) { - if ( v->affinity_broken ) + if ( v->sched_unit->affinity_broken ) return -EBUSY; } @@ -682,7 +698,7 @@ static void vcpu_migrate_finish(struct vcpu *v) */ if ( pick_called && (new_lock == get_sched_res(new_cpu)->schedule_lock) && - cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) && + cpumask_test_cpu(new_cpu, v->sched_unit->cpu_hard_affinity) && cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) ) break; @@ -758,6 +774,7 @@ void restore_vcpu_affinity(struct domain *d) { spinlock_t *lock; unsigned int old_cpu = v->processor; + struct sched_unit *unit = v->sched_unit; ASSERT(!vcpu_runnable(v)); @@ -769,15 +786,15 @@ void restore_vcpu_affinity(struct domain *d) * set v->processor of each of their vCPUs to something that will * make sense for the scheduler of the cpupool in which they are in. */ - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(d)); if ( cpumask_empty(cpumask_scratch_cpu(cpu)) ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) { - sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL); - v->affinity_broken = 0; - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL); + unit->affinity_broken = 0; + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(d)); } @@ -785,18 +802,17 @@ void restore_vcpu_affinity(struct domain *d) { printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v); sched_set_affinity(v, &cpumask_all, NULL); - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(d)); } } v->processor = cpumask_any(cpumask_scratch_cpu(cpu)); - v->sched_unit->res = get_sched_res(v->processor); + unit->res = get_sched_res(v->processor); - lock = unit_schedule_lock_irq(v->sched_unit); - v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v), - v->sched_unit); - v->processor = v->sched_unit->res->processor; + lock = unit_schedule_lock_irq(unit); + unit->res = sched_pick_resource(vcpu_scheduler(v), unit); + v->processor = unit->res->processor; spin_unlock_irq(lock); if ( old_cpu != v->processor ) @@ -828,16 +844,17 @@ int cpu_disable_scheduler(unsigned int cpu) for_each_vcpu ( d, v ) { unsigned long flags; - spinlock_t *lock = unit_schedule_lock_irqsave(v->sched_unit, &flags); + struct sched_unit *unit = v->sched_unit; + spinlock_t *lock = unit_schedule_lock_irqsave(unit, &flags); - cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid); + cpumask_and(&online_affinity, unit->cpu_hard_affinity, c->cpu_valid); if ( cpumask_empty(&online_affinity) && - cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) + cpumask_test_cpu(cpu, unit->cpu_hard_affinity) ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) { /* The vcpu is temporarily pinned, can't move it. */ - unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, unit); ret = -EADDRINUSE; break; } @@ -850,7 +867,7 @@ int cpu_disable_scheduler(unsigned int cpu) if ( v->processor != cpu ) { /* The vcpu is not on this cpu, so we can move on. */ - unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, unit); continue; } @@ -863,7 +880,7 @@ int cpu_disable_scheduler(unsigned int cpu) * things would have failed before getting in here. */ vcpu_migrate_start(v); - unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, unit); vcpu_migrate_finish(v); @@ -892,7 +909,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu) for_each_domain_in_cpupool ( d, c ) for_each_vcpu ( d, v ) - if ( v->affinity_broken ) + if ( v->sched_unit->affinity_broken ) return -EADDRINUSE; return 0; @@ -908,28 +925,31 @@ static int cpu_disable_scheduler_check(unsigned int cpu) void sched_set_affinity( struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft) { - sched_adjust_affinity(dom_scheduler(v->domain), v->sched_unit, hard, soft); + struct sched_unit *unit = v->sched_unit; + + sched_adjust_affinity(dom_scheduler(v->domain), unit, hard, soft); if ( hard ) - cpumask_copy(v->cpu_hard_affinity, hard); + cpumask_copy(unit->cpu_hard_affinity, hard); if ( soft ) - cpumask_copy(v->cpu_soft_affinity, soft); + cpumask_copy(unit->cpu_soft_affinity, soft); - v->soft_aff_effective = !cpumask_subset(v->cpu_hard_affinity, - v->cpu_soft_affinity) && - cpumask_intersects(v->cpu_soft_affinity, - v->cpu_hard_affinity); + unit->soft_aff_effective = !cpumask_subset(unit->cpu_hard_affinity, + unit->cpu_soft_affinity) && + cpumask_intersects(unit->cpu_soft_affinity, + unit->cpu_hard_affinity); } static int vcpu_set_affinity( struct vcpu *v, const cpumask_t *affinity, const cpumask_t *which) { + struct sched_unit *unit = v->sched_unit; spinlock_t *lock; int ret = 0; - lock = unit_schedule_lock_irq(v->sched_unit); + lock = unit_schedule_lock_irq(unit); - if ( v->affinity_broken ) + if ( unit->affinity_broken ) ret = -EBUSY; else { @@ -937,19 +957,19 @@ static int vcpu_set_affinity( * Tell the scheduler we changes something about affinity, * and ask to re-evaluate vcpu placement. */ - if ( which == v->cpu_hard_affinity ) + if ( which == unit->cpu_hard_affinity ) { sched_set_affinity(v, affinity, NULL); } else { - ASSERT(which == v->cpu_soft_affinity); + ASSERT(which == unit->cpu_soft_affinity); sched_set_affinity(v, NULL, affinity); } vcpu_migrate_start(v); } - unit_schedule_unlock_irq(lock, v->sched_unit); + unit_schedule_unlock_irq(lock, unit); domain_update_node_affinity(v->domain); @@ -968,12 +988,12 @@ int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity) if ( cpumask_empty(&online_affinity) ) return -EINVAL; - return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity); + return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_hard_affinity); } int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity) { - return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity); + return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_soft_affinity); } /* Block the currently-executing domain until a pertinent event occurs. */ @@ -1167,28 +1187,30 @@ void watchdog_domain_destroy(struct domain *d) int vcpu_pin_override(struct vcpu *v, int cpu) { + struct sched_unit *unit = v->sched_unit; spinlock_t *lock; int ret = -EINVAL; - lock = unit_schedule_lock_irq(v->sched_unit); + lock = unit_schedule_lock_irq(unit); if ( cpu < 0 ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) { - sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL); - v->affinity_broken = 0; + sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL); + unit->affinity_broken = 0; ret = 0; } } else if ( cpu < nr_cpu_ids ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) ret = -EBUSY; else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) ) { - cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity); - v->affinity_broken = 1; + cpumask_copy(unit->cpu_hard_affinity_saved, + unit->cpu_hard_affinity); + unit->affinity_broken = 1; sched_set_affinity(v, cpumask_of(cpu), NULL); ret = 0; } @@ -1197,7 +1219,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu) if ( ret == 0 ) vcpu_migrate_start(v); - unit_schedule_unlock_irq(lock, v->sched_unit); + unit_schedule_unlock_irq(lock, unit); domain_update_node_affinity(v->domain); @@ -1549,7 +1571,7 @@ static void schedule(void) ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked : (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)), now); - prev->last_run_time = now; + prev->sched_unit->last_run_time = now; ASSERT(next->runstate.state != RUNSTATE_running); vcpu_runstate_change(next, RUNSTATE_running, now); diff --git a/xen/common/wait.c b/xen/common/wait.c index 4f830a14e8..37e9e0d016 100644 --- a/xen/common/wait.c +++ b/xen/common/wait.c @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) /* Save current VCPU affinity; force wakeup on *this* CPU only. */ wqv->wakeup_cpu = smp_processor_id(); - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void) { /* Re-set VCPU affinity and re-enter the scheduler. */ struct vcpu *curr = current; - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 20e36ea39b..17c01abc25 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) * * The hard affinity is not a subset of soft affinity * * There is an overlap between the soft and hard affinity masks */ -static inline int has_soft_affinity(const struct vcpu *v) +static inline int has_soft_affinity(const struct sched_unit *unit) { - return v->soft_aff_effective && - !cpumask_subset(cpupool_domain_cpumask(v->domain), - v->cpu_soft_affinity); + return unit->soft_aff_effective && + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), + unit->cpu_soft_affinity); } /* @@ -452,17 +452,18 @@ static inline int has_soft_affinity(const struct vcpu *v) * to avoid running a vcpu where it would like, but is not allowed to! */ static inline void -affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask) +affinity_balance_cpumask(const struct sched_unit *unit, int step, + cpumask_t *mask) { if ( step == BALANCE_SOFT_AFFINITY ) { - cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity); + cpumask_and(mask, unit->cpu_soft_affinity, unit->cpu_hard_affinity); if ( unlikely(cpumask_empty(mask)) ) - cpumask_copy(mask, v->cpu_hard_affinity); + cpumask_copy(mask, unit->cpu_hard_affinity); } else /* step == BALANCE_HARD_AFFINITY */ - cpumask_copy(mask, v->cpu_hard_affinity); + cpumask_copy(mask, unit->cpu_hard_affinity); } #endif /* __XEN_SCHED_IF_H__ */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ee316cddd7..13c99a9194 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -175,9 +175,6 @@ struct vcpu } runstate_guest; /* guest address */ #endif - /* last time when vCPU is scheduled out */ - uint64_t last_run_time; - /* Has the FPU been initialised? */ bool fpu_initialised; /* Has the FPU been used since it was last saved? */ @@ -203,8 +200,6 @@ struct vcpu bool defer_shutdown; /* VCPU is paused following shutdown request (d->is_shutting_down)? */ bool paused_for_shutdown; - /* VCPU need affinity restored */ - bool affinity_broken; /* A hypercall has been preempted. */ bool hcall_preempted; @@ -213,9 +208,6 @@ struct vcpu bool hcall_compat; #endif - /* Does soft affinity actually play a role (given hard affinity)? */ - bool soft_aff_effective; - /* The CPU, if any, which is holding onto this VCPU's state. */ #define VCPU_CPU_CLEAN (~0u) unsigned int dirty_cpu; @@ -247,16 +239,6 @@ struct vcpu evtchn_port_t virq_to_evtchn[NR_VIRQS]; spinlock_t virq_lock; - /* Bitmask of CPUs on which this VCPU may run. */ - cpumask_var_t cpu_hard_affinity; - /* Used to change affinity temporarily. */ - cpumask_var_t cpu_hard_affinity_tmp; - /* Used to restore affinity across S3. */ - cpumask_var_t cpu_hard_affinity_saved; - - /* Bitmask of CPUs on which this VCPU prefers to run. */ - cpumask_var_t cpu_soft_affinity; - /* Tasklet for continue_hypercall_on_cpu(). */ struct tasklet continue_hypercall_tasklet; @@ -283,6 +265,22 @@ struct sched_unit { void *priv; /* scheduler private data */ struct sched_unit *next_in_list; struct sched_resource *res; + + /* Last time when unit has been scheduled out. */ + uint64_t last_run_time; + + /* Item needs affinity restored. */ + bool affinity_broken; + /* Does soft affinity actually play a role (given hard affinity)? */ + bool soft_aff_effective; + /* Bitmask of CPUs on which this VCPU may run. */ + cpumask_var_t cpu_hard_affinity; + /* Used to change affinity temporarily. */ + cpumask_var_t cpu_hard_affinity_tmp; + /* Used to restore affinity across S3. */ + cpumask_var_t cpu_hard_affinity_saved; + /* Bitmask of CPUs on which this VCPU prefers to run. */ + cpumask_var_t cpu_soft_affinity; }; #define for_each_sched_unit(d, e) \ @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v) static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v) { return (is_hardware_domain(v->domain) && - cpumask_weight(v->cpu_hard_affinity) == 1); + cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1); } #ifdef CONFIG_HAS_PASSTHROUGH -- 2.16.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit @ 2019-05-28 10:32 ` Juergen Gross 2019-06-13 7:18 ` Andrii Anisov 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-05-28 10:32 UTC (permalink / raw) To: xen-devel Cc: Juergen Gross, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné Affinities are scheduler specific attributes, they should be per scheduling unit. So move all affinity related fields in struct vcpu to struct sched_unit. While at it switch affinity related functions in sched-if.h to use a pointer to sched_unit instead to vcpu as parameter. vcpu->last_run_time is primarily used by sched_credit, so move it to struct sched_unit, too. Signed-off-by: Juergen Gross <jgross@suse.com> --- xen/arch/x86/pv/emul-priv-op.c | 1 + xen/arch/x86/pv/traps.c | 5 +- xen/arch/x86/traps.c | 9 ++-- xen/common/domain.c | 19 ++----- xen/common/domctl.c | 13 +++-- xen/common/keyhandler.c | 4 +- xen/common/sched_credit.c | 20 ++++---- xen/common/sched_credit2.c | 42 ++++++++-------- xen/common/sched_null.c | 16 +++--- xen/common/sched_rt.c | 9 ++-- xen/common/schedule.c | 110 ++++++++++++++++++++++++----------------- xen/common/wait.c | 4 +- xen/include/xen/sched-if.h | 17 ++++--- xen/include/xen/sched.h | 36 +++++++------- 14 files changed, 163 insertions(+), 142 deletions(-) diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index b20d79c7a3..a7a24a2053 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -23,6 +23,7 @@ #include <xen/event.h> #include <xen/guest_access.h> #include <xen/iocap.h> +#include <xen/sched.h> #include <xen/spinlock.h> #include <xen/trace.h> diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c index 1740784ff2..419abc3d95 100644 --- a/xen/arch/x86/pv/traps.c +++ b/xen/arch/x86/pv/traps.c @@ -22,6 +22,7 @@ #include <xen/event.h> #include <xen/hypercall.h> #include <xen/lib.h> +#include <xen/sched.h> #include <xen/trace.h> #include <xen/softirq.h> @@ -155,8 +156,8 @@ static void nmi_mce_softirq(void) * Set the tmp value unconditionally, so that the check in the iret * hypercall works. */ - cpumask_copy(st->vcpu->cpu_hard_affinity_tmp, - st->vcpu->cpu_hard_affinity); + cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp, + st->vcpu->sched_unit->cpu_hard_affinity); if ( (cpu != st->processor) || (st->processor != st->vcpu->processor) ) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 05ddc39bfe..c3b39d6296 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1594,16 +1594,17 @@ static void pci_serr_softirq(void) void async_exception_cleanup(struct vcpu *curr) { int trap; + struct sched_unit *unit = curr->sched_unit; if ( !curr->async_exception_mask ) return; /* Restore affinity. */ - if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) && - !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) ) + if ( !cpumask_empty(unit->cpu_hard_affinity_tmp) && + !cpumask_equal(unit->cpu_hard_affinity_tmp, unit->cpu_hard_affinity) ) { - vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp); - cpumask_clear(curr->cpu_hard_affinity_tmp); + vcpu_set_hard_affinity(curr, unit->cpu_hard_affinity_tmp); + cpumask_clear(unit->cpu_hard_affinity_tmp); } if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) ) diff --git a/xen/common/domain.c b/xen/common/domain.c index 90c66079f9..c200e9024f 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v) static void vcpu_destroy(struct vcpu *v) { - free_cpumask_var(v->cpu_hard_affinity); - free_cpumask_var(v->cpu_hard_affinity_tmp); - free_cpumask_var(v->cpu_hard_affinity_saved); - free_cpumask_var(v->cpu_soft_affinity); - free_vcpu_struct(v); } @@ -153,12 +148,6 @@ struct vcpu *vcpu_create( grant_table_init_vcpu(v); - if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) || - !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) || - !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) || - !zalloc_cpumask_var(&v->cpu_soft_affinity) ) - goto fail; - if ( is_idle_domain(d) ) { v->runstate.state = RUNSTATE_running; @@ -198,7 +187,6 @@ struct vcpu *vcpu_create( sched_destroy_vcpu(v); fail_wq: destroy_waitqueue_vcpu(v); - fail: vcpu_destroy(v); return NULL; @@ -557,9 +545,10 @@ void domain_update_node_affinity(struct domain *d) */ for_each_vcpu ( d, v ) { - cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity); + cpumask_or(dom_cpumask, dom_cpumask, + v->sched_unit->cpu_hard_affinity); cpumask_or(dom_cpumask_soft, dom_cpumask_soft, - v->cpu_soft_affinity); + v->sched_unit->cpu_soft_affinity); } /* Filter out non-online cpus */ cpumask_and(dom_cpumask, dom_cpumask, online); @@ -1226,7 +1215,7 @@ int vcpu_reset(struct vcpu *v) v->async_exception_mask = 0; memset(v->async_exception_state, 0, sizeof(v->async_exception_state)); #endif - cpumask_clear(v->cpu_hard_affinity_tmp); + cpumask_clear(v->sched_unit->cpu_hard_affinity_tmp); clear_bit(_VPF_blocked, &v->pause_flags); clear_bit(_VPF_in_reset, &v->pause_flags); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index bade9a63b1..bc986d131d 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -614,6 +614,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) case XEN_DOMCTL_getvcpuaffinity: { struct vcpu *v; + struct sched_unit *unit; struct xen_domctl_vcpuaffinity *vcpuaff = &op->u.vcpuaffinity; ret = -EINVAL; @@ -624,6 +625,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( (v = d->vcpu[vcpuaff->vcpu]) == NULL ) break; + unit = v->sched_unit; ret = -EINVAL; if ( vcpuaffinity_params_invalid(vcpuaff) ) break; @@ -643,7 +645,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) ret = -ENOMEM; break; } - cpumask_copy(old_affinity, v->cpu_hard_affinity); + cpumask_copy(old_affinity, unit->cpu_hard_affinity); if ( !alloc_cpumask_var(&new_affinity) ) { @@ -676,7 +678,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) * For hard affinity, what we return is the intersection of * cpupool's online mask and the new hard affinity. */ - cpumask_and(new_affinity, online, v->cpu_hard_affinity); + cpumask_and(new_affinity, online, unit->cpu_hard_affinity); ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard, new_affinity); } @@ -705,7 +707,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) * hard affinity. */ cpumask_and(new_affinity, new_affinity, online); - cpumask_and(new_affinity, new_affinity, v->cpu_hard_affinity); + cpumask_and(new_affinity, new_affinity, + unit->cpu_hard_affinity); ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft, new_affinity); } @@ -718,10 +721,10 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD ) ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard, - v->cpu_hard_affinity); + unit->cpu_hard_affinity); if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT ) ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft, - v->cpu_soft_affinity); + unit->cpu_soft_affinity); } break; } diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c index 4f4a660b0c..1729f73af0 100644 --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -312,8 +312,8 @@ static void dump_domains(unsigned char key) printk("dirty_cpu=%u", v->dirty_cpu); printk("\n"); printk(" cpu_hard_affinity={%*pbl} cpu_soft_affinity={%*pbl}\n", - nr_cpu_ids, cpumask_bits(v->cpu_hard_affinity), - nr_cpu_ids, cpumask_bits(v->cpu_soft_affinity)); + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_hard_affinity), + nr_cpu_ids, cpumask_bits(v->sched_unit->cpu_soft_affinity)); printk(" pause_count=%d pause_flags=%lx\n", atomic_read(&v->pause_count), v->pause_flags); arch_dump_vcpu_info(v); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 74c85df334..ffac2f4bbb 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -350,6 +350,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu); static inline void __runq_tickle(struct csched_unit *new) { unsigned int cpu = new->vcpu->processor; + struct sched_unit *unit = new->vcpu->sched_unit; struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu)); struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); cpumask_t mask, idle_mask, *online; @@ -375,7 +376,7 @@ static inline void __runq_tickle(struct csched_unit *new) if ( unlikely(test_bit(CSCHED_FLAG_VCPU_PINNED, &new->flags) && cpumask_test_cpu(cpu, &idle_mask)) ) { - ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu); + ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu); SCHED_STAT_CRANK(tickled_idle_cpu_excl); __cpumask_set_cpu(cpu, &mask); goto tickle; @@ -410,11 +411,11 @@ static inline void __runq_tickle(struct csched_unit *new) int new_idlers_empty; if ( balance_step == BALANCE_SOFT_AFFINITY - && !has_soft_affinity(new->vcpu) ) + && !has_soft_affinity(unit) ) continue; /* Are there idlers suitable for new (for this balance step)? */ - affinity_balance_cpumask(new->vcpu, balance_step, + affinity_balance_cpumask(unit, balance_step, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &idle_mask); @@ -443,8 +444,7 @@ static inline void __runq_tickle(struct csched_unit *new) */ if ( new_idlers_empty && new->pri > cur->pri ) { - if ( cpumask_intersects(cur->vcpu->cpu_hard_affinity, - &idle_mask) ) + if ( cpumask_intersects(unit->cpu_hard_affinity, &idle_mask) ) { SCHED_VCPU_STAT_CRANK(cur, kicked_away); SCHED_VCPU_STAT_CRANK(cur, migrate_r); @@ -695,7 +695,7 @@ static inline bool __csched_vcpu_is_cache_hot(const struct csched_private *prv, struct vcpu *v) { bool hot = prv->vcpu_migr_delay && - (NOW() - v->last_run_time) < prv->vcpu_migr_delay; + (NOW() - v->sched_unit->last_run_time) < prv->vcpu_migr_delay; if ( hot ) SCHED_STAT_CRANK(vcpu_hot); @@ -733,7 +733,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) for_each_affinity_balance_step( balance_step ) { - affinity_balance_cpumask(vc, balance_step, cpus); + affinity_balance_cpumask(vc->sched_unit, balance_step, cpus); cpumask_and(cpus, online, cpus); /* * We want to pick up a pcpu among the ones that are online and @@ -752,7 +752,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) * balancing step all together. */ if ( balance_step == BALANCE_SOFT_AFFINITY && - (!has_soft_affinity(vc) || cpumask_empty(cpus)) ) + (!has_soft_affinity(vc->sched_unit) || cpumask_empty(cpus)) ) continue; /* If present, prefer vc's current processor */ @@ -1652,10 +1652,10 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) * or counter. */ if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY && - !has_soft_affinity(vc)) ) + !has_soft_affinity(vc->sched_unit)) ) continue; - affinity_balance_cpumask(vc, balance_step, cpumask_scratch); + affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch); if ( __csched_vcpu_is_migrateable(prv, vc, cpu, cpumask_scratch) ) { /* We got a candidate. Grab it! */ diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 562e73d99e..dabd5636f5 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -699,10 +699,10 @@ static int get_fallback_cpu(struct csched2_unit *svc) { int cpu = v->processor; - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) ) + if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) ) continue; - affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpupool_domain_cpumask(v->domain)); @@ -1390,10 +1390,10 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now, */ if ( score > 0 ) { - if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) ) + if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) ) score += CSCHED2_CREDIT_INIT; - if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) ) + if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) ) score += CSCHED2_CREDIT_INIT; } @@ -1436,6 +1436,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) { int i, ipid = -1; s_time_t max = 0; + struct sched_unit *unit = new->vcpu->sched_unit; unsigned int bs, cpu = new->vcpu->processor; struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain); @@ -1473,7 +1474,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) cpumask_test_cpu(cpu, &rqd->idle) && !cpumask_test_cpu(cpu, &rqd->tickled)) ) { - ASSERT(cpumask_cycle(cpu, new->vcpu->cpu_hard_affinity) == cpu); + ASSERT(cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu); SCHED_STAT_CRANK(tickled_idle_cpu_excl); ipid = cpu; goto tickle; @@ -1482,10 +1483,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) for_each_affinity_balance_step( bs ) { /* Just skip first step, if we don't have a soft affinity */ - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(new->vcpu) ) + if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) ) continue; - affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu)); /* * First of all, consider idle cpus, checking if we can just @@ -1557,7 +1558,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) ipid = cpu; /* If this is in new's soft affinity, just take it */ - if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) ) + if ( cpumask_test_cpu(cpu, unit->cpu_soft_affinity) ) { SCHED_STAT_CRANK(tickled_busy_cpu); goto tickle; @@ -2243,7 +2244,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) goto out; } - cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(vc->domain)); /* @@ -2288,7 +2289,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * * Find both runqueues in one pass. */ - has_soft = has_soft_affinity(vc); + has_soft = has_soft_affinity(unit); for_each_cpu(i, &prv->active_queues) { struct csched2_runqueue_data *rqd; @@ -2335,7 +2336,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) cpumask_t mask; cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active); - if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) ) + if ( cpumask_intersects(&mask, unit->cpu_soft_affinity) ) { min_s_avgload = rqd_avgload; min_s_rqi = i; @@ -2357,9 +2358,9 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * Note that, to obtain the soft-affinity mask, we "just" put what we * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is * ok because: - * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have + * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have * a non-empty intersection (because has_soft is true); - * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already + * - we have unit->cpu_hard_affinity & cpupool_domain_cpumask() already * in cpumask_scratch, we do save a lot doing like this. * * It's kind of like open coding affinity_balance_cpumask() but, in @@ -2367,7 +2368,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * cpumask operations. */ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), - vc->cpu_soft_affinity); + unit->cpu_soft_affinity); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &prv->rqd[min_s_rqi].active); } @@ -2475,6 +2476,7 @@ static void migrate(const struct scheduler *ops, s_time_t now) { int cpu = svc->vcpu->processor; + struct sched_unit *unit = svc->vcpu->sched_unit; if ( unlikely(tb_init_done) ) { @@ -2512,7 +2514,7 @@ static void migrate(const struct scheduler *ops, } _runq_deassign(svc); - cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(svc->vcpu->domain)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &trqd->active); @@ -2546,7 +2548,7 @@ static bool vcpu_is_migrateable(struct csched2_unit *svc, struct vcpu *v = svc->vcpu; int cpu = svc->vcpu->processor; - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity, cpupool_domain_cpumask(v->domain)); return !(svc->flags & CSFLAG_runq_migrate_request) && @@ -2780,7 +2782,7 @@ csched2_unit_migrate( /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */ ASSERT(cpumask_test_cpu(new_cpu, &csched2_priv(ops)->initialized)); - ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity)); + ASSERT(cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity)); trqd = c2rqd(ops, new_cpu); @@ -3320,9 +3322,9 @@ runq_candidate(struct csched2_runqueue_data *rqd, } /* If scurr has a soft-affinity, let's check whether cpu is part of it */ - if ( has_soft_affinity(scurr->vcpu) ) + if ( has_soft_affinity(scurr->vcpu->sched_unit) ) { - affinity_balance_cpumask(scurr->vcpu, BALANCE_SOFT_AFFINITY, + affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY, cpumask_scratch); if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) ) { @@ -3377,7 +3379,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, } /* Only consider vcpus that are allowed to run on this processor. */ - if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) ) + if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) ) { (*skipped)++; continue; diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c index ee3a8cf064..56b0055a42 100644 --- a/xen/common/sched_null.c +++ b/xen/common/sched_null.c @@ -123,7 +123,8 @@ static inline struct null_unit *null_unit(const struct sched_unit *unit) static inline bool vcpu_check_affinity(struct vcpu *v, unsigned int cpu, unsigned int balance_step) { - affinity_balance_cpumask(v, balance_step, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(v->sched_unit, balance_step, + cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpupool_domain_cpumask(v->domain)); @@ -281,10 +282,10 @@ pick_res(struct null_private *prv, struct sched_unit *unit) for_each_affinity_balance_step( bs ) { - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v) ) + if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) ) continue; - affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), cpus); /* @@ -321,7 +322,7 @@ pick_res(struct null_private *prv, struct sched_unit *unit) * as we will actually assign the vCPU to the pCPU we return from here, * only if the pCPU is free. */ - cpumask_and(cpumask_scratch_cpu(cpu), cpus, v->cpu_hard_affinity); + cpumask_and(cpumask_scratch_cpu(cpu), cpus, unit->cpu_hard_affinity); new_cpu = cpumask_any(cpumask_scratch_cpu(cpu)); out: @@ -430,7 +431,7 @@ static void null_unit_insert(const struct scheduler *ops, lock = unit_schedule_lock(unit); - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(v->domain)); /* If the pCPU is free, we assign v to it */ @@ -488,7 +489,8 @@ static void _vcpu_remove(struct null_private *prv, struct vcpu *v) { list_for_each_entry( wvc, &prv->waitq, waitq_elem ) { - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(wvc->vcpu) ) + if ( bs == BALANCE_SOFT_AFFINITY && + !has_soft_affinity(wvc->vcpu->sched_unit) ) continue; if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) ) @@ -767,7 +769,7 @@ static struct task_slice null_schedule(const struct scheduler *ops, list_for_each_entry( wvc, &prv->waitq, waitq_elem ) { if ( bs == BALANCE_SOFT_AFFINITY && - !has_soft_affinity(wvc->vcpu) ) + !has_soft_affinity(wvc->vcpu->sched_unit) ) continue; if ( vcpu_check_affinity(wvc->vcpu, cpu, bs) ) diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index cd737131a3..d640d87b43 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -327,7 +327,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_unit *svc) mask = cpumask_scratch_cpu(svc->vcpu->processor); cpupool_mask = cpupool_domain_cpumask(svc->vcpu->domain); - cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity); + cpumask_and(mask, cpupool_mask, svc->vcpu->sched_unit->cpu_hard_affinity); printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime")," " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n" " \t\t priority_level=%d has_extratime=%d\n" @@ -645,7 +645,7 @@ rt_res_pick(const struct scheduler *ops, struct sched_unit *unit) int cpu; online = cpupool_domain_cpumask(vc->domain); - cpumask_and(&cpus, online, vc->cpu_hard_affinity); + cpumask_and(&cpus, online, unit->cpu_hard_affinity); cpu = cpumask_test_cpu(vc->processor, &cpus) ? vc->processor @@ -1023,7 +1023,8 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask) /* mask cpu_hard_affinity & cpupool & mask */ online = cpupool_domain_cpumask(iter_svc->vcpu->domain); - cpumask_and(&cpu_common, online, iter_svc->vcpu->cpu_hard_affinity); + cpumask_and(&cpu_common, online, + iter_svc->vcpu->sched_unit->cpu_hard_affinity); cpumask_and(&cpu_common, mask, &cpu_common); if ( cpumask_empty(&cpu_common) ) continue; @@ -1192,7 +1193,7 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new) return; online = cpupool_domain_cpumask(new->vcpu->domain); - cpumask_and(¬_tickled, online, new->vcpu->cpu_hard_affinity); + cpumask_and(¬_tickled, online, new->vcpu->sched_unit->cpu_hard_affinity); cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled); /* diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 1321c86111..212c1e637f 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -270,6 +270,12 @@ static void sched_free_unit(struct sched_unit *unit) } unit->vcpu->sched_unit = NULL; + + free_cpumask_var(unit->cpu_hard_affinity); + free_cpumask_var(unit->cpu_hard_affinity_tmp); + free_cpumask_var(unit->cpu_hard_affinity_saved); + free_cpumask_var(unit->cpu_soft_affinity); + xfree(unit); } @@ -293,7 +299,17 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v) unit->next_in_list = *prev_unit; *prev_unit = unit; + if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) || + !zalloc_cpumask_var(&unit->cpu_hard_affinity_tmp) || + !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) || + !zalloc_cpumask_var(&unit->cpu_soft_affinity) ) + goto fail; + return unit; + + fail: + sched_free_unit(unit); + return NULL; } int sched_init_vcpu(struct vcpu *v, unsigned int processor) @@ -363,7 +379,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) for_each_vcpu ( d, v ) { - if ( v->affinity_broken ) + if ( v->sched_unit->affinity_broken ) return -EBUSY; } @@ -682,7 +698,7 @@ static void vcpu_migrate_finish(struct vcpu *v) */ if ( pick_called && (new_lock == get_sched_res(new_cpu)->schedule_lock) && - cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) && + cpumask_test_cpu(new_cpu, v->sched_unit->cpu_hard_affinity) && cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) ) break; @@ -758,6 +774,7 @@ void restore_vcpu_affinity(struct domain *d) { spinlock_t *lock; unsigned int old_cpu = v->processor; + struct sched_unit *unit = v->sched_unit; ASSERT(!vcpu_runnable(v)); @@ -769,15 +786,15 @@ void restore_vcpu_affinity(struct domain *d) * set v->processor of each of their vCPUs to something that will * make sense for the scheduler of the cpupool in which they are in. */ - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(d)); if ( cpumask_empty(cpumask_scratch_cpu(cpu)) ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) { - sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL); - v->affinity_broken = 0; - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL); + unit->affinity_broken = 0; + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(d)); } @@ -785,18 +802,17 @@ void restore_vcpu_affinity(struct domain *d) { printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v); sched_set_affinity(v, &cpumask_all, NULL); - cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, cpupool_domain_cpumask(d)); } } v->processor = cpumask_any(cpumask_scratch_cpu(cpu)); - v->sched_unit->res = get_sched_res(v->processor); + unit->res = get_sched_res(v->processor); - lock = unit_schedule_lock_irq(v->sched_unit); - v->sched_unit->res = sched_pick_resource(vcpu_scheduler(v), - v->sched_unit); - v->processor = v->sched_unit->res->processor; + lock = unit_schedule_lock_irq(unit); + unit->res = sched_pick_resource(vcpu_scheduler(v), unit); + v->processor = unit->res->processor; spin_unlock_irq(lock); if ( old_cpu != v->processor ) @@ -828,16 +844,17 @@ int cpu_disable_scheduler(unsigned int cpu) for_each_vcpu ( d, v ) { unsigned long flags; - spinlock_t *lock = unit_schedule_lock_irqsave(v->sched_unit, &flags); + struct sched_unit *unit = v->sched_unit; + spinlock_t *lock = unit_schedule_lock_irqsave(unit, &flags); - cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid); + cpumask_and(&online_affinity, unit->cpu_hard_affinity, c->cpu_valid); if ( cpumask_empty(&online_affinity) && - cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) + cpumask_test_cpu(cpu, unit->cpu_hard_affinity) ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) { /* The vcpu is temporarily pinned, can't move it. */ - unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, unit); ret = -EADDRINUSE; break; } @@ -850,7 +867,7 @@ int cpu_disable_scheduler(unsigned int cpu) if ( v->processor != cpu ) { /* The vcpu is not on this cpu, so we can move on. */ - unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, unit); continue; } @@ -863,7 +880,7 @@ int cpu_disable_scheduler(unsigned int cpu) * things would have failed before getting in here. */ vcpu_migrate_start(v); - unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, unit); vcpu_migrate_finish(v); @@ -892,7 +909,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu) for_each_domain_in_cpupool ( d, c ) for_each_vcpu ( d, v ) - if ( v->affinity_broken ) + if ( v->sched_unit->affinity_broken ) return -EADDRINUSE; return 0; @@ -908,28 +925,31 @@ static int cpu_disable_scheduler_check(unsigned int cpu) void sched_set_affinity( struct vcpu *v, const cpumask_t *hard, const cpumask_t *soft) { - sched_adjust_affinity(dom_scheduler(v->domain), v->sched_unit, hard, soft); + struct sched_unit *unit = v->sched_unit; + + sched_adjust_affinity(dom_scheduler(v->domain), unit, hard, soft); if ( hard ) - cpumask_copy(v->cpu_hard_affinity, hard); + cpumask_copy(unit->cpu_hard_affinity, hard); if ( soft ) - cpumask_copy(v->cpu_soft_affinity, soft); + cpumask_copy(unit->cpu_soft_affinity, soft); - v->soft_aff_effective = !cpumask_subset(v->cpu_hard_affinity, - v->cpu_soft_affinity) && - cpumask_intersects(v->cpu_soft_affinity, - v->cpu_hard_affinity); + unit->soft_aff_effective = !cpumask_subset(unit->cpu_hard_affinity, + unit->cpu_soft_affinity) && + cpumask_intersects(unit->cpu_soft_affinity, + unit->cpu_hard_affinity); } static int vcpu_set_affinity( struct vcpu *v, const cpumask_t *affinity, const cpumask_t *which) { + struct sched_unit *unit = v->sched_unit; spinlock_t *lock; int ret = 0; - lock = unit_schedule_lock_irq(v->sched_unit); + lock = unit_schedule_lock_irq(unit); - if ( v->affinity_broken ) + if ( unit->affinity_broken ) ret = -EBUSY; else { @@ -937,19 +957,19 @@ static int vcpu_set_affinity( * Tell the scheduler we changes something about affinity, * and ask to re-evaluate vcpu placement. */ - if ( which == v->cpu_hard_affinity ) + if ( which == unit->cpu_hard_affinity ) { sched_set_affinity(v, affinity, NULL); } else { - ASSERT(which == v->cpu_soft_affinity); + ASSERT(which == unit->cpu_soft_affinity); sched_set_affinity(v, NULL, affinity); } vcpu_migrate_start(v); } - unit_schedule_unlock_irq(lock, v->sched_unit); + unit_schedule_unlock_irq(lock, unit); domain_update_node_affinity(v->domain); @@ -968,12 +988,12 @@ int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity) if ( cpumask_empty(&online_affinity) ) return -EINVAL; - return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity); + return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_hard_affinity); } int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity) { - return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity); + return vcpu_set_affinity(v, affinity, v->sched_unit->cpu_soft_affinity); } /* Block the currently-executing domain until a pertinent event occurs. */ @@ -1167,28 +1187,30 @@ void watchdog_domain_destroy(struct domain *d) int vcpu_pin_override(struct vcpu *v, int cpu) { + struct sched_unit *unit = v->sched_unit; spinlock_t *lock; int ret = -EINVAL; - lock = unit_schedule_lock_irq(v->sched_unit); + lock = unit_schedule_lock_irq(unit); if ( cpu < 0 ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) { - sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL); - v->affinity_broken = 0; + sched_set_affinity(v, unit->cpu_hard_affinity_saved, NULL); + unit->affinity_broken = 0; ret = 0; } } else if ( cpu < nr_cpu_ids ) { - if ( v->affinity_broken ) + if ( unit->affinity_broken ) ret = -EBUSY; else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) ) { - cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity); - v->affinity_broken = 1; + cpumask_copy(unit->cpu_hard_affinity_saved, + unit->cpu_hard_affinity); + unit->affinity_broken = 1; sched_set_affinity(v, cpumask_of(cpu), NULL); ret = 0; } @@ -1197,7 +1219,7 @@ int vcpu_pin_override(struct vcpu *v, int cpu) if ( ret == 0 ) vcpu_migrate_start(v); - unit_schedule_unlock_irq(lock, v->sched_unit); + unit_schedule_unlock_irq(lock, unit); domain_update_node_affinity(v->domain); @@ -1549,7 +1571,7 @@ static void schedule(void) ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked : (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)), now); - prev->last_run_time = now; + prev->sched_unit->last_run_time = now; ASSERT(next->runstate.state != RUNSTATE_running); vcpu_runstate_change(next, RUNSTATE_running, now); diff --git a/xen/common/wait.c b/xen/common/wait.c index 4f830a14e8..37e9e0d016 100644 --- a/xen/common/wait.c +++ b/xen/common/wait.c @@ -132,7 +132,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv) /* Save current VCPU affinity; force wakeup on *this* CPU only. */ wqv->wakeup_cpu = smp_processor_id(); - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); @@ -199,7 +199,7 @@ void check_wakeup_from_wait(void) { /* Re-set VCPU affinity and re-enter the scheduler. */ struct vcpu *curr = current; - cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity); + cpumask_copy(&wqv->saved_affinity, curr->sched_unit->cpu_hard_affinity); if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) ) { gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n"); diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 20e36ea39b..17c01abc25 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) * * The hard affinity is not a subset of soft affinity * * There is an overlap between the soft and hard affinity masks */ -static inline int has_soft_affinity(const struct vcpu *v) +static inline int has_soft_affinity(const struct sched_unit *unit) { - return v->soft_aff_effective && - !cpumask_subset(cpupool_domain_cpumask(v->domain), - v->cpu_soft_affinity); + return unit->soft_aff_effective && + !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain), + unit->cpu_soft_affinity); } /* @@ -452,17 +452,18 @@ static inline int has_soft_affinity(const struct vcpu *v) * to avoid running a vcpu where it would like, but is not allowed to! */ static inline void -affinity_balance_cpumask(const struct vcpu *v, int step, cpumask_t *mask) +affinity_balance_cpumask(const struct sched_unit *unit, int step, + cpumask_t *mask) { if ( step == BALANCE_SOFT_AFFINITY ) { - cpumask_and(mask, v->cpu_soft_affinity, v->cpu_hard_affinity); + cpumask_and(mask, unit->cpu_soft_affinity, unit->cpu_hard_affinity); if ( unlikely(cpumask_empty(mask)) ) - cpumask_copy(mask, v->cpu_hard_affinity); + cpumask_copy(mask, unit->cpu_hard_affinity); } else /* step == BALANCE_HARD_AFFINITY */ - cpumask_copy(mask, v->cpu_hard_affinity); + cpumask_copy(mask, unit->cpu_hard_affinity); } #endif /* __XEN_SCHED_IF_H__ */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ee316cddd7..13c99a9194 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -175,9 +175,6 @@ struct vcpu } runstate_guest; /* guest address */ #endif - /* last time when vCPU is scheduled out */ - uint64_t last_run_time; - /* Has the FPU been initialised? */ bool fpu_initialised; /* Has the FPU been used since it was last saved? */ @@ -203,8 +200,6 @@ struct vcpu bool defer_shutdown; /* VCPU is paused following shutdown request (d->is_shutting_down)? */ bool paused_for_shutdown; - /* VCPU need affinity restored */ - bool affinity_broken; /* A hypercall has been preempted. */ bool hcall_preempted; @@ -213,9 +208,6 @@ struct vcpu bool hcall_compat; #endif - /* Does soft affinity actually play a role (given hard affinity)? */ - bool soft_aff_effective; - /* The CPU, if any, which is holding onto this VCPU's state. */ #define VCPU_CPU_CLEAN (~0u) unsigned int dirty_cpu; @@ -247,16 +239,6 @@ struct vcpu evtchn_port_t virq_to_evtchn[NR_VIRQS]; spinlock_t virq_lock; - /* Bitmask of CPUs on which this VCPU may run. */ - cpumask_var_t cpu_hard_affinity; - /* Used to change affinity temporarily. */ - cpumask_var_t cpu_hard_affinity_tmp; - /* Used to restore affinity across S3. */ - cpumask_var_t cpu_hard_affinity_saved; - - /* Bitmask of CPUs on which this VCPU prefers to run. */ - cpumask_var_t cpu_soft_affinity; - /* Tasklet for continue_hypercall_on_cpu(). */ struct tasklet continue_hypercall_tasklet; @@ -283,6 +265,22 @@ struct sched_unit { void *priv; /* scheduler private data */ struct sched_unit *next_in_list; struct sched_resource *res; + + /* Last time when unit has been scheduled out. */ + uint64_t last_run_time; + + /* Item needs affinity restored. */ + bool affinity_broken; + /* Does soft affinity actually play a role (given hard affinity)? */ + bool soft_aff_effective; + /* Bitmask of CPUs on which this VCPU may run. */ + cpumask_var_t cpu_hard_affinity; + /* Used to change affinity temporarily. */ + cpumask_var_t cpu_hard_affinity_tmp; + /* Used to restore affinity across S3. */ + cpumask_var_t cpu_hard_affinity_saved; + /* Bitmask of CPUs on which this VCPU prefers to run. */ + cpumask_var_t cpu_soft_affinity; }; #define for_each_sched_unit(d, e) \ @@ -980,7 +978,7 @@ static inline bool is_hvm_vcpu(const struct vcpu *v) static inline bool is_hwdom_pinned_vcpu(const struct vcpu *v) { return (is_hardware_domain(v->domain) && - cpumask_weight(v->cpu_hard_affinity) == 1); + cpumask_weight(v->sched_unit->cpu_hard_affinity) == 1); } #ifdef CONFIG_HAS_PASSTHROUGH -- 2.16.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-05-28 10:32 ` Juergen Gross @ 2019-06-13 7:18 ` Andrii Anisov 2019-06-13 7:29 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Andrii Anisov @ 2019-06-13 7:18 UTC (permalink / raw) To: Juergen Gross, xen-devel Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné Hello Juergen, Please note that this patch will clash with [1]. On 28.05.19 13:32, Juergen Gross wrote: > vcpu->last_run_time is primarily used by sched_credit, so move it to > struct sched_unit, too. `last_run_time` is moved to credit privates as for current staging. [1] https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=608639ffa0a0d6f219e14ba7397ab2cc018b93c9 -- Sincerely, Andrii Anisov. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-06-13 7:18 ` Andrii Anisov @ 2019-06-13 7:29 ` Juergen Gross 2019-06-13 7:34 ` Andrii Anisov 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-06-13 7:29 UTC (permalink / raw) To: Andrii Anisov, xen-devel Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné Hi Andrii, On 13.06.19 09:18, Andrii Anisov wrote: > Hello Juergen, > > Please note that this patch will clash with [1]. > > On 28.05.19 13:32, Juergen Gross wrote: >> vcpu->last_run_time is primarily used by sched_credit, so move it to >> struct sched_unit, too. > > `last_run_time` is moved to credit privates as for current staging. Thanks for the heads up, but I've rebased already. :-) Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-06-13 7:29 ` Juergen Gross @ 2019-06-13 7:34 ` Andrii Anisov 2019-06-13 8:39 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Andrii Anisov @ 2019-06-13 7:34 UTC (permalink / raw) To: Juergen Gross, xen-devel Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné On 13.06.19 10:29, Juergen Gross wrote: > Thanks for the heads up, but I've rebased already. :-) Oh, great. I'm just wondering if you put it already on your github? I'm playing with scheduling on my site, and I have a strong feeling I should be based on your series ;) -- Sincerely, Andrii Anisov. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-06-13 7:34 ` Andrii Anisov @ 2019-06-13 8:39 ` Juergen Gross 2019-06-13 8:49 ` Andrii Anisov 0 siblings, 1 reply; 21+ messages in thread From: Juergen Gross @ 2019-06-13 8:39 UTC (permalink / raw) To: Andrii Anisov, xen-devel Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, Dario Faggioli, Roger Pau Monné On 13.06.19 09:34, Andrii Anisov wrote: > > > On 13.06.19 10:29, Juergen Gross wrote: >> Thanks for the heads up, but I've rebased already. :-) > > Oh, great. I'm just wondering if you put it already on your github? github.com/jgross1/xen sched-v1-rebase Only compile tested on x86 up to now, but rebase was rather easy. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit 2019-06-13 8:39 ` Juergen Gross @ 2019-06-13 8:49 ` Andrii Anisov 0 siblings, 0 replies; 21+ messages in thread From: Andrii Anisov @ 2019-06-13 8:49 UTC (permalink / raw) To: Juergen Gross Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, Meng Xu, Jan Beulich, xen-devel, Dario Faggioli, Roger Pau Monné [-- Attachment #1.1: Type: text/plain, Size: 278 bytes --] чт, 13 черв. 2019 о 11:39 Juergen Gross <jgross@suse.com> пише: > github.com/jgross1/xen sched-v1-rebase > > Only compile tested on x86 up to now, but rebase was rather easy. > Cool, will take it and check for ARM. Thank you. Sincerely, Andrii Anisov. [-- Attachment #1.2: Type: text/html, Size: 809 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-07-02 10:25 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-01 14:08 [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Jan Beulich 2019-07-01 15:10 ` Juergen Gross 2019-07-01 15:46 ` Jan Beulich 2019-07-02 6:30 ` Juergen Gross 2019-07-02 7:54 ` Jan Beulich 2019-07-02 8:14 ` Juergen Gross 2019-07-02 8:27 ` Jan Beulich 2019-07-02 8:44 ` Juergen Gross 2019-07-02 9:05 ` Jan Beulich 2019-07-02 9:16 ` Juergen Gross 2019-07-02 8:21 ` Dario Faggioli 2019-07-02 8:29 ` Jan Beulich 2019-07-02 9:40 ` Dario Faggioli 2019-07-02 10:01 ` Jan Beulich 2019-07-02 10:25 ` Juergen Gross -- strict thread matches above, loose matches on Subject: below -- 2019-05-28 10:32 [PATCH 00/60] xen: add core scheduling support Juergen Gross 2019-05-28 10:32 ` [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit Juergen Gross 2019-05-28 10:32 ` Juergen Gross 2019-06-13 7:18 ` Andrii Anisov 2019-06-13 7:29 ` Juergen Gross 2019-06-13 7:34 ` Andrii Anisov 2019-06-13 8:39 ` Juergen Gross 2019-06-13 8:49 ` Andrii Anisov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.