Re: [PATCH 07 of 10 [RFC]] sched_credit: Let the scheduler know about `node affinity`

From: Dario Faggioli <raistlin@linux.it>
To: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH 07 of 10 [RFC]] sched_credit: Let the scheduler know about `node affinity`
Date: Wed, 02 May 2012 17:13:06 +0200	[thread overview]
Message-ID: <1335971586.2961.60.camel@Abyss> (raw)
In-Reply-To: <4F9AB0F8.10102@eu.citrix.com>


[-- Attachment #1.1: Type: text/plain, Size: 4335 bytes --]

On Fri, 2012-04-27 at 15:45 +0100, George Dunlap wrote: 
> Hey Dario,
> 
Hi!

> Sorry for the long delay in reviewing this.
> 
No problem, thanks to you for taking the time to look at the patches so
thoroughly!

> Overall I think the approach is good.  
>
That's nice to hear. :-)

> >   /*
> > + * Node Balancing
> > + */
> > +#define CSCHED_BALANCE_NODE_AFFINITY    1
> > +#define CSCHED_BALANCE_CPU_AFFINITY     0
> > +#define CSCHED_BALANCE_START_STEP       CSCHED_BALANCE_NODE_AFFINITY
> > +#define CSCHED_BALANCE_END_STEP         CSCHED_BALANCE_CPU_AFFINITY
> > +
> > +
> This thing of defining "START_STEP" and "END_STEP" seems a bit fragile.  
> I think it would be better to always start at 0, and go until 
> CSCHED_BALANCE_MAX.
>
Ok, I agree that this is fragile and probably also a bit overkill. I'll
make it simpler as you're suggesting.

> > +    /*
> > +     * Let's cache domain's dom->node_affinity here as an
> > +     * optimization for a couple of hot paths. In fact,
> > +     * knowing  whether or not dom->node_affinity has changed
> > +     * would allow us to avoid rebuilding node_affinity_cpumask
> > +     * (below) duing node balancing and/or scheduling.
> > +     */
> > +    nodemask_t node_affinity_cache;
> > +    /* Basing on what dom->node_affinity says,
> > +     * on what CPUs would we like to run most? */
> > +    cpumask_t node_affinity_cpumask;
> I think the comments here need to be more clear.  The main points are:
> * node_affinity_cpumask is the dom->node_affinity translated from a 
> nodemask into a cpumask
> * Because doing the nodemask -> cpumask translation may be expensive, 
> node_affinity_cache stores the last translated value, so we can avoid 
> doing the translation if nothing has changed.
> 
Ok, will do.

> > +/*
> > + * Sort-of conversion between node-affinity and vcpu-affinity for the domain,
> > + * i.e., a cpumask containing all the cpus from all the set nodes in the
> > + * node-affinity mask of the domain.
> This needs to be clearer -- vcpu-affinity doesn't have anything to do 
> with this function, and there's nothing "sort-of" about the conversion. :-)
> 
> I think you mean to say, "Create a cpumask from the node affinity mask."
>
Exactly, I'll try to clarify.

> >   static inline void
> > +__cpumask_tickle(cpumask_t *mask, const cpumask_t *idle_mask)
> > +{
> > +    CSCHED_STAT_CRANK(tickle_idlers_some);
> > +    if ( opt_tickle_one_idle )
> > +    {
> > +        this_cpu(last_tickle_cpu) =
> > +            cpumask_cycle(this_cpu(last_tickle_cpu), idle_mask);
> > +        cpumask_set_cpu(this_cpu(last_tickle_cpu), mask);
> > +    }
> > +    else
> > +        cpumask_or(mask, mask, idle_mask);
> > +}
> I don't see any reason to make this into a function -- it's only called 
> once, and it's not that long.  Unless you're concerned about too many 
> indentations making the lines too short?
>
That was part of it, but I'll put it back and see if I can have it
looking good enough. :-)

> >      sdom->dom = dom;
> > +    /*
> > +     *XXX This would be 'The Right Thing', but as it is still too
> > +     *    early and d->node_affinity has not settled yet, maybe we
> > +     *    can just init the two masks with something like all-nodes
> > +     *    and all-cpus and rely on the first balancing call for
> > +     *    having them updated?
> > +     */
> > +    csched_build_balance_cpumask(sdom);
> We might as well do what you've got here, unless it's likely to produce 
> garbage.  This isn't exactly a hot path. :-)
> 
Well, I won't call it garbage, so that's probably fine. I was concerned
about be thing be clear and meaningful enough. Having this here could
make us (and or future coders :-D) think that they can rely on the
balance mask somehow, while that is not entirely true. I'll re-check the
code and see how I can make it something better for next posting...

> Other than that, looks good -- Thanks!
> 
Good to know, thanks to you!

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel